QMixer¶

class torchrl.modules.QMixer(state_shape: tuple[int, ...] | torch.Size, mixing_embed_dim: int, n_agents: int, device: DEVICE_TYPING)[源代码]¶

QMix 混合器。

通过单调超网络将智能体的局部 Q 值混合成全局 Q 值，该超网络的参数来自全局状态。摘自论文 https://arxiv.org/abs/1803.11485 。

它将每个智能体选择动作的局部值（形状为 (*B, self.n_agents, 1)）转换为一个全局值（形状为 (*B, 1)）。与 torchrl.objectives.QMixerLoss 一起使用。有关示例，请参阅 examples/multiagent/qmix_vdn.py。

参数:

state_shape (tuple 或 torch.Size) – 状态的形状（不包括可能的领先批量维度）。
mixing_embed_dim (int) – 混合嵌入维度的尺寸。
n_agents (int) – 代理数量。
device (str 或 torch.Device) – 网络所用的 torch 设备。

示例

>>> import torch
>>> from tensordict import TensorDict
>>> from tensordict.nn import TensorDictModule
>>> from torchrl.modules.models.multiagent import QMixer
>>> n_agents = 4
>>> qmix = TensorDictModule(
...     module=QMixer(
...         state_shape=(64, 64, 3),
...         mixing_embed_dim=32,
...         n_agents=n_agents,
...         device="cpu",
...     ),
...     in_keys=[("agents", "chosen_action_value"), "state"],
...     out_keys=["chosen_action_value"],
... )
>>> td = TensorDict({"agents": TensorDict({"chosen_action_value": torch.zeros(32, n_agents, 1)}, [32, n_agents]), "state": torch.zeros(32, 64, 64, 3)}, [32])
>>> td
TensorDict(
    fields={
        agents: TensorDict(
            fields={
                chosen_action_value: Tensor(shape=torch.Size([32, 4, 1]), device=cpu, dtype=torch.float32, is_shared=False)},
            batch_size=torch.Size([32, 4]),
            device=None,
            is_shared=False),
        state: Tensor(shape=torch.Size([32, 64, 64, 3]), device=cpu, dtype=torch.float32, is_shared=False)},
    batch_size=torch.Size([32]),
    device=None,
    is_shared=False)
>>> vdn(td)
TensorDict(
    fields={
        agents: TensorDict(
            fields={
                chosen_action_value: Tensor(shape=torch.Size([32, 4, 1]), device=cpu, dtype=torch.float32, is_shared=False)},
            batch_size=torch.Size([32, 4]),
            device=None,
            is_shared=False),
        chosen_action_value: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.float32, is_shared=False),
        state: Tensor(shape=torch.Size([32, 64, 64, 3]), device=cpu, dtype=torch.float32, is_shared=False)},
    batch_size=torch.Size([32]),
    device=None,
    is_shared=False)

mix(chosen_action_value: Tensor, state: Tensor)[源代码]¶

混合器的前向传播。

参数:: chosen_action_value – 形状为 [*B, n_agents] 的张量
返回:: 形状为 [*B] 的张量
返回类型:: chosen_action_value

QMixer¶

文档

教程

资源