ActionMask¶

class torchrl.envs.transforms.ActionMask(action_key: NestedKey = 'action', mask_key: NestedKey = 'action_mask')[source]¶

一个自适应动作掩码器。

此转换器可用于通过掩码动作规范来确保随机生成的动作遵守合法动作。它在执行步骤后从 input tensordict 中读取掩码，并调整有限动作规范的掩码。

注意

此转换器在未配备环境时使用会失败。

参数:

action_key (NestedKey, optional) – 可找到动作张量的键。默认为 "action"。
mask_key (NestedKey, optional) – 可找到动作掩码的键。默认为 "action_mask"。

示例

>>> import torch
>>> from torchrl.data.tensor_specs import Categorical, Binary, Unbounded, Composite
>>> from torchrl.envs.transforms import ActionMask, TransformedEnv
>>> from torchrl.envs.common import EnvBase
>>> class MaskedEnv(EnvBase):
...     def __init__(self, *args, **kwargs):
...         super().__init__(*args, **kwargs)
...         self.action_spec = Categorical(4)
...         self.state_spec = Composite(action_mask=Binary(4, dtype=torch.bool))
...         self.observation_spec = Composite(obs=Unbounded(3))
...         self.reward_spec = Unbounded(1)
...
...     def _reset(self, tensordict=None):
...         td = self.observation_spec.rand()
...         td.update(torch.ones_like(self.state_spec.rand()))
...         return td
...
...     def _step(self, data):
...         td = self.observation_spec.rand()
...         mask = data.get("action_mask")
...         action = data.get("action")
...         mask = mask.scatter(-1, action.unsqueeze(-1), 0)
...
...         td.set("action_mask", mask)
...         td.set("reward", self.reward_spec.rand())
...         td.set("done", ~mask.any().view(1))
...         return td
...
...     def _set_seed(self, seed) -> None:
...         pass
...
>>> torch.manual_seed(0)
>>> base_env = MaskedEnv()
>>> env = TransformedEnv(base_env, ActionMask())
>>> r = env.rollout(10)
>>> r["action_mask"]
tensor([[ True,  True,  True,  True],
        [ True,  True, False,  True],
        [ True,  True, False, False],
        [ True, False, False, False]])

forward(tensordict: TensorDictBase) → TensorDictBase[source]¶

读取输入 tensordict，并对选定的键应用转换。

默认情况下，此方法

直接调用 _apply_transform()。
不调用 _step() 或 _call()。

此方法不会在任何时候在 env.step 中调用。但是，它会在 sample() 中调用。

注意

forward 也可以使用 dispatch 将参数名称转换为键，并使用常规关键字参数。

示例

>>> class TransformThatMeasuresBytes(Transform):
...     '''Measures the number of bytes in the tensordict, and writes it under `"bytes"`.'''
...     def __init__(self):
...         super().__init__(in_keys=[], out_keys=["bytes"])
...
...     def forward(self, tensordict: TensorDictBase) -> TensorDictBase:
...         bytes_in_td = tensordict.bytes()
...         tensordict["bytes"] = bytes
...         return tensordict
>>> t = TransformThatMeasuresBytes()
>>> env = env.append_transform(t) # works within envs
>>> t(TensorDict(a=0))  # Works offline too.

ActionMask¶

文档

教程

资源