ProbabilisticActor¶
- class torchrl.modules.tensordict_module.ProbabilisticActor(*args, **kwargs)[源代码]¶
RL 中概率性 Actor 的通用类。
Actor 类提供了 `out_keys` 的默认值([“action”]),如果提供了 `spec` 但它不是 Composite 对象,它将被自动转换为
spec = Composite(action=spec)
- 参数:
module (nn.Module) – 一个
torch.nn.Module
,用于将输入映射到输出参数空间。in_keys (str 或 str 可迭代对象或dict) – 将从输入 TensorDict 读取并用于构建分布的键。重要的是,如果它是字符串的可迭代对象或字符串,则这些键必须与感兴趣的分布类使用的关键字匹配,例如,对于 Normal 分布及其类似物,为
"loc"
和"scale"
。如果 in_keys 是一个字典,键是分布的键,值是 tensordict 中将与相应分布键匹配的键。out_keys (str 或 str 可迭代对象) – 将写入采样值的键。重要的是,如果这些键存在于输入 TensorDict 中,则将跳过采样步骤。
spec (TensorSpec, 可选) – 包含输出张量规范的关键字唯一参数。如果模块输出多个输出张量,spec 将表征第一个输出张量的空间。
safe (bool) – 关键字唯一参数。如果为
True
,将根据输入 spec 检查输出值。由于探索策略或数值下溢/溢出问题,可能会发生域外采样。如果此值超出范围,将使用TensorSpec.project
方法将其投影回期望空间。默认为False
。default_interaction_type (tensordict.nn.InteractionType, 可选) –
关键字唯一参数。用于检索输出值的默认方法。应为以下之一:
InteractionType.MODE
,InteractionType.DETERMINISTIC
,InteractionType.MEDIAN
,InteractionType.MEAN
或InteractionType.RANDOM
(后一种情况是从分布中随机采样值)。TorchRL 的ExplorationType
类是InteractionType
的代理。默认为InteractionType.DETERMINISTIC
。注意
当抽取样本时,
ProbabilisticActor
实例将首先查找由全局函数interaction_type()
指定的交互模式。如果此函数返回 None(其默认值),则将使用 ProbabilisticTDModule 实例的 default_interaction_type。请注意,DataCollectorBase
实例默认将使用 set_interaction_type 设置为tensordict.nn.InteractionType.RANDOM
。distribution_class (Type, 可选) –
关键字唯一参数。要用于采样的
torch.distributions.Distribution
类。默认为tensordict.nn.distributions.Delta
。注意
如果
distribution_class
的类型是CompositeDistribution
,则键将从该分布的distribution_map
/name_map
关键字参数推断。如果此分布与其他构造函数(例如 partial 或 lambda 函数)一起使用,则需要显式提供 out_keys。另请注意,动作前面不会加上"action"
键,请参阅下面的示例,了解如何使用ProbabilisticActor
实现这一点。distribution_kwargs (dict, 可选) – 关键字唯一参数。要传递给分布的关键字参数对。
return_log_prob (bool, 可选) – 关键字唯一参数。如果为
True
,则分布样本的对数概率将以键 ‘sample_log_prob’ 写入 tensordict。默认为False
。cache_dist (bool, 可选) – 关键字唯一参数。实验性:如果为
True
,则分布的参数(即模块的输出)将与样本一起写入 tensordict。这些参数可用于以后重新计算原始分布(例如,计算用于采样动作的分布与 PPO 中更新的分布之间的散度)。默认为False
。n_empirical_estimate (int, 可选) – 关键字唯一参数。当经验均值不可用时,用于计算经验均值的样本数。默认为 1000。
示例
>>> import torch >>> from tensordict import TensorDict >>> from tensordict.nn import TensorDictModule >>> from torchrl.data import Bounded >>> from torchrl.modules import ProbabilisticActor, NormalParamExtractor, TanhNormal >>> td = TensorDict({"observation": torch.randn(3, 4)}, [3,]) >>> action_spec = Bounded(shape=torch.Size([4]), ... low=-1, high=1) >>> module = nn.Sequential(torch.nn.Linear(4, 8), NormalParamExtractor()) >>> tensordict_module = TensorDictModule(module, in_keys=["observation"], out_keys=["loc", "scale"]) >>> td_module = ProbabilisticActor( ... module=tensordict_module, ... spec=action_spec, ... in_keys=["loc", "scale"], ... distribution_class=TanhNormal, ... ) >>> td = td_module(td) >>> td TensorDict( fields={ action: Tensor(shape=torch.Size([3, 4]), device=cpu, dtype=torch.float32, is_shared=False), loc: Tensor(shape=torch.Size([3, 4]), device=cpu, dtype=torch.float32, is_shared=False), observation: Tensor(shape=torch.Size([3, 4]), device=cpu, dtype=torch.float32, is_shared=False), scale: Tensor(shape=torch.Size([3, 4]), device=cpu, dtype=torch.float32, is_shared=False)}, batch_size=torch.Size([3]), device=None, is_shared=False)
概率性 Actor 还通过
tensordict.nn.CompositeDistribution
类支持复合动作。此分布以 tensordict 作为输入(通常是 “params”)并将其作为一个整体读取:此 tensordict 的内容是复合分布中包含的分布的输入。示例
>>> from tensordict import TensorDict >>> from tensordict.nn import CompositeDistribution, TensorDictModule >>> from torchrl.modules import ProbabilisticActor >>> from torch import nn, distributions as d >>> import torch >>> >>> class Module(nn.Module): ... def forward(self, x): ... return x[..., :3], x[..., 3:6], x[..., 6:] >>> module = TensorDictModule(Module(), ... in_keys=["x"], ... out_keys=[("params", "normal", "loc"), ... ("params", "normal", "scale"), ... ("params", "categ", "logits")]) >>> actor = ProbabilisticActor(module, ... in_keys=["params"], ... distribution_class=CompositeDistribution, ... distribution_kwargs={"distribution_map": { ... "normal": d.Normal, "categ": d.Categorical}} ... ) >>> data = TensorDict({"x": torch.rand(10)}, []) >>> actor(data) TensorDict( fields={ categ: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.int64, is_shared=False), normal: Tensor(shape=torch.Size([3]), device=cpu, dtype=torch.float32, is_shared=False), params: TensorDict( fields={ categ: TensorDict( fields={ logits: Tensor(shape=torch.Size([4]), device=cpu, dtype=torch.float32, is_shared=False)}, batch_size=torch.Size([]), device=None, is_shared=False), normal: TensorDict( fields={ loc: Tensor(shape=torch.Size([3]), device=cpu, dtype=torch.float32, is_shared=False), scale: Tensor(shape=torch.Size([3]), device=cpu, dtype=torch.float32, is_shared=False)}, batch_size=torch.Size([]), device=None, is_shared=False)}, batch_size=torch.Size([]), device=None, is_shared=False), x: Tensor(shape=torch.Size([10]), device=cpu, dtype=torch.float32, is_shared=False)}, batch_size=torch.Size([]), device=None, is_shared=False)
使用概率性 Actor 和复合分布可以通过以下示例代码实现:
示例
>>> import torch >>> from tensordict import TensorDict >>> from tensordict.nn import CompositeDistribution >>> from tensordict.nn import TensorDictModule >>> from torch import distributions as d >>> from torch import nn >>> >>> from torchrl.modules import ProbabilisticActor >>> >>> >>> class Module(nn.Module): ... def forward(self, x): ... return x[..., :3], x[..., 3:6], x[..., 6:] ... >>> >>> module = TensorDictModule(Module(), ... in_keys=["x"], ... out_keys=[ ... ("params", "normal", "loc"), ("params", "normal", "scale"), ("params", "categ", "logits") ... ]) >>> actor = ProbabilisticActor(module, ... in_keys=["params"], ... distribution_class=CompositeDistribution, ... distribution_kwargs={"distribution_map": {"normal": d.Normal, "categ": d.Categorical}, ... "name_map": {"normal": ("action", "normal"), ... "categ": ("action", "categ")}} ... ) >>> print(actor.out_keys) [('params', 'normal', 'loc'), ('params', 'normal', 'scale'), ('params', 'categ', 'logits'), ('action', 'normal'), ('action', 'categ')] >>> >>> data = TensorDict({"x": torch.rand(10)}, []) >>> module(data) >>> print(actor(data)) TensorDict( fields={ action: TensorDict( fields={ categ: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.int64, is_shared=False), normal: Tensor(shape=torch.Size([3]), device=cpu, dtype=torch.float32, is_shared=False)}, batch_size=torch.Size([]), device=None, is_shared=False), params: TensorDict( fields={ categ: TensorDict( fields={ logits: Tensor(shape=torch.Size([4]), device=cpu, dtype=torch.float32, is_shared=False)}, batch_size=torch.Size([]), device=None, is_shared=False), normal: TensorDict( fields={ loc: Tensor(shape=torch.Size([3]), device=cpu, dtype=torch.float32, is_shared=False), scale: Tensor(shape=torch.Size([3]), device=cpu, dtype=torch.float32, is_shared=False)}, batch_size=torch.Size([]), device=None, is_shared=False)}, batch_size=torch.Size([]), device=None, is_shared=False), x: Tensor(shape=torch.Size([10]), device=cpu, dtype=torch.float32, is_shared=False)}, batch_size=torch.Size([]), device=None, is_shared=False)