TanhNormal¶

class torchrl.modules.TanhNormal(loc: torch.Tensor, scale: torch.Tensor, upscale: torch.Tensor | Number = 5.0, low: torch.Tensor | Number = - 1.0, high: torch.Tensor | Number = 1.0, event_dims: int | None = None, tanh_loc: bool = False, safe_tanh: bool = True)[源代码]¶

实现带位置缩放的 TanhNormal 分布。

位置缩放可防止位置在应用 TanhTransform 时“离 0”太远，但这最终会导致采样不稳定和梯度计算不良（例如，梯度爆炸）。实际上，在位置缩放的情况下，位置根据以下公式计算：

\[loc = tanh(loc / upscale) * upscale.\]

参数:

loc (torch.Tensor) – 正态分布位置参数
scale (torch.Tensor) – 正态分布 sigma 参数（方差的平方根）
upscale (torch.Tensor 或数字) –
公式中的“a”缩放因子

\[loc = tanh(loc / upscale) * upscale.\]
low (torch.Tensor 或数字, 可选) – 分布的最小值。默认为 -1.0；
high (torch.Tensor 或数字, 可选) – 分布的最大值。默认为 1.0；
event_dims (int, 可选) – 描述动作的维度数。默认为 1。将 event_dims 设置为 0 将导致日志概率与输入形状相同，设置为 1 将对最后一个维度求和，设置为 2 将对最后两个维度求和，依此类推。
tanh_loc (bool, 可选) – 如果为 True，则使用上述公式进行位置缩放，否则保留原始值。默认为 False；
safe_tanh (bool, 可选) – 如果为 True，则 Tanh 变换会“安全地”进行，以避免数值溢出。这目前会与 torch.compile() 发生冲突。

文档