AdaptiveKLController¶
- class torchrl.data.AdaptiveKLController(*, init_kl_coef: float, target: float, horizon: int, model: nn.Module | None = None)[source]¶
Adaptive KL Controller as described in Ziegler et al. “Fine-Tuning Language Models from Human Preferences”.
- 关键字参数:
init_kl_coef (
float
) – 系数的起始值。target (
float
) – 目标 KL 值。当观察到的 KL 值较小时,系数会减小,从而放松训练目标中的 KL 惩罚,允许模型与参考模型产生更大的偏差。当观察到的 KL 值大于目标时,KL 系数会增大,从而将模型拉回参考模型。horizon (int) – 控制我们如何积极更新系数的缩放因子。
model (nn.Module, optional) – 需要被控制的包装模型。必须有一个属性
"kl_coef"
。如果提供,"kl_coef"
将被就地更新。
参考: Section 2.2 https://arxiv.org/pdf/1909.08593.pdf#page=2 来源: https://github.com/openai/lm-human-preferences/blob/master/lm_human_preferences/train_policy.py