Int8DynActInt4WeightQATLinear¶

class torchao.quantization.qat.linear.Int8DynActInt4WeightQATLinear(in_features: int, out_features: int, bias: bool = False, device: device = None, groupsize: int = 256, precision: dtype = torch.float32, scales_precision: dtype = torch.float32)[源代码]¶

该模块实现了一个线性层，该层具有 int8 动态每 token 伪量化激活和 int4 伪量化分组每通道权重。

参数:

groupsize – 权重的每个量化组中的元素数量
precision – 权重的精度
scales_precision – 每组尺度和零点的精度

注意：我们硬编码激活尺度以使用 torch.fp32，但允许用户指定权重尺度（默认为 torch.fp32）。为了与 Int8DynamicActivationInt4WeightConfig 获得完全相同的数值匹配，用户必须为权重和尺度使用相同的 dtype。此处 scales_precision 仅指权重尺度，而不指激活尺度。

Int8DynActInt4WeightQATLinear¶

文档

教程

资源