NF4Tensor¶

class torchao.dtypes.NF4Tensor(tensor_meta: SubclassTensorArgs, block_size: int, n_blocks: int, scaler_block_size: int, quantized_scalers: Tensor, quantization_factor: Tensor, scaler_mean: Tensor, quantized_data: Tensor, nf4: Tensor)[源代码]¶

用于将权重转换为 QLoRA NF4 格式的 NF4Tensor 类

static convert_to_norm_float_weight(input_tensor: Tensor, n_blocks: int, block_size: int, nf4: Tensor) → Tensor[源代码]¶: 将张量转换为归一化浮点权重格式

static dequantize(value: Tensor, nf4: Tensor) → Tensor[源代码]¶: 将 nf4 值反量化为 bfloat16 格式

dequantize_scalers(input_tensor: Tensor, quantization_factor: Tensor, scaler_block_size: int) → Tensor[源代码]¶

用于解包双量化缩放器

参数:

input_tensor – 输入张量，用于转换为 QLoRA 格式，这是 int8 格式的量化缩放器
quantization_factor – 每个缩放器块的量化因子张量，存储在 inpt_weight.dtype 中
scaler_block_size – 用于双量化的缩放器块大小。

static double_quantize_scalers(input_tensor: Tensor, block_size: int, scaler_block_size: int) → Tuple[Tensor, Tensor, Tensor][源代码]¶

用于实现缩放器的双量化。我们首先取输入张量，计算每个块的绝对最大值量化因子。然后，我们计算正的绝对最大值缩放器的均值。我们从缩放器中减去这个均值，然后再次计算每个块的绝对最大值量化因子。最后，我们将缩放器量化为 int8。

参数:

input_tensor – 输入张量，用于转换为 QLoRA 格式，通常是权重张量

返回:

按块量化因子张量，存储为 int8 格式: 大小：(n_blocks)
torch.Tensor: 按缩放器块量化因子张量，存储为 int16 格式: 大小：(n_scaler_blocks)

返回类型:

torch.Tensor

get_original_weight() → Tensor[源代码]¶: 从归一化浮点权重格式获取原始权重

static quantize_tensor_nearest(value: Tensor, nf4: Tensor) → Tensor[源代码]¶: 将 float16 张量量化为 nf4 格式（最近值，非向上舍入）

NF4Tensor¶

文档

教程

资源