quantization_utils

用于量化的工具，包括缩放因子调整。

函数

`adjust_attn_amax_values`	调整注意力层的amax值。
`all_items_same`	检查提供的列表中的所有元素是否相同。
`convert_state_dict_amax_to_scales`	将量化状态字典中的_amax键转换为比例值，并相应地更新状态字典。
`filter_output_quantizer`	过滤掉state_dict中所有与kv_cache无关的输出量化器。
`from_quantized_weight`	将量化权重转换为目标torch_dtype格式。
`get_activation_scaling_factor`	返回激活缩放因子。
`get_kv_cache_dtype`	返回kv_cache的数据类型。
`get_kv_cache_scaling_factor`	如果设置了输出量化器，则返回kv_cache的缩放因子。
`get_prequant_scaling_factor`	返回预量化缩放因子。
`get_qkv_and_avg_prequant_scale`	获取模块的qkv和平均预量化缩放因子。
`get_quantization_format`	获取量化字符串。
`get_scaling_factor`	返回量化器的缩放因子作为 torch.Tensor。
`get_weight_block_size`	返回权重块的大小。
`get_weight_scaling_factor`	返回权重缩放因子。
`get_weight_scaling_factor_2`	返回次要权重缩放因子。
`get_weights_scaling_factor_and_amax`	计算给定组大小的权重缩放因子。
`process_layer_quant_config`	处理每层的量化信息，以便将TRTLLM导出到quant_cfg.json。
`resmooth_and_get_scale_and_amax`	从单个或多个等级重新平滑权重，并获取缩放因子和amax。
`to_quantized_weight`	将权重转换为量化（打包）格式。

adjust_attn_amax_values(module): 调整注意力层的amax值。

all_items_same(item_list): 检查提供的列表中的所有元素是否相同。

convert_state_dict_amax_to_scales(quantized_state_dict, maxbound, layers_quant_config)

将量化状态字典中的_amax键转换为比例值，并相应地更新状态字典。

Parameters:

quantized_state_dict (dict) – 包含量化值的输入状态字典。
maxbound (float) – 给定量化格式的最大边界值。
layers_quant_config (dict/str) – 包含每层量化格式信息的字典
量化。 (混合精度和包含常规量化格式的字符串) –

Returns:

更新后的状态字典，包含转换后的比例值。

Return type:

字典

filter_output_quantizer(state_dict)

过滤掉state_dict中所有与kv_cache无关的输出量化器。

Parameters:: state_dict (dict) – 完整的模型状态字典。
Returns:: 过滤后的state_dict，仅包含kv_cache输出量化器。
Return type:: 字典

from_quantized_weight(weight, weights_scaling_factor, quantization, torch_dtype)

将量化权重转换为目标torch_dtype格式。

Parameters:

weight (Tensor) –
weights_scaling_factor (Tensor) –
量化 (str) –

get_activation_scaling_factor(module)

返回激活缩放因子。

Parameters:: 模块 (模块) –
Return type:: 张量

get_kv_cache_dtype(modules)

返回kv_cache的数据类型。

如果 output_quantizer 的 num_bits 是 (4, 3)，则返回 FP8；如果是 8，则返回 int8，否则返回 None。

Parameters:: 模块 (联合[列表[nn.模块], nn.模块]) – 要检查的模块或模块列表。
Returns:: kv_cache 数据类型。
Return type:: str

get_kv_cache_scaling_factor(qkv_modules)

如果设置了输出量化器，则返回kv_cache缩放因子。否则默认返回None。

Parameters:: qkv_modules (List[Module]) –
Return type:: 张量

get_prequant_scaling_factor(module, dtype)

返回预量化缩放因子。

Parameters:

模块 (模块) –
dtype (dtype) –

Return type:

张量

get_qkv_and_avg_prequant_scale(module, dtype)

获取模块的qkv和平均预量化缩放因子。

Parameters:

module – 包含 q、k 和 v 子模块的模块。
dtype – 缩放因子的数据类型。

Returns:

一个包含平均预量化缩放因子和单独的: q、k和v的缩放因子的元组。

Return type:

元组

get_quantization_format(module)

获取量化字符串。

通过遍历模块及其子模块获取量化字符串。返回第一个非空的量化字符串。

Return type:: str | None

get_scaling_factor(quantizer)

返回量化器的缩放因子作为 torch.Tensor。

Parameters:: 量化器 (TensorQuantizer) –
Return type:: 张量

get_weight_block_size(module)

返回权重块的大小。

Parameters:: 模块 (模块) –
Return type:: 整数

get_weight_scaling_factor(module)

返回权重缩放因子。

Parameters:: 模块 (模块) –
Return type:: 张量

get_weight_scaling_factor_2(module)

返回次要权重缩放因子。

Parameters:: 模块 (模块) –
Return type:: 张量

get_weights_scaling_factor_and_amax(weight, group_size): 计算给定组大小的权重缩放因子。

process_layer_quant_config(layer_config_dict): 处理每层的量化信息，以便将TRTLLM导出到quant_cfg.json。

resmooth_and_get_scale_and_amax(merged_weights, pre_quant_scales, ranks, group_size, avg_pre_quant_scale=None, quantization=None)

从单个或多个等级重新平滑权重，并获取缩放因子和amax。

Parameters:

merged_weights (Tensor) – 从各个等级合并的权重。
pre_quant_scales (List[Tensor]) – 每个等级的预量化比例列表。
ranks (int) – 等级数量。
group_size (int) – 量化块的分组大小。
avg_pre_quant_scale (可选) – 如果未提供，将使用pre_quant_scales的平均值重新平滑权重。
量化 (str | None) –

Returns:

重新平滑的权重。 weight_scaling_factors: 重新平滑的缩放因子。 avg_pre_quant_scale: 计算得到的量化比例的平均值。 amaxes: 权重的Amax值。

Return type:

权重

to_quantized_weight(weight, weights_scaling_factor, quantization, weights_scaling_factor2=None, block_size=None)

将权重转换为量化（打包）格式。

Parameters:

weight (Tensor) –
weights_scaling_factor (Tensor) –
量化 (str) –
weights_scaling_factor2 (Tensor | None) –
block_size (int | None) –