tensor_quantizer

TensorQuantizer 模块。

类

`TensorQuantizer`	张量量化器模块。
`SequentialQuantizer`	一个用于`TensorQuantizer`模块的顺序容器。

class SequentialQuantizer

基础类：Sequential

一个用于TensorQuantizer模块的顺序容器。

该模块用于以多种格式顺序量化张量。它以TensorQuantizer模块作为输入，并将它们容器化，类似于torch.nn.Sequential。

Parameters:: 量化器 (TensorQuantizer) – 要添加到容器中的TensorQuantizer模块。

__init__(*quantizers)

初始化 SequentialQuantizer 模块。

Parameters:: 量化器 (TensorQuantizer) –

disable(): 禁用量化器模块。

get_modelopt_state()

获取要保存在检查点中的元状态。

Return type:: Dict[str, Any]

static replace_sequential_quantizer_with_single_quantizer(model, indx=0)

将模型中的SequentialQuantizer实例替换为单个量化器。

使用顺序量化器中索引为indx的量化器来替换它。此方法对于单独校准顺序量化器中的量化器非常有用。

Parameters:: indx (int) –

reset_amax(): 重置量化器的最大值。

set_from_attribute_config(attributes)

从属性字典列表中设置包含的量化器的属性。

Parameters:: 属性 (列表[字典[字符串, 任意类型] | QuantizerAttributeConfig] | 字典[字符串, 任意类型] | QuantizerAttributeConfig) –

static tensor_quantizer_iterator(quantizers): 容器中量化器的迭代器（但如果它是TensorQuantizer，则返回自身）。

class TensorQuantizer

基础：Module

张量量化器模块。

该模块管理输入张量的量化和校准。它可以执行假（模拟量化）或真实量化，适用于各种精度和格式，如FP8每张量、INT8每通道、INT4每块等。

如果启用了量化，它将调用适当的量化功能并返回量化后的张量。对于伪量化，量化后的张量数据类型将与输入张量数据类型相同。在校准模式下，该模块使用其校准器收集统计信息。

量化参数如QuantizerAttributeConfig中所述。它们可以在初始化时使用quant_attribute_cfg设置，或者稍后通过调用set_from_attribute_config()来设置。

Parameters:

quant_attribute_cfg – QuantizerAttributeConfig 的实例或 None。如果为 None，则使用默认值。
if_quant – 一个布尔值。如果为True，则在前向路径中启用量化。
if_clip – 一个布尔值。如果为True，则在前向路径中启用裁剪（使用_learn_amax）。
if_calib – 一个布尔值。如果为True，则在前向路径中启用校准。
amax – 无或类似数组的对象，如列表、元组、numpy数组、标量，可用于构造amax张量。

__init__(quant_attribute_cfg=None, if_quant=True, if_clip=False, if_calib=False, amax=None): 初始化量化器并设置所需的变量。

property amax: 返回用于量化的最大值。

property axis: 返回用于量化的轴。

property block_sizes: 返回用于量化的块大小。

clean_up_after_set_from_modelopt_state(prefix=''): 清理在 set_from_modelopt_state 期间创建的临时变量。

dequantize(qtensor)

将量化的实数张量反量化为指定的数据类型。

Parameters:: qtensor (BaseQuantizedTensor) –

disable()

绕过模块。

如果模块被禁用，则不会执行校准、裁剪和量化。

disable_calib(): 禁用校准。

disable_clip(): 禁用剪辑阶段。

disable_quant(): 禁用量化。

enable(): 启用模块。

enable_calib(): 启用校准。

enable_clip(): 启用剪辑阶段。

enable_quant(): 启用量化。

export_amax()

正确导出格式化/形状化的amax。

Return type:: 张量 | 无

extra_repr(): 设置有关此模块的额外信息。

property fake_quant: 如果使用了伪量化，则返回True。

forward(inputs)

将tensor_quant函数应用于输入。

Parameters:: inputs – 一个类型为 float32/float16/bfloat16 的张量。
Returns:: 一个类型为 output_dtype 的张量
Return type:: 输出

get_modelopt_state(properties_only=False)

获取要保存在检查点中的元状态。

如果properties_only为True，则仅包括量化器属性，如num_bits、axis等。要完全恢复量化器，请使用properties_only=False。

Parameters:: properties_only (bool) –
Return type:: Dict[str, Any]

init_learn_amax(): 从固定的amax初始化学习到的amax。

property is_enabled: 如果模块未被禁用，则返回true。

property is_mx_format: 检查是否为MX格式。

load_calib_amax(*args, **kwargs)

从校准器加载amax。

使用校准器计算的值更新amax缓冲区，必要时创建它。 *args 和 **kwargs 直接传递给 compute_amax，除了 kwargs 中的 "strict"。有关更多详细信息，请参阅 compute_amax。

property maxbound: 返回量化的最大边界。

property narrow_range: 如果使用了对称整数范围进行有符号量化，则返回True。

property num_bits: 返回用于量化的num_bits。

property pre_quant_scale: 返回用于smoothquant的pre_quant_scale。

reset_amax(): 将amax重置为None。

set_from_attribute_config(attribute_cfg)

从 attribute_dict 设置量化器属性。

属性定义在 QuantizerAttributeConfig中。

Parameters:: attribute_cfg (QuantizerAttributeConfig | Dict) –

set_from_modelopt_state(modelopt_state, prefix=''): 从检查点设置元状态。

property step_size: 返回整数量化的步长。

sync_amax_across_distributed_group(parallel_group)

同步给定组中所有等级的amax。

Parameters:: parallel_group (DistributedProcessGroup) –

property trt_high_precision_dtype: 如果导出模型时使用了FP16 AMAX，则返回True。

property unsigned: 如果使用了无符号量化，则返回True。