后处理

用于加载和处理model_config的工具。

函数

`check_weight_shape_valid`	检查权重形状是否与推理TP有效。
`pad_embedding_lm_head`	将lm_head和embedding填充为64的倍数以进行AWQ量化。
`postprocess_model_config`	使用训练的张量并行对模型配置进行后处理，以目标推理张量并行为目标。
`postprocess_tensors`	确保模型配置中的所有张量都在CPU上，连续且拥有内存。
`update_lm_head_quantization`	更新lm_head量化配置以用于TRT-LLM导出。

check_weight_shape_valid(config, inference_tensor_parallel=1, training_tensor_parallel=1)

检查权重形状是否与推理TP有效。

这个函数是递归的。

pad_embedding_lm_head(model_config, padding_factor=64)

将lm_head和embedding填充为64的倍数以进行AWQ量化。

Parameters:

model_config (ModelConfig) –
padding_factor (int) –

postprocess_model_config(model_config, inference_tensor_parallel=1, inference_pipeline_parallel=1, training_pipeline_parallel=1, workspace_path=None)

使用训练的张量并行对模型配置进行后处理，以目标推理张量并行为目标。

如果 training_pipeline_parallel > 1，跨 PP 的模型配置将被合并为一个。

Returns:

处理后的模型配置作为一个列表。

对于合并的情况：: 合并后的等级将返回合并后的model_config作为一个单一项目的列表。其他等级将返回一个空列表，因为我们不再导出它们。
对于拆分的情况：: 返回拆分后的模型配置列表。

Parameters:

inference_tensor_parallel (int) –
inference_pipeline_parallel (int) –
training_pipeline_parallel (int) –
workspace_path (Path | str | None) –

Return type:

列表[ModelConfig]

postprocess_tensors(model_config, force_cpu=True, force_contiguous=True, force_non_view=True)

确保模型配置中的所有张量都在CPU上，连续且拥有内存。

Parameters:

model_config (ModelConfig) –
force_cpu (bool) –
force_contiguous (bool) –
force_non_view (bool) –

update_lm_head_quantization(config, lm_head, inference_tensor_parallel=1)

更新lm_head量化配置以用于TRT-LLM导出。

Parameters:

config (ModelConfig) –
lm_head (QuantLinear) –
inference_tensor_parallel (int) –