TorchAO¶

TorchAO是一个专为PyTorch设计的架构优化库，它提供了高性能数据类型、优化技术和推理训练内核，并与原生PyTorch功能（如torch.compile、FSDP等）具备可组合性。部分基准测试数据可查阅此处。

我们建议安装最新版的torchao nightly版本

# Install the latest TorchAO nightly build
# Choose the CUDA version that matches your system (cu126, cu128, etc.)
pip install \
    --pre torchao>=10.0.0 \
    --index-url https://download.pytorch.org/whl/nightly/cu126

量化HuggingFace模型¶

你可以使用torchao对自己的huggingface模型进行量化，例如transformers和diffusers，并通过以下示例代码将检查点保存到huggingface hub，如this所示：

import torch
from transformers import TorchAoConfig, AutoModelForCausalLM, AutoTokenizer
from torchao.quantization import Int8WeightOnlyConfig

model_name = "meta-llama/Meta-Llama-3-8B"
quantization_config = TorchAoConfig(Int8WeightOnlyConfig())
quantized_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto",
    quantization_config=quantization_config
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
input_text = "What are we having for dinner?"
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

hub_repo = # YOUR HUB REPO ID
tokenizer.push_to_hub(hub_repo)
quantized_model.push_to_hub(hub_repo, safe_serialization=False)

或者，您可以使用TorchAO量化空间通过简单的用户界面来量化模型。