Installation

Linux

CUDA Installation

CUDA is a parallel computing platform and programming model created by NVIDIA, which allows developers to use NVIDIA's GPUs for high-performance parallel computing.

First, check if your GPU supports CUDA at https://developer.nvidia.com/cuda-gpus

  1. Ensure that the current Linux version supports CUDA. Enter uname -m && cat /etc/*release in the command line, and you should see output similar to

x86_64
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
  1. Check if gcc is installed. Enter gcc --version in the command line, and you should see similar output

gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
  1. Download the required CUDA at the following website. Here, version 12.2 is recommended. https://developer.nvidia.com/cuda-gpus Note that you need to select the correct version according to the above output.

../_images/image-20240610221819901.png

If you have installed CUDA before (for example, version 12.1), you need to first use sudo /usr/local/cuda-12.1/bin/cuda-uninstaller to uninstall it. If this command cannot run, you can directly:

sudo rm -r /usr/local/cuda-12.1/
sudo apt clean && sudo apt autoclean

After the uninstallation is completed, run the following command and continue the installation according to the prompts:

wget https://developer.download.nvidia.com/compute/cuda/12.2.0/local_installers/cuda_12.2.0_535.54.03_linux.run
sudo sh cuda_12.2.0_535.54.03_linux.run

Note: Before determining whether the CUDA - shipped driver version is compatible with the GPU, it is recommended to cancel the installation of the Driver.

../_images/image-20240610221924687.png

After completion, enter nvcc -V to check if the corresponding version number appears. If it appears, the installation is complete.

../_images/image-20240610221942403.png

Windows

CUDA Installation

  1. Open Settings, find Windows specifications in About and ensure that the system version is in the following list:

Supported version number

Microsoft Windows 11 21H2

Microsoft Windows 11 22H2-SV2

Microsoft Windows 11 23H2

Microsoft Windows 10 21H2

Microsoft Windows 10 22H2

Microsoft Windows Server 2022

  1. Select the corresponding version to download and install according to the prompts.

../_images/image-20240610222000379.png
  1. Open the command prompt (cmd) and enter nvcc -V. If something similar appears, the installation is successful.

../_images/image-20240610222014623.png

Otherwise, check the system environment variables to ensure that CUDA is correctly imported.

../_images/image-20240610222021868.png

LLaMA-Factory Installation

Before installing LLaMA-Factory, make sure you have installed the following dependencies:

Run the following commands to install LLaMA-Factory and its dependencies:

git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e ".[torch,metrics]"

If there is an environment conflict, please try to use pip install --no-deps -e . to solve it

LLaMA-Factory Verification

After the installation is completed, you can quickly verify whether the installation is successful by using llamafactory-cli version.

If you can successfully see an interface similar to the following, it means the installation was successful.

../_images/image-20240611002529453.png

LLaMA-Factory Advanced Options

Windows

QLoRA

If you want to enable Quantized LoRA (QLoRA) on Windows, please select the appropriate bitsandbytes release version according to your CUDA version.

pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.41.2.post2-py3-none-win_amd64.whl

FlashAttention-2

If you want to enable FlashAttention-2 on the Windows platform, please select the appropriate flash-attention release version according to your CUDA version.

Extra Dependency

If you have more requirements, please install the corresponding dependencies.

Name

Description

torch

The open-source deep learning framework PyTorch, which is widely used in machine learning and artificial intelligence research.

torch-npu

The Ascend device compatibility package for PyTorch.

metrics

Used to evaluate and monitor the performance of machine learning models.

deepspeed

Provides the Zero Redundancy Optimizer required for distributed training.

bitsandbytes

Used for quantization of large language models.

hqq

Used for large language model quantization.

eetq

Used for large language model quantization.

gptq

Used to load GPTQ quantized models.

awq

Used to load the AWQ quantized model.

aqlm

Used to load the AQLM quantization model.

vllm

Provides high-speed concurrent model inference services.

galore

Provides an efficient full-parameter fine-tuning algorithm.

badam

Provides an efficient full-parameter fine-tuning algorithm.

qwen

Provides the packages required to load the Qwen v1 model.

modelscope

ModelScope Community, which provides a way to download pre-trained models and datasets.

swanlab

The open-source training tracking tool SwanLab is used to record and visualize the training process

dev

Used for the development and maintenance of LLaMA Factory.