BGE 重排序器#

与嵌入模型类似，BGE提供了一系列不同规模和功能的排序器。本教程将介绍BGE排序器系列。

0. 安装#

在环境中安装依赖项。

%pip install -U FlagEmbedding

1. bge-reranker#

第一代BGE reranker包含两个模型：

模型	语言	参数	描述	基础模型
BAAI/bge-reranker-base	中文和英文	278M	交叉编码器模型，精度更高但效率较低	XLM-RoBERTa-Base
BAAI/bge-reranker-large	中文和英文	560M	交叉编码器模型，精度更高但效率较低	XLM-RoBERTa-Large

from FlagEmbedding import FlagReranker

model = FlagReranker(
    'BAAI/bge-reranker-large',
    use_fp16=True,
    devices=["cuda:0"],   # if you don't have GPUs, you can use "cpu"
)

pairs = [
    ["What is the capital of France?", "Paris is the capital of France."],
    ["What is the capital of France?", "The population of China is over 1.4 billion people."],
    ["What is the population of China?", "Paris is the capital of France."],
    ["What is the population of China?", "The population of China is over 1.4 billion people."]
]

scores = model.compute_score(pairs)
scores

/share/project/xzy/Envs/ft/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
You're using a XLMRobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.

[7.984375, -6.84375, -7.15234375, 5.44921875]

2. bge-reranker v2#

模型	语言	参数	描述	基础模型
BAAI/bge-reranker-v2-m3	多语言	568M	轻量级交叉编码器模型，具备强大的多语言能力，易于部署，推理速度快。	XLM-RoBERTa-Large
BAAI/bge-reranker-v2-gemma	多语言	2.51B	适用于多语言场景的交叉编码器模型，在英语能力和多语言能力方面均表现优异。	Gemma2-2B
BAAI/bge-reranker-v2-minicpm-layerwise	多语言	2.72B	适用于多语言场景的交叉编码器模型，在英语和中文能力上均表现优异，可自由选择输出层以加速推理。	MiniCPM
BAAI/bge-reranker-v2.5-gemma2-lightweight	多语言	9.24B	适用于多语言场景的交叉编码器模型，在英语和中文能力上均表现优异，允许自由选择输出层、压缩比例和压缩层，便于加速推理。	Gemma2-9B

bge-reranker-v2-m3#

bge-reranker-v2-m3基于bge-m3训练，在保持轻量级模型体积的同时引入了强大的多语言能力。

from FlagEmbedding import FlagReranker

# Setting use_fp16 to True speeds up computation with a slight performance degradation (if using gpu)
reranker = FlagReranker('BAAI/bge-reranker-v2-m3', devices=["cuda:0"], use_fp16=True)

score = reranker.compute_score(['query', 'passage'])
# or set "normalize=True" to apply a sigmoid function to the score for 0-1 range
score = reranker.compute_score(['query', 'passage'], normalize=True)

print(score)

You're using a XLMRobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.

[0.003483424193080668]

bge-reranker-v2-gemma#

bge-reranker-v2-gemma是基于gemma-2b训练的模型。它在英语熟练度和多语言能力方面都表现出色。

from FlagEmbedding import FlagLLMReranker

reranker = FlagLLMReranker('BAAI/bge-reranker-v2-gemma', devices=["cuda:0"], use_fp16=True)

score = reranker.compute_score(['query', 'passage'])
print(score)

Loading checkpoint shards: 100%|██████████| 3/3 [00:00<00:00,  5.29it/s]
You're using a GemmaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
100%|██████████| 1/1 [00:00<00:00, 45.99it/s]

[1.974609375]

bge-reranker-v2-minicpm-layerwise#

bge-reranker-v2-minicpm-layerwise 是基于 minicpm-2b-dpo-bf16 训练的。它适用于多语言环境，在英语和中文能力方面都表现良好。

另一个特殊功能是分层设计让用户可以自由选择输出层，从而加速推理过程。

from FlagEmbedding import LayerWiseFlagLLMReranker

reranker = LayerWiseFlagLLMReranker('BAAI/bge-reranker-v2-minicpm-layerwise', devices=["cuda:0"], use_fp16=True)

# Adjusting 'cutoff_layers' to pick which layers are used for computing the score.
score = reranker.compute_score(['query', 'passage'], cutoff_layers=[28])
print(score)

Loading checkpoint shards: 100%|██████████| 3/3 [00:00<00:00,  3.85it/s]
You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
100%|██████████| 1/1 [00:00<00:00, 24.51it/s]

[-7.06640625]

bge-reranker-v2.5-gemma2轻量版#

bge-reranker-v2.5-gemma2-lightweight是基于gemma2-9b训练的。它也适用于多语言场景。

除了分层缩减功能外，bge-reranker-v2.5-gemma2-lightweight还集成了令牌压缩能力，可在保持出色性能的同时进一步节省更多资源。

from FlagEmbedding import LightWeightFlagLLMReranker

reranker = LightWeightFlagLLMReranker('BAAI/bge-reranker-v2.5-gemma2-lightweight', devices=["cuda:0"], use_fp16=True)

# Adjusting 'cutoff_layers' to pick which layers are used for computing the score.
score = reranker.compute_score(['query', 'passage'], cutoff_layers=[28], compress_ratio=2, compress_layers=[24, 40])
print(score)

Loading checkpoint shards: 100%|██████████| 4/4 [00:01<00:00,  3.60it/s]
You're using a GemmaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
100%|██████████| 1/1 [00:00<00:00, 23.95it/s]

[14.734375]

对比#

BGE reranker系列为各类功能需求提供了丰富的选择。您可以根据应用场景和资源情况选择合适的模型：

对于多语言场景，请使用 BAAI/bge-reranker-v2-m3、BAAI/bge-reranker-v2-gemma 和 BAAI/bge-reranker-v2.5-gemma2-lightweight。
针对中文或英文，使用 BAAI/bge-reranker-v2-m3 和 BAAI/bge-reranker-v2-minicpm-layerwise。
出于效率考虑，建议使用BAAI/bge-reranker-v2-m3和BAAI/bge-reranker-v2-minicpm-layerwise的低层结构。
为了节省资源和实现极致效率，使用 BAAI/bge-reranker-base 和 BAAI/bge-reranker-large。
为获得更佳性能，推荐使用 BAAI/bge-reranker-v2-minicpm-layerwise 和 BAAI/bge-reranker-v2-gemma。

请务必在实际用例中进行测试，选择速度与质量平衡最佳的那个！