BGE 重排序器#

与嵌入模型类似,BGE提供了一系列不同规模和功能的排序器。本教程将介绍BGE排序器系列。

0. 安装#

在环境中安装依赖项。

%pip install -U FlagEmbedding

1. bge-reranker#

第一代BGE reranker包含两个模型:

模型

语言

参数

描述

基础模型

BAAI/bge-reranker-base

中文和英文

278M

交叉编码器模型,精度更高但效率较低

XLM-RoBERTa-Base

BAAI/bge-reranker-large

中文和英文

560M

交叉编码器模型,精度更高但效率较低

XLM-RoBERTa-Large

from FlagEmbedding import FlagReranker

model = FlagReranker(
    'BAAI/bge-reranker-large',
    use_fp16=True,
    devices=["cuda:0"],   # if you don't have GPUs, you can use "cpu"
)

pairs = [
    ["What is the capital of France?", "Paris is the capital of France."],
    ["What is the capital of France?", "The population of China is over 1.4 billion people."],
    ["What is the population of China?", "Paris is the capital of France."],
    ["What is the population of China?", "The population of China is over 1.4 billion people."]
]

scores = model.compute_score(pairs)
scores
/share/project/xzy/Envs/ft/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
You're using a XLMRobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
[7.984375, -6.84375, -7.15234375, 5.44921875]

2. bge-reranker v2#

模型

语言

参数

描述

基础模型

BAAI/bge-reranker-v2-m3

多语言

568M

轻量级交叉编码器模型,具备强大的多语言能力,易于部署,推理速度快。

XLM-RoBERTa-Large

BAAI/bge-reranker-v2-gemma

多语言

2.51B

适用于多语言场景的交叉编码器模型,在英语能力和多语言能力方面均表现优异。

Gemma2-2B

BAAI/bge-reranker-v2-minicpm-layerwise

多语言

2.72B

适用于多语言场景的交叉编码器模型,在英语和中文能力上均表现优异,可自由选择输出层以加速推理。

MiniCPM

BAAI/bge-reranker-v2.5-gemma2-lightweight

多语言

9.24B

适用于多语言场景的交叉编码器模型,在英语和中文能力上均表现优异,允许自由选择输出层、压缩比例和压缩层,便于加速推理。

Gemma2-9B

bge-reranker-v2-m3#

bge-reranker-v2-m3基于bge-m3训练,在保持轻量级模型体积的同时引入了强大的多语言能力。

from FlagEmbedding import FlagReranker

# Setting use_fp16 to True speeds up computation with a slight performance degradation (if using gpu)
reranker = FlagReranker('BAAI/bge-reranker-v2-m3', devices=["cuda:0"], use_fp16=True)

score = reranker.compute_score(['query', 'passage'])
# or set "normalize=True" to apply a sigmoid function to the score for 0-1 range
score = reranker.compute_score(['query', 'passage'], normalize=True)

print(score)
You're using a XLMRobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
[0.003483424193080668]

bge-reranker-v2-gemma#

bge-reranker-v2-gemma是基于gemma-2b训练的模型。它在英语熟练度和多语言能力方面都表现出色。

from FlagEmbedding import FlagLLMReranker

reranker = FlagLLMReranker('BAAI/bge-reranker-v2-gemma', devices=["cuda:0"], use_fp16=True)

score = reranker.compute_score(['query', 'passage'])
print(score)
Loading checkpoint shards: 100%|██████████| 3/3 [00:00<00:00,  5.29it/s]
You're using a GemmaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
100%|██████████| 1/1 [00:00<00:00, 45.99it/s]
[1.974609375]

bge-reranker-v2-minicpm-layerwise#

bge-reranker-v2-minicpm-layerwise 是基于 minicpm-2b-dpo-bf16 训练的。它适用于多语言环境,在英语和中文能力方面都表现良好。

另一个特殊功能是分层设计让用户可以自由选择输出层,从而加速推理过程。

from FlagEmbedding import LayerWiseFlagLLMReranker

reranker = LayerWiseFlagLLMReranker('BAAI/bge-reranker-v2-minicpm-layerwise', devices=["cuda:0"], use_fp16=True)

# Adjusting 'cutoff_layers' to pick which layers are used for computing the score.
score = reranker.compute_score(['query', 'passage'], cutoff_layers=[28])
print(score)
Loading checkpoint shards: 100%|██████████| 3/3 [00:00<00:00,  3.85it/s]
You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
100%|██████████| 1/1 [00:00<00:00, 24.51it/s]
[-7.06640625]

bge-reranker-v2.5-gemma2轻量版#

bge-reranker-v2.5-gemma2-lightweight是基于gemma2-9b训练的。它也适用于多语言场景。

除了分层缩减功能外,bge-reranker-v2.5-gemma2-lightweight还集成了令牌压缩能力,可在保持出色性能的同时进一步节省更多资源。

from FlagEmbedding import LightWeightFlagLLMReranker

reranker = LightWeightFlagLLMReranker('BAAI/bge-reranker-v2.5-gemma2-lightweight', devices=["cuda:0"], use_fp16=True)

# Adjusting 'cutoff_layers' to pick which layers are used for computing the score.
score = reranker.compute_score(['query', 'passage'], cutoff_layers=[28], compress_ratio=2, compress_layers=[24, 40])
print(score)
Loading checkpoint shards: 100%|██████████| 4/4 [00:01<00:00,  3.60it/s]
You're using a GemmaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
100%|██████████| 1/1 [00:00<00:00, 23.95it/s]
[14.734375]

对比#

BGE reranker系列为各类功能需求提供了丰富的选择。您可以根据应用场景和资源情况选择合适的模型:

  • 对于多语言场景,请使用 BAAI/bge-reranker-v2-m3BAAI/bge-reranker-v2-gemmaBAAI/bge-reranker-v2.5-gemma2-lightweight

  • 针对中文或英文,使用 BAAI/bge-reranker-v2-m3BAAI/bge-reranker-v2-minicpm-layerwise

  • 出于效率考虑,建议使用BAAI/bge-reranker-v2-m3BAAI/bge-reranker-v2-minicpm-layerwise的低层结构。

  • 为了节省资源和实现极致效率,使用 BAAI/bge-reranker-baseBAAI/bge-reranker-large

  • 为获得更佳性能,推荐使用 BAAI/bge-reranker-v2-minicpm-layerwiseBAAI/bge-reranker-v2-gemma

请务必在实际用例中进行测试,选择速度与质量平衡最佳的那个!