BGE 重排序器#
与嵌入模型类似,BGE提供了一系列不同规模和功能的排序器。本教程将介绍BGE排序器系列。
0. 安装#
在环境中安装依赖项。
%pip install -U FlagEmbedding
1. bge-reranker#
第一代BGE reranker包含两个模型:
模型 |
语言 |
参数 |
描述 |
基础模型 |
|---|---|---|---|---|
中文和英文 |
278M |
交叉编码器模型,精度更高但效率较低 |
XLM-RoBERTa-Base |
|
中文和英文 |
560M |
交叉编码器模型,精度更高但效率较低 |
XLM-RoBERTa-Large |
from FlagEmbedding import FlagReranker
model = FlagReranker(
'BAAI/bge-reranker-large',
use_fp16=True,
devices=["cuda:0"], # if you don't have GPUs, you can use "cpu"
)
pairs = [
["What is the capital of France?", "Paris is the capital of France."],
["What is the capital of France?", "The population of China is over 1.4 billion people."],
["What is the population of China?", "Paris is the capital of France."],
["What is the population of China?", "The population of China is over 1.4 billion people."]
]
scores = model.compute_score(pairs)
scores
/share/project/xzy/Envs/ft/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
You're using a XLMRobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
[7.984375, -6.84375, -7.15234375, 5.44921875]
2. bge-reranker v2#
模型 |
语言 |
参数 |
描述 |
基础模型 |
|---|---|---|---|---|
多语言 |
568M |
轻量级交叉编码器模型,具备强大的多语言能力,易于部署,推理速度快。 |
XLM-RoBERTa-Large |
|
多语言 |
2.51B |
适用于多语言场景的交叉编码器模型,在英语能力和多语言能力方面均表现优异。 |
Gemma2-2B |
|
多语言 |
2.72B |
适用于多语言场景的交叉编码器模型,在英语和中文能力上均表现优异,可自由选择输出层以加速推理。 |
MiniCPM |
|
多语言 |
9.24B |
适用于多语言场景的交叉编码器模型,在英语和中文能力上均表现优异,允许自由选择输出层、压缩比例和压缩层,便于加速推理。 |
Gemma2-9B |
bge-reranker-v2-m3#
bge-reranker-v2-m3基于bge-m3训练,在保持轻量级模型体积的同时引入了强大的多语言能力。
from FlagEmbedding import FlagReranker
# Setting use_fp16 to True speeds up computation with a slight performance degradation (if using gpu)
reranker = FlagReranker('BAAI/bge-reranker-v2-m3', devices=["cuda:0"], use_fp16=True)
score = reranker.compute_score(['query', 'passage'])
# or set "normalize=True" to apply a sigmoid function to the score for 0-1 range
score = reranker.compute_score(['query', 'passage'], normalize=True)
print(score)
You're using a XLMRobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
[0.003483424193080668]
bge-reranker-v2-gemma#
bge-reranker-v2-gemma是基于gemma-2b训练的模型。它在英语熟练度和多语言能力方面都表现出色。
from FlagEmbedding import FlagLLMReranker
reranker = FlagLLMReranker('BAAI/bge-reranker-v2-gemma', devices=["cuda:0"], use_fp16=True)
score = reranker.compute_score(['query', 'passage'])
print(score)
Loading checkpoint shards: 100%|██████████| 3/3 [00:00<00:00, 5.29it/s]
You're using a GemmaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
100%|██████████| 1/1 [00:00<00:00, 45.99it/s]
[1.974609375]
bge-reranker-v2-minicpm-layerwise#
bge-reranker-v2-minicpm-layerwise 是基于 minicpm-2b-dpo-bf16 训练的。它适用于多语言环境,在英语和中文能力方面都表现良好。
另一个特殊功能是分层设计让用户可以自由选择输出层,从而加速推理过程。
from FlagEmbedding import LayerWiseFlagLLMReranker
reranker = LayerWiseFlagLLMReranker('BAAI/bge-reranker-v2-minicpm-layerwise', devices=["cuda:0"], use_fp16=True)
# Adjusting 'cutoff_layers' to pick which layers are used for computing the score.
score = reranker.compute_score(['query', 'passage'], cutoff_layers=[28])
print(score)
Loading checkpoint shards: 100%|██████████| 3/3 [00:00<00:00, 3.85it/s]
You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
100%|██████████| 1/1 [00:00<00:00, 24.51it/s]
[-7.06640625]
bge-reranker-v2.5-gemma2轻量版#
bge-reranker-v2.5-gemma2-lightweight是基于gemma2-9b训练的。它也适用于多语言场景。
除了分层缩减功能外,bge-reranker-v2.5-gemma2-lightweight还集成了令牌压缩能力,可在保持出色性能的同时进一步节省更多资源。
from FlagEmbedding import LightWeightFlagLLMReranker
reranker = LightWeightFlagLLMReranker('BAAI/bge-reranker-v2.5-gemma2-lightweight', devices=["cuda:0"], use_fp16=True)
# Adjusting 'cutoff_layers' to pick which layers are used for computing the score.
score = reranker.compute_score(['query', 'passage'], cutoff_layers=[28], compress_ratio=2, compress_layers=[24, 40])
print(score)
Loading checkpoint shards: 100%|██████████| 4/4 [00:01<00:00, 3.60it/s]
You're using a GemmaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
100%|██████████| 1/1 [00:00<00:00, 23.95it/s]
[14.734375]
对比#
BGE reranker系列为各类功能需求提供了丰富的选择。您可以根据应用场景和资源情况选择合适的模型:
对于多语言场景,请使用
BAAI/bge-reranker-v2-m3、BAAI/bge-reranker-v2-gemma和BAAI/bge-reranker-v2.5-gemma2-lightweight。针对中文或英文,使用
BAAI/bge-reranker-v2-m3和BAAI/bge-reranker-v2-minicpm-layerwise。出于效率考虑,建议使用
BAAI/bge-reranker-v2-m3和BAAI/bge-reranker-v2-minicpm-layerwise的低层结构。为了节省资源和实现极致效率,使用
BAAI/bge-reranker-base和BAAI/bge-reranker-large。为获得更佳性能,推荐使用
BAAI/bge-reranker-v2-minicpm-layerwise和BAAI/bge-reranker-v2-gemma。
请务必在实际用例中进行测试,选择速度与质量平衡最佳的那个!