跳转到内容

使用HuggingFace进行本地嵌入

LlamaIndex 支持 HuggingFace 嵌入模型,包括像 BGE、Mixedbread、Nomic、Jina、E5 等 Sentence Transformer 模型。我们可以使用这些模型为文档和查询创建嵌入以进行检索。

此外,我们提供实用工具,通过HuggingFace的Optimum库来创建和使用ONNX与OpenVINO模型。

基础的 HuggingFaceEmbedding 类是一个通用的包装器,适用于任何 HuggingFace 的嵌入模型。所有 Hugging Face 上的嵌入模型都应该能正常工作。您可以参考嵌入排行榜获取更多推荐。

该类依赖于 sentence-transformers 包,您可以通过 pip install sentence-transformers 进行安装。

注意:如果您之前使用的是 LangChain 中的 HuggingFaceEmbeddings,这应该会给出等效的结果。

如果您在 Colab 上打开这个笔记本,您可能需要安装 LlamaIndex 🦙。

%pip install llama-index-embeddings-huggingface
!pip install llama-index
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
# loads https://huggingface.co/BAAI/bge-small-en-v1.5
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
embeddings = embed_model.get_text_embedding("Hello World!")
print(len(embeddings))
print(embeddings[:5])
384
[-0.003275700844824314, -0.011690810322761536, 0.041559211909770966, -0.03814814239740372, 0.024183044210076332]

让我们尝试使用一份经典的大型文档进行比较——IPCC气候报告,第三章。

!curl https://www.ipcc.ch/report/ar6/wg2/downloads/report/IPCC_AR6_WGII_Chapter03.pdf --output IPCC_AR6_WGII_Chapter03.pdf
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 20.7M 100 20.7M 0 0 69.6M 0 --:--:-- --:--:-- --:--:-- 70.0M
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core import Settings
documents = SimpleDirectoryReader(
input_files=["IPCC_AR6_WGII_Chapter03.pdf"]
).load_data()
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
# loads BAAI/bge-small-en-v1.5 with the default torch backend
embed_model = HuggingFaceEmbedding(
model_name="BAAI/bge-small-en-v1.5",
device="cpu",
embed_batch_size=8,
)
test_embeds = embed_model.get_text_embedding("Hello World!")
Settings.embed_model = embed_model
%%timeit -r 1 -n 1
index = VectorStoreIndex.from_documents(documents, show_progress=True)
Parsing nodes: 100%|██████████| 172/172 [00:00<00:00, 428.44it/s]
Generating embeddings: 100%|██████████| 459/459 [00:19<00:00, 23.32it/s]
20.2 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
# pip install sentence-transformers[onnx]
# loads BAAI/bge-small-en-v1.5 with the onnx backend
embed_model = HuggingFaceEmbedding(
model_name="BAAI/bge-small-en-v1.5",
device="cpu",
backend="onnx",
model_kwargs={
"provider": "CPUExecutionProvider"
}, # For ONNX, you can specify the provider, see https://sbert.net/docs/sentence_transformer/usage/efficiency.html
)
test_embeds = embed_model.get_text_embedding("Hello World!")
Settings.embed_model = embed_model
%%timeit -r 1 -n 1
index = VectorStoreIndex.from_documents(documents, show_progress=True)
Parsing nodes: 100%|██████████| 172/172 [00:00<00:00, 421.63it/s]
Generating embeddings: 100%|██████████| 459/459 [00:31<00:00, 14.53it/s]
32.1 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
# pip install sentence-transformers[openvino]
# loads BAAI/bge-small-en-v1.5 with the openvino backend
embed_model = HuggingFaceEmbedding(
model_name="BAAI/bge-small-en-v1.5",
device="cpu",
backend="openvino", # OpenVINO is very strong on CPUs
revision="refs/pr/16", # BAAI/bge-small-en-v1.5 itself doesn't have an OpenVINO model currently, but there's a PR with it that we can load: https://huggingface.co/BAAI/bge-small-en-v1.5/discussions/16
model_kwargs={
"file_name": "openvino_model_qint8_quantized.xml"
}, # If we're using an optimized/quantized model, we need to specify the file name like this
)
test_embeds = embed_model.get_text_embedding("Hello World!")
Settings.embed_model = embed_model
%%timeit -r 1 -n 1
index = VectorStoreIndex.from_documents(documents, show_progress=True)
Parsing nodes: 100%|██████████| 172/172 [00:00<00:00, 403.15it/s]
Generating embeddings: 100%|██████████| 459/459 [00:08<00:00, 53.83it/s]
9.03 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)