使用HuggingFace进行本地嵌入
LlamaIndex 支持 HuggingFace 嵌入模型,包括像 BGE、Mixedbread、Nomic、Jina、E5 等 Sentence Transformer 模型。我们可以使用这些模型为文档和查询创建嵌入以进行检索。
此外,我们提供实用工具,通过HuggingFace的Optimum库来创建和使用ONNX与OpenVINO模型。
HuggingFaceEmbedding
Section titled “HuggingFaceEmbedding”基础的 HuggingFaceEmbedding 类是一个通用的包装器,适用于任何 HuggingFace 的嵌入模型。所有 Hugging Face 上的嵌入模型都应该能正常工作。您可以参考嵌入排行榜获取更多推荐。
该类依赖于 sentence-transformers 包,您可以通过 pip install sentence-transformers 进行安装。
注意:如果您之前使用的是 LangChain 中的 HuggingFaceEmbeddings,这应该会给出等效的结果。
如果您在 Colab 上打开这个笔记本,您可能需要安装 LlamaIndex 🦙。
%pip install llama-index-embeddings-huggingface!pip install llama-indexfrom llama_index.embeddings.huggingface import HuggingFaceEmbedding
# loads https://huggingface.co/BAAI/bge-small-en-v1.5embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")embeddings = embed_model.get_text_embedding("Hello World!")print(len(embeddings))print(embeddings[:5])384[-0.003275700844824314, -0.011690810322761536, 0.041559211909770966, -0.03814814239740372, 0.024183044210076332]让我们尝试使用一份经典的大型文档进行比较——IPCC气候报告,第三章。
!curl https://www.ipcc.ch/report/ar6/wg2/downloads/report/IPCC_AR6_WGII_Chapter03.pdf --output IPCC_AR6_WGII_Chapter03.pdf % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0100 20.7M 100 20.7M 0 0 69.6M 0 --:--:-- --:--:-- --:--:-- 70.0Mfrom llama_index.core import VectorStoreIndex, SimpleDirectoryReaderfrom llama_index.core import Settings
documents = SimpleDirectoryReader( input_files=["IPCC_AR6_WGII_Chapter03.pdf"]).load_data()基础HuggingFace嵌入
Section titled “Base HuggingFace Embeddings”from llama_index.embeddings.huggingface import HuggingFaceEmbedding
# loads BAAI/bge-small-en-v1.5 with the default torch backendembed_model = HuggingFaceEmbedding( model_name="BAAI/bge-small-en-v1.5", device="cpu", embed_batch_size=8,)test_embeds = embed_model.get_text_embedding("Hello World!")
Settings.embed_model = embed_model%%timeit -r 1 -n 1index = VectorStoreIndex.from_documents(documents, show_progress=True)Parsing nodes: 100%|██████████| 172/172 [00:00<00:00, 428.44it/s]Generating embeddings: 100%|██████████| 459/459 [00:19<00:00, 23.32it/s]
20.2 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)# pip install sentence-transformers[onnx]
# loads BAAI/bge-small-en-v1.5 with the onnx backendembed_model = HuggingFaceEmbedding( model_name="BAAI/bge-small-en-v1.5", device="cpu", backend="onnx", model_kwargs={ "provider": "CPUExecutionProvider" }, # For ONNX, you can specify the provider, see https://sbert.net/docs/sentence_transformer/usage/efficiency.html)test_embeds = embed_model.get_text_embedding("Hello World!")
Settings.embed_model = embed_model%%timeit -r 1 -n 1index = VectorStoreIndex.from_documents(documents, show_progress=True)Parsing nodes: 100%|██████████| 172/172 [00:00<00:00, 421.63it/s]Generating embeddings: 100%|██████████| 459/459 [00:31<00:00, 14.53it/s]
32.1 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)OpenVINO 嵌入
Section titled “OpenVINO Embeddings”# pip install sentence-transformers[openvino]
# loads BAAI/bge-small-en-v1.5 with the openvino backendembed_model = HuggingFaceEmbedding( model_name="BAAI/bge-small-en-v1.5", device="cpu", backend="openvino", # OpenVINO is very strong on CPUs revision="refs/pr/16", # BAAI/bge-small-en-v1.5 itself doesn't have an OpenVINO model currently, but there's a PR with it that we can load: https://huggingface.co/BAAI/bge-small-en-v1.5/discussions/16 model_kwargs={ "file_name": "openvino_model_qint8_quantized.xml" }, # If we're using an optimized/quantized model, we need to specify the file name like this)test_embeds = embed_model.get_text_embedding("Hello World!")
Settings.embed_model = embed_model%%timeit -r 1 -n 1index = VectorStoreIndex.from_documents(documents, show_progress=True)Parsing nodes: 100%|██████████| 172/172 [00:00<00:00, 403.15it/s]Generating embeddings: 100%|██████████| 459/459 [00:08<00:00, 53.83it/s]
9.03 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)- 本地嵌入模型 进一步说明了如何使用这类本地模型。
- 句子转换器 > 加速推理 包含关于如何有效使用后端选项的详细文档,包括针对 ONNX 和 OpenVINO 的优化与量化。