NVIDIA NIMs¶
llama-index-llms-nvidia 软件包包含LlamaIndex与NVIDIA NIM推理微服务上模型构建应用的集成。NIM支持跨领域的模型,如来自社区和NVIDIA的聊天、嵌入和重新排序模型。这些模型经过NVIDIA优化,可在NVIDIA加速基础设施上提供最佳性能,并部署为NIM——一个易于使用的预构建容器,只需在NVIDIA加速基础设施上执行单一命令即可随处部署。
NVIDIA托管的NIM部署可在NVIDIA API目录上进行测试。测试完成后,企业可以使用NVIDIA AI Enterprise许可证从NVIDIA API目录导出NIM,并在本地或云端运行,从而获得对其知识产权和AI应用的完全所有权和控制权。
NIMs以每个模型为基础打包为容器镜像,并通过NVIDIA NGC目录作为NGC容器镜像分发。 NIMs的核心是为AI模型运行推理提供简单、一致且熟悉的API。
!pip install llama-index-core
!pip install llama-index-readers-file
!pip install llama-index-llms-nvidia
!pip install llama-index-embeddings-nvidia
!pip install llama-index-postprocessor-nvidia-rerank
引入一个测试数据集,这是一份关于2021年旧金山住房建设的PDF文件。
!mkdir data
!wget "https://www.dropbox.com/scl/fi/p33j9112y0ysgwg77fdjz/2021_Housing_Inventory.pdf?rlkey=yyok6bb18s5o31snjd2dxkxz3&dl=0" -O "data/housing_data.pdf"
--2024-05-28 17:42:44-- https://www.dropbox.com/scl/fi/p33j9112y0ysgwg77fdjz/2021_Housing_Inventory.pdf?rlkey=yyok6bb18s5o31snjd2dxkxz3&dl=0 Resolving www.dropbox.com (www.dropbox.com)... 162.125.1.18, 2620:100:6016:18::a27d:112 Connecting to www.dropbox.com (www.dropbox.com)|162.125.1.18|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com/cd/0/inline/CTzJ0ZeHC3AFIV3iv1bv9v0oMNXW03OW2waLdeKJNs0X6Tto0MSewm9RZBHwSLhqk4jWFaCmbhMGVXeWa6xPO4mAR4hC3xflJfwgS9Z4lpPUyE4AtlDXpnfsltjEaNeFCSY/file# [following] --2024-05-28 17:42:45-- https://ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com/cd/0/inline/CTzJ0ZeHC3AFIV3iv1bv9v0oMNXW03OW2waLdeKJNs0X6Tto0MSewm9RZBHwSLhqk4jWFaCmbhMGVXeWa6xPO4mAR4hC3xflJfwgS9Z4lpPUyE4AtlDXpnfsltjEaNeFCSY/file Resolving ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com (ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com)... 162.125.4.15, 2620:100:6016:15::a27d:10f Connecting to ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com (ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com)|162.125.4.15|:443... connected. HTTP request sent, awaiting response... 302 Found Location: /cd/0/inline2/CTySzMwupnuXzKpccOeYJ-7RI0NK0f7XMKBkpicHxSBuuwqAvFly51Fm0oCOwFctgeTqmD3thJsTqfFNOFHNe2JSIkJerj3mMr4Du3C7x1BcSy8t5raSfHQ_qSXF1eHrhdFII8Ou59jbofYVLe0punOl-RIa9k_v722SwkxVbg0KL9MrRL48XjX7JbsYHKTHq-gZSdAmpXpIGqS22eJavcSTuYMIy_GSZtDIs3quHM3PGU4849rG34RjpvAa-XkYDBdE996CxWupZ1C2Red9jEc5Tc6miGgt8-4LbGoxKwKF5I_Q3EqHCbvkibVR8OuKSKPtQZcNJSjsvIImzDLJ2WB6BAp2CBxz8szFF3jF3Gp6Iw/file [following] --2024-05-28 17:42:45-- https://ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com/cd/0/inline2/CTySzMwupnuXzKpccOeYJ-7RI0NK0f7XMKBkpicHxSBuuwqAvFly51Fm0oCOwFctgeTqmD3thJsTqfFNOFHNe2JSIkJerj3mMr4Du3C7x1BcSy8t5raSfHQ_qSXF1eHrhdFII8Ou59jbofYVLe0punOl-RIa9k_v722SwkxVbg0KL9MrRL48XjX7JbsYHKTHq-gZSdAmpXpIGqS22eJavcSTuYMIy_GSZtDIs3quHM3PGU4849rG34RjpvAa-XkYDBdE996CxWupZ1C2Red9jEc5Tc6miGgt8-4LbGoxKwKF5I_Q3EqHCbvkibVR8OuKSKPtQZcNJSjsvIImzDLJ2WB6BAp2CBxz8szFF3jF3Gp6Iw/file Reusing existing connection to ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com:443. HTTP request sent, awaiting response... 200 OK Length: 4808625 (4.6M) [application/pdf] Saving to: ‘data/housing_data.pdf’ data/housing_data.p 100%[===================>] 4.58M 8.26MB/s in 0.6s 2024-05-28 17:42:47 (8.26 MB/s) - ‘data/housing_data.pdf’ saved [4808625/4808625]
设置¶
导入我们的依赖项并从API目录https://build.nvidia.com设置NVIDIA API密钥,用于我们将使用的两个托管在目录上的模型(嵌入和重新排序模型)。
开始使用:
在NVIDIA创建一个免费账户,该平台托管了NVIDIA AI基础模型。
点击您选择的模型。
在输入部分选择Python标签页,点击
Get API Key。然后点击Generate Key。复制并保存生成的密钥为NVIDIA_API_KEY。之后,您应该就能访问这些端点了。
from llama_index.core import SimpleDirectoryReader, Settings, VectorStoreIndex
from llama_index.embeddings.nvidia import NVIDIAEmbedding
from llama_index.llms.nvidia import NVIDIA
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core import Settings
from google.colab import userdata
import os
os.environ["NVIDIA_API_KEY"] = userdata.get("nvidia-api-key")
让我们使用NVIDIA托管的NIM作为嵌入模型。
NVIDIA的默认嵌入仅处理前512个标记,因此我们将分块大小设置为500,以最大化嵌入的准确性。
Settings.text_splitter = SentenceSplitter(chunk_size=500)
documents = SimpleDirectoryReader("./data").load_data()
我们将嵌入模型设置为NVIDIA的默认配置。如果某个文本块超过了模型能编码的token数量,默认情况下会抛出错误,因此我们设置truncate="END"来丢弃超出限制的token(由于我们之前设置的文本块大小,希望这种情况不会太多)。
Settings.embed_model = NVIDIAEmbedding(model="NV-Embed-QA", truncate="END")
index = VectorStoreIndex.from_documents(documents)
现在我们已经将数据嵌入并在内存中建立了索引,接下来设置本地自托管的LLM。按照这个NIM快速入门指南,使用Docker可以在5分钟内完成NIM的本地部署。
下面,我们将展示如何:
- 使用Meta开源的
meta/llama3-8b-instruct模型作为本地NIM meta/llama3-70b-instruct作为来自NVIDIA托管的API目录中的NIM。
如果您使用的是本地NIM,请确保将base_url更改为您部署的NIM URL!
我们将检索前5个最相关的文本块来回答我们的问题。
# self-hosted NIM: if you want to use a self-hosted NIM uncomment the line below
# and comment the line using the API catalog
# Settings.llm = NVIDIA(model="meta/llama3-8b-instruct", base_url="http://your-nim-host-address:8000/v1")
# api catalog NIM: if you're using a self-hosted NIM comment the line below
# and un-comment the line using local NIM above
Settings.llm = NVIDIA(model="meta/llama3-70b-instruct")
query_engine = index.as_query_engine(similarity_top_k=20)
让我们问一个简单的问题,我们知道答案在文档的某一处(第18页)。
response = query_engine.query(
"How many new housing units were built in San Francisco in 2021?"
)
print(response)
There was a net addition of 4,649 units to the City’s housing stock in 2021.
现在让我们问一个更复杂的问题,需要阅读表格(该表格位于文档第41页):
response = query_engine.query(
"What was the net gain in housing units in the Mission in 2021?"
)
print(response)
There is no specific information about the net gain in housing units in the Mission in 2021. The provided data is about the city's overall housing stock and production, but it does not provide a breakdown by neighborhood, including the Mission.
这不太行!这是全新的数值,不是我们想要的数字。让我们尝试一个更高级的PDF解析工具LlamaParse:
!pip install llama-parse
from llama_parse import LlamaParse
# in a notebook, LlamaParse requires this to work
import nest_asyncio
nest_asyncio.apply()
# you can get a key at cloud.llamaindex.ai
os.environ["LLAMA_CLOUD_API_KEY"] = userdata.get("llama-cloud-key")
# set up parser
parser = LlamaParse(
result_type="markdown" # "markdown" and "text" are available
)
# use SimpleDirectoryReader to parse our file
file_extractor = {".pdf": parser}
documents2 = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
Started parsing the file under job_id 84cb91f7-45ec-4b99-8281-0f4beef6a892
index2 = VectorStoreIndex.from_documents(documents2)
query_engine2 = index2.as_query_engine(similarity_top_k=20)
response = query_engine2.query(
"What was the net gain in housing units in the Mission in 2021?"
)
print(response)
The net gain in housing units in the Mission in 2021 was 1,305 units.
完美!通过使用更好的解析器,LLM能够回答这个问题。
现在让我们尝试一个更棘手的问题:
response = query_engine2.query(
"How many affordable housing units were completed in 2021?"
)
print(response)
Repeat: 110
LLM似乎感到困惑;这看起来是住房单元的增长百分比。
让我们尝试为LLM提供更多上下文(40而非20),然后使用重新排序器对这些文本块进行排序。这里我们将使用NVIDIA的重新排序器:
from llama_index.postprocessor.nvidia_rerank import NVIDIARerank
query_engine3 = index2.as_query_engine(
similarity_top_k=40, node_postprocessors=[NVIDIARerank(top_n=10)]
)
response = query_engine3.query(
"How many affordable housing units were completed in 2021?"
)
print(response)
1,495
太棒了!现在图表是正确的(如果你好奇的话,这位于第35页)。