JSON查询引擎
JSON查询引擎对于查询符合JSON模式的JSON文档非常有用。
该JSON模式随后在提示上下文中使用,将自然语言查询转换为结构化的JSON Path查询。该JSON Path查询随后用于检索数据以回答给定问题。
如果您在 Colab 上打开这个笔记本,您可能需要安装 LlamaIndex 🦙。
%pip install llama-index-llms-openai!pip install llama-index# First, install the jsonpath-ng package which is used by default to parse & execute the JSONPath queries.!pip install jsonpath-ngRequirement already satisfied: jsonpath-ng in /Users/loganmarkewich/llama_index/llama-index/lib/python3.9/site-packages (1.5.3)Requirement already satisfied: ply in /Users/loganmarkewich/llama_index/llama-index/lib/python3.9/site-packages (from jsonpath-ng) (3.11)Requirement already satisfied: six in /Users/loganmarkewich/llama_index/llama-index/lib/python3.9/site-packages (from jsonpath-ng) (1.16.0)Requirement already satisfied: decorator in /Users/loganmarkewich/llama_index/llama-index/lib/python3.9/site-packages (from jsonpath-ng) (5.1.1)[33mWARNING: You are using pip version 21.2.4; however, version 23.2.1 is available.You should consider upgrading via the '/Users/loganmarkewich/llama_index/llama-index/bin/python3 -m pip install --upgrade pip' command.[0mimport loggingimport sys
logging.basicConfig(stream=sys.stdout, level=logging.INFO)logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))import osimport openai
os.environ["OPENAI_API_KEY"] = "YOUR_KEY_HERE"from IPython.display import Markdown, display让我们从一个玩具JSON开始
Section titled “Let’s start on a Toy JSON”一个非常简单的JSON对象,包含来自博客文章网站的用户评论数据。
我们还将提供一个JSON模式(我们通过给ChatGPT一个JSON示例就能生成)。
请确保您已为JSON模式中的每个字段提供了有用的"description"值。
如示例所示,"username" 字段的描述提到用户名会被转换为小写。您会发现这最终有助于大语言模型生成正确的 JSON 路径查询。
# Test on some sample datajson_value = { "blogPosts": [ { "id": 1, "title": "First blog post", "content": "This is my first blog post", }, { "id": 2, "title": "Second blog post", "content": "This is my second blog post", }, ], "comments": [ { "id": 1, "content": "Nice post!", "username": "jerry", "blogPostId": 1, }, { "id": 2, "content": "Interesting thoughts", "username": "simon", "blogPostId": 2, }, { "id": 3, "content": "Loved reading this!", "username": "simon", "blogPostId": 2, }, ],}
# JSON Schema object that the above JSON value conforms tojson_schema = { "$schema": "http://json-schema.org/draft-07/schema#", "description": "Schema for a very simple blog post app", "type": "object", "properties": { "blogPosts": { "description": "List of blog posts", "type": "array", "items": { "type": "object", "properties": { "id": { "description": "Unique identifier for the blog post", "type": "integer", }, "title": { "description": "Title of the blog post", "type": "string", }, "content": { "description": "Content of the blog post", "type": "string", }, }, "required": ["id", "title", "content"], }, }, "comments": { "description": "List of comments on blog posts", "type": "array", "items": { "type": "object", "properties": { "id": { "description": "Unique identifier for the comment", "type": "integer", }, "content": { "description": "Content of the comment", "type": "string", }, "username": { "description": ( "Username of the commenter (lowercased)" ), "type": "string", }, "blogPostId": { "description": ( "Identifier for the blog post to which the comment" " belongs" ), "type": "integer", }, }, "required": ["id", "content", "username", "blogPostId"], }, }, }, "required": ["blogPosts", "comments"],}from llama_index.llms.openai import OpenAIfrom llama_index.core.indices.struct_store import JSONQueryEngine
llm = OpenAI(model="gpt-4")
nl_query_engine = JSONQueryEngine( json_value=json_value, json_schema=json_schema, llm=llm,)raw_query_engine = JSONQueryEngine( json_value=json_value, json_schema=json_schema, llm=llm, synthesize_response=False,)nl_response = nl_query_engine.query( "What comments has Jerry been writing?",)raw_response = raw_query_engine.query( "What comments has Jerry been writing?",)display( Markdown(f"<h1>Natural language Response</h1><br><b>{nl_response}</b>"))display(Markdown(f"<h1>Raw JSON Response</h1><br><b>{raw_response}</b>"))自然语言响应
Jerry has written the comment "Nice post!".
原始JSON响应
["Nice post!"]
# get the json path query string. Same would apply to raw_responseprint(nl_response.metadata["json_path_response_str"])$.comments[?(@.username=='jerry')].content