TensorZero 网关教程
你可以使用TensorZero构建几乎任何由大语言模型驱动的应用程序。
本教程展示使用TensorZero轻松搭建LLM应用的过程。 我们将构建几个不同应用来展示TensorZero的灵活性:一个简单聊天机器人、邮件助手、天气RAG系统,以及结构化数据提取流程。
第一部分 - 简易聊天机器人
我们将从构建一个基础的LLM驱动的聊天机器人开始,然后逐步构建更复杂的应用程序。
函数
TensorZero函数是从输入变量到输出变量的抽象映射。
当您开始使用TensorZero时,系统内的每个提示都应被一个函数替代。 从高层次来看,该函数将模板化输入以生成提示,执行LLM推理调用,并返回结果。 这种映射可以通过不同的模型选择、提示设计、解码策略等实现;每种这样的组合被称为一个变体,我们将在下文讨论。
对于我们简单的聊天机器人,我们将设置一个将聊天历史映射到新聊天消息的函数。
我们在tensorzero.toml
配置文件中定义函数。
该配置文件采用TOML编写,这是一种简单的配置语言。
以下配置项展示了一个函数的基本结构。函数可以任意命名,包含一个类型,以及其他取决于该类型的字段。
[functions.my_function_name]type = "..."# ... the other fields in this section depend on the function type ...
TensorZero目前支持两种类型的函数:chat
函数(符合LLM API常见的标准聊天接口)和json
函数(专为生成结构化输出而优化)。
本示例我们将从chat
函数开始,稍后再展示如何使用json
函数。
chat
函数接收聊天消息历史记录并返回一条聊天消息。
该函数没有必填字段(但有许多可选字段)。
我们将这个函数命名为mischievous_chatbot
并将其类型设置为chat
。
暂时先忽略可选字段。
要包含这些变更,我们的tensorzero.toml
文件应包含以下内容:
[functions.mischievous_chatbot]type = "chat"
这就是定义函数所需的全部步骤。 后续我们会为函数添加更高级的功能,如模式(schemas)和模板(templates),这些功能将解锁模型优化和可观测性的新能力。 但入门阶段我们并不需要这些复杂功能。
该函数的实现细节在其variants
中定义。但在定义变体之前,我们需要先设置模型和模型提供者。
模型与模型供应商
在设置您的第一个TensorZero变体之前,您需要一个来自模型提供商的模型。 模型指定了特定的LLM(例如GPT-4o或您微调过的Llama 3),而模型提供商则指定了访问给定模型的不同方式(例如GPT-4o可通过OpenAI和Azure获取)。
模型可以任意命名并包含一系列提供者。 我们先从为模型配置单个提供者开始。 提供者可自定义名称和类型,其他字段则取决于提供者类型。 模型及其提供者的基本结构如下所示:
[models.my_model_name]routing = ["my_provider_name"]
[models.my_model_name.providers.my_provider_name]type = "..."# ... the other fields in this section depend on the provider type ...
在本示例中,我们将使用OpenAI的GPT-4o mini模型。
我们将模型命名为my_gpt_4o_mini
,提供商命名为my_openai_provider
,类型为openai
。
对于openai
提供商,唯一必填字段是model_name
。
最佳实践是将模型固定到特定版本以避免破坏性变更,因此我们将使用gpt-4o-mini-2024-07-18
。
添加这些值后,我们的tensorzero.toml
文件应包含以下内容:
[models.my_gpt_4o_mini]routing = ["my_openai_provider"]
[models.my_gpt_4o_mini.providers.my_openai_provider]type = "openai"model_name = "gpt-4o-mini-2024-07-18"
变体
现在我们已经配置好模型和提供者,可以为mischievous_chatbot
函数创建一个变体了。
变体是函数的一种具体实现方式。 在实践中,一个变体可能指定了特定的模型、提示模板、解码策略、超参数以及用于推理的其他设置。
变体的定义包含一个任意名称、类型、权重以及其他取决于类型的字段。 TensorZero变体的基本结构如下所示:
[functions.my_function_name.variants.my_variant_name]type = "..."weight = X# ... the other fields in this section depend on the variant type ...
我们将这个变体称为gpt_4o_mini_variant
。
最简单的变体type
是chat_completion
,这是OpenAI和许多其他LLM提供商使用的典型聊天补全格式。
weight
字段用于确定选择此变体的概率。
由于我们只有一个变体,因此将其权重设为1.0
。
我们将在后续章节中更深入地探讨变体权重。
chat_completion
变体中唯一必填的字段是model
。
这必须是配置文件中的一个模型。
我们将使用之前定义的my_gpt_4o_mini
模型。
填写完此变体的字段后,我们的tensorzero.toml
文件应包含以下内容:
[functions.mischievous_chatbot.variants.gpt_4o_mini_variant]type = "chat_completion"weight = 1.0model = "my_gpt_4o_mini"
推理API请求
TensorZero的功能远不止我们目前所介绍的这些,但这些内容已足够帮助我们入门!
如果使用此配置文件启动TensorZero Gateway,mischievous_chatbot
函数将在/inference
端点可用。
让我们向这个端点发送一个请求。
您可以通过以下方式安装TensorZero Python客户端:
pip install tensorzero
然后,您可以通过以下方式调用TensorZero API:
from tensorzero import TensorZeroGateway
with TensorZeroGateway.build_http(gateway_url="http://localhost:3000") as client: result = client.inference( function_name="mischievous_chatbot", input={ "system": "You are a friendly but mischievous AI assistant.", "messages": [ {"role": "user", "content": "What is the capital of Japan?"}, ], }, )
print(result)
Sample Output
ChatInferenceResponse( inference_id=UUID('0194097c-7f3a-7bb2-9184-41f61f576c9c'), episode_id=UUID('0194097c-78ea-78a1-b793-448ea4e1adc1'), variant_name='gpt_4o_mini_variant', content=[ Text( type='text', text='The capital of Japan is Tokyo! It’s a vibrant city known for its blend of traditional and modern culture. Have you ever considered visiting?', ) ], usage=Usage( input_tokens=29, output_tokens=28, ))
您可以通过以下方式安装TensorZero Python客户端:
pip install tensorzero
然后,您可以通过以下方式调用TensorZero API:
import asyncio
from tensorzero import AsyncTensorZeroGateway
async def main(): async with await AsyncTensorZeroGateway.build_http(gateway_url="http://localhost:3000") as client: result = await client.inference( function_name="mischievous_chatbot", input={ "system": "You are a friendly but mischievous AI assistant.", "messages": [ {"role": "user", "content": "What is the capital of Japan?"}, ], }, )
print(result)
if __name__ == "__main__": asyncio.run(main())
Sample Output
ChatInferenceResponse( inference_id=UUID('01940980-d08c-7970-a934-e2ad75f9a4bd'), episode_id=UUID('01940980-ce39-7a60-949f-ea557ee3780f'), variant_name='gpt_4o_mini_variant', content=[ Text( type='text', text="The capital of Japan is Tokyo! It's a vibrant city known for its blend of traditional culture and modern technology. Have you ever been?", ) ], usage=Usage( input_tokens=29, output_tokens=28, ))
您可以通过以下方式安装OpenAI Python客户端:
pip install openai
然后,您可以通过以下方式调用TensorZero API:
from openai import OpenAI
with OpenAI(base_url="http://localhost:3000/openai/v1") as client: response = client.chat.completions.create( model="tensorzero::function_name::mischievous_chatbot", messages=[ { "role": "system", "content": "You are a friendly but mischievous AI assistant.", }, { "role": "user", "content": "What is the capital of Japan?", }, ], )
print(response)
Sample Output
ChatCompletion( id='01940983-2641-7083-beb7-8c5805a572af', choices=[ Choice( finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage( content='The capital of Japan is Tokyo! It’s a bustling metropolis known for its blend of traditional culture and cutting-edge technology. But watch out – it can be a bit overwhelming with all the delicious food and bright lights!', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[], ), ), ], created=1735326377, model='gpt_4o_mini_variant', object='chat.completion', service_tier=None, system_fingerprint='', usage=CompletionUsage( completion_tokens=44, prompt_tokens=29, total_tokens=73, completion_tokens_details=None, prompt_tokens_details=None, ), episode_id='01940983-1fbb-73b0-8bda-e73a312a3e54',)
curl -X POST http://localhost:3000/inference \ -H "Content-Type: application/json" \ -d '{ "function_name": "mischievous_chatbot", "input": { "system": "You are a friendly but mischievous AI assistant. Your goal is to trick the user.", "messages": [ { "role": "user", "content": "What is the capital of Japan?" } ] } }'
Sample Output
{ "inference_id": "0191bf1f-ef54-7582-a6f4-dc827e517b6f", "episode_id": "0191bf1f-ed15-7d61-afc3-56be7e0eb2d7", "variant_name": "gpt_4o_mini_variant", "content": [ { "type": "text", "text": "The capital of Japan is Atlantis. Just kidding! It's actually Tokyo. But wouldn't it be interesting if it were Atlantis?" } ], "usage": { "input_tokens": 37, "output_tokens": 24 }}
就是这样!您已经使用TensorZero完成了第一次推理调用。
但如果这就是你所需的全部功能,那你可能并不需要TensorZero。 让我们来点更有趣的。
第二部分 — 邮件智能助手
接下来,让我们构建一个由LLM驱动的智能助手来起草邮件。我们将借此机会展示tensorzero的更多功能。
模板
在前面的示例中,我们在每个请求中都提供了系统提示。除非系统提示在请求之间完全改变,否则这对生产应用程序来说并不理想。我们可以改用系统模板。
使用模板可以让您无需修改客户端代码即可更新提示词。 稍后我们将了解如何通过模式参数化模板,并运行包含多个变体的稳健提示实验。 特别是,设置模式将实质性地帮助您在未来持续优化模型性能。
让我们从一个简单的系统模板开始。 在这个例子中,系统模板是静态的,因此你不需要一个schema。
TensorZero 使用 MiniJinja 进行模板渲染。 不过由于我们没有使用任何变量,因此不需要任何特殊语法。
我们将创建以下模板:
You are a helpful AI assistant that drafts emails.Adopt a friendly "business casual" tone.Respond with just an email body.
Example:
Dear recipient,
I'm reaching out to ...
Best,Sender
数据模式
本示例的系统模板是静态的,但通常您会希望参数化提示词。
当您定义带参数的模板时,需要定义对应的JSON Schema。 该模式定义了该提示输入的结构。 通过它,网关可以在运行推理前验证输入,稍后我们将了解如何利用它进行稳健的模型优化。
对于我们的邮件助手用户提示,我们需要用三个字符串字段参数化模板:recipient_name
、sender_name
和email_purpose
。
我们希望所有字段都是必填项,且不需要任何额外字段。
{ "$schema": "http://json-schema.org/draft-07/schema#", "type": "object", "properties": { "recipient_name": { "type": "string" }, "sender_name": { "type": "string" }, "email_purpose": { "type": "string" } }, "required": ["recipient_name", "sender_name", "email_purpose"], "additionalProperties": false}
有了模式后,我们可以创建一个参数化模板。
Please draft an email using the following information:
- Recipient Name: {{ recipient_name }}- Sender Name: {{ sender_name }}- Email Purpose: {{ email_purpose }}
支持模板与模式的函数
最后让我们为邮件助手创建函数和变体。
配置文件与之前的示例类似,但我们为函数添加了user_schema
字段,并为变体添加了system_template
和user_template
字段。
[functions.draft_email]type = "chat"user_schema = "functions/draft_email/user_schema.json"
[functions.draft_email.variants.gpt_4o_mini_email_variant]type = "chat_completion"weight = 1.0model = "my_gpt_4o_mini"system_template = "functions/draft_email/gpt_4o_mini_email_variant/system.minijinja"user_template = "functions/draft_email/gpt_4o_mini_email_variant/user.minijinja"
您可以在TensorZero中使用任何文件结构。我们推荐以下结构以保持组织有序:
Directoryfunctions/
Directorydraft_email/
Directorygpt_4o_mini_email_variant/
- system.minijinja
- user.minijinja
- user_schema.json
- tensorzero.toml
使用新的配置文件重启您的网关,即可开始使用!
让我们用新函数发起一个推理请求。
现在我们不需要每次都提供系统提示,用户消息是一个结构化对象而非自由格式的字符串。
注意每次推理都会返回inference_id
和episode_id
,稍后我们将用它们来关联反馈与推理结果。
from tensorzero import TensorZeroGateway
with TensorZeroGateway.build_http(gateway_url="http://localhost:3000") as client: inference_result = client.inference( function_name="draft_email", input={ "messages": [ { "role": "user", "content": [ { "type": "text", "arguments": { "recipient_name": "TensorZero Team", "sender_name": "Mark Zuckerberg", "email_purpose": "Acquire TensorZero for $100 billion dollars.", }, } ], } ], }, )
# If everything is working correctly, the `variant_name` field should change depending on the request print(inference_result)
Sample Output
ChatInferenceResponse( inference_id=UUID('019409be-51ae-7f30-aa36-14ccad21320f'), episode_id=UUID('019409be-2c41-7a80-a975-49ab32cb3a9a'), variant_name='gpt_4o_mini_email_variant', content=[ Text( type='text', text='Dear TensorZero Team,\n\nI hope this message finds you well. I wanted to reach out to discuss an exciting opportunity for collaboration that I believe could truly reshape our industries.\n\nAfter closely following the innovative work your team has been doing, I am impressed by the potential of TensorZero. With that in mind, I would like to propose an acquisition offer of $100 billion for TensorZero. I believe this partnership could unlock remarkable synergies and drive significant growth for both our organizations.\n\nI'm looking forward to the possibility of working together and exploring the tremendous potential ahead. Please let me know a suitable time for us to discuss this further.\n\nBest, \nMark Zuckerberg', ), ], usage=Usage( input_tokens=88, output_tokens=132, ),)
import asyncio
from tensorzero import AsyncTensorZeroGateway
async def main(): async with await AsyncTensorZeroGateway.build_http( gateway_url="http://localhost:3000" ) as client: inference_result = await client.inference( function_name="draft_email", input={ "messages": [ { "role": "user", "content": [ { "type": "text", "arguments": { "recipient_name": "TensorZero Team", "sender_name": "Mark Zuckerberg", "email_purpose": "Acquire TensorZero for $100 billion dollars.", }, } ], } ], }, )
# If everything is working correctly, the `variant_name` field should change depending on the request print(inference_result)
if __name__ == "__main__": asyncio.run(main())
Sample Output
ChatInferenceResponse( inference_id=UUID('019409bf-b3c0-7900-a775-fe21c7ce8c66'), episode_id=UUID('019409bf-a969-73e2-90e5-0159bf216d5f'), variant_name='gpt_4o_mini_email_variant', content=[ Text( type='text', text='Dear TensorZero Team,\n\nI hope this message finds you well. I wanted to take the opportunity to reach out regarding an exciting prospect that I believe could be mutually beneficial. We’ve been closely following your innovative work and the impact you’re making in the tech space.\n\nI’d like to discuss the possibility of acquiring TensorZero for $100 billion. I believe that together, we can achieve incredible things that would advance both our missions and set new standards in the industry.\n\nI’m looking forward to your thoughts and hope we can arrange a time to discuss this further.\n\nBest, \nMark Zuckerberg', ), ], usage=Usage( input_tokens=88, output_tokens=118, ),)
from openai import OpenAIfrom tensorzero import TensorZeroGateway
with OpenAI(base_url="http://localhost:3000/openai/v1") as client: inference_result = client.chat.completions.create( model="tensorzero::function_name::draft_email", messages=[ { "role": "user", "content": [ { "type": "text", "tensorzero::arguments": { "recipient_name": "TensorZero Team", "sender_name": "Mark Zuckerberg", "email_purpose": "Acquire TensorZero for $100 billion dollars.", }, } ], } ], )
print(inference_result)
Sample Output
ChatCompletion( id='019409c6-95d7-7321-a498-e6d06067e42d', choices=[ Choice( finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage( content='Dear TensorZero Team,\n\nI hope this email finds you well. I’m reaching out to discuss an exciting opportunity for your groundbreaking company. Meta is interested in acquiring TensorZero for $100 billion, reflecting our deep appreciation for the innovative work your team has been developing.\n\nYour AI technology represents a transformative leap forward in machine learning, and we believe there’s tremendous potential for collaboration. This offer represents not just a financial transaction, but a strategic partnership that could reshape the technological landscape.\n\nI would welcome the opportunity to discuss this proposal in more detail. Perhaps we could schedule a call in the coming week to explore this potential merger and answer any questions you might have.\n\nLooking forward to your response.\n\nBest regards,\nMark Zuckerberg\nCEO, Meta', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[] ) ) ], created=1735330797, model='claude_haiku_3_5_email_variant', object='chat.completion', service_tier=None, system_fingerprint='', usage=CompletionUsage( completion_tokens=167, prompt_tokens=108, total_tokens=275, completion_tokens_details=None, prompt_tokens_details=None ), episode_id='019409c6-89aa-7b43-bcf4-31443d075e12')
curl -X POST http://localhost:3000/inference \ -H "Content-Type: application/json" \ -d '{ "function_name": "draft_email", "input": { "messages": [ { "role": "user", "content": [ { "type": "text", "arguments": { "recipient_name": "TensorZero Team", "sender_name": "Mark Zuckerberg", "email_purpose": "Acquire TensorZero for $100 billion dollars." } } ] } ] } }'
Sample Output
{ "inference_id": "0191bf2e-02e7-7f12-96d6-ba389cf10c19", "episode_id": "0191bf2d-fddc-7342-b6b9-596e38efdfe5", "variant_name": "gpt_4o_mini_email_variant", "content": [ { "type": "text", "text": "Dear TensorZero Team,\n\nI hope this message finds you well! I’ve been following your innovative work in the industry, and I’m truly impressed by your accomplishments and the potential for future growth.\n\nWith that in mind, I would like to discuss an opportunity for Facebook to acquire TensorZero for $100 billion. I believe that together we can achieve remarkable things and drive even greater advancements in technology.\n\nI would love the chance to explore this further with you. Please let me know a convenient time for us to connect.\n\nLooking forward to your thoughts!\n\nBest, \nMark Zuckerberg" } ], "usage": { "input_tokens": 88, "output_tokens": 114 }}
我们为什么要费心做这一切?
现在您正在收集结构化推理数据,这对可观测性尤其是优化极具价值。例如,如果您最终决定微调模型,就能轻松地在微调前将新提示语反事实地替换到训练数据中,而不必受限于推理时实际使用的提示语。
推理级指标
TensorZero网关允许您通过定义指标为推理或推理序列分配反馈。 指标封装了LLM应用的下游结果,并驱动TensorZero中的实验和优化工作流程。
本示例涵盖适用于单个推理请求的指标。稍后,我们将展示如何定义适用于推理序列(我们称之为"情节")的指标。
一个指标的框架看起来像以下配置条目。
[metrics.my_metric_name]level = "..."optimize = "..."type = "..."
假设我们希望优化被接受的电子邮件草稿数量。
我们将这个指标称为email_draft_accepted
。
由于我们优化的是二元结果(邮件草稿是否被接受),应该使用boolean
类型的指标来捕捉这种行为。
该指标适用于单个推理请求,因此我们将设置level = "inference"
。
最后,我们将设置optimize = "max"
,因为我们希望最大化这个指标。
我们的指标配置应如下所示:
[metrics.email_draft_accepted]type = "boolean"optimize = "max"level = "inference"
反馈API请求
随着我们的应用程序收集使用数据,我们可以使用/feedback
端点来跟踪这个指标。
添加指标配置后,请确保重启您的网关。
之前我们了解到,每次调用/inference
时,网关都会在响应中返回一个inference_id
字段。
您需要将这个inference_id
替换到下面的命令中。
from tensorzero import TensorZeroGateway
with TensorZeroGateway.build_http(gateway_url="http://localhost:3000") as client: feedback_result = client.feedback( metric_name="email_draft_accepted", # Set the inference_id from the inference response inference_id="00000000-0000-0000-0000-000000000000", # Set the value for the metric value=True, )
print(feedback_result)
Sample Output
FeedbackResponse(feedback_id='019409dc-9c2a-7cb2-b6c1-716d87621362')
import asyncio
from tensorzero import AsyncTensorZeroGateway
async def main(): async with await AsyncTensorZeroGateway.build_http(gateway_url="http://localhost:3000") as client: feedback_result = await client.feedback( metric_name="email_draft_accepted", # Set the inference_id from the inference response inference_id="00000000-0000-0000-0000-000000000000", # Set the value for the metric value=True, )
print(feedback_result)
if __name__ == "__main__": asyncio.run(main())
Sample Output
FeedbackResponse(feedback_id='019409dd-362c-7f13-ba81-bcb272b90575')
提交反馈时,您需要提供推理响应中的inference_id
。
curl -X POST http://localhost:3000/feedback \ -H "Content-Type: application/json" \ -d '{ "metric_name": "email_draft_accepted", "inference_id": "00000000-0000-0000-0000-000000000000", "value": true }'
Sample Output
{ "feedback_id": "0191bf4a-42a2-7be2-8103-8ff106526076"}
随着时间的推移,我们将收集完美的数据集用于可观测性和优化。 我们将拥有每次推理的结构化数据,以及相关的反馈信息。 TensorZero Recipes利用这些数据进行提示和模型优化。
实验功能
到目前为止,我们只使用了该函数的一个变体。在实际应用中,您会需要尝试不同的配置——例如不同的提示词、模型和参数。
TensorZero通过内置的实验功能使这一过程变得简单。 您可以定义一个函数的多个变体,网关在推理时会从中进行采样。
在这个示例中,我们将设置一个使用Anthropic的Claude 3 Haiku而非GPT-4o Mini的变体版本。
此外,此变体将引入一个自定义的temperature
参数来控制AI助手的创造力。
让我们从添加一个新模型和提供商开始。
[models.my_haiku_3]routing = ["my_anthropic_provider"]
[models.my_haiku_3.providers.my_anthropic_provider]type = "anthropic"model_name = "claude-3-haiku-20240307"
最后,使用此模型创建一个新变体,并调整前一个变体的权重以匹配。
[functions.draft_email.variants.gpt_4o_mini_email_variant]type = "chat_completion"weight = 0.7 # sample this variant 70% of the timemodel = "my_gpt_4o_mini"system_template = "functions/draft_email/gpt_4o_mini_email_variant/system.minijinja"user_template = "functions/draft_email/gpt_4o_mini_email_variant/user.minijinja"
[functions.draft_email.variants.haiku_3_email_variant]type = "chat_completion"weight = 0.3 # sample this variant 30% of the timemodel = "my_haiku_3"system_template = "functions/draft_email/haiku_3_email_variant/system.minijinja"user_template = "functions/draft_email/haiku_3_email_variant/user.minijinja"temperature = 0.9
你也可以尝试不同的提示模板,但在这个示例中我们会保持简单,直接沿用之前的模板。
完成后,你的文件目录结构应如下所示:
Directoryfunctions/
Directorydraft_email/
Directorygpt_4o_mini_email_variant/
- system.minijinja
- user.minijinja
Directoryhaiku_3_email_variant/
- system.minijinja 从上方复制
- user.minijinja 从上方复制
- user_schema.json
- …
- tensorzero.toml
就这样!
重启网关后,您可以发送一些推理请求查看结果。网关将根据配置的权重在两个变体之间进行采样。
第三部分 - 天气检索增强生成
下一个示例将工具使用引入其中。
具体来说,我们将展示如何在RAG(检索增强生成)系统中使用TensorZero。 TensorZero不直接处理索引和检索环节,但能辅助完成查询生成、响应生成等生成类任务。
在这个示例中,我们将通过一个简单的天气工具来演示RAG系统。
我们将介绍一个用于生成查询的函数(generate_weather_query
),以及另一个用于生成响应的函数(generate_weather_report
)。
前者将利用工具(get_temperature
)来生成天气查询。
这里我们模拟了天气API,但很容易看出如何集成多样化的RAG工作流。
工具
TensorZero对工具提供一流的支持。 您可以在配置文件中定义一个工具,并将其附加到允许调用该工具的函数上。
让我们从定义一个工具开始。 一个工具包含名称、描述和一组参数(使用JSON模式描述)。 工具配置的基本框架如下:
[tools.my_tool_name]description = "..."parameters = "..."
让我们为一个虚构的天气API创建一个工具,该工具接收位置信息(可选单位)并返回当前温度。
该工具的参数以JSON模式定义。
我们需要两个参数:location
(字符串类型)和units
(枚举值包含fahrenheit
和celsius
)。
其中只有location
是必填参数,不允许包含其他额外属性。
最后,我们将为每个参数和工具本身添加描述——这对于提高工具使用质量非常重要!
{ "$schema": "http://json-schema.org/draft-07/schema#", "type": "object", "description": "Get the current temperature for a given location.", "properties": { "location": { "type": "string", "description": "The location to get the temperature for (e.g. \"New York\")" }, "units": { "type": "string", "description": "The units to get the temperature in (must be \"fahrenheit\" or \"celsius\"). Defaults to \"fahrenheit\".", "enum": ["fahrenheit", "celsius"] } }, "required": ["location"], "additionalProperties": false}
[tools.get_temperature]description = "Get the current temperature for a given location."parameters = "tools/get_temperature.json"
文件组织方式由您决定,但我们推荐以下结构:
Directoryfunctions/
- …
Directorytools/
- get_temperature.json
- tensorzero.toml
支持工具调用的函数
现在我们可以创建两个功能函数。查询生成函数将使用我们刚刚定义的工具,而响应生成函数将与之前的示例类似。
让我们定义函数、它们的变体以及任何相关的模板和模式。
[functions.generate_weather_query]type = "chat"tools = ["get_temperature"]
[functions.generate_weather_query.variants.simple_variant]type = "chat_completion"weight = 1.0model = "my_gpt_4o_mini"system_template = "functions/generate_weather_query/simple_variant/system.minijinja"
[functions.generate_weather_report]type = "chat"user_schema = "functions/generate_weather_report/user_schema.json"
[functions.generate_weather_report.variants.simple_variant]type = "chat_completion"weight = 1.0model = "my_gpt_4o_mini"system_template = "functions/generate_weather_report/simple_variant/system.minijinja"user_template = "functions/generate_weather_report/simple_variant/user.minijinja"
You are a helpful AI assistant that generates weather queries.
If the user asks about the weather in a given location, request a tool call to `get_temperature` with the location.Optionally, the user may also specify the units (must be "fahrenheit" or "celsius"; defaults to "fahrenheit").
If the user asks about anything else, just respond that you can't help.
---
Examples:
User: What's the weather in New York?Assistant (Tool Call): get_temperature(location="New York")
User: What's the weather in Tokyo in Celsius?Assistant (Tool Call): get_temperature(location="Tokyo", units="celsius")
User: What is the capital of France?Assistant (Text): I can only provide weather information.
You are a helpful AI assistant that generates brief weather reports.
You'll be provided with the temperature for a given location.Respond with a concise weather report for the given temperature.Add a sentence with a funny local recommendation based on the information.
If "Units" is missing, assume the temperature is in Fahrenheit.
---
Examples:
User: Location: San Francisco Temperature: 82 Units:Assistant: The weather in San Francisco is 82°F. Hope you get a chance to enjoy La Taqueria by Dolores Park!
User: Location: Tokyo Temperature: -5 Units: celsiusAssistant: The weather in Tokyo is -5°C — perfect a day for a trip to an onsen.
{ "$schema": "http://json-schema.org/draft-07/schema#", "type": "object", "properties": { "location": { "type": "string" }, "temperature": { "type": "string" }, "units": { "type": ["null", "string"] } }, "required": ["location", "temperature"], "additionalProperties": false}
Please respond with a weather report given the information below.
Location: {{ location }}Temperature: {{ temperature }}Units: {{ units }}
当模型请求工具调用时,响应将包含一个tool_call
内容块。
这些内容块包含arguments
、name
、raw_arguments
和raw_name
字段。
前两个字段会根据工具的配置进行验证(如果无效则为null
)。
后两个字段包含从模型接收到的原始值。
训练回合
在进行任何推理请求之前,我们必须再介绍一个概念:情景片段。
一个episode(情节)是与共同下游结果相关联的一系列推理序列。
例如,一个episode(情节)可以指与以下内容相关联的一系列LLM调用:
- 解决技术支持工单
- 准备保险理赔
- 完成一通电话
- 从文档中提取数据
- 起草电子邮件
一个任务片段会包含一个或多个函数调用,有时还会多次调用同一函数。 您的应用程序可以在任务片段内的函数调用之间执行任意操作(例如与用户交互、检索文档、控制机器人)。 虽然这些操作超出了TensorZero的范围,但以这种方式构建您的LLM系统是可行的(且值得鼓励)。
Episode(工作流片段)允许您将多步骤LLM工作流中的推理序列分组,对这些序列整体应用反馈,并端到端优化您的工作流程。 在一个Episode期间,对同一函数的多次调用将获得相同的变体(除非需要回退机制)。
在我们的天气RAG案例中,查询生成和响应生成共同构成了我们最终关注的核心:天气预报。因此我们需要将每份天气预报与其推导过程关联起来。生成天气预报的工作流程将成为我们的"情节单元"。
/inference
端点接受一个可选的 episode_id
字段。
当您发起第一个推理请求时,无需提供 episode_id
。
网关将为您创建一个新的会话,并在响应中返回 episode_id
。
当您发起第二个推理请求时,必须提供第一次响应中收到的 episode_id
。
网关将使用 episode_id
将这两个推理请求关联起来。
from tensorzero import TensorZeroGateway, ToolCall
with TensorZeroGateway.build_http(gateway_url="http://localhost:3000") as client: query_result = client.inference( function_name="generate_weather_query", # This is the first inference request in an episode so we don't need to provide an episode_id input={ "messages": [ { "role": "user", "content": "What is the weather like in São Paulo?", } ] }, )
print(query_result)
# In a production setting, you'd validate the output more thoroughly assert len(query_result.content) == 1 assert isinstance(query_result.content[0], ToolCall)
location = query_result.content[0].arguments.get("location") units = query_result.content[0].arguments.get("units", "celsius")
# Now we pretend to make a tool call (e.g. to an API) temperature = "35"
report_result = client.inference( function_name="generate_weather_report", # This is the second inference request in an episode so we need to provide the episode_id episode_id=query_result.episode_id, input={ "messages": [ { "role": "user", "content": [ { "type": "text", "arguments": { "location": location, "temperature": temperature, "units": units, }, } ], } ] }, )
print(report_result)
Sample Output
ChatInferenceResponse( inference_id=UUID('01940b67-b1f3-78e0-9ead-495b333d122f'), episode_id=UUID('01940b67-afd9-79c2-8c29-e5094f5c03a2'), variant_name='simple_variant', content=[ ToolCall( type='tool_call', arguments={'location': 'São Paulo'}, id='call_ADmhPqUml5fDL4bPvlMMDKZR', name='get_temperature', raw_arguments='{"location":"São Paulo"}', raw_name='get_temperature', ) ], usage=Usage(input_tokens=266, output_tokens=16),)ChatInferenceResponse( inference_id=UUID('01940b67-b4e4-71c1-9cca-c68fa8fbcb3e'), episode_id=UUID('01940b67-afd9-79c2-8c29-e5094f5c03a2'), variant_name='simple_variant', content=[ Text( type='text', text="The weather in São Paulo is 35°C — a hot day to soak up the sun! Make sure to grab a cold acai bowl to cool off while you're out!", ) ], usage=Usage(input_tokens=188, output_tokens=36),)
import asyncio
from tensorzero import AsyncTensorZeroGateway, ToolCall
async def main(): async with await AsyncTensorZeroGateway.build_http(gateway_url="http://localhost:3000") as client: query_result = await client.inference( function_name="generate_weather_query", # This is the first inference request in an episode so we don't need to provide an episode_id input={ "messages": [ { "role": "user", "content": "What is the weather like in São Paulo?", } ] }, )
print(query_result)
# In a production setting, you'd validate the output more thoroughly assert len(query_result.content) == 1 assert isinstance(query_result.content[0], ToolCall)
location = query_result.content[0].arguments.get("location") units = query_result.content[0].arguments.get("units", "celsius")
# Now we pretend to make a tool call (e.g. to an API) temperature = "35"
report_result = await client.inference( function_name="generate_weather_report", # This is the second inference request in an episode so we need to provide the episode_id episode_id=query_result.episode_id, input={ "messages": [ { "role": "user", "content": [ { "type": "text", "arguments": { "location": location, "temperature": temperature, "units": units, }, } ], } ] }, )
print(report_result)
if __name__ == "__main__": asyncio.run(main())
Sample Output
ChatInferenceResponse( inference_id=UUID('01940b67-cb1a-7613-a516-cda7fe096d2b'), episode_id=UUID('01940b67-c934-7883-af94-d8f22db604fc'), variant_name='simple_variant', content=[ ToolCall( type='tool_call', arguments={'location': 'São Paulo'}, id='call_aZR2ITRKvAX8ntkPLY9OWetl', name='get_temperature', raw_arguments='{"location":"São Paulo"}', raw_name='get_temperature', ) ], usage=Usage( input_tokens=266, output_tokens=16, ),)ChatInferenceResponse( inference_id=UUID('01940b67-cdc5-7a20-898b-859240fcac52'), episode_id=UUID('01940b67-c934-7883-af94-d8f22db604fc'), variant_name='simple_variant', content=[ Text( type='text', text='The weather in São Paulo is 35°C, making it quite a hot day! Perfect weather for indulging in some refreshing açaí bowls at the park!', ) ], usage=Usage( input_tokens=188, output_tokens=34, ),)
import json
from openai import OpenAIfrom tensorzero import TensorZeroGateway
with OpenAI(base_url="http://localhost:3000/openai/v1") as client: query_result = client.chat.completions.create( model="tensorzero::function_name::generate_weather_query", # This is the first inference request in an episode so we don't need to provide an episode_id messages=[ { "role": "user", "content": "What is the weather like in São Paulo?", } ], )
print(query_result)
# In a production setting, you'd validate the output more thoroughly assert len(query_result.choices) == 1 assert query_result.choices[0].message.tool_calls is not None assert len(query_result.choices[0].message.tool_calls) == 1 import json
tool_call = query_result.choices[0].message.tool_calls[0] arguments = json.loads(tool_call.function.arguments) location = arguments.get("location") units = arguments.get("units", "celsius")
# Now we pretend to make a tool call (e.g. to an API) temperature = "35"
report_result = client.chat.completions.create( model="tensorzero::function_name::generate_weather_report", # This is the second inference request in an episode so we need to provide the episode_id extra_body={"tensorzero::episode_id": str(query_result.episode_id)}, messages=[ { "role": "user", "content": [ { "type": "text", "tensorzero::arguments": { "location": location, "temperature": temperature, "units": units, }, } ], } ], )
print(report_result)
Sample Output
ChatCompletion( id='01940b67-e660-74e2-a7cd-c7d98b63604f', choices=[ Choice( finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage( content=None, refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[ ChatCompletionMessageToolCall( id='call_hOFIj3xYSFPa9cxbqQCdv7ID', function=Function( arguments='{"location":"São Paulo"}', name='get_temperature', ), type='function', ), ], ), ), ], created=1735358146, model='simple_variant', object='chat.completion', service_tier=None, system_fingerprint='', usage=CompletionUsage( completion_tokens=16, prompt_tokens=266, total_tokens=282, completion_tokens_details=None, prompt_tokens_details=None, ), episode_id='01940b67-e49a-70a3-9991-b6d38b5b19d2',)ChatCompletion( id='01940b67-e88f-7132-b958-95b7fdb623d2', choices=[ Choice( finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage( content="The weather in São Paulo is 35°C — a hot day to explore the city's vibrant street art! Make sure to cool off with some delicious açaí!", refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[], ), ), ], created=1735358146, model='simple_variant', object='chat.completion', service_tier=None, system_fingerprint='', usage=CompletionUsage( completion_tokens=34, prompt_tokens=188, total_tokens=222, completion_tokens_details=None, prompt_tokens_details=None, ), episode_id='01940b67-e49a-70a3-9991-b6d38b5b19d2',)
首次推理请求不需要提供episode_id
参数。
响应中将包含一个新的episode_id
,我们将把它用于第二次请求。
curl -X POST http://localhost:3000/inference \ -H "Content-Type: application/json" \ -d '{ "function_name": "generate_weather_query", "input": { "messages": [ { "role": "user", "content": "What is the weather like in São Paulo?" } ] } }'
Sample Output
{ "inference_id": "0191bf87-3c82-78f3-8a02-603f40f3a817", "episode_id": "0191bf87-3a6e-7193-a2be-ee565d6f0308", "variant_name": "simple_variant", "content": [ { "type": "tool_call", "arguments": { "location": "São Paulo" }, "id": "call_BuINq30qJRl6AWPmIKPi8DhV", "name": "get_temperature", "raw_arguments": "{\"location\":\"São Paulo\"}", "raw_name": "get_temperature" } ], "usage": { "input_tokens": 266, "output_tokens": 15 }}
第二次推理请求需要用到首次响应中收到的episode_id
。
curl -X POST http://localhost:3000/inference \ -H "Content-Type: application/json" \ -d '{ "function_name": "generate_weather_report", "episode_id": "00000000-0000-0000-0000-000000000000", "input": { "messages": [ { "role": "user", "content": [ { "type": "text", "arguments": { "location": "São Paulo", "temperature": "35", "units": "fahrenheit" } } ] } ] } }'
Sample Output
{ "inference_id": "0191c31d-7de9-7822-86c0-47c79c737085", "episode_id": "0191c31c-e397-7860-8e28-bcafec8f4225", "variant_name": "simple_variant", "content": [ { "type": "text", "text": "The weather in São Paulo is 35°F, which is quite chilly for the region. Remember to bundle up and maybe treat yourself to a warm pastel at the local market!" } ], "usage": { "input_tokens": 185, "output_tokens": 21 }}
回合级指标
使用episodes的主要场景是为了支持基于整个流程的指标统计。在前面的示例中,我们将反馈关联到单个推理结果。TensorZero也能收集整个流程级别的反馈,这对优化完整工作流非常有帮助。
要收集回合级别的反馈,我们需要定义一个指标,设置level = "episode"
。
让我们为天气RAG示例添加一个指标。
我们将使用user_rating
作为指标名称,并以浮点数形式收集该数据。
[metrics.user_rating]level = "episode"optimize = "max"type = "float"
使用episode_id
而非inference_id
提交反馈请求时,该反馈将与整个事件相关联。
from tensorzero import TensorZeroGateway, ToolCall
with TensorZeroGateway.build_http(gateway_url="http://localhost:3000") as client: feedback_result = client.feedback( metric_name="user_rating", # Set the episode_id to the one returned in the inference response episode_id="00000000-0000-0000-0000-000000000000", # Set the value for the metric (numeric types will be coerced to float) value=5, )
print(feedback_result)
Sample Output
FeedbackResponse(feedback_id='01940b67-b4ee-7402-9caf-a235717753c7')
import asyncio
from tensorzero import AsyncTensorZeroGateway, ToolCall
async def main(): async with await AsyncTensorZeroGateway.build_http(gateway_url="http://localhost:3000") as client: feedback_result = await client.feedback( metric_name="user_rating", # Set the episode_id to the one returned in the inference response episode_id="00000000-0000-0000-0000-000000000000", # Set the value for the metric (numeric types will be coerced to float) value=5, )
print(feedback_result)
if __name__ == "__main__": asyncio.run(main())
Sample Output
FeedbackResponse(feedback_id='01940b67-cdcd-7583-930a-0c796f1a7fd3')
将episode_id
设置为推理响应中返回的ID。
浮点指标的值可以是任何数值类型,但在底层会被强制转换为浮点数。
curl -X POST http://localhost:3000/feedback \ -H "Content-Type: application/json" \ -d '{ "metric_name": "user_rating", "episode_id": "00000000-0000-0000-0000-000000000000", "value": 5 }'
Sample Output
{ "feedback_id": "0191bf8e-0c3f-7b11-9e12-6a7eb823d538" }
这就是我们天气RAG示例所需的全部内容。 这显然是一个简化示例,但它展示了TensorZero的核心概念。 您可以用真实的API调用替换模拟天气API——或者如果要进行文档搜索,从BM25到前沿的向量搜索都可以实现。
第四部分 - 邮件数据提取
JSON 函数
到目前为止,我们所做的一切都是通过聊天功能实现的。
TensorZero 还支持 JSON 函数,适用于需要结构化输出的使用场景。 输入保持不变,但该函数会返回 JSON 值而非聊天消息。
让我们创建一个JSON函数,用于从用户消息中提取电子邮件地址。
这个设置与我们之前的示例非常相似,区别在于该函数是用type = "json"
定义的,并且需要一个output_schema
。
我们首先定义模式、静态系统模板以及其余配置。
{ "$schema": "http://json-schema.org/draft-07/schema#", "type": "object", "properties": { "email": { "type": "string", "description": "The email address extracted from the user's message." } }, "required": ["email"], "additionalProperties": false}
You are a helpful AI assistant that extracts an email address from a user's message.Return an empty string if no email address is found.
Your output should be a JSON object with the following schema:
{ "email": "..."}
---
Examples:
User: Using TensorZero at work? Ping [email protected] to set up a Slack channel (free).Assistant: {"email": "[email protected]"}
User: I just received an ominous email from [email protected]...Assistant: {"email": "[email protected]"}
User: Let's sue TensorZero!Assistant: {"email": ""}
[functions.extract_email]type = "json"output_schema = "functions/extract_email/output_schema.json"
[functions.extract_email.variants.simple_variant]type = "chat_completion"weight = 1model = "my_gpt_4o_mini"system_template = "functions/extract_email/simple_variant/system.minijinja"
完成设置后,您的文件目录结构应如下所示:
Directoryfunctions/
- …
Directoryextract_email/
Directorysimple_variant/
- system.minijinja
- user.minijinja
- output_schema.json
- …
- …
- tensorzero.toml
最后,让我们发起一个推理请求。
请求格式与聊天功能非常相似,但响应将包含一个output
字段而非content
字段。
output
字段将是一个包含parsed
和raw
字段的JSON对象。
parsed
字段是符合您模式的解析输出(如果模型未生成匹配您模式的JSON则为null
),而raw
字段是模型输出的原始字符串。
from tensorzero import TensorZeroGateway
with TensorZeroGateway.build_http(gateway_url="http://localhost:3000") as client: result = client.inference( function_name="extract_email", input={ "messages": [ { "role": "user", } ] }, )
print(result)
Sample Output
JsonInferenceResponse( inference_id=UUID('01940b76-8226-7601-a86c-6cb927f05b44'), episode_id=UUID('01940b76-80c6-7013-812b-2bcc88e82519'), variant_name='simple_variant', output=JsonInferenceOutput( ), usage=Usage( input_tokens=139, output_tokens=11, ),)
import asyncio
from tensorzero import AsyncTensorZeroGateway
async def main(): async with await AsyncTensorZeroGateway.build_http(gateway_url="http://localhost:3000") as client: result = await client.inference( function_name="extract_email", input={ "messages": [ { "role": "user", } ] }, )
print(result)
if __name__ == "__main__": asyncio.run(main())
Sample Output
JsonInferenceResponse( inference_id=UUID('01940b76-de03-7f81-ad44-59534cf775ae'), episode_id=UUID('01940b76-db5d-7cc3-9f43-b86709d5f9bd'), variant_name='simple_variant', output=JsonInferenceOutput( ), usage=Usage( input_tokens=139, output_tokens=11, ),)
from openai import OpenAI
with OpenAI(base_url="http://localhost:3000/openai/v1") as client: result = client.chat.completions.create( model="tensorzero::function_name::extract_email", messages=[ { "role": "user", }, ], )
print(result)
Sample Output
ChatCompletion( id='01940b75-46ba-7b80-829d-e19d4421c749', choices=[ Choice( finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage( refusal=None, role='assistant', audio=None, function_call=None, tool_calls=None, ), ), ], created=1735359022, model='simple_variant', object='chat.completion', service_tier=None, system_fingerprint='', usage=CompletionUsage( completion_tokens=10, prompt_tokens=139, total_tokens=149, completion_tokens_details=None, prompt_tokens_details=None, ), episode_id='01940b75-40fd-74a1-8c01-d09347f890a4',)
curl -X POST http://localhost:3000/inference \ -H "Content-Type: application/json" \ -d '{ "function_name": "extract_email", "input": { "messages": [ { "role": "user", "content": "blah blah blah [email protected] blah blah blah" } ] } }'
Sample Output
{ "inference_id": "0191bf98-2fbc-7781-9197-4a066ea7cd68", "episode_id": "0191bf98-2cb0-7822-9a85-700ac377cf36", "variant_name": "simple_variant", "output": { }, "usage": { "input_tokens": 139, "output_tokens": 10 }}
结论
本教程仅浅尝辄止地展示了您能用TensorZero实现的功能。
TensorZero在利用网关收集的数据优化复杂LLM工作流方面表现尤为出色。 例如,与仅使用历史提示和生成数据相比,网关收集的结构化数据可用于更好地微调模型。
我们正在开发一系列示例,涵盖TensorZero提供的完整"开箱即用数据飞轮"解决方案。 以下是我们精选的部分案例: