评估示例
评估样本是用于评估和衡量你的 LLM 应用在特定场景中表现的单个结构化数据实例。它代表 AI 应用预计要处理的单次交互单元或特定用例。在 Ragas 中,评估样本使用 SingleTurnSample 和 MultiTurnSample 类表示。
SingleTurnSample
SingleTurnSample 代表用户、LLM 与用于评估的预期结果之间的单轮交互。它适用于涉及单个问答对的评估,可能包含额外的上下文或参考信息。
示例
The following example demonstrates how to create a SingleTurnSample instance for evaluating a single-turn interaction in a RAG-based application. In this scenario, a user asks a question, and the AI provides an answer. We’ll create a SingleTurnSample instance to represent this interaction, including any retrieved contexts, reference answers, and evaluation rubrics.
from ragas import SingleTurnSample
# User's question
user_input = "What is the capital of France?"
# Retrieved contexts (e.g., from a knowledge base or search engine)
retrieved_contexts = ["Paris is the capital and most populous city of France."]
# AI's response
response = "The capital of France is Paris."
# Reference answer (ground truth)
reference = "Paris"
# Evaluation rubric
rubric = {
"accuracy": "Correct",
"completeness": "High",
"fluency": "Excellent"
}
# Create the SingleTurnSample instance
sample = SingleTurnSample(
user_input=user_input,
retrieved_contexts=retrieved_contexts,
response=response,
reference=reference,
rubric=rubric
)
MultiTurnSample
MultiTurnSample 表示人类、AI(以及可选的工具)之间的多轮交互以及用于评估的预期结果。它适用于在更复杂的交互中表示用于评估的对话智能体。 在 MultiTurnSample 中,user_input 属性表示一系列消息,这些消息共同构成了人类用户与 AI 系统之间的多轮对话。这些消息是类 HumanMessage、AIMessage 和 ToolMessage 的实例。
示例
下面的示例演示了如何创建一个 MultiTurnSample 实例来评估多轮交互。在此场景中,用户想知道纽约市的当前天气。AI 助手将使用天气 API 工具获取信息并回复用户。
from ragas.messages import HumanMessage, AIMessage, ToolMessage, ToolCall
# User asks about the weather in New York City
user_message = HumanMessage(content="What's the weather like in New York City today?")
# AI decides to use a weather API tool to fetch the information
ai_initial_response = AIMessage(
content="Let me check the current weather in New York City for you.",
tool_calls=[ToolCall(name="WeatherAPI", args={"location": "New York City"})]
)
# Tool provides the weather information
tool_response = ToolMessage(content="It's sunny with a temperature of 75°F in New York City.")
# AI delivers the final response to the user
ai_final_response = AIMessage(content="It's sunny and 75 degrees Fahrenheit in New York City today.")
# Combine all messages into a list to represent the conversation
conversation = [
user_message,
ai_initial_response,
tool_response,
ai_final_response
]
现在,使用对话来创建一个 MultiTurnSample 对象,包括任何参考回复和评估标准。