评估示例

评估样本是用于评估和衡量你的 LLM 应用在特定场景中表现的单个结构化数据实例。它代表 AI 应用预计要处理的单次交互单元或特定用例。在 Ragas 中，评估样本使用 SingleTurnSample 和 MultiTurnSample 类表示。

SingleTurnSample

SingleTurnSample 代表用户、LLM 与用于评估的预期结果之间的单轮交互。它适用于涉及单个问答对的评估，可能包含额外的上下文或参考信息。

示例

The following example demonstrates how to create a SingleTurnSample instance for evaluating a single-turn interaction in a RAG-based application. In this scenario, a user asks a question, and the AI provides an answer. We’ll create a SingleTurnSample instance to represent this interaction, including any retrieved contexts, reference answers, and evaluation rubrics.

from ragas import SingleTurnSample

# User's question
user_input = "What is the capital of France?"

# Retrieved contexts (e.g., from a knowledge base or search engine)
retrieved_contexts = ["Paris is the capital and most populous city of France."]

# AI's response
response = "The capital of France is Paris."

# Reference answer (ground truth)
reference = "Paris"

# Evaluation rubric
rubric = {
    "accuracy": "Correct",
    "completeness": "High",
    "fluency": "Excellent"
}

# Create the SingleTurnSample instance
sample = SingleTurnSample(
    user_input=user_input,
    retrieved_contexts=retrieved_contexts,
    response=response,
    reference=reference,
    rubric=rubric
)

MultiTurnSample

MultiTurnSample 表示人类、AI（以及可选的工具）之间的多轮交互以及用于评估的预期结果。它适用于在更复杂的交互中表示用于评估的对话智能体。在 MultiTurnSample 中，user_input 属性表示一系列消息，这些消息共同构成了人类用户与 AI 系统之间的多轮对话。这些消息是类 HumanMessage、AIMessage 和 ToolMessage 的实例。

示例

下面的示例演示了如何创建一个 MultiTurnSample 实例来评估多轮交互。在此场景中，用户想知道纽约市的当前天气。AI 助手将使用天气 API 工具获取信息并回复用户。

from ragas.messages import HumanMessage, AIMessage, ToolMessage, ToolCall

# User asks about the weather in New York City
user_message = HumanMessage(content="What's the weather like in New York City today?")

# AI decides to use a weather API tool to fetch the information
ai_initial_response = AIMessage(
    content="Let me check the current weather in New York City for you.",
    tool_calls=[ToolCall(name="WeatherAPI", args={"location": "New York City"})]
)

# Tool provides the weather information
tool_response = ToolMessage(content="It's sunny with a temperature of 75°F in New York City.")

# AI delivers the final response to the user
ai_final_response = AIMessage(content="It's sunny and 75 degrees Fahrenheit in New York City today.")

# Combine all messages into a list to represent the conversation
conversation = [
    user_message,
    ai_initial_response,
    tool_response,
    ai_final_response
]

现在，使用对话来创建一个 MultiTurnSample 对象，包括任何参考回复和评估标准。

from ragas import MultiTurnSample
# 用于评估的参考回复
reference_response = "Provide the current weather in New York City to the user."


# 创建 MultiTurnSample 实例
sample = MultiTurnSample(
    user_input=conversation,
    reference=reference_response,
)