基于人类反馈评估RAG
在处理RAG(检索增强生成)流程时,您的目标不仅是评估单个大语言模型的响应,还需要结合对检索文档的多维度评估,包括上下文相关性、答案相关性以及事实准确性等指标。
在本示例中,您将创建一个标注界面,旨在评估:
- 检索文档的上下文相关性
- 回答相关性
- 回答的真实性
关于如何使用此模板与Label Studio SDK的教程,请参阅Evaluate LLM Responses。
配置标注界面
创建项目并设置以下标注配置:
<View>
<Style>
.htx-text {white - space: pre-wrap;}
.question {
font - size: 120%;
width: 800px;
margin-bottom: 0.5em;
border: 1px solid #eee;
padding: 0 1em 1em 1em;
background: #fefefe;
}
.answer {
font - size: 120%;
width: 800px;
margin-top: 0.5em;
border: 1px solid #eee;
padding: 0 1em 1em 1em;
background: #fefefe;
}
.doc-body {
white - space: pre-wrap;
overflow-wrap: break-word;
word-break: keep-all;
}
.doc-footer {
font - size: 85%;
overflow-wrap: break-word;
word-break: keep-all;
}
h3 + p + p {font - size: 85%;} /* doc id */
</Style>
<View className="question">
<Header value="Question"/>
<Text name="question" value="$question"/>
</View>
<View style="margin-top: 2em">
<Header value="Context"/>
<List name="results" value="$similar_docs" title="Retrieved Documents"/>
<Ranker name="rank" toName="results">
<Bucket name="relevant" title="Relevant"/>
<Bucket name="non_relevant" title="Non Relevant"/>
</Ranker>
</View>
<View className="answer">
<Header value="Answer"/>
<Text name="answer" value="$answer"/>
</View>
<Collapse>
<Panel value="How relevant is the answer to the provided context?">
<Choices name="answer_relevancy" toName="question" showInline="true">
<Choice value="Relevant" html="<div class="thumb-container" style="display: flex; gap: 20px;">
<div class="thumb-box" id="thumb-up" style="width: 100px; height: 100px; display: flex; align-items: center; justify-content: center; border: 1px solid #ccc; border-radius: 5px; cursor: pointer; transition: background-color 0.3s;">
<span class="thumb-icon" style="font-size: 48px;">&#128077;</span> <!-- Thumbs Up Emoji -->
</div></div>"/>
<Choice value="Non Relevant" html="<div class="thumb-container" style="display: flex; gap: 20px;">
<div class="thumb-box" id="thumb-down" style="width: 100px; height: 100px; display: flex; align-items: center; justify-content: center; border: 1px solid #ccc; border-radius: 5px; cursor: pointer; transition: background-color 0.3s;">
<span class="thumb-icon" style="font-size: 48px;">&#128078;</span> <!-- Thumbs Down Emoji -->
</div>
</div>"/>
</Choices>
</Panel>
</Collapse>
<Collapse>
<Panel value="If the answer factually aligns with the retrieved context?">
<Choices name="faithfulness" toName="question" showInline="true">
<Choice value="Relevant" html="<div class="thumb-container" style="display: flex; gap: 20px;">
<div class="thumb-box" id="thumb-up" style="width: 100px; height: 100px; display: flex; align-items: center; justify-content: center; border: 1px solid #ccc; border-radius: 5px; cursor: pointer; transition: background-color 0.3s;">
<span class="thumb-icon" style="font-size: 48px;">&#128077;</span> <!-- Thumbs Up Emoji -->
</div></div>"/>
<Choice value="Non Relevant" html="<div class="thumb-container" style="display: flex; gap: 20px;">
<div class="thumb-box" id="thumb-down" style="width: 100px; height: 100px; display: flex; align-items: center; justify-content: center; border: 1px solid #ccc; border-radius: 5px; cursor: pointer; transition: background-color 0.3s;">
<span class="thumb-icon" style="font-size: 48px;">&#128078;</span> <!-- Thumbs Down Emoji -->
</div>
</div>"/>
</Choices>
</Panel>
</Collapse>
</View>
此配置包含以下元素:
<View>- All labeling configurations must include a baseViewtag. In this configuration, theViewtag is used to configure the display of blocks, similar to the div tag in HTML. It helps in organizing the layout of the labeling interface.<Style>-Style标签用于定义应用于View内元素的CSS样式。在此配置中,它为标注界面布局的各个部分设置了不同类的样式。<Header>-Header标签用于在标注界面中显示标题或题头。标题文本通过value参数定义。-Text标签用于显示输入数据提供的文本内容。根据下面的示例输入数据,文本块将显示源JSON中question或answer键对应的信息。您可能需要调整这些变量以匹配您自己的JSON数据。- 列出检索到的文档。根据下方示例输入数据,您将从源JSON中的similar_docs字段填充该列表。-Ranker标签创建用户界面元素,允许您通过拖放列表项到不同分组中进行排序。-Bucket标签在 Ranker 中定义一个类别或容器,可用于放置项目。
-Collapse标签创建一个可折叠的区域,用户可以通过点击展开或收起该区域。-Panel标签用于 Collapse 元素内部,定义可展开或折叠的内容。-Choices标签提供一组选项供标注者选择,通过name和toName参数指定。-Choice标签用于在Choices标签内定义单个选项。在本示例中,选项被设计为可点击的点赞和点踩图标样式。
输入数据
在这个示例中,您包含了提示语、响应内容以及用于上下文的文档。
[
{
"data": {
"question": "Can I use Label Studio for LLM evaluation?",
"answer": "Yes, you can use Label Studio for LLM evaluation.",
"similar_docs": [
{"id": 0, "body": "Label Studio is a data labeling tool."},
{"id": 1, "body": "Label Studio is a data labeling tool for AI projects."}
]
}
}
]
使用LlamaIndex
您可以使用LlamaIndex框架收集此类数据。
pip install llama-index
例如,您可以使用脚本创建一个RAG流程来回答用户关于GitHub问题的查询:
import os
from llama_index.readers.github import GitHubRepositoryIssuesReader, GitHubIssuesClient
from llama_index.core import VectorStoreIndex, StorageContext, load_index_from_storage
from llama_index.core.callbacks import CallbackManager, LlamaDebugHandler, CBEventType
reader = GitHubRepositoryIssuesReader(
github_client=GitHubIssuesClient(),
owner="HumanSignal",
repo="label-studio",
)
llama_debug = LlamaDebugHandler()
callback_manager = CallbackManager([llama_debug])
# check if storage already exists
PERSIST_DIR = "./llama-index-storage"
if not os.path.exists(PERSIST_DIR):
# load the documents and create the index
documents = reader.load_data(state=GitHubRepositoryIssuesReader.IssueState.CLOSED)
index = VectorStoreIndex.from_documents(documents, callback_manager=callback_manager)
# store it for later
index.storage_context.persist(persist_dir=PERSIST_DIR)
else:
# load the existing index
storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR)
index = load_index_from_storage(storage_context, callback_manager=callback_manager)
query_engine = index.as_query_engine()
question = "Can I use Label Studio for LLM evaluation?"
answer = query_engine.query(query)
# accessing the list of top retrieved documents from callback
event_pairs = llama_debug.get_event_pairs(CBEventType.RETRIEVE)
retrieved_nodes = list(event_pairs[0][1].payload.values())[0]
retrieved_documents = [node.text for node in retrieved_nodes]
现在你可以使用SDK构建一个任务,该任务可以根据上述描述的标注配置直接导入到Label Studio项目中:
task = {
"question": question,
"answer": answer,
"similar_docs": [{"id": i, "body": text} for i, text in enumerate(retrieved_documents)]
}