基于人类反馈评估RAG

在处理RAG（检索增强生成）流程时，您的目标不仅是评估单个大语言模型的响应，还需要结合对检索文档的多维度评估，包括上下文相关性、答案相关性以及事实准确性等指标。

在本示例中，您将创建一个标注界面，旨在评估：

检索文档的上下文相关性
回答相关性
回答的真实性

关于如何使用此模板与Label Studio SDK的教程，请参阅Evaluate LLM Responses。

配置标注界面

创建项目并设置以下标注配置：

<View>
  <Style>
    .htx-text {white - space: pre-wrap;}
    .question {
    font - size: 120%;
    width: 800px;
    margin-bottom: 0.5em;
    border: 1px solid #eee;
    padding: 0 1em 1em 1em;
    background: #fefefe;
  }
    .answer {
    font - size: 120%;
    width: 800px;
    margin-top: 0.5em;
    border: 1px solid #eee;
    padding: 0 1em 1em 1em;
    background: #fefefe;
  }
    .doc-body {
    white - space: pre-wrap;
    overflow-wrap: break-word;
    word-break: keep-all;
  }
    .doc-footer {
    font - size: 85%;
    overflow-wrap: break-word;
    word-break: keep-all;
  }
    h3 + p + p {font - size: 85%;} /* doc id */
  </Style>

  <View className="question">
    <Header value="Question"/>
    <Text name="question" value="$question"/>
  </View>

  <View style="margin-top: 2em">
    <Header value="Context"/>
    <List name="results" value="$similar_docs" title="Retrieved Documents"/>
    <Ranker name="rank" toName="results">
      <Bucket name="relevant" title="Relevant"/>
      <Bucket name="non_relevant" title="Non Relevant"/>
    </Ranker>
  </View>

  <View className="answer">
    <Header value="Answer"/>
    <Text name="answer" value="$answer"/>
  </View>
  <Collapse>
    <Panel value="How relevant is the answer to the provided context?">
      <Choices name="answer_relevancy" toName="question" showInline="true">
        <Choice value="Relevant" html="&lt;div class=&quot;thumb-container&quot; style=&quot;display: flex; gap: 20px;&quot;&gt;
  &lt;div class=&quot;thumb-box&quot; id=&quot;thumb-up&quot; style=&quot;width: 100px; height: 100px; display: flex; align-items: center; justify-content: center; border: 1px solid #ccc; border-radius: 5px; cursor: pointer; transition: background-color 0.3s;&quot;&gt;
      &lt;span class=&quot;thumb-icon&quot; style=&quot;font-size: 48px;&quot;&gt;&amp;#128077;&lt;/span&gt; &lt;!-- Thumbs Up Emoji --&gt;
  &lt;/div&gt;&lt;/div&gt;"/>
        <Choice value="Non Relevant" html="&lt;div class=&quot;thumb-container&quot; style=&quot;display: flex; gap: 20px;&quot;&gt;
&lt;div class=&quot;thumb-box&quot; id=&quot;thumb-down&quot; style=&quot;width: 100px; height: 100px; display: flex; align-items: center; justify-content: center; border: 1px solid #ccc; border-radius: 5px; cursor: pointer; transition: background-color 0.3s;&quot;&gt;
      &lt;span class=&quot;thumb-icon&quot; style=&quot;font-size: 48px;&quot;&gt;&amp;#128078;&lt;/span&gt; &lt;!-- Thumbs Down Emoji --&gt;
  &lt;/div&gt;
&lt;/div&gt;"/>
      </Choices>

    </Panel>
  </Collapse>

  <Collapse>
    <Panel value="If the answer factually aligns with the retrieved context?">
      <Choices name="faithfulness" toName="question" showInline="true">
        <Choice value="Relevant" html="&lt;div class=&quot;thumb-container&quot; style=&quot;display: flex; gap: 20px;&quot;&gt;
  &lt;div class=&quot;thumb-box&quot; id=&quot;thumb-up&quot; style=&quot;width: 100px; height: 100px; display: flex; align-items: center; justify-content: center; border: 1px solid #ccc; border-radius: 5px; cursor: pointer; transition: background-color 0.3s;&quot;&gt;
      &lt;span class=&quot;thumb-icon&quot; style=&quot;font-size: 48px;&quot;&gt;&amp;#128077;&lt;/span&gt; &lt;!-- Thumbs Up Emoji --&gt;
  &lt;/div&gt;&lt;/div&gt;"/>
        <Choice value="Non Relevant" html="&lt;div class=&quot;thumb-container&quot; style=&quot;display: flex; gap: 20px;&quot;&gt;
&lt;div class=&quot;thumb-box&quot; id=&quot;thumb-down&quot; style=&quot;width: 100px; height: 100px; display: flex; align-items: center; justify-content: center; border: 1px solid #ccc; border-radius: 5px; cursor: pointer; transition: background-color 0.3s;&quot;&gt;
      &lt;span class=&quot;thumb-icon&quot; style=&quot;font-size: 48px;&quot;&gt;&amp;#128078;&lt;/span&gt; &lt;!-- Thumbs Down Emoji --&gt;
  &lt;/div&gt;
&lt;/div&gt;"/>
      </Choices>

    </Panel>
  </Collapse>
</View>

此配置包含以下元素：

<View> - All labeling configurations must include a base View tag. In this configuration, the View tag is used to configure the display of blocks, similar to the div tag in HTML. It helps in organizing the layout of the labeling interface.
<Style> - Style标签用于定义应用于View内元素的CSS样式。在此配置中，它为标注界面布局的各个部分设置了不同类的样式。
<Header> - Header标签用于在标注界面中显示标题或题头。标题文本通过value参数定义。
- Text标签用于显示输入数据提供的文本内容。根据下面的示例输入数据，文本块将显示源JSON中question或answer键对应的信息。您可能需要调整这些变量以匹配您自己的JSON数据。
- 列出检索到的文档。根据下方示例输入数据，您将从源JSON中的similar_docs字段填充该列表。
- Ranker标签创建用户界面元素，允许您通过拖放列表项到不同分组中进行排序。
- Bucket 标签在 Ranker 中定义一个类别或容器，可用于放置项目。

- Collapse标签创建一个可折叠的区域，用户可以通过点击展开或收起该区域。
- Panel 标签用于 Collapse 元素内部，定义可展开或折叠的内容。
- Choices标签提供一组选项供标注者选择，通过name和toName参数指定。
- Choice标签用于在Choices标签内定义单个选项。在本示例中，选项被设计为可点击的点赞和点踩图标样式。

输入数据

在这个示例中，您包含了提示语、响应内容以及用于上下文的文档。

[
  {
    "data": {
      "question": "Can I use Label Studio for LLM evaluation?",
      "answer": "Yes, you can use Label Studio for LLM evaluation.",
      "similar_docs": [
        {"id": 0, "body": "Label Studio is a data labeling tool."},
        {"id": 1, "body": "Label Studio is a data labeling tool for AI projects."}
      ]
    }
  }
]

使用LlamaIndex

您可以使用LlamaIndex框架收集此类数据。

pip install llama-index

例如，您可以使用脚本创建一个RAG流程来回答用户关于GitHub问题的查询：

import os
from llama_index.readers.github import GitHubRepositoryIssuesReader, GitHubIssuesClient
from llama_index.core import VectorStoreIndex, StorageContext, load_index_from_storage
from llama_index.core.callbacks import CallbackManager, LlamaDebugHandler, CBEventType

reader = GitHubRepositoryIssuesReader(
github_client=GitHubIssuesClient(),
owner="HumanSignal",
repo="label-studio",
)

llama_debug = LlamaDebugHandler()
callback_manager = CallbackManager([llama_debug])


# check if storage already exists
PERSIST_DIR = "./llama-index-storage"
if not os.path.exists(PERSIST_DIR):
# load the documents and create the index
documents = reader.load_data(state=GitHubRepositoryIssuesReader.IssueState.CLOSED)
index = VectorStoreIndex.from_documents(documents, callback_manager=callback_manager)
# store it for later
index.storage_context.persist(persist_dir=PERSIST_DIR)
else:
# load the existing index
storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR)
index = load_index_from_storage(storage_context, callback_manager=callback_manager)

query_engine = index.as_query_engine()
question = "Can I use Label Studio for LLM evaluation?"
answer = query_engine.query(query)

# accessing the list of top retrieved documents from callback
event_pairs = llama_debug.get_event_pairs(CBEventType.RETRIEVE)
retrieved_nodes = list(event_pairs[0][1].payload.values())[0]
retrieved_documents = [node.text for node in retrieved_nodes]

现在你可以使用SDK构建一个任务，该任务可以根据上述描述的标注配置直接导入到Label Studio项目中：

task = {
  "question": question,
  "answer": answer,
  "similar_docs": [{"id": i, "body": text} for i, text in enumerate(retrieved_documents)]
}

专为各种规模的团队设计版本比较

分享您的Label Studio配置！

通过在Awesome Label Studio Configs仓库分享您独特的Label Studio配置，激发社区灵感！

立即贡献！