跳至内容

Ragas 实验性

一个将评估驱动开发(EDD)应用于人工智能应用的框架。

Ragas Experimental的目标是将Ragas发展成一个面向AI应用的通用评估框架。它帮助团队设计、运行并推理任何AI工作流中的评估。除了工具本身,它还提供了一种思维模型,将评估不仅视为诊断工具,更作为迭代改进的支柱。

✨ 简介

  • 🚀 教程

    循序渐进的指南,帮助您开始使用 Ragas Experimental。通过实用示例学习如何评估像 RAGs 和智能体这样的人工智能应用。

    Tutorials

  • 📚 核心概念

    深入探讨评估的原则,以及 Ragas Experimental 如何支持面向人工智能应用的评估驱动开发。

    Core Concepts

你好,世界 👋

1. 使用本地后端安装 Ragas Experimental

pip install ragas-experimental && pip install "ragas-experimental[local]"

2. 将此代码片段复制到名为 hello_world.py 的文件中并运行 python hello_world.py

import numpy as np
from ragas_experimental import Dataset
from ragas import experiment
from ragas_experimental.metrics import MetricResult, discrete_metric  

# Define a custom metric for accuracy
@discrete_metric(name="accuracy_score", allowed_values=["pass", "fail"])
def accuracy_score(response: str, expected: str):
    result = "pass" if expected.lower().strip() == response.lower().strip() else "fail"
    return MetricResult(value=result, reason=f"Match: {result == 'pass'}")

# Mock application endpoint that simulates an AI application response
def mock_app_endpoint(**kwargs) -> str:
    return np.random.choice(["Paris", "4", "Blue Whale", "Einstein", "Python"])

# Create an experiment that uses the mock application endpoint and the accuracy metric
@experiment()
async def run_experiment(row):
    response = mock_app_endpoint(query=row.get("query"))
    accuracy = accuracy_score.score(response=response, expected=row.get("expected_output"))
    return {**row, "response": response, "accuracy": accuracy.value}

if __name__ == "__main__":
    import asyncio

    # Create dataset inline
    dataset = Dataset(name="test_dataset", backend="local/csv", root_dir=".")
    test_data = [
        {"query": "What is the capital of France?", "expected_output": "Paris"},
        {"query": "What is 2 + 2?", "expected_output": "4"},
        {"query": "What is the largest animal?", "expected_output": "Blue Whale"},
        {"query": "Who developed the theory of relativity?", "expected_output": "Einstein"},
        {"query": "What programming language is named after a snake?", "expected_output": "Python"},
    ]

    for sample in test_data:
        dataset.append(sample)
    dataset.save()

    # Run experiment
    results = asyncio.run(run_experiment.arun(dataset, name="first_experiment"))

3. 检查当前目录结构以查看创建的数据集和实验结果。

tree .

输出:

├── datasets
│   └── test_dataset.csv
└── experiments
    └── first_experiment.csv

4. 查看您第一个实验的结果

open experiments/first_experiment.csv

输出: