开始

开始使用 RedisVL

RedisVL 是一个多功能的 Python 库，带有集成的 CLI，旨在增强使用 Redis 实现的 AI 应用程序。本指南将引导您完成以下步骤：

定义一个IndexSchema。
准备一个样本数据集。
创建一个SearchIndex对象。
测试 rvl CLI 功能。
加载示例数据。
构建VectorQuery对象并执行搜索。
更新一个SearchIndex对象。

注意：

本文档是这个Jupyter笔记本的转换形式。

在开始之前，请确保以下事项：

您已经安装了RedisVL并激活了该环境。
您有一个运行中的Redis实例，具备Redis查询引擎功能。

定义一个`IndexSchema`

IndexSchema 维护关键的索引配置和字段定义，以便在 Redis 中进行搜索。为了方便使用，可以从 Python 字典或 YAML 文件构建该模式。

示例模式创建

考虑一个包含用户信息的数据集，包括job、age、credit_score，以及一个三维的user_embedding向量。

您必须决定用于此数据集的Redis索引名称和键前缀。以下是YAML和Python dict格式的示例模式定义。

YAML 定义：

version: '0.1.0'

index:
  name: user_simple
  prefix: user_simple_docs

fields:
    - name: user
      type: tag
    - name: credit_store
      type: tag
    - name: job
      type: text
    - name: age
      type: numeric
    - name: user_embedding
      type: vector
      attrs:
        algorithm: flat
        dims: 3
        distance_metric: cosine
        datatype: float32

将此信息存储在本地文件中，例如schema.yaml，以便与RedisVL一起使用。

Python 字典：

schema = {
    "index": {
        "name": "user_simple",
        "prefix": "user_simple_docs",
    },
    "fields": [
        {"name": "user", "type": "tag"},
        {"name": "credit_score", "type": "tag"},
        {"name": "job", "type": "text"},
        {"name": "age", "type": "numeric"},
        {
            "name": "user_embedding",
            "type": "vector",
            "attrs": {
                "dims": 3,
                "distance_metric": "cosine",
                "algorithm": "flat",
                "datatype": "float32"
            }
        }
    ]
}

样本数据集准备

下面，创建一个包含user、job、age、credit_score和user_embedding字段的模拟数据集。user_embedding向量是用于演示目的的合成示例。

有关创建现实世界嵌入的更多信息，请参阅此文章。

import numpy as np

data = [
    {
        'user': 'john',
        'age': 1,
        'job': 'engineer',
        'credit_score': 'high',
        'user_embedding': np.array([0.1, 0.1, 0.5], dtype=np.float32).tobytes()
    },
    {
        'user': 'mary',
        'age': 2,
        'job': 'doctor',
        'credit_score': 'low',
        'user_embedding': np.array([0.1, 0.1, 0.5], dtype=np.float32).tobytes()
    },
    {
        'user': 'joe',
        'age': 3,
        'job': 'dentist',
        'credit_score': 'medium',
        'user_embedding': np.array([0.9, 0.9, 0.1], dtype=np.float32).tobytes()
    }
]

如上所示，样本user_embedding向量使用NumPy Python包转换为字节。

创建一个`SearchIndex`

准备好模式和样本数据集后，创建一个SearchIndex：

from redisvl.index import SearchIndex

index = SearchIndex.from_dict(schema)
# or use .from_yaml('schema_file.yaml')

现在我们也需要创建一个Redis连接。有几种方法可以做到这一点：

创建并管理您自己的客户端连接（推荐）。
提供一个简单的Redis URL，让RedisVL代表您连接。

使用你自己的 Redis 连接实例

在连接实例上有自定义设置或您的应用程序将共享连接池的情况下，这是理想的：

from redis import Redis

client = Redis.from_url("redis://localhost:6379")

index.set_client(client)
# optionally provide an async Redis client object to enable async index operations

让索引管理连接实例

这对于简单的情况是理想的：

index.connect("redis://localhost:6379")
# optionally use an async client by passing use_async=True

创建基础索引

现在已连接到Redis，运行创建命令。

index.create(overwrite=True)

注意：此时，索引没有关联的数据。数据加载随后进行。

使用`rvl`命令进行检查

使用rvl CLI命令检查新创建的索引及其字段：

$ rvl index listall
18:25:34 [RedisVL] INFO   Indices:
18:25:34 [RedisVL] INFO   1. user_simple

$ rvl index info -i user_simple

╭──────────────┬────────────────┬──────────────────────┬─────────────────┬────────────╮
│ Index Name   │ Storage Type   │ Prefixes             │ Index Options   │   Indexing │
├──────────────┼────────────────┼──────────────────────┼─────────────────┼────────────┤
│ user_simple  │ HASH           │ ['user_simple_docs'] │ []              │          0 │
╰──────────────┴────────────────┴──────────────────────┴─────────────────┴────────────╯
Index Fields:
╭────────────────┬────────────────┬─────────┬────────────────┬────────────────╮
│ Name           │ Attribute      │ Type    │ Field Option   │ Option Value   │
├────────────────┼────────────────┼─────────┼────────────────┼────────────────┤
│ user           │ user           │ TAG     │ SEPARATOR      │ ,              │
│ credit_score   │ credit_score   │ TAG     │ SEPARATOR      │ ,              │
│ job            │ job            │ TEXT    │ WEIGHT         │ 1              │
│ age            │ age            │ NUMERIC │                │                │
│ user_embedding │ user_embedding │ VECTOR  │                │                │
╰────────────────┴────────────────┴─────────┴────────────────┴────────────────╯

加载数据到`SearchIndex`

将示例数据集加载到Redis：

keys = index.load(data)

print(keys)

['user:31d4f3c73f1a4c26b41cf0e2b8e0248a',
 'user:c9ff740437064b919245e49ef585484d',
 'user:6db5f2e09f08438785b73d8048d5350b']

默认情况下，load 将创建一个唯一的 Redis 键，作为索引键 prefix 和 UUID 的组合。您还可以通过提供直接键或在加载时指向指定的 id_field 来自定义键。

使用新数据更新索引

使用load方法更新数据：

# Add more data
new_data = [{
    'user': 'tyler',
    'age': 9,
    'job': 'engineer',
    'credit_score': 'high',
    'user_embedding': np.array([0.1, 0.3, 0.5], dtype=np.float32).tobytes()
}]
keys = index.load(new_data)

print(keys)

['user_simple_docs:ea6e8f2f93d5447c950ccb6843627761']

创建 `VectorQuery` 对象

接下来，为您新填充的索引创建一个向量查询对象。此示例将使用一个简单的向量来演示向量搜索的工作原理。生产中的向量可能比三个浮点数大得多，并且通常需要机器学习模型（例如，Huggingface句子转换器）或嵌入API（例如，Cohere和OpenAI）。RedisVL提供了一组向量化工具来帮助创建向量。

from redisvl.query import VectorQuery
from jupyterutils import result_print

query = VectorQuery(
    vector=[0.1, 0.1, 0.5],
    vector_field_name="user_embedding",
    return_fields=["user", "age", "job", "credit_score", "vector_distance"],
    num_results=3
)

执行查询

定义好你的VectorQuery对象后，你可以使用query方法在SearchIndex上执行查询。

results = index.query(query)
result_print(results)

vector_distance	用户	年龄	工作	信用评分
0	约翰	1	工程师	高
0	mary	2	医生	低
0.0566299557686	tyler	9	工程师	高

使用异步Redis客户端

AsyncSearchIndex 类与异步 Redis Python 客户端一起，提供了异步查询、索引创建和数据加载的功能。这是在生产环境中使用 redisvl 的推荐方式。

from redisvl.index import AsyncSearchIndex
from redis.asyncio import Redis

client = Redis.from_url("redis://localhost:6379")

index = AsyncSearchIndex.from_dict(schema)
index.set_client(client)

# execute the vector query async
results = await index.aquery(query)
result_print(results)

vector_distance	用户	年龄	工作	信用评分
0	john	1	工程师	高
0	玛丽	2	医生	低
0.0566299557686	tyler	9	工程师	高

更新模式

在某些场景下，更新索引模式是有意义的。使用Redis和RedisVL，这很容易，因为Redis可以在您更新索引配置时保持底层数据不变。

想象一下，你想以以下方式重新索引这些数据：

为job字段使用Tag类型，而不是Text。
为user_embedding字段使用HNSW向量索引，而不是flat向量索引。

# Modify this schema to have what we want

index.schema.remove_field("job")
index.schema.remove_field("user_embedding")
index.schema.add_fields([
    {"name": "job", "type": "tag"},
    {
        "name": "user_embedding",
        "type": "vector",
        "attrs": {
            "dims": 3,
            "distance_metric": "cosine",
            "algorithm": "flat",
            "datatype": "float32"
        }
    }
])

# Run the index update but keep underlying data in place
await index.create(overwrite=True, drop=False)

# Execute the vector query
results = await index.aquery(query)
result_print(results)

vector_distance	用户	年龄	工作	信用评分
0	约翰	1	工程师	高
0	mary	2	医生	低
0.0566299557686	tyler	9	工程师	高

检查索引统计

使用 rvl 检查索引的统计信息：

$ rvl stats -i user_simple

Statistics:
╭─────────────────────────────┬─────────────╮
│ Stat Key                    │ Value       │
├─────────────────────────────┼─────────────┤
│ num_docs                    │ 4           │
│ num_terms                   │ 0           │
│ max_doc_id                  │ 4           │
│ num_records                 │ 20          │
│ percent_indexed             │ 1           │
│ hash_indexing_failures      │ 0           │
│ number_of_uses              │ 2           │
│ bytes_per_record_avg        │ 1           │
│ doc_table_size_mb           │ 0.00044632  │
│ inverted_sz_mb              │ 1.90735e-05 │
│ key_table_size_mb           │ 0.000165939 │
│ offset_bits_per_record_avg  │ nan         │
│ offset_vectors_sz_mb        │ 0           │
│ offsets_per_term_avg        │ 0           │
│ records_per_doc_avg         │ 5           │
│ sortable_values_size_mb     │ 0           │
│ total_indexing_time         │ 0.246       │
│ total_inverted_index_blocks │ 11          │
│ vector_index_sz_mb          │ 0.0201416   │
╰─────────────────────────────┴─────────────╯

清理

# clean up the index
await index.adelete()

开始

定义一个IndexSchema