向量化器

支持的向量化器

在本文档中,您将学习如何使用RedisVL通过内置的文本嵌入向量化器创建嵌入。RedisVL支持:

  1. OpenAI
  2. HuggingFace
  3. Vertex AI
  4. Cohere
注意:
本文档是这个Jupyter笔记本的转换形式。

在开始之前,请确保以下事项:

  1. 您已经安装了RedisVL并激活了该环境。
  2. 您有一个运行中的Redis实例,具备Redis查询引擎功能。
# import necessary modules
import os

创建文本嵌入

这个例子将展示如何在RedisVL中使用多种不同的文本向量化器从三个简单的句子创建嵌入。

  • "那是一只快乐的狗"
  • "那是一个快乐的人"
  • "今天是个好日子"

OpenAI

OpenAITextVectorizer 使得使用 RedisVL 与 OpenAI 的嵌入模型变得容易。为此,您需要安装 openai

pip install openai
import getpass

# setup the API Key
api_key = os.environ.get("OPENAI_API_KEY") or getpass.getpass("Enter your OpenAI API key: ")
from redisvl.utils.vectorize import OpenAITextVectorizer

# create a vectorizer
oai = OpenAITextVectorizer(
    model="text-embedding-ada-002",
    api_config={"api_key": api_key},
)

test = oai.embed("This is a test sentence.")
print("Vector dimensions: ", len(test))
test[:10]

Vector dimensions:  1536

[-0.001025049015879631,
 -0.0030993607360869646,
 0.0024536605924367905,
 -0.004484387580305338,
 -0.010331203229725361,
 0.012700922787189484,
 -0.005368996877223253,
 -0.0029411641880869865,
 -0.0070833307690918446,
 -0.03386051580309868]
# Create many embeddings at once
sentences = [
    "That is a happy dog",
    "That is a happy person",
    "Today is a sunny day"
]

embeddings = oai.embed_many(sentences)
embeddings[0][:10]

[-0.01747742109000683,
 -5.228330701356754e-05,
 0.0013870716793462634,
 -0.025637786835432053,
 -0.01985435001552105,
 0.016117358580231667,
 -0.0037306349258869886,
 0.0008945261361077428,
 0.006577865686267614,
 -0.025091219693422318]
# openai also supports asyncronous requests, which you can use to speed up the vectorization process.
embeddings = await oai.aembed_many(sentences)
print("Number of Embeddings:", len(embeddings))

Number of Embeddings: 3

Huggingface

Huggingface 是一个流行的自然语言处理(NLP)平台,它有许多预训练模型可以直接使用。RedisVL 支持使用 Huggingface 的 "Sentence Transformers" 从文本创建嵌入。要使用 Huggingface,你需要安装 sentence-transformers 库。

pip install sentence-transformers
os.environ["TOKENIZERS_PARALLELISM"] = "false"
from redisvl.utils.vectorize import HFTextVectorizer

# create a vectorizer
# choose your model from the huggingface website
hf = HFTextVectorizer(model="sentence-transformers/all-mpnet-base-v2")

# embed a sentence
test = hf.embed("This is a test sentence.")
test[:10]

[0.00037810884532518685,
 -0.05080341175198555,
 -0.03514723479747772,
 -0.02325104922056198,
 -0.044158220291137695,
 0.020487844944000244,
 0.0014617963461205363,
 0.031261757016181946,
 0.05605152249336243,
 0.018815357238054276]
# You can also create many embeddings at once
embeddings = hf.embed_many(sentences, as_buffer=True)

VertexAI

VertexAI 是 GCP 的全功能 AI 平台,其中包括许多预训练的 LLM。RedisVL 支持使用 VertexAI 从这些模型创建嵌入。要使用 VertexAI,您首先需要安装 google-cloud-aiplatform 库。

pip install google-cloud-aiplatform>=1.26

然后你需要获得一个Google Cloud Project的访问权限,并提供访问凭证。这是通过将GOOGLE_APPLICATION_CREDENTIALS环境变量设置为从GCP上的服务账户下载的JSON密钥文件的路径来实现的。

最后,您需要找到您的项目IDVertexAI的地理区域

确保设置了以下环境变量:

GOOGLE_APPLICATION_CREDENTIALS=<path to your gcp JSON creds>
GCP_PROJECT_ID=<your gcp project id>
GCP_LOCATION=<your gcp geo region for vertex ai>
from redisvl.utils.vectorize import VertexAITextVectorizer


# create a vectorizer
vtx = VertexAITextVectorizer(api_config={
    "project_id": os.environ.get("GCP_PROJECT_ID") or getpass.getpass("Enter your GCP Project ID: "),
    "location": os.environ.get("GCP_LOCATION") or getpass.getpass("Enter your GCP Location: "),
    "google_application_credentials": os.environ.get("GOOGLE_APPLICATION_CREDENTIALS") or getpass.getpass("Enter your Google App Credentials path: ")
})

# embed a sentence
test = vtx.embed("This is a test sentence.")
test[:10]

[0.04373306408524513,
 -0.05040992051362991,
 -0.011946038343012333,
 -0.043528858572244644,
 0.021510830149054527,
 0.028604144230484962,
 0.014770914800465107,
 -0.01610461436212063,
 -0.0036560404114425182,
 0.013746795244514942]

Cohere

Cohere 允许你在产品中实现语言AI。使用 CohereTextVectorizer 可以简化与Cohere的嵌入模型一起使用RedisVL的过程。为此,你需要安装 cohere

pip install cohere
import getpass
# set up the API Key
api_key = os.environ.get("COHERE_API_KEY") or getpass.getpass("Enter your Cohere API key: ")

需要特别注意每个embed调用的input_type参数。例如,对于嵌入查询,您应该设置input_type='search_query'。对于嵌入文档,设置input_type='search_document'。查看更多信息这里

from redisvl.utils.vectorize import CohereTextVectorizer

# create a vectorizer
co = CohereTextVectorizer(
    model="embed-english-v3.0",
    api_config={"api_key": api_key},
)

# embed a search query
test = co.embed("This is a test sentence.", input_type='search_query')
print("Vector dimensions: ", len(test))
print(test[:10])

# embed a document
test = co.embed("This is a test sentence.", input_type='search_document')
print("Vector dimensions: ", len(test))
print(test[:10])

Vector dimensions:  1024
[-0.010856628, -0.019683838, -0.0062179565, 0.003545761, -0.047943115, 0.0009365082, -0.005924225, 0.016174316, -0.03289795, 0.049194336]
Vector dimensions:  1024
[-0.009712219, -0.016036987, 2.8073788e-05, -0.022491455, -0.041259766, 0.002281189, -0.033294678, -0.00057029724, -0.026260376, 0.0579834]

了解更多关于如何一起使用RedisVL和Cohere的信息,请参阅此专用用户指南

使用提供者嵌入进行搜索

现在你已经创建了嵌入向量,你可以使用它们来搜索相似的句子。你将使用上面的三个句子来搜索相似的句子。

首先,为您的索引创建模式。

以下是HuggingFace向量化器的示例模式在YAML中的样子:

version: '0.1.0'

index:
    name: vectorizers
    prefix: doc
    storage_type: hash

fields:
    - name: sentence
      type: text
    - name: embedding
      type: vector
      attrs:
        dims: 768
        algorithm: flat
        distance_metric: cosine
from redisvl.index import SearchIndex

# construct a search index from the schema
index = SearchIndex.from_yaml("./schema.yaml")

# connect to local redis instance
index.connect("redis://localhost:6379")

# create the index (no data yet)
index.create(overwrite=True)
# use the CLI to see the created index
!rvl index listall

22:02:27 [RedisVL] INFO   Indices:
22:02:27 [RedisVL] INFO   1. vectorizers
# load expects an iterable of dictionaries where
# the vector is stored as a bytes buffer

data = [{"text": t,
         "embedding": v}
        for t, v in zip(sentences, embeddings)]

index.load(data)

    ['doc:17c401b679ce43cb82f3ab2280ad02f2',
     'doc:3fc0502bec434b17a3f06e20824b2e59',
     'doc:199f17b0e5d24dcaa1fd4fb41558150c']
from redisvl.query import VectorQuery

# use the HuggingFace vectorizer again to create a query embedding
query_embedding = hf.embed("That is a happy cat")

query = VectorQuery(
    vector=query_embedding,
    vector_field_name="embedding",
    return_fields=["text"],
    num_results=3
)

results = index.query(query)
for doc in results:
    print(doc["text"], doc["vector_distance"])

That is a happy dog 0.160862326622
That is a happy person 0.273598492146
Today is a sunny day 0.744559407234
# cleanup
index.delete()
RATE THIS PAGE
Back to top ↑