2023年5月22日

将Weaviate与生成式OpenAI模块结合用于生成式搜索

本笔记本适用于以下场景:

  • 您的数据已存储在Weaviate中
  • 您希望将Weaviate与生成式OpenAI模块(generative-openai)一起使用。

先决条件

本手册仅涵盖生成式搜索示例,但不涉及配置和数据导入。

为了充分利用本指南,请先完成入门指南,您将在其中学习使用Weaviate的基本知识并导入演示数据。

检查清单:

  • 已完成 入门指南,
  • 创建了一个Weaviate实例,
  • 将数据导入到您的Weaviate实例中,
  • 你拥有一个OpenAI API key

===========================================================

准备您的OpenAI API密钥

OpenAI API key 用于在导入时对数据进行向量化处理,以及执行查询操作。

如果您没有OpenAI API密钥,可以从https://beta.openai.com/account/api-keys获取。

获取密钥后,请将其添加到环境变量中,命名为OPENAI_API_KEY

# Export OpenAI API Key
!export OPENAI_API_KEY="your key"
# Test that your OpenAI API key is correctly set as an environment variable
# Note. if you run this notebook locally, you will need to reload your terminal and the notebook for the env variables to be live.
import os

# Note. alternatively you can set a temporary env variable like this:
# os.environ["OPENAI_API_KEY"] = 'your-key-goes-here'

if os.getenv("OPENAI_API_KEY") is not None:
    print ("OPENAI_API_KEY is ready")
else:
    print ("OPENAI_API_KEY environment variable not found")
import weaviate
from datasets import load_dataset
import os

# Connect to your Weaviate instance
client = weaviate.Client(
    url="https://your-wcs-instance-name.weaviate.network/",
    # url="http://localhost:8080/",
    auth_client_secret=weaviate.auth.AuthApiKey(api_key="<YOUR-WEAVIATE-API-KEY>"), # comment out this line if you are not using authentication for your Weaviate instance (i.e. for locally deployed instances)
    additional_headers={
        "X-OpenAI-Api-Key": os.getenv("OPENAI_API_KEY")
    }
)

# Check if your instance is live and ready
# This should return `True`
client.is_ready()

Weaviate提供了一个Generative Search OpenAI模块,该模块基于存储在您的Weaviate实例中的数据生成响应。

构建生成式搜索查询的方式与在Weaviate中进行标准语义搜索查询非常相似。

例如:

  • 在"文章"中搜索,
  • 返回 "title", "content", "url"
  • 查找与“足球俱乐部”相关的对象
  • 将结果限制为5个对象
    result = (
        client.query
        .get("Articles", ["title", "content", "url"])
        .with_near_text("concepts": "football clubs")
        .with_limit(5)
        # generative query will go here
        .do()
    )

现在,您可以添加with_generate()函数来应用生成式转换。with_generate接受以下任一参数:

  • single_prompt - 为每个返回的对象生成响应,
  • grouped_task – 从所有返回对象生成单一响应。
def generative_search_per_item(query, collection_name):
    prompt = "Summarize in a short tweet the following content: {content}"

    result = (
        client.query
        .get(collection_name, ["title", "content", "url"])
        .with_near_text({ "concepts": [query], "distance": 0.7 })
        .with_limit(5)
        .with_generate(single_prompt=prompt)
        .do()
    )
    
    # Check for errors
    if ("errors" in result):
        print ("\033[91mYou probably have run out of OpenAI API calls for the current minute – the limit is set at 60 per minute.")
        raise Exception(result["errors"][0]['message'])
    
    return result["data"]["Get"][collection_name]
query_result = generative_search_per_item("football clubs", "Article")

for i, article in enumerate(query_result):
    print(f"{i+1}. { article['title']}")
    print(article['_additional']['generate']['singleResult']) # print generated response
    print("-----------------------")
def generative_search_group(query, collection_name):
    generateTask = "Explain what these have in common"

    result = (
        client.query
        .get(collection_name, ["title", "content", "url"])
        .with_near_text({ "concepts": [query], "distance": 0.7 })
        .with_generate(grouped_task=generateTask)
        .with_limit(5)
        .do()
    )
    
    # Check for errors
    if ("errors" in result):
        print ("\033[91mYou probably have run out of OpenAI API calls for the current minute – the limit is set at 60 per minute.")
        raise Exception(result["errors"][0]['message'])
    
    return result["data"]["Get"][collection_name]
query_result = generative_search_group("football clubs", "Article")

print (query_result[0]['_additional']['generate']['groupedResult'])

感谢您的关注,您现在已掌握如何建立自己的向量数据库并使用嵌入技术实现各种酷炫功能——尽情探索吧!对于更复杂的应用场景,请继续学习本代码库中的其他实用示例。