Google Cloud SQL for PostgreSQL - `PostgresVectorStore`
Cloud SQL 是一项全托管式关系型数据库服务,提供高性能、无缝集成和卓越的可扩展性。它支持 MySQL、PostgreSQL 和 SQL Server 数据库引擎。借助 Cloud SQL 的 LlamaIndex 集成,您可以扩展数据库应用程序以构建支持人工智能的体验。
This notebook goes over how to use Cloud SQL for PostgreSQL to store vector embeddings with the PostgresVectorStore class.
Learn more about the package on GitHub.
要运行此笔记本,您需要执行以下操作:
Install the integration library, llama-index-cloud-sql-pg, and the library for the embedding service, llama-index-embeddings-vertex.
%pip install --upgrade --quiet llama-index-cloud-sql-pg llama-index-embeddings-vertex llama-index-llms-vertex llama-indexColab only: Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top.
# # Automatically restart kernel after installs so that your environment can access the new packages# import IPython
# app = IPython.Application.instance()# app.kernel.do_shutdown(True)以登录此笔记本的IAM用户身份验证到Google Cloud,以便访问您的Google Cloud项目。
- 如果您正在使用 Colab 运行此笔记本,请使用下面的单元格并继续。
- If you are using Vertex AI Workbench, check out the setup instructions here.
from google.colab import auth
auth.authenticate_user()设置您的 Google Cloud 项目,以便在此笔记本中利用 Google Cloud 资源。
如果您不知道您的项目ID,请尝试以下方法:
- Run
gcloud config list. - Run
gcloud projects list. - See the support page: Locate the project ID.
# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.
PROJECT_ID = "my-project-id" # @param {type:"string"}
# Set the project id!gcloud config set project {PROJECT_ID}设置 Cloud SQL 数据库值
Section titled “Set Cloud SQL database values”在Cloud SQL 实例页面中查找您的数据库值。
# @title Set Your Values Here { display-mode: "form" }REGION = "us-central1" # @param {type: "string"}INSTANCE = "my-primary" # @param {type: "string"}DATABASE = "my-database" # @param {type: "string"}TABLE_NAME = "vector_store" # @param {type: "string"}USER = "postgres" # @param {type: "string"}PASSWORD = "my-password" # @param {type: "string"}PostgresEngine 连接池
Section titled “PostgresEngine Connection Pool”将 Cloud SQL 设为向量存储的要求和参数之一是 PostgresEngine 对象。PostgresEngine 可配置与 Cloud SQL 数据库的连接池,使应用程序能够成功连接并遵循行业最佳实践。
要使用 PostgresEngine.from_instance() 创建 PostgresEngine,您只需提供以下4项内容:
project_idproject_id : 云 SQL 实例所在的 Google Cloud 项目的项目 ID。regionregion : Cloud SQL 实例所在的区域。instanceinstance : Cloud SQL 实例的名称。databasedatabase : 要连接的 Cloud SQL 实例上的数据库名称。
默认情况下,IAM数据库认证将作为数据库认证方法使用。该库使用属于从环境中获取的应用默认凭据(ADC)的IAM主体。
有关IAM数据库认证的更多信息,请参阅:
可选地,也可以使用内置数据库认证,通过用户名和密码访问 Cloud SQL 数据库。只需向 PostgresEngine.from_instance() 提供可选的 user 和 password 参数:
useruser : 用于内置数据库认证和登录的数据库用户passwordpassword : 用于内置数据库认证和登录的数据库密码。
Note: This tutorial demonstrates the async interface. All async methods have corresponding sync methods.
from llama_index_cloud_sql_pg import PostgresEngine
engine = await PostgresEngine.afrom_instance( project_id=PROJECT_ID, region=REGION, instance=INSTANCE, database=DATABASE, user=USER, password=PASSWORD,)The PostgresVectorStore class requires a database table. The PostgresEngine engine has a helper method init_vector_store_table() that can be used to create a table with the proper schema for you.
await engine.ainit_vector_store_table( table_name=TABLE_NAME, vector_size=768, # Vector size for VertexAI model(textembedding-gecko@latest))You can also specify a schema name by passing schema_name wherever you pass table_name.
SCHEMA_NAME = "my_schema"
await engine.ainit_vector_store_table( table_name=TABLE_NAME, schema_name=SCHEMA_NAME, vector_size=768,)You can use any Llama Index embeddings model.
You may need to enable Vertex AI API to use VertexTextEmbeddings. We recommend setting the embedding model’s version for production, learn more about the Text embeddings models.
# enable Vertex AI API!gcloud services enable aiplatform.googleapis.comfrom llama_index.core import Settingsfrom llama_index.embeddings.vertex import VertexTextEmbeddingfrom llama_index.llms.vertex import Verteximport google.auth
credentials, project_id = google.auth.default()Settings.embed_model = VertexTextEmbedding( model_name="textembedding-gecko@003", project=PROJECT_ID, credentials=credentials,)
Settings.llm = Vertex(model="gemini-1.5-flash-002", project=PROJECT_ID)初始化一个默认的 PostgresVectorStore
Section titled “Initialize a default PostgresVectorStore”from llama_index_cloud_sql_pg import PostgresVectorStore
vector_store = await PostgresVectorStore.create( engine=engine, table_name=TABLE_NAME, # schema_name=SCHEMA_NAME)!mkdir -p 'data/paul_graham/'!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader("./data/paul_graham").load_data()print("Document ID:", documents[0].doc_id)与 VectorStoreIndex 配合使用
Section titled “Use with VectorStoreIndex”Create an index from the vector store by using VectorStoreIndex.
The simplest way to use a Vector Store is to load a set of documents and build an index from them using from_documents.
from llama_index.core import StorageContext, VectorStoreIndex
storage_context = StorageContext.from_defaults(vector_store=vector_store)index = VectorStoreIndex.from_documents( documents, storage_context=storage_context, show_progress=True)query_engine = index.as_query_engine()response = query_engine.query("What did the author do?")print(response)向量存储可以利用关系数据来筛选相似性搜索。
创建一个带有自定义元数据列的新表。 您也可以重用已包含文档ID、内容、嵌入和/或元数据的自定义列的现有表。
from llama_index_cloud_sql_pg import Column
# Set table nameTABLE_NAME = "vectorstore_custom"# SCHEMA_NAME = "my_schema"
await engine.ainit_vector_store_table( table_name=TABLE_NAME, # schema_name=SCHEMA_NAME, vector_size=768, # VertexAI model: textembedding-gecko@003 metadata_columns=[Column("len", "INTEGER")],)
# Initialize PostgresVectorStorecustom_store = await PostgresVectorStore.create( engine=engine, table_name=TABLE_NAME, # schema_name=SCHEMA_NAME, metadata_columns=["len"],)Document metadata can provide the LLM and retrieval process with more information. Learn more about different approaches for extracting and adding metadata.
from llama_index.core import Document
fruits = ["apple", "pear", "orange", "strawberry", "banana", "kiwi"]documents = [ Document(text=fruit, metadata={"len": len(fruit)}) for fruit in fruits]
storage_context = StorageContext.from_defaults(vector_store=custom_store)custom_doc_index = VectorStoreIndex.from_documents( documents, storage_context=storage_context, show_progress=True)You can apply pre-filtering to the search results by specifying a filters argument
from llama_index.core.vector_stores.types import ( MetadataFilter, MetadataFilters, FilterOperator,)
filters = MetadataFilters( filters=[ MetadataFilter(key="len", operator=FilterOperator.GT, value="5"), ],)
query_engine = custom_doc_index.as_query_engine(filters=filters)res = query_engine.query("List some fruits")print(str(res.source_nodes[0].text))Speed up vector search queries by applying a vector index. Learn more about vector indexes.
from llama_index_cloud_sql_pg.indexes import IVFFlatIndex
index = IVFFlatIndex()await vector_store.aapply_vector_index(index)await vector_store.areindex() # Re-index using default index nameawait vector_store.adrop_vector_index() # Delete index using default name