Google Cloud SQL for PostgreSQL - `PostgresReader`

云 SQL 是一项全托管式关系型数据库服务，提供高性能、无缝集成和卓越的可扩展性。它支持 MySQL、PostgreSQL 和 SQL Server 数据库引擎。通过 Cloud SQL 的 LlamaIndex 集成，扩展您的数据库应用程序以构建支持人工智能的体验。

本笔记本介绍如何使用 Cloud SQL for PostgreSQL 通过 PostgresReader 类将数据检索为文档。

Learn more about the package on GitHub.

开始之前

要运行此笔记本，您需要执行以下操作：

🦙 库安装

安装集成库，llama-index-cloud-sql-pg。

Colab only: Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top.

# # Automatically restart kernel after installs so that your environment can access the new packages
# import IPython

# app = IPython.Application.instance()
# app.kernel.do_shutdown(True)

🔐 身份验证

以登录此笔记本的IAM用户身份验证到Google Cloud，以便访问您的Google Cloud项目。

如果您正在使用 Colab 运行此笔记本，请使用下面的单元格并继续。
If you are using Vertex AI Workbench, check out the setup instructions here.

from google.colab import auth

auth.authenticate_user()

☁ 设置您的谷歌云项目

设置您的 Google Cloud 项目，以便在此笔记本中利用 Google Cloud 资源。

如果您不知道您的项目ID，请尝试以下方法：

Run gcloud config list.
Run gcloud projects list.
See the support page: Locate the project ID.

# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.

PROJECT_ID = "my-project-id"  # @param {type:"string"}

# Set the project id
!gcloud config set project {PROJECT_ID}

基本用法

设置 Cloud SQL 数据库值

在Cloud SQL 实例页面中查找您的数据库值。

# @title Set Your Values Here { display-mode: "form" }
REGION = "us-central1"  # @param {type: "string"}
INSTANCE = "my-primary"  # @param {type: "string"}
DATABASE = "my-database"  # @param {type: "string"}
TABLE_NAME = "reader_table"  # @param {type: "string"}
USER = "postgres"  # @param {type: "string"}
PASSWORD = "my-password"  # @param {type: "string"}

PostgresEngine 连接池

将 Cloud SQL 设为读取器的要求和参数之一是 PostgresEngine 对象。PostgresEngine 会配置到 Cloud SQL 数据库的连接池，使应用程序能够成功连接并遵循行业最佳实践。

要使用 PostgresEngine.from_instance() 创建 PostgresEngine，您只需提供以下4项内容：

project_idproject_id : 云 SQL 实例所在的 Google Cloud 项目的项目 ID。
regionregion : Cloud SQL 实例所在的区域。
instanceinstance : Cloud SQL 实例的名称。
databasedatabase : 要连接的 Cloud SQL 实例上的数据库名称。

默认情况下，IAM数据库认证将作为数据库认证方法使用。该库使用属于来自环境的应用默认凭据(ADC)的IAM主体。

有关IAM数据库认证的更多信息，请参阅：

可选地，也可以使用内置数据库认证，通过用户名和密码访问 Cloud SQL 数据库。只需向 PostgresEngine.from_instance() 提供可选的 user 和 password 参数：

useruser : 用于内置数据库认证和登录的数据库用户
passwordpassword : 用于内置数据库认证和登录的数据库密码。

Note: This tutorial demonstrates the async interface. All async methods have corresponding sync methods.

from llama_index_cloud_sql_pg import PostgresEngine

engine = await PostgresEngine.afrom_instance(
    project_id=PROJECT_ID,
    region=REGION,
    instance=INSTANCE,
    database=DATABASE,
    user=USER,
    password=PASSWORD,
)

创建 PostgresReader

在创建用于从 Cloud SQL Postgres 获取数据的 PostgresReader 时，您有两个主要选项来指定要加载的数据：

使用 table_name 参数 - 当您指定 table_name 参数时，您是在告诉读取器从给定表中获取所有数据。
使用查询参数 - 当您指定查询参数时，可以提供自定义SQL查询来获取数据。这使您能够完全控制SQL查询，包括选择特定列、应用筛选条件、排序、连接表等。

使用 `table_name` 参数加载文档

通过默认表格加载文档

读取器从表中返回一个文档列表，使用第一列作为文本，其他所有列作为元数据。默认表格将第一列作为文本，第二列作为元数据（JSON）。每一行成为一个文档。

from llama_index_cloud_sql_pg import PostgresReader

# Creating a basic PostgresReader object
reader = await PostgresReader.create(
    engine,
    table_name=TABLE_NAME,
    # schema_name=SCHEMA_NAME,
)

通过自定义表格/元数据或自定义页面内容列加载文档

reader = await PostgresReader.create(
    engine,
    table_name=TABLE_NAME,
    # schema_name=SCHEMA_NAME,
    content_columns=["product_name"],  # Optional
    metadata_columns=["id"],  # Optional
)

使用SQL查询加载文档

查询参数允许用户指定自定义SQL查询，其中可包含筛选条件以从数据库加载特定文档。

table_name = "products"
content_columns = ["product_name", "description"]
metadata_columns = ["id", "content"]

reader = PostgresReader.create(
    engine=engine,
    query=f"SELECT * FROM {table_name};",
    content_columns=content_columns,
    metadata_columns=metadata_columns,
)

注意：如果未指定 content_columns 和 metadata_columns，读取器将自动将第一个返回的列视为文档的 text，并将所有后续列视为 metadata。

设置页面内容格式

读取器返回一个文档列表，每行对应一个文档，页面内容以指定的字符串格式呈现，例如文本（空格分隔的拼接）、JSON、YAML、CSV等。JSON和YAML格式包含表头，而文本和CSV格式不包含字段表头。

reader = await PostgresReader.create(
    engine,
    table_name=TABLE_NAME,
    # schema_name=SCHEMA_NAME,
    content_columns=["product_name", "description"],
    format="YAML",
)

加载文档

您可以选择以下两种方式加载文档：

一次性加载所有数据
懒加载数据

一次性加载所有数据

docs = await reader.aload_data()

print(docs)

懒加载数据

docs_iterable = reader.alazy_load_data()

docs = []
async for doc in docs_iterable:
    docs.append(doc)

print(docs)