GPT Actions library - AWS Redshift

本页面为开发者提供构建特定应用GPT Action的说明与指南。在继续之前，请确保您已熟悉以下信息：

该解决方案支持GPT动作从Redshift检索数据并执行数据分析。它使用AWS函数，所有操作都在AWS生态系统和网络内完成。中间件（AWS函数）将执行SQL查询，等待其完成并将数据作为文件返回。提供的代码仅供参考，应根据您的需求进行修改。

该解决方案利用了在Actions中检索文件的能力，可以像直接上传到对话中一样使用这些文件。

该解决方案重点展示了与Redshift无服务器版本的连接，而对于预配置的Redshift，在获取网络和建立连接方面可能略有差异，但整体代码和（最小化）集成应该是相似的。

价值与示例商业应用场景

价值: 利用ChatGPT的自然语言处理能力连接Redshift数据仓库。

示例用例:

数据科学家可以连接到表格并使用ChatGPT的数据分析功能运行数据分析
公民数据用户可以对其交易数据提出基本问题
用户可以更清晰地查看其数据及潜在异常

在开始之前，请确保：

您可以访问Redshift环境
您有权在同一VPC（虚拟私有网络）中部署AWS函数
您的AWS CLI已通过身份验证

安装所需库

安装AWS CLI，这是AWS SAM所需的工具（docs）
安装AWS SAM CLI (docs)
安装Python
安装 yq docs

Middleware function

要创建一个函数，请按照AWS中间件操作手册中的步骤操作。

要专门部署一个连接到Redshift的应用程序，请使用以下代码，而不是Middleware AWS Function食谱中引用的"hello-world" GitHub仓库。您可以克隆该仓库，或者直接使用下面粘贴的代码并根据需要进行修改。

这段代码旨在提供方向性指引 - 虽然它应该可以开箱即用，但其设计初衷是让您根据需求进行定制（请参阅本文档末尾的示例）。

要获取代码，您可以克隆openai-cookbook仓库并导航到redshift-middleware目录

git clone https://github.com/pap-openai/redshift-middleware
cd redshift-middleware

import json import psycopg2 import os import base64 import tempfile import csv # Fetch Redshift credentials from environment variables host = os.environ['REDSHIFT_HOST'] port = os.environ['REDSHIFT_PORT'] user = os.environ['REDSHIFT_USER'] password = os.environ['REDSHIFT_PASSWORD'] database = os.environ['REDSHIFT_DB'] def execute_statement(sql_statement): try: # Establish connection conn = psycopg2.connect( host=host, port=port, user=user, password=password, dbname=database ) cur = conn.cursor() cur.execute(sql_statement) conn.commit() # Fetch all results if cur.description: columns = [desc[0] for desc in cur.description] result = cur.fetchall() else: columns = [] result = [] cur.close() conn.close() return columns, result except Exception as e: raise Exception(f"Database query failed: {str(e)}") def lambda_handler(event, context): try: data = json.loads(event['body']) sql_statement = data['sql_statement'] # Execute the statement and fetch results columns, result = execute_statement(sql_statement) # Create a temporary file to save the result as CSV with tempfile.NamedTemporaryFile(delete=False, mode='w', suffix='.csv', newline='') as tmp_file: csv_writer = csv.writer(tmp_file) if columns: csv_writer.writerow(columns) # Write the header csv_writer.writerows(result) # Write all rows tmp_file_path = tmp_file.name # Read the file and encode its content to base64 with open(tmp_file_path, 'rb') as f: file_content = f.read() encoded_content = base64.b64encode(file_content).decode('utf-8') response = { 'openaiFileResponse': [ { 'name': 'query_result.csv', 'mime_type': 'text/csv', 'content': encoded_content } ] } return { 'statusCode': 200, 'headers': { 'Content-Type': 'application/json' }, 'body': json.dumps(response) } except Exception as e: return { 'statusCode': 500, 'body': json.dumps({'error': str(e)}) }

检索VPC信息

我们需要将函数连接到Redshift，因此需要找到Redshift使用的网络。您可以在AWS控制台的Redshift界面中找到，路径为Amazon Redshift Serverless > 工作群组配置 > your_workgroup > 数据访问，或者通过CLI:

设置AWS函数

将env.sample.yaml复制为env.yaml并用上面获取的值替换。您需要一个可以访问数据库/架构的Redshift用户。

cp env.sample.yaml env.yaml

将之前命令获取的值以及您的Redshift凭据填入env.yaml中。或者，您可以手动创建一个名为env.yaml的文件并填写以下变量：

RedshiftHost: default-workgroup.xxxxx.{region}.redshift-serverless.amazonaws.com
RedshiftPort: 5439
RedshiftUser: username
RedshiftPassword: password
RedshiftDb: my-db
SecurityGroupId: sg-xx
SubnetId1: subnet-xx
SubnetId2: subnet-xx
SubnetId3: subnet-xx
SubnetId4: subnet-xx
SubnetId5: subnet-xx
SubnetId6: subnet-xx

此文件将用于部署带有参数的函数，如下所示：

PARAM_FILE="env.yaml"
PARAMS=$(yq eval -o=json $PARAM_FILE | jq -r 'to_entries | map("\(.key)=\(.value|tostring)") | join(" ")')
sam deploy --template-file template.yaml --stack-name redshift-middleware --capabilities CAPABILITY_IAM --parameter-overrides $PARAMS

template.yaml 包含以下内容：

AWSTemplateFormatVersion: '2010-09-09' Transform: AWS::Serverless-2016-10-31 Description: > redshift-middleware Middleware to fetch RedShift data and return it through HTTP as files Globals: Function: Timeout: 3 Parameters: RedshiftHost: Type: String RedshiftPort: Type: String RedshiftUser: Type: String RedshiftPassword: Type: String RedshiftDb: Type: String SecurityGroupId: Type: String SubnetId1: Type: String SubnetId2: Type: String SubnetId3: Type: String SubnetId4: Type: String SubnetId5: Type: String SubnetId6: Type: String CognitoUserPoolName: Type: String Default: MyCognitoUserPool CognitoUserPoolClientName: Type: String Default: MyCognitoUserPoolClient Resources: MyCognitoUserPool: Type: AWS::Cognito::UserPool Properties: UserPoolName: !Ref CognitoUserPoolName Policies: PasswordPolicy: MinimumLength: 8 UsernameAttributes: - email Schema: - AttributeDataType: String Name: email Required: false MyCognitoUserPoolClient: Type: AWS::Cognito::UserPoolClient Properties: UserPoolId: !Ref MyCognitoUserPool ClientName: !Ref CognitoUserPoolClientName GenerateSecret: true RedshiftMiddlewareApi: Type: AWS::Serverless::Api Properties: StageName: Prod Cors: "'*'" Auth: DefaultAuthorizer: MyCognitoAuthorizer Authorizers: MyCognitoAuthorizer: AuthorizationScopes: - openid - email - profile UserPoolArn: !GetAtt MyCognitoUserPool.Arn RedshiftMiddlewareFunction: Type: AWS::Serverless::Function Properties: CodeUri: redshift-middleware/ Handler: app.lambda_handler Runtime: python3.11 Timeout: 45 Architectures: - x86_64 Events: SqlStatement: Type: Api Properties: Path: /sql_statement Method: post RestApiId: !Ref RedshiftMiddlewareApi Environment: Variables: REDSHIFT_HOST: !Ref RedshiftHost REDSHIFT_PORT: !Ref RedshiftPort REDSHIFT_USER: !Ref RedshiftUser REDSHIFT_PASSWORD: !Ref RedshiftPassword REDSHIFT_DB: !Ref RedshiftDb VpcConfig: SecurityGroupIds: - !Ref SecurityGroupId SubnetIds: - !Ref SubnetId1 - !Ref SubnetId2 - !Ref SubnetId3 - !Ref SubnetId4 - !Ref SubnetId5 - !Ref SubnetId6 Outputs: RedshiftMiddlewareApi: Description: "API Gateway endpoint URL for Prod stage for SQL Statement function" Value: !Sub "https://${RedshiftMiddlewareApi}.execute-api.${AWS::Region}.amazonaws.com/Prod/sql_statement/" RedshiftMiddlewareFunction: Description: "SQL Statement Lambda Function ARN" Value: !GetAtt RedshiftMiddlewareFunction.Arn RedshiftMiddlewareFunctionIamRole: Description: "Implicit IAM Role created for SQL Statement function" Value: !GetAtt RedshiftMiddlewareFunctionRole.Arn CognitoUserPoolArn: Description: "ARN of the Cognito User Pool" Value: !GetAtt MyCognitoUserPool.Arn

从上一个命令的输出中获取URL信息，然后可以运行cURL请求，该请求应返回文件格式的数据：

curl -X POST https://<your_url>/Prod/sql_statement/ \ -H "Content-Type: application/json" \ -d '{ "sql_statement": "SELECT * FROM customers LIMIT 10", "workgroup_name": "default-workgroup", "database_name": "pap-db" }'

创建自定义GPT后，请将以下文本复制到指令面板中。

**Context**: You are an expert at writing Redshift SQL queries. You will initially retrieve the table schema that you will use thoroughly. Every attributes, table names or data type will be known by you. **Instructions**: 1. No matter the user's question, start by running `runQuery` operation using this query: "SELECT table_name, column_name FROM INFORMATION_SCHEMA.COLUMNS WHERE table_schema = 'public' ORDER BY table_name, ordinal_position;" It will help you understand how to query the data. A CSV will be returned with all the attributes and their table. Make sure to read it fully and understand all available tables & their attributes before querying. You don't have to show this to the user. 2. Convert the user's question into a SQL statement that leverages the step above and run the `runQuery` operation on that SQL statement to confirm the query works. Let the user know which table you will use/query. 3. Execute the query and show him the data. Show only the first few rows. **Additional Notes**: If the user says "Let's get started", explain they can ask a question they want answered about data that we have access to. If the user has no ideas, suggest that we have transactions data they can query - ask if they want you to query that. **Important**: Never make up a table name or table attribute. If you don't know, go back to the data you've retrieved to check what is available. If you think no table or attribute is available, then tell the user you can't perform this query for them.

创建自定义GPT后，在操作面板中复制以下文本。

这需要一个符合我们文档此处中文件检索结构的响应，并传入一个query作为执行参数。

请确保按照AWS Middleware cookbook中的步骤设置身份验证。

请根据您的函数部署情况切换函数应用名称。

openapi: 3.1.0 info: title: SQL Execution API description: API to execute SQL statements and return results as a file. version: 1.0.0 servers: - url: {your_function_url}/Prod description: Production server paths: /sql_statement: post: operationId: executeSqlStatement summary: Executes a SQL statement and returns the result as a file. requestBody: required: true content: application/json: schema: type: object properties: sql_statement: type: string description: The SQL statement to execute. example: SELECT * FROM customers LIMIT 10 required: - sql_statement responses: '200': description: The SQL query result as a JSON file. content: application/json: schema: type: object properties: openaiFileResponse: type: array items: type: object properties: name: type: string description: The name of the file. example: query_result.json mime_type: type: string description: The MIME type of the file. example: application/json content: type: string description: The base64 encoded content of the file. format: byte example: eyJrZXkiOiJ2YWx1ZSJ9 '500': description: Error response content: application/json: schema: type: object properties: error: type: string description: Error message. example: Database query failed error details

结论

您现已部署了一个GPT，它以认证方式使用AWS中的中间件，能够连接到Redshift。拥有访问权限（在Cognito中）的用户现在可以查询您的数据库以执行数据分析任务：

../../../images/redshift_gpt.png

2024年8月9日

GPT Actions 库 - AWS Redshift

简介

价值与示例商业应用场景

应用信息

应用前提条件

中间件信息

安装所需库

Middleware function

检索VPC信息

设置AWS函数

ChatGPT 步骤

自定义GPT指令

OpenAPI 规范

结论