创建一个Text2SQL代理
利用RAGFlow的RAG能力构建一个Text2SQL代理。由@TeslaZY贡献。
场景
Text2SQL代理桥接了自然语言处理(NLP)和结构化查询语言(SQL)之间的差距。其主要优势如下:
-
协助非技术用户使用SQL:并非所有用户都有SQL背景或了解查询中涉及的表结构。通过Text2SQL代理,用户可以用自然语言提出问题或请求数据,而无需深入了解数据库结构或SQL语法。
-
提高SQL开发效率:对于熟悉SQL的人来说,Text2SQL代理通过使用户能够快速构建复杂查询,而无需手动编写每个部分,从而简化了流程。
-
最小化错误:手动编写SQL查询容易出错,特别是对于复杂的查询或不熟悉数据库结构的用户。Text2SQL代理可以解释自然语言指令并生成准确的SQL查询,从而减少潜在的语法和逻辑错误。
-
提升数据分析能力:在商业智能和数据分析中,迅速从数据中获得洞察至关重要。Text2SQL代理有助于更直接和方便地从数据库中提取有价值的信息,从而帮助加速决策。
-
自动化和集成:Text2SQL代理可以集成到更大的系统中,以支持自动化工作流程,例如自动报告生成和数据监控。它还可以与其他服务和技术无缝集成,提供更丰富的应用可能性。
-
支持多种语言和多样表达:人们可以用多种方式表达相同的想法。一个有效的Text2SQL系统应该能够理解各种表达方式,并准确地将它们转换为SQL查询。
总之,Text2SQL代理旨在使数据库查询更加直观和用户友好,同时确保效率和准确性。它服务于广泛的用户,从完全非技术人员到经验丰富的数据分析师和开发人员。
然而,传统的Text2SQL解决方案通常需要模型微调,这在与RAG或Agent组件一起在企业环境中实施时,可能会显著增加部署和维护成本。RAGFlow基于RAG的Text2SQL利用现有的(已连接的)大型语言模型(LLM),允许与其他RAG/Agent组件无缝集成,而无需额外的微调模型。
食谱
所需组件列表:
程序
数据准备
数据库环境
Mysql-8.0.39
数据库表创建语句
SET NAMES utf8mb4;
-- ----------------------------
-- Table structure for Customers
-- ----------------------------
DROP TABLE IF EXISTS `Customers`;
CREATE TABLE `Customers` (
`CustomerID` int NOT NULL AUTO_INCREMENT,
`UserName` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`Email` varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`PhoneNumber` varchar(20) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
PRIMARY KEY (`CustomerID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
-- ----------------------------
-- Table structure for Products
-- ----------------------------
DROP TABLE IF EXISTS `Products`;
CREATE TABLE `Products` (
`ProductID` int NOT NULL AUTO_INCREMENT,
`ProductName` varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`Description` text COLLATE utf8mb4_unicode_ci,
`Price` decimal(10,2) DEFAULT NULL,
`StockQuantity` int DEFAULT NULL,
PRIMARY KEY (`ProductID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
-- ----------------------------
-- Table structure for Orders
-- ----------------------------
DROP TABLE IF EXISTS `Orders`;
CREATE TABLE `Orders` (
`OrderID` int NOT NULL AUTO_INCREMENT,
`CustomerID` int DEFAULT NULL,
`OrderDate` date DEFAULT NULL,
`TotalPrice` decimal(10,2) DEFAULT NULL,
PRIMARY KEY (`OrderID`),
KEY `CustomerID` (`CustomerID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
-- ----------------------------
-- Table structure for OrderDetails
-- ----------------------------
DROP TABLE IF EXISTS `OrderDetails`;
CREATE TABLE `OrderDetails` (
`OrderDetailID` int NOT NULL AUTO_INCREMENT,
`OrderID` int DEFAULT NULL,
`ProductID` int DEFAULT NULL,
`UnitPrice` decimal(10,2) DEFAULT NULL,
`Quantity` int DEFAULT NULL,
`TotalPrice` decimal(10,2) DEFAULT NULL,
PRIMARY KEY (`OrderDetailID`),
KEY `OrderID` (`OrderID`),
KEY `ProductID` (`ProductID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
生成测试数据
START TRANSACTION;
INSERT INTO Customers (UserName, Email, PhoneNumber) VALUES
('Alice', 'alice@example.com', '123456789'),
('Bob', 'bob@example.com', '987654321'),
('Charlie', 'charlie@example.com', '112233445'),
('Diana', 'diana@example.com', '555666777'),
('Eve', 'eve@example.com', '999888777'),
('Frank', 'frank@example.com', '123123123'),
('Grace', 'grace@example.com', '456456456'),
('Hugo', 'hugo@example.com', '789789789'),
('Ivy', 'ivy@example.com', '321321321'),
('Jack', 'jack@example.com', '654654654');
INSERT INTO Products (ProductName, Description, Price, StockQuantity) VALUES
('Laptop', 'High performance laptop', 1200.00, 50),
('Smartphone', 'Latest model smartphone', 800.00, 100),
('Tablet', 'Portable tablet device', 300.00, 75),
('Headphones', 'Noise-cancelling headphones', 150.00, 200),
('Camera', 'Professional camera', 600.00, 30),
('Monitor', '24-inch Full HD monitor', 200.00, 45),
('Keyboard', 'Mechanical keyboard', 100.00, 150),
('Mouse', 'Ergonomic gaming mouse', 50.00, 250),
('Speaker', 'Wireless Bluetooth speaker', 80.00, 120),
('Router', 'Wi-Fi router with high speed', 120.00, 90);
INSERT INTO Orders (CustomerID, OrderDate, TotalPrice) VALUES
(1, '2024-01-15', 0),
(2, '2024-02-01', 0),
(3, '2024-03-05', 0),
(4, '2024-04-10', 0),
(5, '2024-05-15', 0),
(6, '2024-06-20', 0),
(7, '2024-07-25', 0),
(8, '2024-08-30', 0),
(9, '2024-09-05', 0),
(10, '2024-10-10', 0),
(1, '2024-11-15', 0),
(2, '2024-12-01', 0),
(3, '2024-01-05', 0),
(4, '2024-02-10', 0),
(5, '2024-03-15', 0),
(6, '2024-04-20', 0),
(7, '2024-05-25', 0),
(8, '2024-06-30', 0),
(9, '2024-07-05', 0),
(10, '2024-08-10', 0);
INSERT INTO OrderDetails (OrderID, ProductID, UnitPrice, Quantity, TotalPrice) VALUES
(1, 1, (SELECT Price FROM Products WHERE ProductID = 1), 2, (SELECT Price * 2 FROM Products WHERE ProductID = 1)),
(1, 2, (SELECT Price FROM Products WHERE ProductID = 2), 1, (SELECT Price FROM Products WHERE ProductID = 2)),
(2, 3, (SELECT Price FROM Products WHERE ProductID = 3), 3, (SELECT Price * 3 FROM Products WHERE ProductID = 3)),
(2, 4, (SELECT Price FROM Products WHERE ProductID = 4), 1, (SELECT Price FROM Products WHERE ProductID = 4)),
(3, 5, (SELECT Price FROM Products WHERE ProductID = 5), 1, (SELECT Price FROM Products WHERE ProductID = 5)),
(3, 6, (SELECT Price FROM Products WHERE ProductID = 6), 2, (SELECT Price * 2 FROM Products WHERE ProductID = 6)),
(4, 7, (SELECT Price FROM Products WHERE ProductID = 7), 5, (SELECT Price * 5 FROM Products WHERE ProductID = 7)),
(5, 8, (SELECT Price FROM Products WHERE ProductID = 8), 3, (SELECT Price * 3 FROM Products WHERE ProductID = 8)),
(5, 9, (SELECT Price FROM Products WHERE ProductID = 9), 2, (SELECT Price * 2 FROM Products WHERE ProductID = 9)),
(6, 10, (SELECT Price FROM Products WHERE ProductID = 10), 4, (SELECT Price * 4 FROM Products WHERE ProductID = 10)),
(7, 2, (SELECT Price FROM Products WHERE ProductID = 2), 4, (SELECT Price * 4 FROM Products WHERE ProductID = 2)),
(7, 8, (SELECT Price FROM Products WHERE ProductID = 8), 3, (SELECT Price * 3 FROM Products WHERE ProductID = 8)),
(8, 1, (SELECT Price FROM Products WHERE ProductID = 1), 1, (SELECT Price FROM Products WHERE ProductID = 1)),
(8, 9, (SELECT Price FROM Products WHERE ProductID = 9), 2, (SELECT Price * 2 FROM Products WHERE ProductID = 9)),
(8, 10, (SELECT Price FROM Products WHERE ProductID = 10), 5, (SELECT Price * 5 FROM Products WHERE ProductID = 10)),
(9, 3, (SELECT Price FROM Products WHERE ProductID = 3), 5, (SELECT Price * 5 FROM Products WHERE ProductID = 3)),
(9, 6, (SELECT Price FROM Products WHERE ProductID = 6), 1, (SELECT Price FROM Products WHERE ProductID = 6)),
(10, 4, (SELECT Price FROM Products WHERE ProductID = 4), 2, (SELECT Price * 2 FROM Products WHERE ProductID = 4)),
(10, 7, (SELECT Price FROM Products WHERE ProductID = 7), 3, (SELECT Price * 3 FROM Products WHERE ProductID = 7)),
(11, 5, (SELECT Price FROM Products WHERE ProductID = 5), 1, (SELECT Price FROM Products WHERE ProductID = 5)),
(11, 10, (SELECT Price FROM Products WHERE ProductID = 10), 4, (SELECT Price * 4 FROM Products WHERE ProductID = 10)),
(12, 1, (SELECT Price FROM Products WHERE ProductID = 1), 3, (SELECT Price * 3 FROM Products WHERE ProductID = 1)),
(12, 8, (SELECT Price FROM Products WHERE ProductID = 8), 2, (SELECT Price * 2 FROM Products WHERE ProductID = 8)),
(13, 2, (SELECT Price FROM Products WHERE ProductID = 2), 1, (SELECT Price FROM Products WHERE ProductID = 2)),
(13, 9, (SELECT Price FROM Products WHERE ProductID = 9), 3, (SELECT Price * 3 FROM Products WHERE ProductID = 9)),
(14, 3, (SELECT Price FROM Products WHERE ProductID = 3), 4, (SELECT Price * 4 FROM Products WHERE ProductID = 3)),
(14, 6, (SELECT Price FROM Products WHERE ProductID = 6), 2, (SELECT Price * 2 FROM Products WHERE ProductID = 6)),
(15, 4, (SELECT Price FROM Products WHERE ProductID = 4), 5, (SELECT Price * 5 FROM Products WHERE ProductID = 4)),
(15, 7, (SELECT Price FROM Products WHERE ProductID = 7), 1, (SELECT Price FROM Products WHERE ProductID = 7)),
(16, 5, (SELECT Price FROM Products WHERE ProductID = 5), 2, (SELECT Price * 2 FROM Products WHERE ProductID = 5)),
(16, 10, (SELECT Price FROM Products WHERE ProductID = 10), 3, (SELECT Price * 3 FROM Products WHERE ProductID = 10)),
(17, 1, (SELECT Price FROM Products WHERE ProductID = 1), 4, (SELECT Price * 4 FROM Products WHERE ProductID = 1)),
(17, 8, (SELECT Price FROM Products WHERE ProductID = 8), 1, (SELECT Price FROM Products WHERE ProductID = 8)),
(18, 2, (SELECT Price FROM Products WHERE ProductID = 2), 5, (SELECT Price * 5 FROM Products WHERE ProductID = 2)),
(18, 9, (SELECT Price FROM Products WHERE ProductID = 9), 2, (SELECT Price * 2 FROM Products WHERE ProductID = 9)),
(19, 3, (SELECT Price FROM Products WHERE ProductID = 3), 3, (SELECT Price * 3 FROM Products WHERE ProductID = 3)),
(19, 6, (SELECT Price FROM Products WHERE ProductID = 6), 4, (SELECT Price * 4 FROM Products WHERE ProductID = 6)),
(20, 4, (SELECT Price FROM Products WHERE ProductID = 4), 1, (SELECT Price FROM Products WHERE ProductID = 4)),
(20, 7, (SELECT Price FROM Products WHERE ProductID = 7), 5, (SELECT Price * 5 FROM Products WHERE ProductID = 7));
-- Update Orders Table's TotalPrice
UPDATE Orders o
JOIN (
SELECT OrderID, SUM(TotalPrice) as order_total
FROM OrderDetails
GROUP BY OrderID
) od ON o.OrderID = od.OrderID
SET o.TotalPrice = od.order_total;
COMMIT;
配置知识库
对于RAGFlow基于RAG的Text2SQL,通常需要以下知识库:
- DDL: 数据库表创建语句。
- DB_Description: 表和列的详细描述。
- Q->SQL: 自然语言查询描述以及相应的SQL查询示例(问答对)。
然而,在专门的查询场景中,用户查询可能包含领域特定术语的缩写或同义词。如果用户引用了领域特定术语的同义词,系统可能无法生成正确的SQL查询。因此,建议引入同义词词典,以帮助代理生成更准确的SQL查询。
- TextSQL_Thesaurus: 一个涵盖领域特定术语及其同义词的词典。
配置DDL知识库
- DDL 文本内容如下:
CREATE TABLE Customers (
CustomerID int NOT NULL AUTO_INCREMENT,
UserName varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
Email varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
PhoneNumber varchar(20) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
PRIMARY KEY (CustomerID)
) ENGINE=InnoDB AUTO_INCREMENT=11 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
CREATE TABLE Products (
ProductID int NOT NULL AUTO_INCREMENT,
ProductName varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
Description text COLLATE utf8mb4_unicode_ci,
Price decimal(10,2) DEFAULT NULL,
StockQuantity int DEFAULT NULL,
PRIMARY KEY (ProductID)
) ENGINE=InnoDB AUTO_INCREMENT=11 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
CREATE TABLE Orders (
OrderID int NOT NULL AUTO_INCREMENT,
CustomerID int DEFAULT NULL,
OrderDate date DEFAULT NULL,
TotalPrice decimal(10,2) DEFAULT NULL,
PRIMARY KEY (OrderID),
KEY CustomerID (CustomerID)
) ENGINE=InnoDB AUTO_INCREMENT=21 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
CREATE TABLE OrderDetails (
OrderDetailID int NOT NULL AUTO_INCREMENT,
OrderID int DEFAULT NULL,
ProductID int DEFAULT NULL,
UnitPrice decimal(10,2) DEFAULT NULL,
Quantity int DEFAULT NULL,
TotalPrice decimal(10,2) DEFAULT NULL,
PRIMARY KEY (OrderDetailID),
KEY OrderID (OrderID),
KEY ProductID (ProductID)
) ENGINE=InnoDB AUTO_INCREMENT=40 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
- 为DLL知识库设置块数据
配置DB_Description知识库
- DB_Description 文本的内容如下:
### Customers (Customer Information Table)
The Customers table records detailed information about different customers in the online store. Here is the meaning of each field within this table:
- CustomerID: A unique identifier for a customer, auto-incremented.
- UserName: The name used by the customer for logging into the online store or displayed on the site.
- Email: The email address of the customer, which can be used for account verification, password recovery, and order updates.
- PhoneNumber: The phone number of the customer, useful for contact purposes such as delivery notifications or customer service.
### Products (Product Information Table)
The Products table contains information about the products offered by the online store. Each field within this table represents:
- ProductID: A unique identifier for a product, auto-incremented.
- ProductName: The name of the product, such as laptop, smartphone, nounch, etc.
- Description: Detailed information about the product.
- Price: The selling price of the product, stored as a decimal value to accommodate currency formatting.
- StockQuantity: The quantity of the product available in stock.
### Orders (Order Information Table)
The Orders table tracks orders placed by customers. This table includes fields that denote:
- OrderID: A unique identifier for an order, auto-incremented.
- CustomerID: A foreign key that references the CustomerID in the Customers table, indicating which customer placed the order.
- OrderDate: The date when the order was placed.
- TotalPrice: The total price of all items in the order, calculated at the time of purchase.
### OrderDetails (Order Details Table)
The OrderDetails table provides detailed information about each item in an order. Fields within this table include:
- OrderDetailID: A unique identifier for each line item in an order, auto-incremented.
- OrderID: A foreign key that references the OrderID in the Orders table, linking the detail to a specific order.
- ProductID: A foreign key that references the ProductID in the Products table, specifying which product was ordered.
- UnitPrice: The price per unit of the product at the time of order.
- Quantity: The number of units of the product ordered.
- TotalPrice: The total price for this particular item in the order, calculated as UnitPrice * Quantity.
- 为DB_Description知识库设置块数据
配置 Q->SQL 知识库
- Q->SQL Excel 文档 QA.xlsx
- 将Q->SQL Excel文档上传到Q->SQL知识库,并通过解析设置块数据如下:
配置TextSQL_Thesaurus知识库
- TextSQL_Thesaurus 文本的内容如下:
###
Standard noun: StockQuantity
Synonyms: stock,stockpile,inventory
###
Standard noun: UserName
Synonyms: user name, user's name
###
Standard noun: Quantity
Synonyms: amount,number
###
Standard noun: Smartphone
Synonyms: phone, mobile phone, smart phone, mobilephone
###
Standard noun: ProductName
Synonyms: product name, product's name
###
Standard noun: tablet
Synonyms: pad,Pad
###
Standard noun: laptop
Synonyms: laptop computer,laptop pc
- 为TextSQL_Thesaurus知识库设置块数据
构建代理
- 使用Text2SQL代理模板创建一个代理。
- 进入Agent的配置页面以开始设置过程。
- 创建一个检索节点并将其命名为Thesaurus;创建一个ExeSQL节点。
- 配置Q->SQL、DDL、DB_Description和TextSQL_Thesaurus知识库。请参考以下内容:
- Configure the Generate node, named LLM‘s prompt:
- 将此内容添加到模板提供的提示中,以向LLM提供同义词库内容:
## 您可以使用以下同义词库语句。例如,如果我的请求来自同义词,您必须使用标准名词生成SQL。使用过去问题的回答来指导您:{sql_thesaurus}。 - 确保键和组件ID之间的映射配置正确。
- 配置结果应如下所示:
- 将此内容添加到模板提供的提示中,以向LLM提供同义词库内容:
- 配置ExecSQL节点,填写MySQL数据库的配置信息。
- 在Begin组件中设置一个开启器,例如:
你好!我是你的电子产品在线商店业务数据分析助手。我能为你做什么?
运行和测试代理
- 点击运行按钮以启动代理。
- 输入问题:
Help me summarize stock quantities for each product
- 点击发送按钮将问题发送给代理。
- 代理将响应以下内容:
调试代理
自0.15.0版本以来,ragflow引入了Agent组件/工具的逐步执行功能,为调试和测试提供了强大的机制。让我们探讨如何进行逐步运行。
-
要进入测试运行模式,您可以点击组件上方的三角形图标,或者通过点击组件本身进入组件的详细信息页面。进入后,选择组件详细信息右上角的测试运行按钮。
-
输入一个在Q->SQL知识库中不存在但性质相似的问题。 点击运行按钮以接收组件的输出。
Find all customers who has bought a mobile phone
3. 如图所示,从Q->SQL知识库中未检索到匹配信息,但数据库中存在类似问题。请相应地调整Rerank模型、“相似度阈值”或“关键词相似度权重”以返回相关内容。
-
观察LLM节点和ExeSQL节点的输入和输出。
-
代理现在生成了正确的SQL查询结果。
-
对于关于“手机”的查询,代理成功使用“智能手机”生成了适当的SQL查询。这展示了同义词词典如何指导LLM生成准确的SQL查询。
通过这个,你可能会欣赏到Step Run的能力。它无疑有助于构建更有效的代理。
故障排除
总计:0 数据库中没有记录!
- 确认SQL是否正确。如果是,检查数据库的连接信息。
- 如果连接信息正确,可能数据库中实际上没有与您的查询匹配的数据。
注意事项
在垂直领域的实际生产场景中,有效实施Text2SQL需要考虑以下几个关键因素:
-
处理DDL和DB_Description:处理数据定义语言(DDL)语句和数据库描述需要丰富的调试经验。根据实际的业务环境,辨别哪些信息是关键的,哪些可能是冗余的,这一点至关重要。这包括确定表属性的相关性,如主键、外键、索引等。
-
维护高质量的QA数据:确保问答数据的高标准显著有助于LLM生成更准确的SQL查询。
-
管理领域特定同义词:专业领域的同义词可以极大地影响SQL查询条件的生成。因此,维护一个广泛且最新的同义词库对于缓解这一挑战至关重要。
-
促进用户反馈:在Agent中实施反馈机制,允许用户提供正确的SQL查询。管理员随后可以利用此反馈自动生成相应的QA数据,减少手动维护的需求。
总之,从Text2SQL获得高质量输出仍然依赖于高质量的输入。构建强大的问答数据集是优化RAGFlow的Text2SQL能力的核心。