vanna

源代码

在GitHub上查看源代码：https://github.com/vanna-ai/vanna

基本用法

获取API密钥

import vanna as vn
api_key = vn.get_api_key('my-email@example.com')
vn.set_api_key(api_key)

设置模型

vn.set_model('chinook')

提出问题

vn.ask(question='What are the top 10 artists by sales?')

vn.ask(...) 是围绕 vn.generate_sql(...)、vn.run_sql(...)、vn.generate_plotly_code(...)、vn.get_plotly_figure(...) 和 vn.generate_followup_questions(...) 的一个便捷封装。

有关可以提问的可运行笔记本，请参见此处

训练

有三种主要类型的训练数据可以添加到模型中：SQL、DDL 和文档。

# DDL Statements
vn.train(ddl='CREATE TABLE employees (id INT, name VARCHAR(255), salary INT)')

# Documentation
vn.train(documentation='Our organization\'s definition of sales is the discount price of an item multiplied by the quantity sold.')

# SQL
vn.train(sql='SELECT AVG(salary) FROM employees')

vn.train(...) 是 vn.add_sql(...)、vn.add_ddl(...) 和 vn.add_documentation(...) 的便捷封装。

有关可以训练模型的可运行笔记本，请参见此处

术语表

前缀	定义	示例
`vn.set_`	设置当前会话的变量	[`vn.set_model(...)`][set_model] [`vn.set_api_key(...)`][set_api_key]
`vn.get_`	执行只读操作	[`vn.get_model()`][get_models]
`vn.add_`	向模型添加某些内容	[`vn.add_sql(...)`][add_sql] [`vn.add_ddl(...)`][add_ddl]
`vn.generate_`	使用AI根据模型中的信息生成某些内容	[`vn.generate_sql(...)`][generate_sql] [`vn.generate_explanation()`][generate_explanation]
`vn.run_`	运行代码（SQL 或 Plotly）	[`vn.run_sql`][run_sql]
`vn.remove_`	从模型中移除某些内容	[`vn.remove_training_data`][remove_training_data]
`vn.update_`	更新模型中的某些内容	[`vn.update_model_visibility(...)`][update_model_visibility]
`vn.connect_`	连接到数据库	[`vn.connect_to_snowflake(...)`][connect_to_snowflake]

权限

默认情况下，当你创建一个模型时，它是私有的。你可以向你的模型添加成员或管理员，或者将其设为公开。

用户角色		公共模型		私有模型
使用	训练	使用	训练
非会员	✅	❌	❌	❌
成员	✅	❌	✅	❌
管理员	✅	✅	✅	✅

开源与扩展

Vanna.AI 是开源且可扩展的。如果您想在没有服务器的情况下使用 Vanna，请参见示例这里。

以下是使用默认“本地”版本的Vanna时，代码库中实现各种函数的示例。vanna.base.VannaBase 是提供 vanna.base.VannaBase.ask 和 vanna.base.VannaBase.train 函数的基类。这些函数依赖于在子类 vanna.openai_chat.OpenAI_Chat 和 vanna.chromadb_vector.ChromaDB_VectorStore 中实现的抽象方法。vanna.openai_chat.OpenAI_Chat 使用 OpenAI API 生成 SQL 和 Plotly 代码。vanna.chromadb_vector.ChromaDB_VectorStore 使用 ChromaDB 存储训练数据并生成嵌入。

如果你想在其他LLMs或数据库中使用Vanna，你可以创建自己的vanna.base.VannaBase子类并实现抽象方法。

flowchart
    subgraph VannaBase
        ask
        train
    end

    subgraph OpenAI_Chat
        get_sql_prompt
        submit_prompt
        generate_question
        generate_plotly_code
    end

    subgraph ChromaDB_VectorStore
        generate_embedding
        add_question_sql
        add_ddl
        add_documentation
        get_similar_question_sql
        get_related_ddl
        get_related_documentation
    end

API 参考

api_key: Optional[str] = None

fig_as_img: bool = False

run_sql: Optional[Callable[[str], pandas.core.frame.DataFrame]] = None

示例

vn.run_sql = lambda sql: pd.read_sql(sql, engine)

为Vanna.AI设置SQL到DataFrame的函数。这用于[vn.ask(...)][ask]函数。你也可以使用[vn.connect_to_snowflake(...)][connect_to_snowflake]来设置这个，而不是直接设置。

def get_api_key(email: str, otp_code: Optional[str] = None) -> str:

示例：

vn.get_api_key(email="my-email@example.com")

登录到Vanna.AI API。

参数:

email (str): The email address to login with.
otp_code (Union[str, None]): The OTP code to login with. If None, an OTP code will be sent to the email address.

str: API密钥。

def set_api_key(key: str) -> None:

设置Vanna.AI的API密钥。

示例：

api_key = vn.get_api_key(email="my-email@example.com")
vn.set_api_key(api_key)

参数:

key (str): The API key.

def get_models() -> List[str]:

示例：

models = vn.get_models()

列出用户所属的模型。

List[str]: 模型名称的列表。

def create_model(model: str, db_type: str) -> bool:

示例：

vn.create_model(model="my-model", db_type="postgres")

创建一个新模型。

参数:

model (str): The name of the model to create.
db_type (str): The type of database to use for the model. This can be "Snowflake", "BigQuery", "Postgres", or anything else.

bool: 如果模型创建成功则为True，否则为False。

def add_user_to_model(model: str, email: str, is_admin: bool) -> bool:

示例：

vn.add_user_to_model(model="my-model", email="user@example.com")

将用户添加到模型中。

参数:

model (str): The name of the model to add the user to.
email (str): The email address of the user to add.
is_admin (bool): Whether or not the user should be an admin.

bool: 如果用户成功添加则为True，否则为False。

def update_model_visibility(public: bool) -> bool:

示例：

vn.update_model_visibility(public=True)

设置当前模型的可见性。如果模型是可见的，任何人都可以看到它。如果不可见，只有模型的成员可以看到它。

参数:

public (bool): Whether or not the model should be publicly visible.

bool: 如果模型可见性设置成功则为True，否则为False。

def set_model(model: str):

设置用于Vanna.AI API的模型。

示例：

vn.set_model("my-model")

参数:

model (str): The name of the model to use.

def add_sql(question: str, sql: str, tag: Optional[str] = 'Manually Trained') -> bool:

向模型的训练数据中添加一个问题及其对应的SQL查询。调用此功能的推荐方式是使用[vn.train(sql=...)][train]。

示例：

vn.add_sql(
    question="What is the average salary of employees?",
    sql="SELECT AVG(salary) FROM employees"
)

参数:

question (str): The question to store.
sql (str): The SQL query to store.
tag (Union[str, None]): A tag to associate with the question and SQL query.

bool: 如果问题和SQL查询成功存储，则为True，否则为False。

def add_ddl(ddl: str) -> bool:

向模型的训练数据添加一个DDL语句

示例：

vn.add_ddl(
    ddl="CREATE TABLE employees (id INT, name VARCHAR(255), salary INT)"
)

参数:

ddl (str): The DDL statement to store.

bool: 如果DDL语句成功存储，则为True，否则为False。

def add_documentation(documentation: str) -> bool:

向模型的训练数据添加文档

示例：

vn.add_documentation(
    documentation="Our organization's definition of sales is the discount price of an item multiplied by the quantity sold."
)

参数:

documentation (str): The documentation string to store.

bool: 如果文档字符串成功存储则为True，否则为False。

@dataclass

class TrainingPlanItem:

TrainingPlanItem(item_type: str, item_group: str, item_name: str, item_value: str)

item_type: str

item_group: str

item_name: str

item_value: str

ITEM_TYPE_SQL = 'sql'

ITEM_TYPE_DDL = 'ddl'

ITEM_TYPE_IS = 'is'

class TrainingPlan:

一个表示训练计划的类。你可以查看其中的内容，并移除你不想训练的项目。

示例：

plan = vn.get_training_plan()

plan.get_summary()

TrainingPlan(plan: List[TrainingPlanItem])

def get_summary(self) -> List[str]:

示例：

plan = vn.get_training_plan()

plan.get_summary()

获取训练计划的摘要。

List[str]: 描述训练计划的字符串列表。

def remove_item(self, item: str):

示例：

plan = vn.get_training_plan()

plan.remove_item("Train on SQL: What is the average salary of employees?")

从训练计划中移除一个项目。

参数:

item (str): The item to remove.

def get_training_plan_postgres( filter_databases: Optional[List[str]] = None, filter_schemas: Optional[List[str]] = None, include_information_schema: bool = False, use_historical_queries: bool = True) -> TrainingPlan:

def get_training_plan_generic(df) -> TrainingPlan:

def get_training_plan_experimental( filter_databases: Optional[List[str]] = None, filter_schemas: Optional[List[str]] = None, include_information_schema: bool = False, use_historical_queries: bool = True) -> TrainingPlan:

实验性 : 此方法是实验性的，可能在未来的版本中发生变化。

根据数据库中的元数据获取训练计划。目前这仅适用于Snowflake。

示例：

plan = vn.get_training_plan_experimental(filter_databases=["employees"], filter_schemas=["public"])

vn.train(plan=plan)

def train( question: str = None, sql: str = None, ddl: str = None, documentation: str = None, json_file: str = None, sql_file: str = None, plan: TrainingPlan = None) -> bool:

示例：

vn.train()

在Vanna.AI上训练一个问题及其对应的SQL查询。如果你不带参数调用它，它将检查你是否连接到了数据库，并尝试在该数据库的元数据上进行训练。如果你使用sql参数调用它，它等同于[add_sql()][add_sql]。如果你使用ddl参数调用它，它等同于[add_ddl()][add_ddl]。如果你使用documentation参数调用它，它等同于[add_documentation()][add_documentation]。它还可以接受一个JSON文件路径或SQL文件路径，以分别训练一批问题和SQL查询或一系列SQL查询。此外，你可以传递一个[TrainingPlan][TrainingPlan]对象。使用[vn.get_training_plan_experimental()][get_training_plan_experimental]获取训练计划。

参数:

question (str): The question to train on.
sql (str): The SQL query to train on.
sql_file (str): The SQL file path.
json_file (str): The JSON file path.
ddl (str): The DDL statement.
documentation (str): The documentation to train on.
plan (TrainingPlan): The training plan to train on.

def flag_sql_for_review( question: str, sql: Optional[str] = None, error_msg: Optional[str] = None) -> bool:

示例：

vn.flag_sql_for_review(question="What is the average salary of employees?")

标记一个问题及其对应的SQL查询以供审查。你可以在[vn.get_all_questions()][get_all_questions]中看到标签显示。

参数:

question (str): The question to flag.
sql (str): The SQL query to flag.
error_msg (str): The error message to flag.

bool: 如果问题和SQL查询被成功标记，则为True，否则为False。

def remove_sql(question: str) -> bool:

从模型的训练数据中移除一个问题及其对应的SQL查询

示例：

vn.remove_sql(question="What is the average salary of employees?")

参数:

question (str): The question to remove.

def remove_training_data(id: str) -> bool:

从模型中移除训练数据

示例：

vn.remove_training_data(id="1-ddl")

参数:

id (str): The ID of the training data to remove.

def generate_sql(question: str) -> str:

示例：

vn.generate_sql(question="What is the average salary of employees?")
# SELECT AVG(salary) FROM employees

使用Vanna.AI API生成SQL查询。

参数:

question (str): The question to generate an SQL query for.

str 或 None: SQL 查询，如果发生错误则为 None。

def get_related_training_data(question: str) -> vanna.types.TrainingData:

示例：

training_data = vn.get_related_training_data(question="What is the average salary of employees?")

获取与问题相关的训练数据。

参数:

question (str): The question to get related training data for.

TrainingData 或 None：相关的训练数据，如果发生错误则为 None。

def generate_meta(question: str) -> str:

示例：

vn.generate_meta(question="What tables are in the database?")
# Information about the tables in the database

使用Vanna.AI API生成关于数据库元数据的答案。

参数:

question (str): The question to generate an answer for.

str 或 None: 答案，如果发生错误则为 None。

def generate_followup_questions(question: str, df: pandas.core.frame.DataFrame) -> List[str]:

示例：

vn.generate_followup_questions(question="What is the average salary of employees?", df=df)
# ['What is the average salary of employees in the Sales department?', 'What is the average salary of employees in the Engineering department?', ...]

使用Vanna.AI API生成后续问题。

参数:

question (str): The question to generate follow-up questions for.
df (pd.DataFrame): The DataFrame to generate follow-up questions for.

List[str] 或 None: 后续问题，如果发生错误则为 None。

def generate_questions() -> List[str]:

示例：

vn.generate_questions()
# ['What is the average salary of employees?', 'What is the total salary of employees?', ...]

使用Vanna.AI API生成问题。

List[str] 或 None: 问题列表，如果发生错误则为 None。

def ask( question: Optional[str] = None, print_results: bool = True, auto_train: bool = True, generate_followups: bool = True) -> Optional[Tuple[Optional[str], Optional[pandas.core.frame.DataFrame], Optional[plotly.graph_objs._figure.Figure], Optional[List[str]]]]:

示例：

# RECOMMENDED IN A NOTEBOOK:
sql, df, fig, followup_questions = vn.ask()


sql, df, fig, followup_questions = vn.ask(question="What is the average salary of employees?")
# SELECT AVG(salary) FROM employees

使用Vanna.AI API提出问题。这将生成一个SQL查询，运行它，并将结果以数据框和Plotly图表的形式返回。如果将print_results设置为True，SQL、数据框和图表将输出到屏幕而不是返回。

参数:

question (str): The question to ask. If None, you will be prompted to enter a question.
print_results (bool): Whether to print the SQL query and results.
auto_train (bool): Whether to automatically train the model if the SQL query is incorrect.
generate_followups (bool): Whether to generate follow-up questions.

str 或 None: SQL查询，如果发生错误则为None。 pd.DataFrame 或 None: SQL查询的结果，如果发生错误则为None。 plotly.graph_objs.Figure 或 None: Plotly图表，如果发生错误则为None。 List[str] 或 None: 后续问题，如果发生错误则为None。

def generate_plotly_code( question: Optional[str], sql: Optional[str], df: pandas.core.frame.DataFrame, chart_instructions: Optional[str] = None) -> str:

示例：

vn.generate_plotly_code(
    question="What is the average salary of employees?",
    sql="SELECT AVG(salary) FROM employees",
    df=df
)
# fig = px.bar(df, x="name", y="salary")

使用Vanna.AI API生成Plotly代码。

参数:

question (str): The question to generate Plotly code for.
sql (str): The SQL query to generate Plotly code for.
df (pd.DataFrame): The dataframe to generate Plotly code for.
chart_instructions (str): Optional instructions for how to plot the chart.

str 或 None: Plotly 代码，如果发生错误则为 None。

def get_plotly_figure( plotly_code: str, df: pandas.core.frame.DataFrame, dark_mode: bool = True) -> plotly.graph_objs._figure.Figure:

示例：

fig = vn.get_plotly_figure(
    plotly_code="fig = px.bar(df, x='name', y='salary')",
    df=df
)
fig.show()

从数据框和Plotly代码中获取一个Plotly图表。

参数:

df (pd.DataFrame): The dataframe to use.
plotly_code (str): The Plotly code to use.

plotly.graph_objs.Figure: Plotly 图表。

def get_results(cs, default_database: str, sql: str) -> pandas.core.frame.DataFrame:

已弃用。请改用vn.run_sql。运行SQL查询并将结果作为pandas dataframe返回。这只是一个不使用Vanna.AI API的辅助函数。

参数:

cs: Snowflake connection cursor.
default_database (str): The default database to use.
sql (str): The SQL query to execute.

pd.DataFrame: SQL查询的结果。

def generate_explanation(sql: str) -> str:

示例：

vn.generate_explanation(sql="SELECT * FROM students WHERE name = 'John Doe'")
# 'This query selects all columns from the students table where the name is John Doe.'

使用Vanna.AI API生成SQL查询的解释。

参数:

sql (str): The SQL query to generate an explanation for.

str 或 None: 解释，如果发生错误则为 None。

def generate_question(sql: str) -> str:

示例：

vn.generate_question(sql="SELECT * FROM students WHERE name = 'John Doe'")
# 'What is the name of the student?'

使用Vanna.AI API从SQL查询生成一个问题。

参数:

sql (str): The SQL query to generate a question for.

str 或 None: 问题，如果发生错误则为 None。

def get_all_questions() -> pandas.core.frame.DataFrame:

从Vanna.AI API获取问题列表。

示例：

questions = vn.get_all_questions()

pd.DataFrame 或 None: 问题列表，如果发生错误则为 None。

def get_training_data() -> pandas.core.frame.DataFrame:

获取当前模型的训练数据

示例：

training_data = vn.get_training_data()

pd.DataFrame 或 None: 训练数据，如果发生错误则为 None。

def connect_to_sqlite(url: str):

连接到SQLite数据库。这只是一个辅助函数来设置[vn.run_sql][run_sql]

参数:

url (str): The URL of the database to connect to.

无

def connect_to_snowflake( account: str, username: str, password: str, database: str, schema: Optional[str] = None, role: Optional[str] = None):

使用Snowflake连接器连接到Snowflake。这只是一个辅助函数来设置[vn.run_sql][run_sql]

示例：

import snowflake.connector

vn.connect_to_snowflake(
    account="myaccount",
    username="myusername",
    password="mypassword",
    database="mydatabase",
    role="myrole",
)

参数:

account (str): The Snowflake account name.
username (str): The Snowflake username.
password (str): The Snowflake password.
database (str): The default database to use.
schema (Union[str, None], optional): The schema to use. Defaults to None.
role (Union[str, None], optional): The role to use. Defaults to None.

def connect_to_postgres( host: str = None, dbname: str = None, user: str = None, password: str = None, port: int = None):

使用psycopg2连接器连接到postgres。这只是一个辅助函数来设置[vn.run_sql][run_sql] 示例:

import psycopg2.connect
vn.connect_to_bigquery(
    host="myhost",
    dbname="mydatabase",
    user="myuser",
    password="mypassword",
    port=5432
)

参数:

host (str): The postgres host.
dbname (str): The postgres database name.
user (str): The postgres user.
password (str): The postgres password.
port (int): The postgres Port.

def connect_to_bigquery(cred_file_path: str = None, project_id: str = None):

使用 bigquery 连接器连接到 gcs。这只是一个辅助函数来设置 [vn.run_sql][run_sql] 示例:

import bigquery.Client
vn.connect_to_bigquery(
    project_id="myprojectid",
    cred_file_path="path/to/credentials.json",
)

参数:

project_id (str): The gcs project id.
cred_file_path (str): The gcs credential file path

vanna

源代码

基本用法

获取API密钥

设置模型

提出问题

训练

术语表

权限

开源与扩展

API 参考

参数:

返回：

参数:

返回：

参数:

返回：

参数:

返回：

参数:

返回：

参数:

参数:

返回：

参数:

返回：

参数:

返回：

返回：

参数:

参数:

参数:

返回：

参数:

参数:

参数:

返回：

参数:

返回：

参数:

返回：

参数:

返回：

返回：

参数:

返回：

参数:

返回：

参数:

返回：

参数:

返回：

参数:

返回：

参数:

返回：

返回：

返回：

参数:

返回：

参数:

参数:

参数: