SQL示例#
从SQL数据库中的第一个表获取并绘制数据
使用Graphistry预装的SQL ODBC驱动程序
默认:可视化Graphistry中使用的模式迁移表
展示了几种可视化模式 + 一个用于 sql->交互式可视化的便捷函数
尝试:修改指定的行以更改以可视化任何其他表
进一步阅读:- UI Guide - CSV上传笔记本应用
设置#
Graphistry#
[ ]:
import graphistry
#pip install graphistry -q
# To specify Graphistry account & server, use:
# graphistry.register(api=3, username='...', password='...', protocol='https', server='hub.graphistry.com')
# For more options, see https://github.com/graphistry/pygraphistry#configure
SQL连接字符串#
使用您自己的数据库连接字符串进行修改
对于更频繁的使用和共享,请参阅示例,了解如何在会话之间重用凭据的同时隐藏笔记本中的凭据
[ ]:
user = "graphistry"
pwd = "password"
server = "postgres:5432"
##OPTIONAL: Mount in installation's ${PWD}/.notebooks/db_secrets.json and read in
#import json
#with open('/home/graphistry/notebooks/db_secrets.json') as json_file:
# cfg = json.load(json_file)
# user = cfg['user']
# pwd = cfg['pwd']
# server = cfg['server']
#
## .. The first time you run this notebook, save the secret cfg to the system's persistent notebook folder:
#import json
#with open('/home/graphistry/notebooks/db_secrets.json', 'w') as outfile:
# json.dump({
# "user": "graphistry",
# "pwd": "password",
# "server": "postgres:5432"
# }, outfile)
## Delete ^^^ after use
db_string = "postgres://" + user + ":" + pwd + "@" + server
### Take care not to save a print of the result
[ ]:
# OPTIONAL: Install ODBC drivers in other environments:
# ! apt-get update
# ! apt-get install -y g++ unixodbc unixodbc-dev
# ! conda install -c anaconda pyodbc=4.0.26 sqlalchemy=1.3.5
连接到数据库#
[ ]:
import pandas as pd
import sqlalchemy as db
from sqlalchemy import create_engine, MetaData, Table, Column, Integer, Float, String
from sqlalchemy.orm import sessionmaker
[ ]:
engine = create_engine(db_string)
Session = sessionmaker(bind=engine)
session = Session()
检查可用表格#
[ ]:
table_names = engine.table_names()
", ".join(table_names)
可选:修改以选择您自己的表!#
[ ]:
if 'django_migrations' in table_names:
table = 'django_migrations'
else:
table = table_names[0]
初始化可视化:获取数据#
[ ]:
result = engine.execute("SELECT * FROM \"" + table + "\" LIMIT 1000")
df = pd.DataFrame(result.fetchall(), columns=result.keys())
print("table", table, '# rows', len(df))
df.sample(min(3, len(df)))
绘图#
几种变体:1. 将每行和单元格值视为节点,并连接行<>单元格值 2. 将每个单元格值视为节点,并在它们出现在同一行时连接所有单元格值 3. 将每个单元格值视为边,并指定在哪些列上连接值 4. 使用显式的节点/边表
1. 将每一行和单元格值视为一个节点,并连接行<>单元格值#
[ ]:
graphistry.hypergraph(df)['graph'].plot()
2. 将每个单元格值视为一个节点,并在它们出现在同一行时将所有单元格值连接在一起#
[ ]:
graphistry.hypergraph(df, direct=True)['graph'].plot()
3. 将每个单元格值视为边,并指定哪些列将值连接在一起#
[ ]:
graphistry.hypergraph(df, direct=True,
opts={
'EDGES': {
'id': ['name'],
'applied': ['name'],
'name': ['app']
}
})['graph'].plot()
4. 使用显式的节点/边表#
[ ]:
g = graphistry.bind(source='name', destination='app').edges(df.assign(name=df['name'].apply(lambda x: 'id_' + x)))
g.plot()
[ ]:
# Add node bindings..
nodes_df = pd.concat([
df[['name', 'id', 'applied']],
df[['app']].drop_duplicates().assign(\
id = df[['app']].drop_duplicates()['app'], \
name = df[['app']].drop_duplicates()['app'])
], ignore_index=True, sort=False)
g = g.bind(node='id', point_title='name').nodes(nodes_df)
g.plot()
便捷函数#
[ ]:
def explore(sql, *args, **kvargs):
result = engine.execute(sql)
df = pd.DataFrame(result.fetchall(), columns=result.keys())
print('# rows', len(df))
g = graphistry.hypergraph(df, *args, **kvargs)['graph']
return g
简单使用#
[ ]:
explore("SELECT * FROM django_migrations LIMIT 1000").plot()
传入 graphistry.hypergraph() 选项#
[ ]:
explore("SELECT * FROM django_migrations LIMIT 1000", direct=True).plot()
获取数据#
[ ]:
explore("SELECT * FROM django_migrations LIMIT 1000")._nodes
更多文档#
[ ]: