SQL示例#

从SQL数据库中的第一个表获取并绘制数据

  • 使用Graphistry预装的SQL ODBC驱动程序

  • 默认:可视化Graphistry中使用的模式迁移表

  • 展示了几种可视化模式 + 一个用于 sql->交互式可视化的便捷函数

  • 尝试:修改指定的行以更改以可视化任何其他表

进一步阅读:- UI Guide - CSV上传笔记本应用

设置#

Graphistry#

[ ]:
import graphistry

#pip install graphistry -q

# To specify Graphistry account & server, use:
# graphistry.register(api=3, username='...', password='...', protocol='https', server='hub.graphistry.com')
# For more options, see https://github.com/graphistry/pygraphistry#configure

SQL连接字符串#

  • 使用您自己的数据库连接字符串进行修改

  • 对于更频繁的使用和共享,请参阅示例,了解如何在会话之间重用凭据的同时隐藏笔记本中的凭据

[ ]:
user = "graphistry"
pwd = "password"
server = "postgres:5432"

##OPTIONAL: Mount in installation's ${PWD}/.notebooks/db_secrets.json and read in
#import json
#with open('/home/graphistry/notebooks/db_secrets.json') as json_file:
#    cfg = json.load(json_file)
#    user = cfg['user']
#    pwd = cfg['pwd']
#    server = cfg['server']
#
## .. The first time you run this notebook, save the secret cfg to the system's persistent notebook folder:
#import json
#with open('/home/graphistry/notebooks/db_secrets.json', 'w') as outfile:
#    json.dump({
#        "user": "graphistry",
#        "pwd": "password",
#        "server": "postgres:5432"
#    }, outfile)
## Delete ^^^ after use

db_string = "postgres://" + user + ":" + pwd + "@" + server
### Take care not to save a print of the result
[ ]:
# OPTIONAL: Install ODBC drivers in other environments:
# ! apt-get update
# ! apt-get install -y g++ unixodbc unixodbc-dev
# ! conda install -c anaconda pyodbc=4.0.26 sqlalchemy=1.3.5

连接到数据库#

[ ]:
import pandas as pd
import sqlalchemy as db
from sqlalchemy import create_engine, MetaData, Table, Column, Integer, Float, String
from sqlalchemy.orm import sessionmaker
[ ]:
engine = create_engine(db_string)
Session = sessionmaker(bind=engine)
session = Session()

检查可用表格#

[ ]:
table_names = engine.table_names()
", ".join(table_names)

可选:修改以选择您自己的表!#

[ ]:
if 'django_migrations' in table_names:
    table = 'django_migrations'
else:
    table = table_names[0]

初始化可视化:获取数据#

[ ]:
result = engine.execute("SELECT * FROM \"" + table + "\" LIMIT 1000")
df = pd.DataFrame(result.fetchall(), columns=result.keys())
print("table", table, '# rows', len(df))
df.sample(min(3, len(df)))

绘图#

几种变体:1. 将每行和单元格值视为节点,并连接行<>单元格值 2. 将每个单元格值视为节点,并在它们出现在同一行时连接所有单元格值 3. 将每个单元格值视为边,并指定在哪些列上连接值 4. 使用显式的节点/边表

1. 将每一行和单元格值视为一个节点,并连接行<>单元格值#

[ ]:
graphistry.hypergraph(df)['graph'].plot()

2. 将每个单元格值视为一个节点,并在它们出现在同一行时将所有单元格值连接在一起#

[ ]:
graphistry.hypergraph(df, direct=True)['graph'].plot()

3. 将每个单元格值视为边,并指定哪些列将值连接在一起#

[ ]:
graphistry.hypergraph(df, direct=True,
    opts={
        'EDGES': {
            'id': ['name'],
            'applied': ['name'],
            'name': ['app']
        }
    })['graph'].plot()

4. 使用显式的节点/边表#

[ ]:
g = graphistry.bind(source='name', destination='app').edges(df.assign(name=df['name'].apply(lambda x: 'id_' + x)))
g.plot()
[ ]:
# Add node bindings..
nodes_df = pd.concat([
    df[['name', 'id', 'applied']],
    df[['app']].drop_duplicates().assign(\
          id = df[['app']].drop_duplicates()['app'], \
        name = df[['app']].drop_duplicates()['app'])
], ignore_index=True, sort=False)

g = g.bind(node='id', point_title='name').nodes(nodes_df)

g.plot()

便捷函数#

[ ]:
def explore(sql, *args, **kvargs):
    result = engine.execute(sql)
    df = pd.DataFrame(result.fetchall(), columns=result.keys())
    print('# rows', len(df))
    g = graphistry.hypergraph(df, *args, **kvargs)['graph']
    return g

简单使用#

[ ]:
explore("SELECT * FROM django_migrations LIMIT 1000").plot()

传入 graphistry.hypergraph() 选项#

[ ]:
explore("SELECT * FROM django_migrations LIMIT 1000", direct=True).plot()

获取数据#

[ ]:
explore("SELECT * FROM django_migrations LIMIT 1000")._nodes

更多文档#

[ ]: