Graphistry Neptune Gremlin 身份图演示#
PyGraphistry 帮助连接到图数据源,使用 Python 数据框工具处理它们,并使用 Graphistry 进行可视化。它通常用于笔记本、数据应用和仪表板。
本笔记本使用PyGraphistry快速实现以下功能:* 连接到Neptune * 运行Gremlin查询,通过内置绑定在gremlin_python上 * 转换为数据框进行数据处理:CPU通过Pandas,GPU通过RAPIDS cuDF * 可视化,自动生成丰富的、交互式的、GPU加速的Graphistry图形可视化会话 * 分享与嵌入您的美观结果
对于下面使用的任何API,运行help(graphistry.the_method)
以快速查看其文档
演示使用的是来自我们联合graph-app-kit教程的AWS Neptune身份图数据样本。如果您有自己的数据集,包括非身份数据,示例查询应该仍然有效。
设置#
可选 - 通过graph-app-kit for Neptune快速启动:* Neptune:已在Neptune的身份图数据库示例套件上测试,您可以替换为您自己的 * Graphistry:使用您自己的,获取免费Hub账户,或在AWS中与Neptune的VPC和公共子网一起启动 * Notebook:使用您自己的,或在AWS中与Neptune的VPC和公共子网一起启动
如果你遇到gremlinpython
事件运行时错误,尝试这个gist来解决它们
安装#
已在graphistry环境中提供
[1]:
# ! pip install -u gremlinpython graphistry
# ! pip install -u pandas
# see https://rapids.ai/ if trying GPU dataframes
导入#
[2]:
! pip show gremlinpython graphistry | grep 'Name\|Version'
Name: gremlinpython
Version: 3.4.10
Name: graphistry
Version: 0.19.0+5.g5ce1d3fb0
[3]:
import graphistry
graphistry.__version__
[3]:
'0.19.0+5.g5ce1d3fb0'
配置#
[4]:
# To specify Graphistry account & server, use:
# graphistry.register(api=3, username='...', password='...', protocol='https', server='hub.graphistry.com')
# For more options, see https://github.com/graphistry/pygraphistry#configure
[29]:
NEPTUNE_READER_PROTOCOL='wss'
NEPTUNE_READER_HOST='neptunedbcluster-abc.cluster-ro-xyz.us-east-1.neptune.amazonaws.com'
NEPTUNE_READER_PORT='8182'
endpoint = f'{NEPTUNE_READER_PROTOCOL}://{NEPTUNE_READER_HOST}:{NEPTUNE_READER_PORT}/gremlin'
endpoint
[29]:
'wss://neptunedbcluster-abc.cluster-ro-xyz.us-east-1.neptune.amazonaws.com:8182/gremlin'
[6]:
#import logging
#logging.basicConfig(level=logging.DEBUG)
连接#
[7]:
graphistry.register(**GRAPHISTRY_CFG)
g = graphistry.neptune(endpoint=endpoint)
g._gremlin_client
[7]:
<gremlin_python.driver.client.Client at 0x7fdfc230e3d0>
查询与绘图#
PyGraphistry 自动将 gremlin 结果转换为节点/边数据框
边缘查询通常只返回节点ID;调用
fetch_nodes()
来丰富你的g._nodes
数据框PyGraphistry 绘制数据框
[25]:
%%time
g2 = g.gremlin('g.E().limit(10000)')
CPU times: user 4.96 s, sys: 27.9 ms, total: 4.99 s
Wall time: 4.95 s
[26]:
print('NODES:')
g2._nodes.info()
g2._nodes.sample(3)
NODES:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8106 entries, 0 to 8105
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 8106 non-null object
1 label 8106 non-null object
dtypes: object(2)
memory usage: 126.8+ KB
[26]:
id | label | |
---|---|---|
4102 | ed95a9a5be30e4c8/e212d4b4d4a865a/7e3e41e09dfe6... | website |
6496 | 6ea77fc3ea42bd5b/87be29bd5615083/d4392e74543e413 | website |
7540 | 4c980617e02858a4/7de2f069da3a3655/30591f4d8c71... | website |
[27]:
print('EDGES:')
print(g2._edges.info())
g2._edges.sample(3)
EDGES:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 10000 non-null object
1 label 10000 non-null object
2 src 10000 non-null object
3 dst 10000 non-null object
dtypes: object(4)
memory usage: 312.6+ KB
None
[27]:
id | label | src | dst | |
---|---|---|---|---|
2814 | f7803bf0ac187592421c0695792b698f43b596ce | visited | 556de63e26686d50/95263499b67bbda1?f300c39f4f33... | 48e740025e70e4e38dc87928cd45357c |
8081 | fe80cddfec97a7dd802cf93cf277da01d9b5fb65 | visited | 3ccec85ce35ea661?fa76e6024017220f | 23c31ea91be100fd224dff1499939851 |
2046 | 4e5290971de41c1e1bcb7433e53ffc6321e410cf | visited | 6ea77fc3ea42bd5b/9c280de73bf0fb32/bb555a4d63de... | 9e77c2a52fdf9f9b7416e85cabaf7c76 |
[28]:
%%time
# Enrich nodes dataframe with any available server property data
g3 = g2.fetch_nodes()
print(g3._nodes.info())
g3._nodes.sample(3)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8106 entries, 0 to 8105
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 8106 non-null object
1 label 8106 non-null object
dtypes: object(2)
memory usage: 126.8+ KB
None
CPU times: user 4.32 s, sys: 43.9 ms, total: 4.37 s
Wall time: 4.33 s
[28]:
id | label | |
---|---|---|
1242 | 4c980617e02858a4/7de2f069da3a3655/30591f4d8c71... | website |
3190 | 493a46bbfd2029ae/4a0cad2f071a71ce/f9ba18598922... | website |
6782 | 4c980617e02858a4/7de2f069da3a3655/30591f4d8c71... | website |
[19]:
%%time
g3.plot()
CPU times: user 59.8 ms, sys: 4 ms, total: 63.8 ms
Wall time: 1.68 s
[19]:
自定义您的视觉效果 & 嵌入#
Graphistry 使用智能默认值可视化数据:基于社区的着色、基于度的大小、力导向布局、自动缩放和内置的视觉分析。然而,提前配置视觉效果通常会有所帮助。
示例:* 在新列“type”上启用图例 * 根据节点列“type”为节点着色 * 根据节点类型选择图标 * 设置背景颜色以匹配笔记本 * 使用更紧凑的布局
更多示例请参见 PyGraphistry github repo
[24]:
%%time
g4 = (g3
# Add node column 'type' based on gremlin-provided column 'label'
# The legend auto-detects this column and appears
.nodes(lambda g: g._nodes.assign(type=g._nodes['label']))
.encode_point_color('type', categorical_mapping={
'website': 'blue',
'transientId': 'green'
})
.encode_point_icon('type', categorical_mapping ={
'website': 'link',
'transientId': 'barcode'
})
.addStyle(bg={'color': '#eee'}, page={'title': 'My Graph'})
# More: https://hub.graphistry.com/docs/api/1/rest/url/
.settings(url_params={'play': 2000})
)
g4.plot()
CPU times: user 63.5 ms, sys: 3.88 ms, total: 67.3 ms
Wall time: 1.62 s
[24]:
为其他系统生成URL#
[23]:
%%time
url = g4.plot(render=False)
url
CPU times: user 64.8 ms, sys: 0 ns, total: 64.8 ms
Wall time: 1.67 s
[23]:
'https://hub.graphistry.com/graph/graph.html?dataset=7405d0ac396a47ea9ee84acab7b0b31d&type=arrow&viztoken=c5e68946-e922-487e-9484-ef8fc9e2c8f9&usertag=5bf3845f-pygraphistry-0.19.0+5.g5ce1d3fb0&splashAfter=1625879227&info=true&strongGravity=False&play=2000'
下一步#
深入了解 PyGraphistry:自定义示例、GPU图分析等
使用graph-app-kit / Streamlit的Neptune集成进行仪表板制作
Amazon Neptune的发布公告与教程
Additional Graphistry APIs: REST, React, JS, …
[ ]: