Graphistry Neptune Gremlin 身份图演示#

PyGraphistry 帮助连接到图数据源，使用 Python 数据框工具处理它们，并使用 Graphistry 进行可视化。它通常用于笔记本、数据应用和仪表板。

本笔记本使用PyGraphistry快速实现以下功能：* 连接到Neptune * 运行Gremlin查询，通过内置绑定在gremlin_python上 * 转换为数据框进行数据处理：CPU通过Pandas，GPU通过RAPIDS cuDF * 可视化，自动生成丰富的、交互式的、GPU加速的Graphistry图形可视化会话 * 分享与嵌入您的美观结果

对于下面使用的任何API，运行help(graphistry.the_method)以快速查看其文档

演示使用的是来自我们联合graph-app-kit教程的AWS Neptune身份图数据样本。如果您有自己的数据集，包括非身份数据，示例查询应该仍然有效。

设置#

可选 - 通过graph-app-kit for Neptune快速启动：* Neptune：已在Neptune的身份图数据库示例套件上测试，您可以替换为您自己的 * Graphistry：使用您自己的，获取免费Hub账户，或在AWS中与Neptune的VPC和公共子网一起启动 * Notebook：使用您自己的，或在AWS中与Neptune的VPC和公共子网一起启动

如果你遇到gremlinpython事件运行时错误，尝试这个gist来解决它们

安装#

已在graphistry环境中提供

[1]:

# ! pip install -u gremlinpython graphistry
# ! pip install -u pandas
# see https://rapids.ai/ if trying GPU dataframes

导入#

[2]:

! pip show gremlinpython graphistry | grep 'Name\|Version'

Name: gremlinpython
Version: 3.4.10
Name: graphistry
Version: 0.19.0+5.g5ce1d3fb0

[3]:

import graphistry
graphistry.__version__

[3]:

'0.19.0+5.g5ce1d3fb0'

配置#

[4]:

# To specify Graphistry account & server, use:
# graphistry.register(api=3, username='...', password='...', protocol='https', server='hub.graphistry.com')
# For more options, see https://github.com/graphistry/pygraphistry#configure

[29]:

NEPTUNE_READER_PROTOCOL='wss'
NEPTUNE_READER_HOST='neptunedbcluster-abc.cluster-ro-xyz.us-east-1.neptune.amazonaws.com'
NEPTUNE_READER_PORT='8182'

endpoint = f'{NEPTUNE_READER_PROTOCOL}://{NEPTUNE_READER_HOST}:{NEPTUNE_READER_PORT}/gremlin'
endpoint

[29]:

'wss://neptunedbcluster-abc.cluster-ro-xyz.us-east-1.neptune.amazonaws.com:8182/gremlin'

[6]:

#import logging
#logging.basicConfig(level=logging.DEBUG)

连接#

[7]:

graphistry.register(**GRAPHISTRY_CFG)

g = graphistry.neptune(endpoint=endpoint)

g._gremlin_client

[7]:

<gremlin_python.driver.client.Client at 0x7fdfc230e3d0>

查询与绘图#

PyGraphistry 自动将 gremlin 结果转换为节点/边数据框
边缘查询通常只返回节点ID；调用fetch_nodes()来丰富你的g._nodes数据框
PyGraphistry 绘制数据框

[25]:

%%time

g2 = g.gremlin('g.E().limit(10000)')

CPU times: user 4.96 s, sys: 27.9 ms, total: 4.99 s
Wall time: 4.95 s

[26]:

print('NODES:')
g2._nodes.info()
g2._nodes.sample(3)

NODES:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8106 entries, 0 to 8105
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   id      8106 non-null   object
 1   label   8106 non-null   object
dtypes: object(2)
memory usage: 126.8+ KB

[26]:

	id	label
4102	ed95a9a5be30e4c8/e212d4b4d4a865a/7e3e41e09dfe6...	website
6496	6ea77fc3ea42bd5b/87be29bd5615083/d4392e74543e413	website
7540	4c980617e02858a4/7de2f069da3a3655/30591f4d8c71...	website

[27]:

print('EDGES:')
print(g2._edges.info())

g2._edges.sample(3)

EDGES:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   id      10000 non-null  object
 1   label   10000 non-null  object
 2   src     10000 non-null  object
 3   dst     10000 non-null  object
dtypes: object(4)
memory usage: 312.6+ KB
None

[27]:

	id	label	src	dst
2814	f7803bf0ac187592421c0695792b698f43b596ce	visited	556de63e26686d50/95263499b67bbda1?f300c39f4f33...	48e740025e70e4e38dc87928cd45357c
8081	fe80cddfec97a7dd802cf93cf277da01d9b5fb65	visited	3ccec85ce35ea661?fa76e6024017220f	23c31ea91be100fd224dff1499939851
2046	4e5290971de41c1e1bcb7433e53ffc6321e410cf	visited	6ea77fc3ea42bd5b/9c280de73bf0fb32/bb555a4d63de...	9e77c2a52fdf9f9b7416e85cabaf7c76

[28]:

%%time

# Enrich nodes dataframe with any available server property data

g3 = g2.fetch_nodes()

print(g3._nodes.info())

g3._nodes.sample(3)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8106 entries, 0 to 8105
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   id      8106 non-null   object
 1   label   8106 non-null   object
dtypes: object(2)
memory usage: 126.8+ KB
None
CPU times: user 4.32 s, sys: 43.9 ms, total: 4.37 s
Wall time: 4.33 s

[28]:

	id	label
1242	4c980617e02858a4/7de2f069da3a3655/30591f4d8c71...	website
3190	493a46bbfd2029ae/4a0cad2f071a71ce/f9ba18598922...	website
6782	4c980617e02858a4/7de2f069da3a3655/30591f4d8c71...	website

[19]:

%%time

g3.plot()

CPU times: user 59.8 ms, sys: 4 ms, total: 63.8 ms
Wall time: 1.68 s

[19]:

自定义您的视觉效果 & 嵌入#

Graphistry 使用智能默认值可视化数据：基于社区的着色、基于度的大小、力导向布局、自动缩放和内置的视觉分析。然而，提前配置视觉效果通常会有所帮助。

示例：* 在新列“type”上启用图例 * 根据节点列“type”为节点着色 * 根据节点类型选择图标 * 设置背景颜色以匹配笔记本 * 使用更紧凑的布局

更多示例请参见 PyGraphistry github repo

[24]:

%%time

g4 = (g3

      # Add node column 'type' based on gremlin-provided column 'label'
      # The legend auto-detects this column and appears
      .nodes(lambda g: g._nodes.assign(type=g._nodes['label']))

      .encode_point_color('type', categorical_mapping={
          'website': 'blue',
          'transientId': 'green'
      })

      .encode_point_icon('type', categorical_mapping ={
          'website': 'link',
          'transientId': 'barcode'
      })

      .addStyle(bg={'color': '#eee'}, page={'title': 'My Graph'})

      # More: https://hub.graphistry.com/docs/api/1/rest/url/
      .settings(url_params={'play': 2000})
)

g4.plot()

CPU times: user 63.5 ms, sys: 3.88 ms, total: 67.3 ms
Wall time: 1.62 s

[24]:

为其他系统生成URL#

[23]:

%%time

url = g4.plot(render=False)

url

CPU times: user 64.8 ms, sys: 0 ns, total: 64.8 ms
Wall time: 1.67 s

[23]:

'https://hub.graphistry.com/graph/graph.html?dataset=7405d0ac396a47ea9ee84acab7b0b31d&type=arrow&viztoken=c5e68946-e922-487e-9484-ef8fc9e2c8f9&usertag=5bf3845f-pygraphistry-0.19.0+5.g5ce1d3fb0&splashAfter=1625879227&info=true&strongGravity=False&play=2000'

下一步#

深入了解 PyGraphistry：自定义示例、GPU图分析等
探索 gremlinpython
使用graph-app-kit / Streamlit的Neptune集成进行仪表板制作
- Amazon Neptune的发布公告与教程
在Hub上尝试CSV上传或启动您自己的Graphistry服务器
Additional Graphistry APIs: REST, React, JS, …

[ ]:

Graphistry Neptune Gremlin 身份图演示

目录

Graphistry Neptune Gremlin 身份图演示#

设置#

安装#

导入#

配置#

连接#

查询与绘图#

自定义您的视觉效果 & 嵌入#

为其他系统生成URL#

下一步#