教程:使用Graphistry和Neo4j可视化丝绸之路区块链#
通过使用Graphistry直接可视化cypher(BOLT)查询结果,调查大型数据集变得更加容易。本教程将引导您完成查询Neo4j、可视化结果以及额外的配置和查询。
此分析基于Graphistry团队围绕法庭程序进行的区块链数据提取,当时关键DEA特工Carl Force在丝绸之路案件中因从Ross Ulbricht(恐惧海盗罗伯茨)那里挪用资金而被判刑。我们探讨如何重现这一分析,并确定Carl在最初挪用资金后,钱去了哪里。
说明 * 阅读各个单元格 * 点击预构建的可视化以启动它们,并像使用Google Maps一样与它们互动 * 要自己尝试,请设置您自己的Neo4j实例并获取Graphistry API密钥,然后运行数据加载单元格
进一步阅读
Python 客户端教程和演示:graphistry/pygraphistry
Graphistry API 密钥: https://www.graphistry.com/api-request
Neo4j即服务:http://graphstory.com
配置#
安装依赖项
在首次运行非Graphistry笔记本服务器时:
取消注释并运行前两行
从顶部菜单重新启动你的Python内核运行时
对于高级替代安装,请参阅后续的注释行
[ ]:
#!pip install --user pandas
#!pip install --user graphistry[bolt]
### ADVANCED:
### If you already have the neo4j python driver, you can leave out '[bolt]':
### !pip install --user graphistry
### If you already have graphistry but not neo4j, you can reuse your existing graphistry:
### !pip install --user neo4j
导入 & 测试
[ ]:
import pandas as pd
import neo4j # just for testing
from neo4j import GraphDatabase # for data loader
import graphistry
print('neo4j', neo4j.__version__)
print('graphistry', graphistry.__version__)
连接
如果您的Neo4j连接关闭,您可能需要重新连接
取消注释以下部分以用于非Graphistry笔记本服务器
[ ]:
NEO4J = {
'uri': "bolt://my.site.COM:7687",
'auth': ("neo4j", "myalphapwd1")
}
graphistry.register(bolt=NEO4J)
# To specify Graphistry account & server, use:
# graphistry.register(api=3, username='...', password='...', protocol='https', server='hub.graphistry.com')
# For more options, see https://github.com/graphistry/pygraphistry#configure
可选:将受污染的交易加载到您自己的Neo4j数据库中#
要填充您自己的Neo4j实例,请将一个或两个顶部命令设置为True
[ ]:
DELETE_EXISTING_DATABASE=False
POPULATE_DATABASE=False
if DELETE_EXISTING_DATABASE:
driver = GraphDatabase.driver(**NEO4J)
with driver.session() as session:
# split into 2 transancations case of memory limit errors
print('Deleting existing transactions')
tx = session.begin_transaction()
tx.run("""MATCH (a:Account)-[r]->(b) DELETE r""")
tx.commit()
print('Deleting existing accounts')
tx = session.begin_transaction()
tx.run("""MATCH (a:Account) DELETE a""")
tx.commit()
print('Delete successful')
if POPULATE_DATABASE:
edges = pd.read_csv('https://www.dropbox.com/s/q1daa707y99ind9/edges.csv?dl=1')
edges = edges.rename(columns={'Amount $': "USD", 'Transaction ID': 'Transaction'})[['USD', 'Date', 'Source', 'Destination', 'Transaction']]
id_len = len(edges['Source'][0].split('...')[0]) #truncate IDs (dirty data)
edges = edges.assign(
Source=edges['Source'].apply(lambda id: id[:id_len]),
Destination=edges['Destination'].apply(lambda id: id[:id_len]))
ROSS_FULL='2a37b3bdca935152335c2097e5da367db24209cc'
ROSS = ROSS_FULL[:32]
CARL_FULL = 'b2233dd22ade4c9978ec1fd1fbb36eb7f9b4609e'
CARL = CARL_FULL[:32]
CARL_NICK = 'Carl Force (DEA)'
ROSS_NICK = 'Ross Ulbricht (SilkRoad)'
nodes = pd.read_csv('https://www.dropbox.com/s/nf796f1asow8tx7/nodes.csv?dl=1')
nodes = nodes.rename(columns={'Balance $': 'USD', 'Balance (avg) $': 'USD_avg', 'Balance (max) $': 'USD_max', 'Tainted Coins': 'Tainted_Coins'})[['Account', 'USD', 'USD_avg', 'USD_max', 'Tainted_Coins']]
nodes['Account'] = nodes['Account'].apply(lambda id: id[:id_len])
nodes['Account'] = nodes['Account'].apply(lambda id: CARL_NICK if id == CARL else ROSS_NICK if id == ROSS else id)
driver = GraphDatabase.driver(**NEO4J)
with driver.session() as session:
tx = session.begin_transaction()
print('Loading', len(nodes), 'accounts')
for index, row in nodes.iterrows():
if index % 2000 == 0:
print('Committing', index - 2000, '...', index)
tx.commit()
tx = session.begin_transaction()
tx.run("""
CREATE (a:Account {
Account: $Account,
USD: $USD, USD_avg: $USD_avg, USD_max: $USD_max, Tainted_Coins: $Tainted_Coins
})
RETURN id(a)
""", **row)
if index % 2000 == 0:
print(index)
print('Committing rest')
tx.commit()
tx = session.begin_transaction()
print('Creating index on Account')
tx.run(""" CREATE INDEX ON :Account(Account) """)
tx.commit()
STATUS=1000
BATCH=2000
driver = GraphDatabase.driver(**NEO4J)
with driver.session() as session:
tx = session.begin_transaction()
print('Loading', len(edges), 'transactions')
for index, row in edges.iterrows():
tx.run("""MATCH (a:Account),(b:Account)
WHERE a.Account = $Source AND b.Account = $Destination
CREATE (a)-[r:PAYMENT {
Source: $Source, Destination: $Destination, USD: $USD, Date: $Date, Transaction: $Transaction
}]->(b)
""", **row)
if index % STATUS == 0:
print(index)
if index % BATCH == 0 and index > 0:
print('sending batch out')
tx.commit()
print('... done')
tx = session.begin_transaction()
tx.commit()
Cypher 演示#
1a. 热身:可视化所有$7K - $10K的交易#
尝试平移和缩放(与Google Maps相同的触控板/鼠标控制),并点击单个钱包和交易。
[ ]:
g = graphistry.cypher("""
MATCH (a)-[r:PAYMENT]->(b) WHERE r.USD > 7000 AND r.USD < 10000 RETURN a, r, b ORDER BY r.USD DESC
""")
[ ]:
g.plot()
截图
1b. 清理:配置节点和边标题以使用金额字段#
静态配置:我们可以直接在笔记本中预先配置可视化
动态配置: 尝试在工具中动态即时改进可视化效果
在
edge:USD
和point:USD_MAX
上执行add histogram for...
使用它们设置边缘/点着色,并选择“渐变(Spectral7 7)”混合,然后切换以反转顺序(从冷到热)。
对于
point:USD_MAX
,切换它以控制点大小,并在Scene settings
中,增加点大小滑块
[ ]:
g = g\
.bind(point_title='Account')\
.bind(edge_title='USD')
g.plot()
2. 查找所有与贪污的DEA特工Carl Force相关的1-5跳交易#
2a. 下游#
卡尔的大部分钱去了哪里?* 尝试在edge:USD
上设置过滤器,以区分小额和大额资金流动。
[ ]:
g.cypher("""
match (a)-[r:PAYMENT*1..20]->(b)
where a.Account = $root and ALL(transfer IN r WHERE transfer.USD > $min_amount and transfer.USD < $max_amount )
return a, r, b
""",
{'root': "Carl Force (DEA)",
'min_amount': 999,
'max_amount': 99999}).plot()
截图:
2b. 上游#
卡尔从哪里获得了大部分的钱?
[ ]:
g.cypher("""
match (a)-[r:PAYMENT*1..10]->(b)
where b.Account=$sink and ALL(transfer IN r WHERE transfer.USD > $min_amount and transfer.USD < $max_amount )
return r, a, b
""",
{'sink': "Carl Force (DEA)",
'min_amount': 1999,
'max_amount': 99999}).plot()
截图:
3. 丝绸之路与卡尔·福斯之间的路径#
[ ]:
g.cypher("match (a)-[r:PAYMENT*1..10]->(b) where a.Account=$silk and b.Account=$dea return r, a, b",
{'dea': "Carl Force (DEA)", "silk": "Ross Ulbricht (SilkRoad)"}).plot()