教程:使用Graphistry和Neo4j可视化丝绸之路区块链#

通过使用Graphistry直接可视化cypher(BOLT)查询结果,调查大型数据集变得更加容易。本教程将引导您完成查询Neo4j、可视化结果以及额外的配置和查询。

此分析基于Graphistry团队围绕法庭程序进行的区块链数据提取,当时关键DEA特工Carl Force在丝绸之路案件中因从Ross Ulbricht(恐惧海盗罗伯茨)那里挪用资金而被判刑。我们探讨如何重现这一分析,并确定Carl在最初挪用资金后,钱去了哪里。

说明 * 阅读各个单元格 * 点击预构建的可视化以启动它们,并像使用Google Maps一样与它们互动 * 要自己尝试,请设置您自己的Neo4j实例并获取Graphistry API密钥,然后运行数据加载单元格

进一步阅读

配置#

安装依赖项

  • 在首次运行非Graphistry笔记本服务器时:

    1. 取消注释并运行前两行

    2. 从顶部菜单重新启动你的Python内核运行时

  • 对于高级替代安装,请参阅后续的注释行

[ ]:
#!pip install --user pandas
#!pip install --user graphistry[bolt]

### ADVANCED:
### If you already have the neo4j python driver, you can leave out '[bolt]':
### !pip install --user graphistry
### If you already have graphistry but not neo4j, you can reuse your existing graphistry:
### !pip install --user neo4j

导入 & 测试

[ ]:
import pandas as pd
import neo4j # just for testing
from neo4j import GraphDatabase # for data loader
import graphistry
print('neo4j', neo4j.__version__)
print('graphistry', graphistry.__version__)

连接

  • 如果您的Neo4j连接关闭,您可能需要重新连接

  • 取消注释以下部分以用于非Graphistry笔记本服务器

[ ]:
NEO4J = {
    'uri': "bolt://my.site.COM:7687",
    'auth': ("neo4j", "myalphapwd1")
}

graphistry.register(bolt=NEO4J)

# To specify Graphistry account & server, use:
# graphistry.register(api=3, username='...', password='...', protocol='https', server='hub.graphistry.com')
# For more options, see https://github.com/graphistry/pygraphistry#configure

可选:将受污染的交易加载到您自己的Neo4j数据库中#

要填充您自己的Neo4j实例,请将一个或两个顶部命令设置为True

[ ]:
DELETE_EXISTING_DATABASE=False
POPULATE_DATABASE=False

if DELETE_EXISTING_DATABASE:
    driver = GraphDatabase.driver(**NEO4J)
    with driver.session() as session:
        # split into 2 transancations case of memory limit errors
        print('Deleting existing transactions')
        tx = session.begin_transaction()
        tx.run("""MATCH (a:Account)-[r]->(b) DELETE r""")
        tx.commit()
        print('Deleting existing accounts')
        tx = session.begin_transaction()
        tx.run("""MATCH (a:Account) DELETE a""")
        tx.commit()
        print('Delete successful')

if POPULATE_DATABASE:
    edges = pd.read_csv('https://www.dropbox.com/s/q1daa707y99ind9/edges.csv?dl=1')
    edges = edges.rename(columns={'Amount $': "USD", 'Transaction ID': 'Transaction'})[['USD', 'Date', 'Source', 'Destination', 'Transaction']]
    id_len = len(edges['Source'][0].split('...')[0]) #truncate IDs (dirty data)
    edges = edges.assign(
    Source=edges['Source'].apply(lambda id: id[:id_len]),
    Destination=edges['Destination'].apply(lambda id: id[:id_len]))
    ROSS_FULL='2a37b3bdca935152335c2097e5da367db24209cc'
    ROSS = ROSS_FULL[:32]
    CARL_FULL = 'b2233dd22ade4c9978ec1fd1fbb36eb7f9b4609e'
    CARL = CARL_FULL[:32]
    CARL_NICK = 'Carl Force (DEA)'
    ROSS_NICK = 'Ross Ulbricht (SilkRoad)'
    nodes = pd.read_csv('https://www.dropbox.com/s/nf796f1asow8tx7/nodes.csv?dl=1')
    nodes = nodes.rename(columns={'Balance $': 'USD', 'Balance (avg) $': 'USD_avg', 'Balance (max) $': 'USD_max', 'Tainted Coins': 'Tainted_Coins'})[['Account', 'USD', 'USD_avg', 'USD_max', 'Tainted_Coins']]
    nodes['Account'] = nodes['Account'].apply(lambda id: id[:id_len])
    nodes['Account'] = nodes['Account'].apply(lambda id: CARL_NICK if id == CARL else ROSS_NICK if id == ROSS else id)
    driver = GraphDatabase.driver(**NEO4J)
    with driver.session() as session:
        tx = session.begin_transaction()
        print('Loading', len(nodes), 'accounts')
        for index, row in nodes.iterrows():
            if index % 2000 == 0:
                print('Committing', index - 2000, '...', index)
                tx.commit()
                tx = session.begin_transaction()
            tx.run("""
            CREATE (a:Account {
              Account: $Account,
              USD: $USD, USD_avg: $USD_avg, USD_max: $USD_max, Tainted_Coins: $Tainted_Coins
            })
            RETURN id(a)
            """, **row)
            if index % 2000 == 0:
                print(index)
        print('Committing rest')
        tx.commit()
        tx = session.begin_transaction()
        print('Creating index on Account')
        tx.run("""  CREATE INDEX ON :Account(Account)  """)
        tx.commit()
    STATUS=1000
    BATCH=2000
    driver = GraphDatabase.driver(**NEO4J)

    with driver.session() as session:
        tx = session.begin_transaction()
        print('Loading', len(edges), 'transactions')
        for index, row in edges.iterrows():
            tx.run("""MATCH (a:Account),(b:Account)
                  WHERE a.Account = $Source AND b.Account = $Destination
                  CREATE (a)-[r:PAYMENT {
                    Source: $Source, Destination: $Destination, USD: $USD, Date: $Date, Transaction: $Transaction
                  }]->(b)
                  """, **row)
            if index % STATUS == 0:
                print(index)
            if index % BATCH == 0 and index > 0:
                print('sending batch out')
                tx.commit()
                print('... done')
                tx = session.begin_transaction()
        tx.commit()

Cypher 演示#

1a. 热身:可视化所有$7K - $10K的交易#

尝试平移和缩放(与Google Maps相同的触控板/鼠标控制),并点击单个钱包和交易。

[ ]:
g = graphistry.cypher("""
      MATCH (a)-[r:PAYMENT]->(b) WHERE r.USD > 7000 AND r.USD < 10000  RETURN a, r, b ORDER BY r.USD DESC
  """)
[ ]:
g.plot()

截图 比特币交易在7K到10K之间

1b. 清理:配置节点和边标题以使用金额字段#

  • 静态配置:我们可以直接在笔记本中预先配置可视化

  • 动态配置: 尝试在工具中动态即时改进可视化效果

    • edge:USDpoint:USD_MAX上执行add histogram for...

    • 使用它们设置边缘/点着色,并选择“渐变(Spectral7 7)”混合,然后切换以反转顺序(从冷到热)。

    • 对于point:USD_MAX,切换它以控制点大小,并在Scene settings中,增加点大小滑块

[ ]:
g = g\
  .bind(point_title='Account')\
  .bind(edge_title='USD')

g.plot()

2. 查找所有与贪污的DEA特工Carl Force相关的1-5跳交易#

2a. 下游#

卡尔的大部分钱去了哪里?* 尝试在edge:USD上设置过滤器,以区分小额和大额资金流动。

[ ]:
g.cypher("""
    match (a)-[r:PAYMENT*1..20]->(b)
    where a.Account = $root and ALL(transfer IN r WHERE transfer.USD > $min_amount and transfer.USD < $max_amount )
    return a, r, b
  """,
  {'root': "Carl Force (DEA)",
   'min_amount': 999,
   'max_amount': 99999}).plot()

截图:

卡尔·福斯的比特币账户

2b. 上游#

卡尔从哪里获得了大部分的钱?

[ ]:
g.cypher("""
      match (a)-[r:PAYMENT*1..10]->(b)
      where b.Account=$sink and ALL(transfer IN r WHERE transfer.USD > $min_amount and transfer.USD < $max_amount )
      return r, a, b
    """,
    {'sink': "Carl Force (DEA)",
    'min_amount': 1999,
    'max_amount': 99999}).plot()

截图:

卡尔·福斯从丝绸之路挪用资金

3. 丝绸之路与卡尔·福斯之间的路径#

[ ]:
g.cypher("match (a)-[r:PAYMENT*1..10]->(b) where a.Account=$silk and b.Account=$dea return r, a, b",
         {'dea': "Carl Force (DEA)", "silk": "Ross Ulbricht (SilkRoad)"}).plot()

进一步阅读#