教程:Graphistry中的数据分析#

  1. 注册

  2. 加载表格

  3. 绘图:

    • 简单:输入是一个边的列表

    • 任意:输入是一个表(超图 转换)

  4. 高级绘图

  5. 进一步阅读

1. 注册#

[101]:
import graphistry

# To specify Graphistry account & server, use:
# graphistry.register(api=3, username='...', password='...', protocol='https', server='hub.graphistry.com')
# For more options, see https://github.com/graphistry/pygraphistry#configure

2. 加载表格#

Graphistry 与数据框架如 Pandas 和 GPU RAPIDS cuDF 无缝协作

[94]:
import pandas as pd

df = pd.read_csv('./data/honeypot.csv')

df.sample(3)
[94]:
attackerIP victimIP victimPort vulnName count time(max) time(min)
64 178.77.190.33 172.31.14.66 445.0 MS08067 (NetAPI) 6 1.419968e+09 1.419967e+09
7 112.209.78.240 172.31.14.66 445.0 MS08067 (NetAPI) 10 1.414516e+09 1.414514e+09
182 79.140.174.193 172.31.14.66 445.0 MS08067 (NetAPI) 2 1.422062e+09 1.422062e+09

3. 绘图#

A. 简单图表#

  • 建立一组绑定。简单的图形是:

    • 必填:边表,包含src+dst ID列,以及可选的附加属性列

    • 可选:节点表,带有匹配的节点ID列

  • 请参阅UI指南了解工具内活动

演示图模式:

  • 输入表: 上面的警报 df 包含列 | attackerIP | victimIP |

  • : 链接 df 的列 attackerIP -> victimIP

  • 节点: 未指定; Graphistry 默认基于边生成

  • 节点颜色: Graphistry 默认推断社区

  • 节点大小: Graphistry 默认使用边的数量(“度”)

[16]:
g = graphistry.edges(df, 'attackerIP', 'victimIP')
[17]:
g.plot()
[17]:

B. 超图 – 绘制任意表格#

超图变换是一种将表格转换为图的便捷方法:

  • 它从表中提取实体并将它们链接在一起

  • 当实体来自同一行时,它们会被链接在一起

3. 高级绘图#

然后,您可以根据节点和边的属性来驱动视觉样式

这个演示首先计算一个节点表。默认情况下,您不需要显式提供节点表,但这样您可能会缺少节点属性的数据:

  • 常规推断的图节点将仅具有id和度

  • 超图的边和行节点将有许多属性,但超图实体节点将只有id、类型/类别和度数

演示模式:

  • 节点表: | node_id | type | attacks |

  • 点大小: 攻击次数

  • 点图标和颜色: 攻击者 vs 受害者

  • 边缘颜色: 基于首次攻击

[62]:
# Cell:
# Compute nodes_df by combining entities in attackerIP and victimIP
# As part of this, compute attack counts for each node

targets_df = (
    df
    [['victimIP']]
    .drop_duplicates()
    .rename(columns={'victimIP': 'node_id'})
    .assign(type='victim')
)

attackers_df = (
    df
    .groupby(['attackerIP'])
    .agg(attacks=pd.NamedAgg(column="attackerIP", aggfunc="count"))
    .reset_index()
    .rename(columns={'attackerIP': 'node_id'}).assign(type='attacker')
)

nodes_df = pd.concat([targets_df, attackers_df])

nodes_df.sort_values(by='attacks', ascending=False)[:5]
[62]:
node_id type attacks
31 125.64.35.67 attacker 6.0
32 125.64.35.68 attacker 4.0
95 198.204.253.101 attacker 2.0
78 188.225.73.153 attacker 2.0
79 188.44.107.239 attacker 2.0
[86]:
# Cell:
# Add


# New encodings features requires api=3: `graphistry.register(api=3, username='...', password='...')

g2 = (g
      .nodes(nodes_df, 'node_id')

      # 'red', '#f00', '#ff0000'
      .encode_point_color('type', categorical_mapping={
          'attacker': 'red',
          'victim': 'white'
      }, default_mapping='gray')

      # Icons: https://fontawesome.com/v4.7/cheatsheet/
      .encode_point_icon('type', categorical_mapping={
          'attacker': 'bomb',
          'victim': 'laptop'
      })

      # Gradient
      .encode_edge_color('time(min)', palette=['blue', 'purple', 'red'], as_continuous=True)

      .encode_point_size('attacks')

      .addStyle(bg={'color': '#eee'}, page={'title': 'My Graph'})

      # Options: https://hub.graphistry.com/docs/api/1/rest/url/
      .settings(url_params={'play': 1000, 'pointSize': 0.5})
)

g2.plot(as_files=False)
[86]:

高级绑定也适用于超图#

超图在节点和边上预先计算了许多值,我们可以利用这些值来驱动更清晰的可视化

[104]:
hg2_g._nodes.sample(3)
[104]:
attackerIP nodeTitle type category nodeID victimIP victimPort vulnName EventID
159 77.52.11.94 77.52.11.94 attackerIP ip ip::77.52.11.94 NaN NaN NaN NaN
162 78.187.242.78 78.187.242.78 attackerIP ip ip::78.187.242.78 NaN NaN NaN NaN
170 81.47.128.144 81.47.128.144 attackerIP ip ip::81.47.128.144 NaN NaN NaN NaN
[103]:
hg2_g._edges.sample(3)
[103]:
edgeType category vulnName dst time(max) time(min) src victimPort victimIP EventID count attackerIP
516 ip::vulnName attackerIP::vulnName MS08067 (NetAPI) vulnName::MS08067 (NetAPI) 1.416885e+09 1.416881e+09 ip::186.149.87.94 445.0 172.31.14.66 EventID::76 3 186.149.87.94
932 vulnName::ip vulnName::victimIP MS08067 (NetAPI) ip::172.31.14.66 1.423515e+09 1.423515e+09 vulnName::MS08067 (NetAPI) 445.0 172.31.14.66 EventID::52 1 176.119.227.9
983 vulnName::ip vulnName::victimIP MS08067 (NetAPI) ip::172.31.14.66 1.423932e+09 1.423932e+09 vulnName::MS08067 (NetAPI) 445.0 172.31.14.66 EventID::103 2 192.110.160.227
[113]:
(hg2_g

 .encode_point_color('type', categorical_mapping={
     'attackerIP': 'yellow',
     'victimIP': 'blue'
 }, default_mapping='gray')

 .encode_point_icon('type', categorical_mapping={
      'attackerIP': 'bomb',
      'victimIP': 'laptop'
 }, default_mapping='')

 .encode_edge_color('time(min)', palette=['blue', 'purple', 'red'], as_continuous=True)

 .settings(url_params={'pointsOfInterestMax': 10})

).plot()
[113]:

进一步阅读:#