教程：Graphistry中的数据分析#

注册
加载表格
绘图：
- 简单：输入是一个边的列表
- 任意：输入是一个表（超图转换）
高级绘图
进一步阅读

1. 注册#

[101]:

import graphistry

# To specify Graphistry account & server, use:
# graphistry.register(api=3, username='...', password='...', protocol='https', server='hub.graphistry.com')
# For more options, see https://github.com/graphistry/pygraphistry#configure

2. 加载表格#

Graphistry 与数据框架如 Pandas 和 GPU RAPIDS cuDF 无缝协作

[94]:

import pandas as pd

df = pd.read_csv('./data/honeypot.csv')

df.sample(3)

[94]:

	attackerIP	victimIP	victimPort	vulnName	count	time(max)	time(min)
64	178.77.190.33	172.31.14.66	445.0	MS08067 (NetAPI)	6	1.419968e+09	1.419967e+09
7	112.209.78.240	172.31.14.66	445.0	MS08067 (NetAPI)	10	1.414516e+09	1.414514e+09
182	79.140.174.193	172.31.14.66	445.0	MS08067 (NetAPI)	2	1.422062e+09	1.422062e+09

3. 绘图#

A. 简单图表#

建立一组绑定。简单的图形是：
- 必填：边表，包含src+dst ID列，以及可选的附加属性列
- 可选：节点表，带有匹配的节点ID列
请参阅UI指南了解工具内活动

演示图模式:

输入表: 上面的警报 df 包含列 | attackerIP | victimIP |
边: 链接 df 的列 attackerIP -> victimIP
节点: 未指定; Graphistry 默认基于边生成
节点颜色: Graphistry 默认推断社区
节点大小: Graphistry 默认使用边的数量（“度”）

[16]:

g = graphistry.edges(df, 'attackerIP', 'victimIP')

[17]:

g.plot()

[17]:

B. 超图 – 绘制任意表格#

超图变换是一种将表格转换为图的便捷方法：

它从表中提取实体并将它们链接在一起
当实体来自同一行时，它们会被链接在一起

方法1：将每一行视为一个节点，并将其链接到其中的每个单元格值#

演示图模式: * 边: 行 -> 攻击者IP, 行 -> 受害者IP, 行 -> 受害者端口, 行 -> 漏洞名称 * 节点: 行, 攻击者IP, 受害者IP, 受害者端口, 漏洞名称 * 节点颜色: 基于推断的社区自动生成 * 节点大小: 边的数量

[93]:

hg1 = graphistry.hypergraph(
    df,

    # Optional: Subset of columns to turn into nodes; defaults to all
    entity_types=['attackerIP', 'victimIP', 'victimPort', 'vulnName'],

    # Optional: merge nodes when their IDs appear in multiple columns
    # ... so replace nodes attackerIP::1.1.1.1 and victimIP::1.1.1.1
    # ... with just one node ip::1.1.1.1
    opts={
        'CATEGORIES': {
            'ip': ['attackerIP', 'victimIP']
        }
    })

hg1_g = hg1['graph']
hg1_g.plot()

# links 880
# events 220
# attrib entities 221

[93]:

方法2：从列条目链接值#

为了更高级的超图控制，我们可以跳过行节点，并通过启用direct来控制生成哪些边。

演示图模式：* 边：* attackerIP -> victimIP, attackerIP -> victimPort, attackerIP -> vulnName * victimPort -> victimIP * vulnName -> victimIP * 节点：attackerIP, victimIP, victimPort, vulnName * 默认颜色：基于推断的社区自动设置 * 默认节点大小：边的数量

[102]:

hg2 = graphistry.hypergraph(
    df,
    entity_types=['attackerIP', 'victimIP', 'victimPort', 'vulnName'],
    direct=True,
    opts={
        # Optional: Without, creates edges that are all-to-all for each row
        'EDGES': {
            'attackerIP': ['victimIP', 'victimPort', 'vulnName'],
            'victimPort': ['victimIP'],
            'vulnName': ['victimIP']
        },

        # Optional: merge nodes when their IDs appear in multiple columns
        # ... so replace nodes attackerIP::1.1.1.1 and victimIP::1.1.1.1
        # ... with just one node ip::1.1.1.1
        'CATEGORIES': {
            'ip': ['attackerIP', 'victimIP']
        }
    })

hg2_g = hg2['graph']
hg2_g.plot()

# links 1100
# events 220
# attrib entities 221

[102]:

3. 高级绘图#

然后，您可以根据节点和边的属性来驱动视觉样式

这个演示首先计算一个节点表。默认情况下，您不需要显式提供节点表，但这样您可能会缺少节点属性的数据：

常规推断的图节点将仅具有id和度
超图的边和行节点将有许多属性，但超图实体节点将只有id、类型/类别和度数

演示模式：

节点表: | node_id | type | attacks |
点大小: 攻击次数
点图标和颜色: 攻击者 vs 受害者
边缘颜色: 基于首次攻击

[62]:

# Cell:
# Compute nodes_df by combining entities in attackerIP and victimIP
# As part of this, compute attack counts for each node

targets_df = (
    df
    [['victimIP']]
    .drop_duplicates()
    .rename(columns={'victimIP': 'node_id'})
    .assign(type='victim')
)

attackers_df = (
    df
    .groupby(['attackerIP'])
    .agg(attacks=pd.NamedAgg(column="attackerIP", aggfunc="count"))
    .reset_index()
    .rename(columns={'attackerIP': 'node_id'}).assign(type='attacker')
)

nodes_df = pd.concat([targets_df, attackers_df])

nodes_df.sort_values(by='attacks', ascending=False)[:5]

[62]:

	node_id	type	attacks
31	125.64.35.67	attacker	6.0
32	125.64.35.68	attacker	4.0
95	198.204.253.101	attacker	2.0
78	188.225.73.153	attacker	2.0
79	188.44.107.239	attacker	2.0

[86]:

# Cell:
# Add


# New encodings features requires api=3: `graphistry.register(api=3, username='...', password='...')

g2 = (g
      .nodes(nodes_df, 'node_id')

      # 'red', '#f00', '#ff0000'
      .encode_point_color('type', categorical_mapping={
          'attacker': 'red',
          'victim': 'white'
      }, default_mapping='gray')

      # Icons: https://fontawesome.com/v4.7/cheatsheet/
      .encode_point_icon('type', categorical_mapping={
          'attacker': 'bomb',
          'victim': 'laptop'
      })

      # Gradient
      .encode_edge_color('time(min)', palette=['blue', 'purple', 'red'], as_continuous=True)

      .encode_point_size('attacks')

      .addStyle(bg={'color': '#eee'}, page={'title': 'My Graph'})

      # Options: https://hub.graphistry.com/docs/api/1/rest/url/
      .settings(url_params={'play': 1000, 'pointSize': 0.5})
)

g2.plot(as_files=False)

[86]:

高级绑定也适用于超图#

超图在节点和边上预先计算了许多值，我们可以利用这些值来驱动更清晰的可视化

[104]:

hg2_g._nodes.sample(3)

[104]:

	attackerIP	nodeTitle	type	category	nodeID	victimIP	victimPort	vulnName	EventID
159	77.52.11.94	77.52.11.94	attackerIP	ip	ip::77.52.11.94	NaN	NaN	NaN	NaN
162	78.187.242.78	78.187.242.78	attackerIP	ip	ip::78.187.242.78	NaN	NaN	NaN	NaN
170	81.47.128.144	81.47.128.144	attackerIP	ip	ip::81.47.128.144	NaN	NaN	NaN	NaN

[103]:

hg2_g._edges.sample(3)

[103]:

	edgeType	category	vulnName	dst	time(max)	time(min)	src	victimPort	victimIP	EventID	count	attackerIP
516	ip::vulnName	attackerIP::vulnName	MS08067 (NetAPI)	vulnName::MS08067 (NetAPI)	1.416885e+09	1.416881e+09	ip::186.149.87.94	445.0	172.31.14.66	EventID::76	3	186.149.87.94
932	vulnName::ip	vulnName::victimIP	MS08067 (NetAPI)	ip::172.31.14.66	1.423515e+09	1.423515e+09	vulnName::MS08067 (NetAPI)	445.0	172.31.14.66	EventID::52	1	176.119.227.9
983	vulnName::ip	vulnName::victimIP	MS08067 (NetAPI)	ip::172.31.14.66	1.423932e+09	1.423932e+09	vulnName::MS08067 (NetAPI)	445.0	172.31.14.66	EventID::103	2	192.110.160.227

[113]:

(hg2_g

 .encode_point_color('type', categorical_mapping={
     'attackerIP': 'yellow',
     'victimIP': 'blue'
 }, default_mapping='gray')

 .encode_point_icon('type', categorical_mapping={
      'attackerIP': 'bomb',
      'victimIP': 'laptop'
 }, default_mapping='')

 .encode_edge_color('time(min)', palette=['blue', 'purple', 'red'], as_continuous=True)

 .settings(url_params={'pointsOfInterestMax': 10})

).plot()

[113]:

教程：Graphistry 中的数据分析

目录