教程:Graphistry中的数据分析#
注册
加载表格
绘图:
简单:输入是一个边的列表
任意:输入是一个表(超图 转换)
高级绘图
进一步阅读
1. 注册#
[101]:
import graphistry
# To specify Graphistry account & server, use:
# graphistry.register(api=3, username='...', password='...', protocol='https', server='hub.graphistry.com')
# For more options, see https://github.com/graphistry/pygraphistry#configure
2. 加载表格#
Graphistry 与数据框架如 Pandas 和 GPU RAPIDS cuDF 无缝协作
[94]:
import pandas as pd
df = pd.read_csv('./data/honeypot.csv')
df.sample(3)
[94]:
| attackerIP | victimIP | victimPort | vulnName | count | time(max) | time(min) | |
|---|---|---|---|---|---|---|---|
| 64 | 178.77.190.33 | 172.31.14.66 | 445.0 | MS08067 (NetAPI) | 6 | 1.419968e+09 | 1.419967e+09 |
| 7 | 112.209.78.240 | 172.31.14.66 | 445.0 | MS08067 (NetAPI) | 10 | 1.414516e+09 | 1.414514e+09 |
| 182 | 79.140.174.193 | 172.31.14.66 | 445.0 | MS08067 (NetAPI) | 2 | 1.422062e+09 | 1.422062e+09 |
3. 绘图#
A. 简单图表#
建立一组绑定。简单的图形是:
必填:边表,包含src+dst ID列,以及可选的附加属性列
可选:节点表,带有匹配的节点ID列
请参阅UI指南了解工具内活动
演示图模式:
输入表: 上面的警报
df包含列| attackerIP | victimIP |边: 链接
df的列attackerIP -> victimIP节点: 未指定; Graphistry 默认基于边生成
节点颜色: Graphistry 默认推断社区
节点大小: Graphistry 默认使用边的数量(“度”)
[16]:
g = graphistry.edges(df, 'attackerIP', 'victimIP')
[17]:
g.plot()
[17]:
B. 超图 – 绘制任意表格#
超图变换是一种将表格转换为图的便捷方法:
它从表中提取实体并将它们链接在一起
当实体来自同一行时,它们会被链接在一起
方法1:将每一行视为一个节点,并将其链接到其中的每个单元格值#
演示图模式: * 边: 行 -> 攻击者IP, 行 -> 受害者IP, 行 -> 受害者端口, 行 -> 漏洞名称 * 节点: 行, 攻击者IP, 受害者IP, 受害者端口, 漏洞名称 * 节点颜色: 基于推断的社区自动生成 * 节点大小: 边的数量
[93]:
hg1 = graphistry.hypergraph(
df,
# Optional: Subset of columns to turn into nodes; defaults to all
entity_types=['attackerIP', 'victimIP', 'victimPort', 'vulnName'],
# Optional: merge nodes when their IDs appear in multiple columns
# ... so replace nodes attackerIP::1.1.1.1 and victimIP::1.1.1.1
# ... with just one node ip::1.1.1.1
opts={
'CATEGORIES': {
'ip': ['attackerIP', 'victimIP']
}
})
hg1_g = hg1['graph']
hg1_g.plot()
# links 880
# events 220
# attrib entities 221
[93]:
方法2:从列条目链接值#
为了更高级的超图控制,我们可以跳过行节点,并通过启用direct来控制生成哪些边。
演示图模式:* 边:* attackerIP -> victimIP, attackerIP -> victimPort, attackerIP -> vulnName * victimPort -> victimIP * vulnName -> victimIP * 节点:attackerIP, victimIP, victimPort, vulnName * 默认颜色:基于推断的社区自动设置 * 默认节点大小:边的数量
[102]:
hg2 = graphistry.hypergraph(
df,
entity_types=['attackerIP', 'victimIP', 'victimPort', 'vulnName'],
direct=True,
opts={
# Optional: Without, creates edges that are all-to-all for each row
'EDGES': {
'attackerIP': ['victimIP', 'victimPort', 'vulnName'],
'victimPort': ['victimIP'],
'vulnName': ['victimIP']
},
# Optional: merge nodes when their IDs appear in multiple columns
# ... so replace nodes attackerIP::1.1.1.1 and victimIP::1.1.1.1
# ... with just one node ip::1.1.1.1
'CATEGORIES': {
'ip': ['attackerIP', 'victimIP']
}
})
hg2_g = hg2['graph']
hg2_g.plot()
# links 1100
# events 220
# attrib entities 221
[102]:
3. 高级绘图#
然后,您可以根据节点和边的属性来驱动视觉样式
这个演示首先计算一个节点表。默认情况下,您不需要显式提供节点表,但这样您可能会缺少节点属性的数据:
常规推断的图节点将仅具有id和度
超图的边和行节点将有许多属性,但超图实体节点将只有id、类型/类别和度数
演示模式:
节点表:
| node_id | type | attacks |点大小: 攻击次数
点图标和颜色: 攻击者 vs 受害者
边缘颜色: 基于首次攻击
[62]:
# Cell:
# Compute nodes_df by combining entities in attackerIP and victimIP
# As part of this, compute attack counts for each node
targets_df = (
df
[['victimIP']]
.drop_duplicates()
.rename(columns={'victimIP': 'node_id'})
.assign(type='victim')
)
attackers_df = (
df
.groupby(['attackerIP'])
.agg(attacks=pd.NamedAgg(column="attackerIP", aggfunc="count"))
.reset_index()
.rename(columns={'attackerIP': 'node_id'}).assign(type='attacker')
)
nodes_df = pd.concat([targets_df, attackers_df])
nodes_df.sort_values(by='attacks', ascending=False)[:5]
[62]:
| node_id | type | attacks | |
|---|---|---|---|
| 31 | 125.64.35.67 | attacker | 6.0 |
| 32 | 125.64.35.68 | attacker | 4.0 |
| 95 | 198.204.253.101 | attacker | 2.0 |
| 78 | 188.225.73.153 | attacker | 2.0 |
| 79 | 188.44.107.239 | attacker | 2.0 |
[86]:
# Cell:
# Add
# New encodings features requires api=3: `graphistry.register(api=3, username='...', password='...')
g2 = (g
.nodes(nodes_df, 'node_id')
# 'red', '#f00', '#ff0000'
.encode_point_color('type', categorical_mapping={
'attacker': 'red',
'victim': 'white'
}, default_mapping='gray')
# Icons: https://fontawesome.com/v4.7/cheatsheet/
.encode_point_icon('type', categorical_mapping={
'attacker': 'bomb',
'victim': 'laptop'
})
# Gradient
.encode_edge_color('time(min)', palette=['blue', 'purple', 'red'], as_continuous=True)
.encode_point_size('attacks')
.addStyle(bg={'color': '#eee'}, page={'title': 'My Graph'})
# Options: https://hub.graphistry.com/docs/api/1/rest/url/
.settings(url_params={'play': 1000, 'pointSize': 0.5})
)
g2.plot(as_files=False)
[86]:
高级绑定也适用于超图#
超图在节点和边上预先计算了许多值,我们可以利用这些值来驱动更清晰的可视化
[104]:
hg2_g._nodes.sample(3)
[104]:
| attackerIP | nodeTitle | type | category | nodeID | victimIP | victimPort | vulnName | EventID | |
|---|---|---|---|---|---|---|---|---|---|
| 159 | 77.52.11.94 | 77.52.11.94 | attackerIP | ip | ip::77.52.11.94 | NaN | NaN | NaN | NaN |
| 162 | 78.187.242.78 | 78.187.242.78 | attackerIP | ip | ip::78.187.242.78 | NaN | NaN | NaN | NaN |
| 170 | 81.47.128.144 | 81.47.128.144 | attackerIP | ip | ip::81.47.128.144 | NaN | NaN | NaN | NaN |
[103]:
hg2_g._edges.sample(3)
[103]:
| edgeType | category | vulnName | dst | time(max) | time(min) | src | victimPort | victimIP | EventID | count | attackerIP | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 516 | ip::vulnName | attackerIP::vulnName | MS08067 (NetAPI) | vulnName::MS08067 (NetAPI) | 1.416885e+09 | 1.416881e+09 | ip::186.149.87.94 | 445.0 | 172.31.14.66 | EventID::76 | 3 | 186.149.87.94 |
| 932 | vulnName::ip | vulnName::victimIP | MS08067 (NetAPI) | ip::172.31.14.66 | 1.423515e+09 | 1.423515e+09 | vulnName::MS08067 (NetAPI) | 445.0 | 172.31.14.66 | EventID::52 | 1 | 176.119.227.9 |
| 983 | vulnName::ip | vulnName::victimIP | MS08067 (NetAPI) | ip::172.31.14.66 | 1.423932e+09 | 1.423932e+09 | vulnName::MS08067 (NetAPI) | 445.0 | 172.31.14.66 | EventID::103 | 2 | 192.110.160.227 |
[113]:
(hg2_g
.encode_point_color('type', categorical_mapping={
'attackerIP': 'yellow',
'victimIP': 'blue'
}, default_mapping='gray')
.encode_point_icon('type', categorical_mapping={
'attackerIP': 'bomb',
'victimIP': 'laptop'
}, default_mapping='')
.encode_edge_color('time(min)', palette=['blue', 'purple', 'red'], as_continuous=True)
.settings(url_params={'pointsOfInterestMax': 10})
).plot()
[113]: