可视化CSV小应用#
Jupyter:
文件->制作副本Colab:文件->在Drive中保存副本通过按下
shift-enter来运行笔记本单元格要么逐个编辑并运行顶部单元格,要么编辑并运行底部的独立版本
[1]:
#!pip install graphistry -q
[3]:
import pandas as pd
import graphistry
# To specify Graphistry account & server, use:
# graphistry.register(api=3, username='...', password='...', protocol='https', server='hub.graphistry.com')
# For more options, see https://github.com/graphistry/pygraphistry#configure
1. 上传csv#
通过上传文件或通过URL使用文件。
运行 help(pd.read_csv) 以获取更多选项。
文件上传:Jupyter Notebooks#
如果右上角的圆圈不是绿色,点击
kernel->reconnect点击Jupyter徽标进入文件目录 (
/tree)导航到包含您的笔记本的目录页面
按下右上角的
upload按钮
文件上传:Google Colab#
按下左侧的右箭头打开左侧边栏
转到
Files选项卡按下
UPLOAD确保进入
/content
文件上传:URL#
取消注释下面的行并放入实际的数据URL
运行
help(pd.read_csv)以获取更多选项
[4]:
file_path = './data/honeypot.csv'
df = pd.read_csv(file_path)
print('# rows', len(df))
df.sample(min(len(df), 3))
('# rows', 220)
[4]:
| attackerIP | victimIP | victimPort | vulnName | count | time(max) | time(min) | |
|---|---|---|---|---|---|---|---|
| 145 | 41.230.211.128 | 172.31.14.66 | 445.0 | MS08067 (NetAPI) | 2 | 1.421730e+09 | 1.421729e+09 |
| 25 | 122.121.202.157 | 172.31.14.66 | 445.0 | MS08067 (NetAPI) | 8 | 1.423612e+09 | 1.423611e+09 |
| 75 | 182.68.160.230 | 172.31.14.66 | 445.0 | MS08067 (NetAPI) | 9 | 1.417438e+09 | 1.417436e+09 |
2. 可选:清理CSV#
[5]:
df = df.rename(columns={
# 'attackerIP': 'src_ip',
# 'victimIP': 'dest_ip'
})
df.sample(3)
[5]:
| attackerIP | victimIP | victimPort | vulnName | count | time(max) | time(min) | |
|---|---|---|---|---|---|---|---|
| 70 | 182.161.224.84 | 172.31.14.66 | 139.0 | MS08067 (NetAPI) | 4 | 1.419954e+09 | 1.419952e+09 |
| 10 | 115.115.227.82 | 172.31.14.66 | 445.0 | MS08067 (NetAPI) | 2 | 1.413569e+09 | 1.413569e+09 |
| 152 | 46.130.76.13 | 172.31.14.66 | 445.0 | MS08067 (NetAPI) | 7 | 1.421093e+09 | 1.421092e+09 |
3. 配置:使用3种图表进行可视化#
设置 mode 和相应的值:
模式“A”。从(src,dst)边的表格中查看图表#
模式“B”。参见超图:将行绘制为节点并将其连接到同一行中的实体#
选择哪些列作为节点
如果多个列共享相同的类型(例如,“src_ip”,“dest_ip”都是“ip”),请统一它们
模式“C”。通过创建多个节点,每行边查看#
选择不同的列值如何指向其他列值
如果多个列共享相同的类型(例如,“src_ip”,“dest_ip”都是“ip”),请统一它们
[6]:
#Pick 'A', 'B', or 'C'
mode = 'B'
max_rows = 1000
### 'A' == mode
my_src_col = 'attackerIP'
my_dest_col = 'victimIP'
### 'B' == mode
node_cols = ['attackerIP', 'victimIP', 'vulnName']
categories = { #optional
'ip': ['attacker_IP', 'victimIP']
#, 'user': ['owner', 'seller'],
}
### 'C' == mode
edges = {
'attackerIP': [ 'victimIP', 'victimPort', 'vulnName'],
'victimIP': [ 'victimPort'],
'vulnName': [ 'victimIP' ]
}
categories = { #optional
'ip': ['attackerIP', 'victimIP']
#, user': ['owner', 'seller'], ...
}
4. 绘图:上传 & 渲染!#
参见 UI guide
[75]:
g = None
hg = None
num_rows = min(max_rows, len(df))
if mode == 'A':
g = graphistry.edges(df.sample(num_rows)).bind(source=my_src_col, destination=my_dest_col)
elif mode == 'B':
hg = graphistry.hypergraph(df.sample(num_rows), node_cols, opts={'CATEGORIES': categories})
g = hg['graph']
elif mode == 'C':
nodes = list(edges.keys())
for dests in edges.values():
for dest in dests:
nodes.append(dest)
node_cols = list(set(nodes))
hg = graphistry.hypergraph(df.sample(num_rows), node_cols, direct=True, opts={'CATEGORIES': categories, 'EDGES': edges})
g = hg['graph']
#hg
print(len(g._edges))
g.plot()
('# links', 1100)
('# events', 220)
('# attrib entities', 221)
1100
[75]:
替代方案:组合#
将数据加载和清理/配置/绘图分开。
[59]:
#!pip install graphistry -q
import pandas as pd
import graphistry
#graphistry.register(key='MY_KEY', server='hub.graphistry.com')
##########
#1. Load
file_path = './data/honeypot.csv'
df = pd.read_csv(file_path)
print(df.columns)
print('rows:', len(df))
print(df.sample(min(len(df),3)))
Index([u'attackerIP', u'victimIP', u'victimPort', u'vulnName', u'count',
u'time(max)', u'time(min)'],
dtype='object')
('rows:', 220)
attackerIP victimIP victimPort vulnName count \
81 187.143.247.231 172.31.14.66 445.0 MS04011 (LSASS) 1
47 151.252.204.92 172.31.14.66 139.0 MS08067 (NetAPI) 1
41 125.64.35.68 172.31.14.66 9999.0 MaxDB Vulnerability 6
time(max) time(min)
81 1.420657e+09 1.420657e+09
47 1.422929e+09 1.422929e+09
41 1.420915e+09 1.417479e+09
[79]:
##########
#2. Clean
#df = df.rename(columns={'attackerIP': 'src_ip', 'victimIP: 'dest_ip', 'victimPort': 'protocol'})
##########
#3. Config - Pick 'A', 'B', or 'C'
mode = 'C'
max_rows = 1000
### 'A' == mode
my_src_col = 'attackerIP'
my_dest_col = 'victimIP'
### 'B' == mode
node_cols = ['attackerIP', 'victimIP', 'victimPort', 'vulnName']
categories = { #optional
'ip': ['src_ip', 'dest_ip']
#, 'user': ['owner', 'seller'],
}
### 'C' == mode
edges = {
'attackerIP': [ 'victimIP', 'victimPort', 'vulnName'],
'victimIP': [ 'victimPort' ],
'vulnName': ['victimIP' ]
}
categories = { #optional
'ip': ['attackerIP', 'victimIP']
#, 'user': ['owner', 'seller'], ...
}
##########
#4. Plot
g = None
hg = None
num_rows = min(max_rows, len(df))
if mode == 'A':
g = graphistry.edges(df.sample(num_rows)).bind(source=my_src_col, destination=my_dest_col)
elif mode == 'B':
hg = graphistry.hypergraph(df.sample(num_rows), node_cols, opts={'CATEGORIES': categories})
g = hg['graph']
elif mode == 'C':
nodes = list(edges.keys())
for dests in edges.values():
for dest in dests:
nodes.append(dest)
node_cols = list(set(nodes))
hg = graphistry.hypergraph(df.sample(num_rows), node_cols, direct=True, opts={'CATEGORIES': categories, 'EDGES': edges})
g = hg['graph']
g.plot()
('# links', 1100)
('# events', 220)
('# attrib entities', 221)
[79]:
[ ]: