可视化CSV迷你应用

目录

可视化CSV小应用#

Jupyter: 文件 -> 制作副本 Colab: 文件 -> 在Drive中保存副本
通过按下 shift-enter 来运行笔记本单元格
要么逐个编辑并运行顶部单元格，要么编辑并运行底部的独立版本

[1]:

#!pip install graphistry -q

[3]:

import pandas as pd
import graphistry

# To specify Graphistry account & server, use:
# graphistry.register(api=3, username='...', password='...', protocol='https', server='hub.graphistry.com')
# For more options, see https://github.com/graphistry/pygraphistry#configure

1. 上传csv#

通过上传文件或通过URL使用文件。

运行 help(pd.read_csv) 以获取更多选项。

文件上传：Jupyter Notebooks#

如果右上角的圆圈不是绿色，点击 kernel -> reconnect
点击Jupyter徽标进入文件目录 (/tree)
导航到包含您的笔记本的目录页面
按下右上角的upload按钮

文件上传：Google Colab#

按下左侧的右箭头打开左侧边栏
转到Files选项卡
按下 UPLOAD
确保进入 /content

文件上传：URL#

取消注释下面的行并放入实际的数据URL
运行 help(pd.read_csv) 以获取更多选项

[4]:

file_path = './data/honeypot.csv'
df = pd.read_csv(file_path)

print('# rows', len(df))
df.sample(min(len(df), 3))

('# rows', 220)

[4]:

	attackerIP	victimIP	victimPort	vulnName	count	time(max)	time(min)
145	41.230.211.128	172.31.14.66	445.0	MS08067 (NetAPI)	2	1.421730e+09	1.421729e+09
25	122.121.202.157	172.31.14.66	445.0	MS08067 (NetAPI)	8	1.423612e+09	1.423611e+09
75	182.68.160.230	172.31.14.66	445.0	MS08067 (NetAPI)	9	1.417438e+09	1.417436e+09

2. 可选：清理CSV#

[5]:

df = df.rename(columns={
#    'attackerIP': 'src_ip',
#    'victimIP': 'dest_ip'
})

df.sample(3)

[5]:

	attackerIP	victimIP	victimPort	vulnName	count	time(max)	time(min)
70	182.161.224.84	172.31.14.66	139.0	MS08067 (NetAPI)	4	1.419954e+09	1.419952e+09
10	115.115.227.82	172.31.14.66	445.0	MS08067 (NetAPI)	2	1.413569e+09	1.413569e+09
152	46.130.76.13	172.31.14.66	445.0	MS08067 (NetAPI)	7	1.421093e+09	1.421092e+09

3. 配置：使用3种图表进行可视化#

设置 mode 和相应的值：

模式“A”。从(src,dst)边的表格中查看图表#

模式“B”。参见超图：将行绘制为节点并将其连接到同一行中的实体#

选择哪些列作为节点
如果多个列共享相同的类型（例如，“src_ip”，“dest_ip”都是“ip”），请统一它们

模式“C”。通过创建多个节点，每行边查看#

选择不同的列值如何指向其他列值
如果多个列共享相同的类型（例如，“src_ip”，“dest_ip”都是“ip”），请统一它们

[6]:

#Pick 'A', 'B', or 'C'
mode = 'B'
max_rows = 1000


### 'A' == mode
my_src_col = 'attackerIP'
my_dest_col = 'victimIP'



### 'B' == mode
node_cols = ['attackerIP', 'victimIP', 'vulnName']
categories = { #optional
    'ip': ['attacker_IP', 'victimIP']
    #, 'user': ['owner', 'seller'],
}



### 'C' == mode
edges = {
      'attackerIP': [ 'victimIP', 'victimPort', 'vulnName'],
      'victimIP': [ 'victimPort'],
      'vulnName': [ 'victimIP' ]
}
categories = { #optional
      'ip': ['attackerIP', 'victimIP']
       #, user': ['owner', 'seller'], ...
}

4. 绘图：上传 & 渲染！#

参见 UI guide

[75]:

g = None
hg = None
num_rows = min(max_rows, len(df))
if mode == 'A':
    g = graphistry.edges(df.sample(num_rows)).bind(source=my_src_col, destination=my_dest_col)
elif mode == 'B':
    hg = graphistry.hypergraph(df.sample(num_rows), node_cols, opts={'CATEGORIES': categories})
    g = hg['graph']
elif mode == 'C':
    nodes = list(edges.keys())
    for dests in edges.values():
        for dest in dests:
            nodes.append(dest)
    node_cols = list(set(nodes))
    hg = graphistry.hypergraph(df.sample(num_rows), node_cols, direct=True, opts={'CATEGORIES': categories, 'EDGES': edges})
    g = hg['graph']

#hg
print(len(g._edges))

g.plot()

('# links', 1100)
('# events', 220)
('# attrib entities', 221)
1100

[75]:

替代方案：组合#

将数据加载和清理/配置/绘图分开。

[59]:

#!pip install graphistry -q
import pandas as pd
import graphistry
#graphistry.register(key='MY_KEY', server='hub.graphistry.com')


##########
#1. Load
file_path = './data/honeypot.csv'
df = pd.read_csv(file_path)

print(df.columns)
print('rows:', len(df))
print(df.sample(min(len(df),3)))

Index([u'attackerIP', u'victimIP', u'victimPort', u'vulnName', u'count',
       u'time(max)', u'time(min)'],
      dtype='object')
('rows:', 220)
         attackerIP      victimIP  victimPort             vulnName  count  \
81  187.143.247.231  172.31.14.66       445.0      MS04011 (LSASS)      1
47   151.252.204.92  172.31.14.66       139.0     MS08067 (NetAPI)      1
41     125.64.35.68  172.31.14.66      9999.0  MaxDB Vulnerability      6

       time(max)     time(min)
81  1.420657e+09  1.420657e+09
47  1.422929e+09  1.422929e+09
41  1.420915e+09  1.417479e+09

[79]:

##########
#2. Clean
#df = df.rename(columns={'attackerIP': 'src_ip', 'victimIP: 'dest_ip', 'victimPort': 'protocol'})


##########
#3. Config - Pick 'A', 'B', or 'C'
mode = 'C'
max_rows = 1000


### 'A' == mode
my_src_col = 'attackerIP'
my_dest_col = 'victimIP'

### 'B' == mode
node_cols = ['attackerIP', 'victimIP', 'victimPort', 'vulnName']
categories = { #optional
    'ip': ['src_ip', 'dest_ip']
    #, 'user': ['owner', 'seller'],
}

### 'C' == mode
edges = {
    'attackerIP': [ 'victimIP', 'victimPort', 'vulnName'],
    'victimIP': [ 'victimPort' ],
    'vulnName': ['victimIP' ]
}
categories = { #optional
    'ip': ['attackerIP', 'victimIP']
    #, 'user': ['owner', 'seller'], ...
}

##########
#4. Plot
g = None
hg = None
num_rows = min(max_rows, len(df))
if mode == 'A':
    g = graphistry.edges(df.sample(num_rows)).bind(source=my_src_col, destination=my_dest_col)
elif mode == 'B':
    hg = graphistry.hypergraph(df.sample(num_rows), node_cols, opts={'CATEGORIES': categories})
    g = hg['graph']
elif mode == 'C':
    nodes = list(edges.keys())
    for dests in edges.values():
        for dest in dests:
            nodes.append(dest)
    node_cols = list(set(nodes))
    hg = graphistry.hypergraph(df.sample(num_rows), node_cols, direct=True, opts={'CATEGORIES': categories, 'EDGES': edges})
    g = hg['graph']


g.plot()

('# links', 1100)
('# events', 220)
('# attrib entities', 221)

[79]:

[ ]: