Splunk<> Graphistry#

Graphistry 为 Splunk 中的事件数据带来了现代可视化分析。完整平台面向企业团队,而本教程则为研究人员和猎手分享了可视化技术。

使用方法:* 阅读并跟随,通过点击启动预构建的可视化 * 插入您的Graphistry API密钥和Splunk凭据以供自己使用

进一步阅读:* UI指南:https://hub.graphistry.com/docs/ui/index/ * Python客户端教程和演示:graphistry/pygraphistry * Graphistry API密钥:https://www.graphistry.com/api-request * DoD / VAST挑战:https://www.cs.umd.edu/hcil/varepository/benchmarks.php

0. 配置#

[ ]:

#splunk SPLUNK = { 'host': 'MY.SPLUNK.com', 'scheme': 'https', 'port': 8089, 'username': 'MY_SPLUNK_USER', 'password': 'MY_SPLUNK_PWD' }

1. 导入#

[ ]:
import pandas as pd

Graphistry#

[ ]:
!pip install graphistry

import graphistry
graphistry.__version__

# To specify Graphistry account & server, use:
# graphistry.register(api=3, username='...', password='...', protocol='https', server='hub.graphistry.com')
# For more options, see https://github.com/graphistry/pygraphistry#configure
Requirement already satisfied: graphistry in /usr/local/lib/python2.7/dist-packages (0.9.56)
Requirement already satisfied: pandas>=0.17.0 in /usr/local/lib/python2.7/dist-packages (from graphistry) (0.22.0)
Requirement already satisfied: numpy in /usr/local/lib/python2.7/dist-packages (from graphistry) (1.14.6)
Requirement already satisfied: requests in /usr/local/lib/python2.7/dist-packages (from graphistry) (2.18.4)
Requirement already satisfied: future>=0.15.0 in /usr/local/lib/python2.7/dist-packages (from graphistry) (0.16.0)
Requirement already satisfied: protobuf>=2.6.0 in /usr/local/lib/python2.7/dist-packages (from graphistry) (3.6.1)
Requirement already satisfied: pytz>=2011k in /usr/local/lib/python2.7/dist-packages (from pandas>=0.17.0->graphistry) (2018.5)
Requirement already satisfied: python-dateutil in /usr/local/lib/python2.7/dist-packages (from pandas>=0.17.0->graphistry) (2.5.3)
Requirement already satisfied: idna<2.7,>=2.5 in /usr/local/lib/python2.7/dist-packages (from requests->graphistry) (2.6)
Requirement already satisfied: urllib3<1.23,>=1.21.1 in /usr/local/lib/python2.7/dist-packages (from requests->graphistry) (1.22)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python2.7/dist-packages (from requests->graphistry) (2018.8.24)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /usr/local/lib/python2.7/dist-packages (from requests->graphistry) (3.0.4)
Requirement already satisfied: six>=1.9 in /usr/local/lib/python2.7/dist-packages (from protobuf>=2.6.0->graphistry) (1.11.0)
Requirement already satisfied: setuptools in /usr/local/lib/python2.7/dist-packages (from protobuf>=2.6.0->graphistry) (39.1.0)
u'0.9.56'

Splunk#

[ ]:
# !pip install splunk-sdk

import splunklib
[ ]:
#Connect to Splunk. Replace settings with your own setup.
import splunklib.client as client
import splunklib.results as results

service = client.connect(**SPLUNK)
[ ]:
def extend(o, override):
  for k in override.keys():
    o[k] = override[k]
  return o

STEP = 10000;
def splunkToPandas(qry, overrides={}):
    kwargs_blockingsearch = extend({
        "count": 0,
        "earliest_time": "2010-01-24T07:20:38.000-05:00",
        "latest_time": "now",
        "search_mode": "normal",
        "exec_mode": "blocking"
    }, overrides)
    job = service.jobs.create(qry, **kwargs_blockingsearch)

    print "Search results:\n"
    resultCount = job["resultCount"]
    offset = 0;

    print 'results', resultCount
    out = None
    while (offset < int(resultCount)):
        print "fetching:", offset, '-', offset + STEP
        kwargs_paginate = extend(kwargs_blockingsearch,
                                 {"count": STEP,
                                  "offset": offset})

        # Get the search results and display them
        blocksearch_results = job.results(**kwargs_paginate)
        reader = results.ResultsReader(blocksearch_results)
        lst = [x for x in reader]
        df2 = pd.DataFrame(lst)
        out = df2 if type(out) == type(None) else pd.concat([out, df2], ignore_index=True)
        offset += STEP
    return out

2. 获取数据#

[ ]:
query = 'search index="vast" srcip=* destip=* | rename destip -> dest_ip, srcip -> src_ip | fields dest_ip _time src_ip protocol | eval time=_time | fields - _* '
%time df = splunkToPandas(query, {"sample_ratio": 1000})

#df = splunkToPandasAll('search index="vast" | head 10')
#df = pd.concat([ splunkToPandas('search index="vast" | head 10'), splunkToPandas('search index="vast" | head 10') ], ignore_index=True)


print 'results', len(df)

df.sample(5)
Search results:

results 5035
fetching: 0 - 10000
CPU times: user 4.95 s, sys: 13.3 ms, total: 4.96 s
Wall time: 7.92 s
results 5035
dest_ip src_ip protocol time
4324 10.138.235.111 172.30.0.4 TCP 1505519752
2806 10.0.3.5 10.12.15.152 TCP 1505519767
2630 10.0.4.5 10.12.15.152 TCP 1505519769
20 10.0.4.7 10.6.6.7 TCP 1505519795
866 10.0.2.8 10.17.15.10 TCP 1505519787

3. 可视化!#

A) 简单IP<>IP: 1326个节点, 253K条边#

[ ]:
graphistry.bind(source='src_ip', destination='dest_ip').edges(df).plot()

B) IP<>IP + srcip<>protocol: 1328 个节点, 506K 条边#

[ ]:
def make_edges(df, src, dst):
  out = df.copy()
  out['src'] = df[src]
  out['dst'] = df[dst]
  return out



ip2ip = make_edges(df, 'src_ip', 'dest_ip')
srcip2protocol = make_edges(df, 'src_ip', 'protocol')

combined = pd.concat([ip2ip, srcip2protocol], ignore_index=True)
combined.sample(6)
dest_ip src_ip protocol time src dst
6889 10.0.3.5 10.13.77.49 TCP 1505519777 10.13.77.49 TCP
3440 10.0.2.6 10.12.15.152 TCP 1505519761 10.12.15.152 10.0.2.6
6396 10.0.4.5 10.138.235.111 TCP 1505519782 10.138.235.111 TCP
1394 10.0.4.5 10.138.235.111 TCP 1505519782 10.138.235.111 10.0.4.5
5975 10.0.2.7 10.17.15.10 TCP 1505519786 10.17.15.10 TCP
8683 10.0.2.4 10.12.15.152 TCP 1505519759 10.12.15.152 TCP
[ ]:
graphistry.bind(source='src', destination='dst').edges(combined).plot()

3. 通过超图实现的All<>All:254K个节点,760K条边#

[ ]:
hg = graphistry.hypergraph(df, entity_types=[ 'src_ip', 'dest_ip', 'protocol'] )
print hg.keys()
hg['graph'].plot()
('# links', 15105)
('# event entities', 5035)
('# attrib entities', 170)
['entities', 'nodes', 'edges', 'events', 'graph']
[ ]:

节点颜色#

[ ]:

nodes = pd.concat([ df[['src_ip']].rename(columns={'src_ip': 'id'}).assign(orig_col='src_ip'), df[['dest_ip']].rename(columns={'dest_ip': 'id'}).assign(orig_col='dest_ip') ], ignore_index=True).drop_duplicates(['id']) #see https://hub.graphistry.com/docs/api/api-color-palettes/ col2color = { "src_ip": 90005, "dest_ip": 46005 } nodes_with_color = nodes.assign(color=nodes.apply(lambda row: col2color[ row['orig_col'] ], axis=1)) nodes_with_color.sample(3)
id orig_col color
4383 172.30.0.3 src_ip 90005
9403 10.0.0.42 dest_ip 46005
4206 172.30.0.4 src_ip 90005
[ ]:
graphistry.bind(source='src_ip', destination='dest_ip').edges(df).nodes(nodes_with_color).bind(node='id', point_color='color').plot()
[ ]: