Splunk<> Graphistry

目录

Splunk<> Graphistry#

Graphistry 为 Splunk 中的事件数据带来了现代可视化分析。完整平台面向企业团队，而本教程则为研究人员和猎手分享了可视化技术。

使用方法：* 阅读并跟随，通过点击启动预构建的可视化 * 插入您的Graphistry API密钥和Splunk凭据以供自己使用

进一步阅读：* UI指南：https://hub.graphistry.com/docs/ui/index/ * Python客户端教程和演示：graphistry/pygraphistry * Graphistry API密钥：https://www.graphistry.com/api-request * DoD / VAST挑战：https://www.cs.umd.edu/hcil/varepository/benchmarks.php

0. 配置#

[ ]:

#splunk
SPLUNK = {
    'host': 'MY.SPLUNK.com',
    'scheme': 'https',
    'port': 8089,
    'username': 'MY_SPLUNK_USER',
    'password': 'MY_SPLUNK_PWD'
}

1. 导入#

[ ]:

import pandas as pd

Graphistry#

[ ]:

!pip install graphistry

import graphistry
graphistry.__version__

# To specify Graphistry account & server, use:
# graphistry.register(api=3, username='...', password='...', protocol='https', server='hub.graphistry.com')
# For more options, see https://github.com/graphistry/pygraphistry#configure

Requirement already satisfied: graphistry in /usr/local/lib/python2.7/dist-packages (0.9.56)
Requirement already satisfied: pandas>=0.17.0 in /usr/local/lib/python2.7/dist-packages (from graphistry) (0.22.0)
Requirement already satisfied: numpy in /usr/local/lib/python2.7/dist-packages (from graphistry) (1.14.6)
Requirement already satisfied: requests in /usr/local/lib/python2.7/dist-packages (from graphistry) (2.18.4)
Requirement already satisfied: future>=0.15.0 in /usr/local/lib/python2.7/dist-packages (from graphistry) (0.16.0)
Requirement already satisfied: protobuf>=2.6.0 in /usr/local/lib/python2.7/dist-packages (from graphistry) (3.6.1)
Requirement already satisfied: pytz>=2011k in /usr/local/lib/python2.7/dist-packages (from pandas>=0.17.0->graphistry) (2018.5)
Requirement already satisfied: python-dateutil in /usr/local/lib/python2.7/dist-packages (from pandas>=0.17.0->graphistry) (2.5.3)
Requirement already satisfied: idna<2.7,>=2.5 in /usr/local/lib/python2.7/dist-packages (from requests->graphistry) (2.6)
Requirement already satisfied: urllib3<1.23,>=1.21.1 in /usr/local/lib/python2.7/dist-packages (from requests->graphistry) (1.22)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python2.7/dist-packages (from requests->graphistry) (2018.8.24)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /usr/local/lib/python2.7/dist-packages (from requests->graphistry) (3.0.4)
Requirement already satisfied: six>=1.9 in /usr/local/lib/python2.7/dist-packages (from protobuf>=2.6.0->graphistry) (1.11.0)
Requirement already satisfied: setuptools in /usr/local/lib/python2.7/dist-packages (from protobuf>=2.6.0->graphistry) (39.1.0)

u'0.9.56'

Splunk#

[ ]:

# !pip install splunk-sdk

import splunklib

[ ]:

#Connect to Splunk. Replace settings with your own setup.
import splunklib.client as client
import splunklib.results as results

service = client.connect(**SPLUNK)

[ ]:

def extend(o, override):
  for k in override.keys():
    o[k] = override[k]
  return o

STEP = 10000;
def splunkToPandas(qry, overrides={}):
    kwargs_blockingsearch = extend({
        "count": 0,
        "earliest_time": "2010-01-24T07:20:38.000-05:00",
        "latest_time": "now",
        "search_mode": "normal",
        "exec_mode": "blocking"
    }, overrides)
    job = service.jobs.create(qry, **kwargs_blockingsearch)

    print "Search results:\n"
    resultCount = job["resultCount"]
    offset = 0;

    print 'results', resultCount
    out = None
    while (offset < int(resultCount)):
        print "fetching:", offset, '-', offset + STEP
        kwargs_paginate = extend(kwargs_blockingsearch,
                                 {"count": STEP,
                                  "offset": offset})

        # Get the search results and display them
        blocksearch_results = job.results(**kwargs_paginate)
        reader = results.ResultsReader(blocksearch_results)
        lst = [x for x in reader]
        df2 = pd.DataFrame(lst)
        out = df2 if type(out) == type(None) else pd.concat([out, df2], ignore_index=True)
        offset += STEP
    return out

2. 获取数据#

[ ]:

query = 'search index="vast" srcip=* destip=* | rename destip -> dest_ip, srcip -> src_ip | fields dest_ip _time src_ip protocol | eval time=_time | fields - _* '
%time df = splunkToPandas(query, {"sample_ratio": 1000})

#df = splunkToPandasAll('search index="vast" | head 10')
#df = pd.concat([ splunkToPandas('search index="vast" | head 10'), splunkToPandas('search index="vast" | head 10') ], ignore_index=True)


print 'results', len(df)

df.sample(5)

Search results:

results 5035
fetching: 0 - 10000
CPU times: user 4.95 s, sys: 13.3 ms, total: 4.96 s
Wall time: 7.92 s
results 5035

	dest_ip	src_ip	protocol	time
4324	10.138.235.111	172.30.0.4	TCP	1505519752
2806	10.0.3.5	10.12.15.152	TCP	1505519767
2630	10.0.4.5	10.12.15.152	TCP	1505519769
20	10.0.4.7	10.6.6.7	TCP	1505519795
866	10.0.2.8	10.17.15.10	TCP	1505519787

3. 可视化！#

A) 简单IP<>IP: 1326个节点, 253K条边#

[ ]:

graphistry.bind(source='src_ip', destination='dest_ip').edges(df).plot()

B) IP<>IP + srcip<>protocol: 1328 个节点, 506K 条边#

[ ]:

def make_edges(df, src, dst):
  out = df.copy()
  out['src'] = df[src]
  out['dst'] = df[dst]
  return out



ip2ip = make_edges(df, 'src_ip', 'dest_ip')
srcip2protocol = make_edges(df, 'src_ip', 'protocol')

combined = pd.concat([ip2ip, srcip2protocol], ignore_index=True)
combined.sample(6)

	dest_ip	src_ip	protocol	time	src	dst
6889	10.0.3.5	10.13.77.49	TCP	1505519777	10.13.77.49	TCP
3440	10.0.2.6	10.12.15.152	TCP	1505519761	10.12.15.152	10.0.2.6
6396	10.0.4.5	10.138.235.111	TCP	1505519782	10.138.235.111	TCP
1394	10.0.4.5	10.138.235.111	TCP	1505519782	10.138.235.111	10.0.4.5
5975	10.0.2.7	10.17.15.10	TCP	1505519786	10.17.15.10	TCP
8683	10.0.2.4	10.12.15.152	TCP	1505519759	10.12.15.152	TCP

[ ]:

graphistry.bind(source='src', destination='dst').edges(combined).plot()

3. 通过超图实现的All<>All：254K个节点，760K条边#

[ ]:

hg = graphistry.hypergraph(df, entity_types=[ 'src_ip', 'dest_ip', 'protocol'] )
print hg.keys()
hg['graph'].plot()

('# links', 15105)
('# event entities', 5035)
('# attrib entities', 170)
['entities', 'nodes', 'edges', 'events', 'graph']

[ ]:

节点颜色#

[ ]:

nodes = pd.concat([
    df[['src_ip']].rename(columns={'src_ip': 'id'}).assign(orig_col='src_ip'),
    df[['dest_ip']].rename(columns={'dest_ip': 'id'}).assign(orig_col='dest_ip') ],
    ignore_index=True).drop_duplicates(['id'])

#see https://hub.graphistry.com/docs/api/api-color-palettes/
col2color = {
    "src_ip": 90005,
    "dest_ip": 46005
}

nodes_with_color = nodes.assign(color=nodes.apply(lambda row: col2color[ row['orig_col'] ], axis=1))

nodes_with_color.sample(3)

	id	orig_col	color
4383	172.30.0.3	src_ip	90005
9403	10.0.0.42	dest_ip	46005
4206	172.30.0.4	src_ip	90005

[ ]:

graphistry.bind(source='src_ip', destination='dest_ip').edges(df).nodes(nodes_with_color).bind(node='id', point_color='color').plot()

[ ]: