教程:使用NetworkX API进行图算法分析

之前的教程中,我们已经介绍了如何使用NetworkX API操作图数据。本教程将展示如何通过GraphScope执行类似Networkx的图分析。

Networkx如何进行图分析?

通常,NetworkX的图分析流程始于图的构建。

在以下示例中,我们首先创建一个空图,然后通过NetworkX的图操作接口扩展数据。

import networkx
# Initialize an empty graph
G = networkx.Graph()

# Add edges (1, 2)and(1 3) by `add_edges_from` interface
G.add_edges_from([(1, 2), (1, 3)])

# Add vertex "4" by `add_node` interface 
G.add_node(4)

然后我们可以查询图信息。

# Query the number of vertices by `number_of_nodes` interface.
G.number_of_nodes()
# Similarly, query the number of edges by `number_of_edges` interface.
G.number_of_edges()
# Query the degree of each vertex by `degree` interface.
sorted(d for n, d in G.degree())

最后,我们可以调用NetworkX内置的算法来分析图G

# Run 'connected components' algorithm
list(networkx.connected_components(G))

# Run 'clustering' algorithm
networkx.clustering(G)

如何使用GraphScope的NetworkX API进行图分析

要在GraphScope中使用NetworkX API,我们只需将import networkx as nx替换为import graphscope.nx as nx

根据之前的教程,我们首先创建一个图 nx.Graph()

import graphscope
graphscope.set_option(show_log=True)
import graphscope.nx as nx

# Initialize an empty graph
G = nx.Graph()

# Add one vertex by `add_node` interface
G.add_node(1)

# Or add a batch of vertices from iterable list
G.add_nodes_from([2, 3])

# Also you can add attributes while adding vertices
G.add_nodes_from([(4, {"color": "red"}), (5, {"color": "green"})])

# Similarly, add one edge by `add_edge` interface
G.add_edge(1, 2)
e = (2, 3)
G.add_edge(*e)

# Or add a batch of edges from iterable list
G.add_edges_from([(1, 2), (1, 3)])

# Add attributes while adding edges
G.add_edges_from([(1, 2), (2, 3, {'weight': 3.1415})])

图分析

GraphScope中的图分析模块接口也与NetworkX兼容。

基于上述已创建的图,我们使用connected_components分析图的连通分量,使用clustering获取每个顶点的聚类系数,并通过all_pairs_shortest_path计算任意两个顶点之间的最短路径。

# Run connected_components
list(nx.connected_components(G))

# Run clustering
nx.clustering(G)

# Run all_pairs_shortest_path
sp = dict(nx.all_pairs_shortest_path(G))
sp[3]

图形展示

与NetworkX类似,您可以通过draw接口绘制图形,该功能依赖于Matplotlib的绘图功能。

您需要先安装 matplotlib

pip3 install matplotlib

然后您可以使用以下方式绘制图形

nx.draw(G, with_labels=True, font_weight='bold')

GraphScope 相比 NetworkX 的性能加速

让我们通过在Twitter数据集上运行clustering算法,看看GraphScope相比NetworkX能带来多大的性能提升。

如果环境中没有数据集,请下载:

wget https://raw.githubusercontent.com/GraphScope/gstest/master/twitter.e -P /tmp

然后在GraphScope和NetworkX中同时加载数据集。

import os
import graphscope.nx as gs_nx
import networkx as nx

# loading graph in NetworkX
g1 = nx.read_edgelist(
     os.path.expandvars('/tmp/twitter.e'), nodetype=int, data=False, create_using=nx.Graph
)
type(g1)

# Loading graph in GraphScope
g2 = gs_nx.read_edgelist(
     os.path.expandvars('/tmp/twitter.e'), nodetype=int, data=False, create_using=gs_nx.Graph
)
type(g2)

在GraphScope和NetworkX中运行算法并显示时间。

%%time
# GraphScope
ret_gs = gs_nx.clustering(g2)
%%time
# NetworkX
ret_nx = nx.clustering(g1)
# Result comparison
ret_gs == ret_nx