社区事件和生命周期¶

社区事件描述了网络社区结构随时间的变化。网络的社区结构可能会因为节点的到达或离开、社区的创建或解散、或社区的合并或分裂而发生变化。

cdlib 库提供了一套工具来分析社区随时间的演变，包括社区事件的检测和社区生命周期的分析。

该库的接口设计尽可能简单，使用户能够轻松分析其网络中社区的演变。

查看LifeCycle类以获取更多详细信息：

LifeCycle Object

具有明确生命周期的聚类¶

一些动态社区检测算法（例如，时间权衡算法）提供了社区生命周期的明确表示。在这种情况下，不需要将社区事件检测作为后处理，因为社区的生命周期已经可用。

要分析此类预计算的事件，请应用以下代码片段：

from cdlib import LifeCycle
from cdlib import algorithms
import dynetx as dn
import networkx as nx

dg = dn.DynGraph()
for x in range(10):
    g = nx.erdos_renyi_graph(200, 0.05)
    dg.add_interactions_from(list(g.edges()), t=x)

coms = algorithms.tiles(dg, 2)

lc = LifeCycle(coms)
lc.compute_events_from_explicit_matching()

无需显式生命周期的聚类¶

如果动态社区检测算法没有提供社区生命周期的明确表示，该库提供了一套工具来检测社区事件并分析社区的生命周期。特别是，该库允许通过四种不同的策略来识别事件：

Facets 事件定义 [Failla24]
Greene 事件定义 [Greene2010]
Asur 事件定义 [Asur2009]
自定义事件定义

前三种策略基于文献中提出的社区事件定义，而最后一种允许用户定义自己的事件。

要应用前三种策略之一，请使用以下代码片段：

from cdlib import LifeCycle, TemporalClustering, algorithms
from networkx.generators.community import LFR_benchmark_graph

tc = TemporalClustering()
for t in range(0, 10):
    g = LFR_benchmark_graph(
            n=250,
            tau1=3,
            tau2=1.5,
            mu=0.1,
            average_degree=5,
            min_community=20,
            seed=10,
    )
    coms = algorithms.louvain(g)  # here any CDlib algorithm can be applied
    tc.add_clustering(coms, t)

events = LifeCycle(tc)
events.compute_events("facets") # or "greene" or "asur"

注意

每种策略都有其参数，可以通过向compute_events方法传递字典来指定。特别是，facets策略需要指定min_branch_size参数（默认值为1），而greene和asur需要指定threshold参数（默认值为0.1）。

要定义自定义事件，请使用以下代码片段：

from cdlib import LifeCycle, TemporalClustering, algorithms
from networkx.generators.community import LFR_benchmark_graph

tc = TemporalClustering()
for t in range(0, 10):
    g = LFR_benchmark_graph(
            n=250,
            tau1=3,
            tau2=1.5,
            mu=0.1,
            average_degree=5,
            min_community=20,
            seed=10,
    )
    coms = algorithms.louvain(g)  # here any CDlib algorithm can be applied
    tc.add_clustering(coms, t)

events = LifeCycle(tc)
jaccard = lambda x, y: len(set(x) & set(y)) / len(set(x) | set(y))
events.compute_events_with_custom_matching(jaccard, threshold=0.3, two_sided=True)

在上述代码片段中，jaccard 函数用于定义两个社区之间的相似度。 threshold 参数用于定义将两个社区视为彼此演变所需的最小相似度。通过更改相似度函数和阈值，用户可以定义自己的匹配策略。

分析事件和流程¶

一旦检测到社区事件，该库提供了一套工具来分析它们。每个事件都有一组属性来表征，例如事件类型、涉及的社区、涉及的节点以及发生时间。

注意

该库为每个社区分配一个唯一标识符，形式为t_c，其中t是发生时间，c是社区标识符。例如，标识符为2_3的社区是在时间2时标识符为3的社区。

每个跟踪策略定义了一组不同的事件（例如，创建、解散、合并、分裂）。然而，cdlib 将事件的概念泛化，将其分解为四个组成部分。对于每个通用的时间社区 t_c，它提供了访问权限：

流入: 从时间t-1的集群中进入社区t_c的节点集合;
流出: 将在时间 t+1 离开社区 t_c 的节点集合;
来自事件：在时间t观察到的社区生成的事件集，这些事件涉及时间t-1的集群；
到事件：社区 t_c 在时间 t 开始的事件集合，这些事件将影响时间 t+1 的集群；

所有这些信息可以总结在一个称为polytree的社区时间依赖有向图中。

这里是一个如何分析社区事件和流程的示例：

from cdlib import LifeCycle, TemporalClustering, algorithms
from networkx.generators.community import LFR_benchmark_graph

tc = TemporalClustering()
for t in range(0, 10):
    g = LFR_benchmark_graph(
            n=250,
            tau1=3,
            tau2=1.5,
            mu=0.1,
            average_degree=5,
            min_community=20,
            seed=10,
    )
    coms = algorithms.louvain(g)  # here any CDlib algorithm can be applied
    tc.add_clustering(coms, t)

events = LifeCycle(tc)
events.compute_events("facets") # or "greene" or "asur"
event_types = events.get_event_types() # provide the list of available events for the detected method (in this case for 'facets')

ev = events.get_event("1_2") # to compute events for all communities use the get_events() method
print(ev.out_flow)  # to get the out flow of the community 1_2
print(ev.in_flow)  # to get the in flow of the community 1_2
print(ev.from_event)  # to get the from events of the community 1_2
print(ev.to_event)  # to get the to events of the community 1_2

out_flow = events.analyze_flow("1_2", "+") # if the community id is not specified all the communities are considered
in_flow = events.analyze_flow("1_2", "-")

每个事件的特征在于其对社区实际状态的重要性程度。特别是，facets 事件是模糊事件（可以同时发生多个），而 greene 和 asur 事件是清晰事件（只能同时发生一个）。

注意

根据facets术语，analyze_flow和analyze_flows返回一个字典，描述流的唯一性、身份和流出。有关这些措施的详细描述，请参阅[Failla24]

此外，如果时间网络带有与节点相关的属性（无论是动态变化的还是静态的 - 例如政治倾向），该库提供了一组工具来分析事件的典型性。

设置和检索节点属性非常简单：

from cdlib import LifeCycle, TemporalClustering, algorithms
from networkx.generators.community import LFR_benchmark_graph

def random_leaning():
    attrs = {}
    for i in range(250): # 250 nodes
        attrs[i] = {}
        for t in range(10): # 10 time steps
            attrs[i][t] = random.choice(["left", "right"])
    return attrs

tc = TemporalClustering()
for t in range(0, 10):
    g = LFR_benchmark_graph(
            n=250,
            tau1=3,
            tau2=1.5,
            mu=0.1,
            average_degree=5,
            min_community=20,
            seed=10,
    )
    coms = algorithms.louvain(g)  # here any CDlib algorithm can be applied
    tc.add_clustering(coms, t)

events = LifeCycle(tc)
events.compute_events("facets") # or "greene" or "asur"
events.set_attribute(random_leaning(), "political_leaning")
attrs = events.get_attribute("political_leaning")

events.analyze_flow("1_1", "+",  attr="political_leaning") # to analyze the flow of political leaning in the community 1_1

属性存储为字典的字典，其中第一个键是节点ID，第二个键是时间步长。

如果此类信息可用，analyze_flow 方法将在其分析中集成流属性熵的评估。

可视化事件和流程¶

该库提供了一套工具，用于可视化在网络社区结构中检测到的事件和流动。

注意

该库使用networkx库来表示网络的社区结构，并使用matplotlib / plotly库来可视化它。

这里是一个如何可视化社区事件、流程和多叉树的示例：

from cdlib import LifeCycle, TemporalClustering, algorithms
from cdlib.viz import (
    plot_flow,
    plot_event_radar,
    plot_event_radars,
    typicality_distribution,
    )
from networkx.generators.community import LFR_benchmark_graph

tc = TemporalClustering()
for t in range(0, 10):
    g = LFR_benchmark_graph(
            n=250,
            tau1=3,
            tau2=1.5,
            mu=0.1,
            average_degree=5,
            min_community=20,
            seed=10,
    )
    coms = algorithms.louvain(g)  # here any CDlib algorithm can be applied
    tc.add_clustering(coms, t)

events = LifeCycle(tc)
events.compute_events("facets") # or "greene" or "asur"

fig = plot_flow(events)
fig.show()

fig = plot_event_radar(events, "1_2", direction="+") # only out events
fig.show()

fig = plot_event_radars(events, "1_2") # both in and out events
fig.show()

fig = typicality_distribution(events, "+")
fig.show()

dg = events.polytree()
fig = nx.draw_networkx(dg, with_labels=True)
fig.show()

有关可用方法和参数的详细描述，请查看cdlib参考指南中的Visual Analytics部分。

验证流程¶

该库提供了一套工具，用于根据零模型对观察到的流进行统计验证。

这里是一个如何验证观察到的流量的示例：

from cdlib import LifeCycle, TemporalClustering, algorithms
from cdlib.lifecycles.validation import validate_flow, validate_all_flows
from networkx.generators.community import LFR_benchmark_graph

tc = TemporalClustering()
for t in range(0, 10):
    g = LFR_benchmark_graph(
            n=250,
            tau1=3,
            tau2=1.5,
            mu=0.1,
            average_degree=5,
            min_community=20,
            seed=10,
    )
    coms = algorithms.louvain(g)  # here any CDlib algorithm can be applied
    tc.add_clustering(coms, t)

events = LifeCycle(tc)
events.compute_events("facets") # or "greene" or "asur"

cf = events.flow_null("1_2", "+", iterations=1000)  # validate the out flow of community 1_2. Iterations define the number of randomizations to perform.
vf = events.all_flows_null("+", iterations=1000) # validate all out flows

两种验证方法都返回一个字典，字典的键是集合标识符，值是对观察到的流量与零模型进行比较的平均值、标准差和p值。

`flow_null`(lc, target, direction[, ...])	将流程与空模型进行比较。
`all_flows_null`(lc, direction[, ...])	将所有流程与空模型进行比较。

[Failla24] (1,2)

Andrea Failla, Rémy Cazabet, Giulio Rossetti, Salvatore Citraro. “重新定义时间数据中的事件类型和群体演变。”, arXiv 预印本 arXiv:2403.06771. 2024

[Asur2009]

Sitaram Asur, Parthasarathy Srinivasan, Ucar Duygu. “基于事件的框架，用于描述交互图的演化行为。” ACM Transactions on Knowledge Discovery from Data (TKDD) 3.4 (2009): 1-36.

[Greene2010]

Derek Greene, Doyle Donal, Cunningham, Padraig. “追踪动态社交网络中社区的演变。” 2010年国际社交网络分析与挖掘进展会议。IEEE, 2010.