邻接谱嵌入

本演示展示了如何使用邻接谱嵌入(ASE)类。然后,我们将使用ASE来展示如何在嵌入空间中使用k-means找到来自随机块模型图的两个社区。

[1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import adjusted_rand_score
from sklearn.cluster import KMeans

from graspologic.embed import AdjacencySpectralEmbed
from graspologic.simulations import sbm
from graspologic.plot import heatmap, pairplot

import warnings
warnings.filterwarnings('ignore')
np.random.seed(8889)
%matplotlib inline
/home/runner/.cache/pypoetry/virtualenvs/graspologic-pkHfzCJ8-py3.10/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

数据生成

ASE 是一种用于估计建模为随机点积图(RDPG)的网络潜在位置的方法。这种嵌入既是图的降维形式,也是将生成模型拟合到图数据的一种方式。我们首先生成两个2块SBM:一个有向,一个无向。

[2]:
# Define parameters
n_verts = 100
labels_sbm = n_verts * [0] + n_verts * [1]
P = np.array([[0.8, 0.2],
              [0.2, 0.8]])

# Generate SBMs from parameters
undirected_sbm = sbm(2 * [n_verts], P)
directed_sbm = sbm(2 * [n_verts], P, directed=True)

# Plot both SBMs
fig, axes = plt.subplots(1, 2, figsize=(16, 8))
heatmap(undirected_sbm, title='2-block SBM (undirected)', inner_hier_labels=labels_sbm, ax=axes[0])
heatmap(directed_sbm, title='2-block SBM (directed)', inner_hier_labels=labels_sbm, ax=axes[1]);
../../_images/tutorials_embedding_AdjacencySpectralEmbed_3_0.png

嵌入:无向情况

We now use the AdjacencySpectralEmbed class to embed the adjacency matrix into lower-dimensional space.
If no parameters are given to the AdjacencySpectralEmbed class, it will automatically choose the number of dimensions to embed into.
[3]:
# instantiate an ASE object
ase = AdjacencySpectralEmbed()

# call its fit_transform method to generate latent positions
Xhat = ase.fit_transform(undirected_sbm)
_ = pairplot(Xhat, title='SBM adjacency spectral embedding')
../../_images/tutorials_embedding_AdjacencySpectralEmbed_6_0.png

嵌入:有向情况

如果图是有向的,我们将得到两个大致对应于“出”和“入”潜在位置的输出,因为这些不再相同。

[4]:
# Transform in directed case
ase = AdjacencySpectralEmbed()
Xhat, Yhat = ase.fit_transform(directed_sbm)

# Plot both embeddings
pairplot(Xhat, title='SBM adjacency spectral embedding "out"')
_ = pairplot(Yhat, title='SBM adjacency spectral embedding "in"')
../../_images/tutorials_embedding_AdjacencySpectralEmbed_9_0.png
../../_images/tutorials_embedding_AdjacencySpectralEmbed_9_1.png

维度规范

One can also specify the parameters for embedding.
Here, we specify the number of embedded dimensions and change the SVD solver used to compute the embedding.
[5]:
# fit and transform
ase = AdjacencySpectralEmbed(n_components=2, algorithm='truncated')
Xhat = ase.fit_transform(undirected_sbm)

# plot
pairplot(Xhat, title='2-component embedding', height=4)
[5]:
<seaborn.axisgrid.PairGrid at 0x7f4398309210>
../../_images/tutorials_embedding_AdjacencySpectralEmbed_12_1.png

嵌入空间中的聚类

现在,我们将使用图的欧几里得表示来应用标准的聚类算法,如k-means。我们从一个SBM模型开始,其中两个块具有完全相同的连接概率(实际上给我们一个ER模型图)。在这种情况下,k-means将无法区分两个嵌入的块。随着块之间的连接变得更加明显,聚类将得到改善。对于每个图,我们绘制其邻接矩阵、嵌入空间中的预测k-means聚类标签以及作为真实标签函数的误差。调整兰德指数(ARI)是聚类准确性的度量,其中1表示相对于真实情况的完美聚类。错误率仅仅是正确标记节点的比例。

[6]:
palette = {'Right':(0,0.7,0.2),
           'Wrong':(0.8,0.1,0.1)}

for insularity in np.linspace(0.5, 0.625, 4):
    P = np.array([[insularity, 1-insularity], [1-insularity, insularity]])
    sampled_sbm = sbm(2 * [n_verts], P)
    Xhat = AdjacencySpectralEmbed(n_components=2).fit_transform(sampled_sbm)
    labels_kmeans = KMeans(n_clusters=2).fit_predict(Xhat)
    ari = adjusted_rand_score(labels_sbm, labels_kmeans)
    error = labels_sbm - labels_kmeans
    error = error != 0
    # sometimes the labels given by kmeans will be the inverse of ours
    if np.sum(error) / (2 * n_verts) > 0.5:
        error = error == 0
    error_rate = np.sum(error) / (2 * n_verts)
    error_label = (2 * n_verts) * ['Right']
    error_label = np.array(error_label)
    error_label[error] = 'Wrong'

    heatmap(sampled_sbm, title=f'Insularity: {str(insularity)[:5]}',
            inner_hier_labels=labels_sbm)
    pairplot(Xhat,
             labels=labels_kmeans,
             title=f'KMeans on embedding, ARI: {str(ari)[:5]}',
             legend_name='Predicted label',
             height=3.5,
             palette='muted',)
    pairplot(Xhat,
             labels=error_label,
             title=f'Error from KMeans, Error rate: {str(error_rate)}',
             legend_name='Error label',
             height=3.5,
             palette=palette,)
../../_images/tutorials_embedding_AdjacencySpectralEmbed_14_0.png
../../_images/tutorials_embedding_AdjacencySpectralEmbed_14_1.png
../../_images/tutorials_embedding_AdjacencySpectralEmbed_14_2.png
../../_images/tutorials_embedding_AdjacencySpectralEmbed_14_3.png
../../_images/tutorials_embedding_AdjacencySpectralEmbed_14_4.png
../../_images/tutorials_embedding_AdjacencySpectralEmbed_14_5.png
../../_images/tutorials_embedding_AdjacencySpectralEmbed_14_6.png
../../_images/tutorials_embedding_AdjacencySpectralEmbed_14_7.png
../../_images/tutorials_embedding_AdjacencySpectralEmbed_14_8.png
../../_images/tutorials_embedding_AdjacencySpectralEmbed_14_9.png
../../_images/tutorials_embedding_AdjacencySpectralEmbed_14_10.png
../../_images/tutorials_embedding_AdjacencySpectralEmbed_14_11.png