邻接谱嵌入¶
本演示展示了如何使用邻接谱嵌入(ASE)类。然后,我们将使用ASE来展示如何在嵌入空间中使用k-means找到来自随机块模型图的两个社区。
[1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import adjusted_rand_score
from sklearn.cluster import KMeans
from graspologic.embed import AdjacencySpectralEmbed
from graspologic.simulations import sbm
from graspologic.plot import heatmap, pairplot
import warnings
warnings.filterwarnings('ignore')
np.random.seed(8889)
%matplotlib inline
/home/runner/.cache/pypoetry/virtualenvs/graspologic-pkHfzCJ8-py3.10/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
数据生成¶
ASE 是一种用于估计建模为随机点积图(RDPG)的网络潜在位置的方法。这种嵌入既是图的降维形式,也是将生成模型拟合到图数据的一种方式。我们首先生成两个2块SBM:一个有向,一个无向。
[2]:
# Define parameters
n_verts = 100
labels_sbm = n_verts * [0] + n_verts * [1]
P = np.array([[0.8, 0.2],
[0.2, 0.8]])
# Generate SBMs from parameters
undirected_sbm = sbm(2 * [n_verts], P)
directed_sbm = sbm(2 * [n_verts], P, directed=True)
# Plot both SBMs
fig, axes = plt.subplots(1, 2, figsize=(16, 8))
heatmap(undirected_sbm, title='2-block SBM (undirected)', inner_hier_labels=labels_sbm, ax=axes[0])
heatmap(directed_sbm, title='2-block SBM (directed)', inner_hier_labels=labels_sbm, ax=axes[1]);
嵌入:无向情况¶
We now use the AdjacencySpectralEmbed class to embed the adjacency matrix into lower-dimensional space.
If no parameters are given to the AdjacencySpectralEmbed class, it will automatically choose the number of dimensions to embed into.
[3]:
# instantiate an ASE object
ase = AdjacencySpectralEmbed()
# call its fit_transform method to generate latent positions
Xhat = ase.fit_transform(undirected_sbm)
_ = pairplot(Xhat, title='SBM adjacency spectral embedding')
嵌入:有向情况¶
如果图是有向的,我们将得到两个大致对应于“出”和“入”潜在位置的输出,因为这些不再相同。
[4]:
# Transform in directed case
ase = AdjacencySpectralEmbed()
Xhat, Yhat = ase.fit_transform(directed_sbm)
# Plot both embeddings
pairplot(Xhat, title='SBM adjacency spectral embedding "out"')
_ = pairplot(Yhat, title='SBM adjacency spectral embedding "in"')
维度规范¶
One can also specify the parameters for embedding.
Here, we specify the number of embedded dimensions and change the SVD solver used to compute the embedding.
[5]:
# fit and transform
ase = AdjacencySpectralEmbed(n_components=2, algorithm='truncated')
Xhat = ase.fit_transform(undirected_sbm)
# plot
pairplot(Xhat, title='2-component embedding', height=4)
[5]:
<seaborn.axisgrid.PairGrid at 0x7f4398309210>
嵌入空间中的聚类¶
现在,我们将使用图的欧几里得表示来应用标准的聚类算法,如k-means。我们从一个SBM模型开始,其中两个块具有完全相同的连接概率(实际上给我们一个ER模型图)。在这种情况下,k-means将无法区分两个嵌入的块。随着块之间的连接变得更加明显,聚类将得到改善。对于每个图,我们绘制其邻接矩阵、嵌入空间中的预测k-means聚类标签以及作为真实标签函数的误差。调整兰德指数(ARI)是聚类准确性的度量,其中1表示相对于真实情况的完美聚类。错误率仅仅是正确标记节点的比例。
[6]:
palette = {'Right':(0,0.7,0.2),
'Wrong':(0.8,0.1,0.1)}
for insularity in np.linspace(0.5, 0.625, 4):
P = np.array([[insularity, 1-insularity], [1-insularity, insularity]])
sampled_sbm = sbm(2 * [n_verts], P)
Xhat = AdjacencySpectralEmbed(n_components=2).fit_transform(sampled_sbm)
labels_kmeans = KMeans(n_clusters=2).fit_predict(Xhat)
ari = adjusted_rand_score(labels_sbm, labels_kmeans)
error = labels_sbm - labels_kmeans
error = error != 0
# sometimes the labels given by kmeans will be the inverse of ours
if np.sum(error) / (2 * n_verts) > 0.5:
error = error == 0
error_rate = np.sum(error) / (2 * n_verts)
error_label = (2 * n_verts) * ['Right']
error_label = np.array(error_label)
error_label[error] = 'Wrong'
heatmap(sampled_sbm, title=f'Insularity: {str(insularity)[:5]}',
inner_hier_labels=labels_sbm)
pairplot(Xhat,
labels=labels_kmeans,
title=f'KMeans on embedding, ARI: {str(ari)[:5]}',
legend_name='Predicted label',
height=3.5,
palette='muted',)
pairplot(Xhat,
labels=error_label,
title=f'Error from KMeans, Error rate: {str(error_rate)}',
legend_name='Error label',
height=3.5,
palette=palette,)