信号子图估计器

本教程将介绍以下信号子图估计器:

  • 不连贯子图估计器

  • 相干子图估计器

[1]:
import graspologic.subgraph as sg
import matplotlib.pyplot as plt
import numpy as np
/home/runner/.cache/pypoetry/virtualenvs/graspologic-pkHfzCJ8-py3.10/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

预备知识

通用图模型的特征是 \(M_V(m,s;\pi,p,q)\),其中:

\(V\) - 图中的顶点数量

\(n\) - 图样本的数量

\(s\) - 子图中必须存在的边数

\(m\) - 子图中每条边必须关联的顶点数

\(\pi\) - 图样本属于类别1的概率

\(p\) - 信号子图中边的概率,以类别0为条件

\(q\) - 信号子图中边的概率,条件是类别1

信号子图是具有不同类别条件似然参数的边的子集。信号子图估计器评估图中每条边的测试统计量,并选择具有最低测试统计量的\(s\)条边,其中\(s\)是信号子图的期望大小。

用于找到信号子图的估计器决定了结果子图的某些属性。两个估计器都使用\(s\)来确定结果子图的大小。\(m\)仅用于相干估计器,它将子图限制为\(m\)个顶点。

非相干信号子图估计器

在这个例子中,我们将从一个有70个顶点的图中随机选择20条边。这些边将具有不同的类条件边概率,并且图将从模型\(M_{70}(20; 0.5, 0.8, 0.1)\)中采样,其中\(n = 100\)

[2]:
from graspologic.plot import heatmap

verts = 70
sedges = 20
pi = 0.5
p = 0.8
q = 0.1
nsamples = 100

np.random.seed(8888)
classlabels = np.zeros(nsamples, dtype=int)
classlabels[1::2] = 1

sigsubindex = np.random.choice(verts ** 2, sedges, replace=False)
vect = p * np.ones(verts ** 2)
vect[sigsubindex] = q
vect = np.reshape(vect, (verts, verts))
expected = np.where(vect == q, 1, 0)

blank = vect[:, :, None] + np.zeros(int(nsamples / 2))
A = p * np.ones((verts, verts, nsamples))
A[:, :, 1::2] = blank
A = np.random.binomial(1, A)

sigsub = sg.SignalSubgraph()
sigsub.fit_transform(graphs=A, labels=classlabels, constraints=sedges)

estimatesigsub = np.zeros((verts, verts))
estimatesigsub[sigsub.sigsub_] = 1

fig, ax = plt.subplots(ncols=2, figsize=(10, 5), constrained_layout=True)
heatmap(expected, ax=ax[0], cbar=False, title="Expected Signal-Subgraph")
_ = heatmap(estimatesigsub, ax=ax[1], cbar=False, title="Estimated Signal-Subgraph")
../../_images/tutorials_subgraph_subgraph_5_0.png

请注意,因为 \(p\)\(q\) 足够不同,且 \(n\) 足够大,预期的和估计的信号子图应该完全匹配。

非相干信号子图估计器

再次,我们将从一个有70个顶点的图中随机选择20条边。这些边将具有不同的类条件边概率,但图将从模型\(M_{70}(1, 20; 0.5, 0.8, 0.1)\)中采样,其中\(n = 100\)

估计的信号子图将包含20条边,每条边必须连接到同一个顶点。首先,我们将使用与之前示例相同的预期信号子图。

[3]:
np.random.seed(8888)
classlabels = np.zeros(nsamples, dtype=int)
classlabels[1::2] = 1

sigsubindex = np.random.choice(verts ** 2, sedges, replace=False)
vect = p * np.ones(verts ** 2)
vect[sigsubindex] = q
vect = np.reshape(vect, (verts, verts))
expected = np.where(vect == q, 1, 0)

blank = vect[:, :, None] + np.zeros(int(nsamples / 2))
A = p * np.ones((verts, verts, nsamples))
A[:, :, 1::2] = blank
A = np.random.binomial(1, A)

sigsub = sg.SignalSubgraph()
sigsub.fit_transform(A, classlabels, [20, 1])

estimatesigsub = np.zeros((verts, verts))
estimatesigsub[sigsub.sigsub_] = 1

fig, ax = plt.subplots(ncols=2, figsize=(10, 5), constrained_layout=True)
heatmap(expected, ax=ax[0], cbar=False, title="Expected Signal-Subgraph")
_ = heatmap(estimatesigsub, ax=ax[1], cbar=False, title="Estimated Signal-Subgraph")
../../_images/tutorials_subgraph_subgraph_8_0.png

注意相干估计器如何将估计的信号子图限制为与具有最佳总显著性值的1个顶点相邻的20条边。现在,我们将尝试一个也仅限于一个顶点的预期信号子图。

[4]:
mverts = 1

np.random.seed(7777)
classlabels = np.zeros(nsamples, dtype=int)
classlabels[1::2] = 1

m = np.random.choice(verts, mverts)
vect = p * np.ones(2 * verts * mverts - (mverts ** 2))
vect[np.random.choice(len(vect), sedges, replace=False)] = q

blank = p * np.ones((verts, verts))
blank[m, :] = np.nan
blank[:, m] = np.nan
blank[np.isnan(blank)] = vect
expected = np.where(blank == q, 1, 0)

blank = blank[:, :, None] + np.zeros(int(nsamples / 2))
A = p * np.ones((verts, verts, nsamples))
A[:, :, 1::2] = blank
A = np.random.binomial(1, A)

sigsub = sg.SignalSubgraph()
sigsub.fit_transform(graphs=A, labels=classlabels, constraints=sedges)

estimatesigsub = np.zeros((verts, verts))
estimatesigsub[sigsub.sigsub_] = 1

fig, ax = plt.subplots(ncols=2, figsize=(10, 5), constrained_layout=True)
heatmap(expected, ax=ax[0], cbar=False, title="Expected Signal-Subgraph")
_ = heatmap(estimatesigsub, ax=ax[1], cbar=False, title="Estimated Signal-Subgraph")
../../_images/tutorials_subgraph_subgraph_10_0.png

现在,预期的信号子图被限制在相干信号子图模型中,预期和估计的信号子图完全相等。