矩阵图和邻接图：使用元数据可视化和排序矩阵¶

本指南介绍了Matrixplot和Adjplot。它们允许用户根据一些元数据对矩阵进行排序，并将其绘制为热图或散点图。这些函数还允许用户添加颜色或刻度轴，以指示不同组或属性之间的分离。

注意：Matrixplot 和 Adjplot 的输入/功能几乎相同。Adjplot 只是 Matrixplot 的一个便捷封装，它假设要绘制的矩阵是方阵，并且具有相同的行和列元数据。

[1]:

from graspologic.simulations import sbm
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from graspologic.plot import adjplot, matrixplot
import seaborn as sns

/home/runner/.cache/pypoetry/virtualenvs/graspologic-pkHfzCJ8-py3.10/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

使用随机块模型模拟二值图¶

4块模型定义如下：

\begin{align*} n &= [50, 50, 50, 50]\\ P &= \begin{bmatrix}0.8 & 0.1 & 0.05 & 0.01\\ 0.1 & 0.4 & 0.15 & 0.02\\ 0.05 & 0.15 & 0.3 & 0.01\\ 0.01 & 0.02 & 0.01 & 0.4 \end{bmatrix} \end{align*}

因此，前50个顶点属于块1，接下来的50个顶点属于块2，第三个50个顶点属于块3，最后50个顶点属于块4。

每个块都与一些元数据相关联：

区块编号	半球	区域
1	0	0
2	0	1
3	1	0
4	1	1

[2]:

N = 50
n_communities = [N, N, N, N]
p = [[0.8, 0.1, 0.05, 0.01],
     [0.1, 0.4, 0.15, 0.02],
     [0.05, 0.15, 0.3, 0.01],
     [0.01, 0.02, 0.01, 0.4]]

np.random.seed(2)
A = sbm(n_communities, p)
meta = pd.DataFrame(
    data={
        'hemisphere': np.concatenate((np.full((1, 2*N), 0), np.full((1, 2*N), 1)), axis=1).flatten(),
        'region': np.concatenate((np.full((1, N), 0), np.full((1, N), 1), np.full((1, N), 0), np.full((1, N), 1)), axis=1).flatten(),
        'cell_size': np.arange(4*N)},
)

在没有随机化的情况下，原始数据看起来像这样：

[3]:

fig, ax = plt.subplots(1, 1, figsize=(10, 10))
adjplot(
    data=A,
    ax=ax,
    meta=meta,
    plot_type="scattermap",
)

[3]:

(<Axes: >,
 <mpl_toolkits.axes_grid1.axes_divider.AxesDivider at 0x7f2c781c3a60>)

../../_images/tutorials_plotting_matrixplot_5_1.png

随机化数据，以便我们可以看到矩阵排序的视觉重要性：

[4]:

rnd_idx = np.arange(4*N)
np.random.shuffle(rnd_idx)
A = A[np.ix_(rnd_idx, rnd_idx)]
meta = meta.reindex(rnd_idx)

随机化后的数据：

[5]:

fig, ax = plt.subplots(1, 1, figsize=(10, 10))
adjplot(
    data=A,
    ax=ax,
    meta=meta,
    plot_type="scattermap",
)

[5]:

(<Axes: >,
 <mpl_toolkits.axes_grid1.axes_divider.AxesDivider at 0x7f2cc82f83d0>)

../../_images/tutorials_plotting_matrixplot_9_1.png

`group` 的使用¶

参数组可以是一个列表或字符串或np.array，用于对矩阵进行分组

按一个元数据（半球）对矩阵进行分组¶

[6]:

fig, ax = plt.subplots(1, 1, figsize=(10, 10))
adjplot(
    data=A,
    ax=ax,
    meta=meta,
    plot_type="scattermap",
    group=["hemisphere"],
    sizes=(5, 5),
)

[6]:

(<Axes: >,
 <mpl_toolkits.axes_grid1.axes_divider.AxesDivider at 0x7f2cc82f8430>)

../../_images/tutorials_plotting_matrixplot_12_1.png

按另一个元数据（区域）对矩阵进行分组，但使用颜色轴标记半球¶

[7]:

fig, ax = plt.subplots(1, 1, figsize=(10, 10))
adjplot(
    data=A,
    ax=ax,
    meta=meta,
    plot_type="scattermap",
    group=["region"],
    color=["hemisphere"],
    sizes=(5, 5),
)

[7]:

(<Axes: >,
 <mpl_toolkits.axes_grid1.axes_divider.AxesDivider at 0x7f2c781609a0>)

../../_images/tutorials_plotting_matrixplot_14_1.png

同时按两个元数据进行分组¶

请注意列表的顺序很重要。

[8]:

fig, ax = plt.subplots(1, 1, figsize=(10, 10))
adjplot(
    data=A,
    ax=ax,
    meta=meta,
    plot_type="scattermap",
    group=["hemisphere", "region"],
)

fig, ax = plt.subplots(1, 1, figsize=(10, 10))
adjplot(
    data=A,
    ax=ax,
    meta=meta,
    plot_type="scattermap",
    group=["region", "hemisphere"],
)

[8]:

(<Axes: >,
 <mpl_toolkits.axes_grid1.axes_divider.AxesDivider at 0x7f2c79088550>)

../../_images/tutorials_plotting_matrixplot_16_1.png

../../_images/tutorials_plotting_matrixplot_16_2.png

`class_order` 的使用¶

按大小对分组类别进行排序¶

如果分组的类大小不同，我们可以根据大小按升序对它们进行排序。

[9]:

from numpy.random import normal
N = 10
n_communities = [N, 3*N, 2*N, N]
p = [[0.8, 0.1, 0.05, 0.01],
     [0.1, 0.4, 0.15, 0.02],
     [0.05, 0.15, 0.3, 0.01],
     [0.01, 0.02, 0.01, 0.4]]
wt = [[normal]*4]*4
wtargs = [[dict(loc=5, scale=1)]*4]*4

np.random.seed(2)
A = sbm(n_communities, p, wt=wt, wtargs=wtargs)
meta = pd.DataFrame(
    data={
        'hemisphere': np.concatenate((np.full((1, 4*N), 0), np.full((1, 3*N), 1)), axis=1).flatten(),
        'region': np.concatenate((np.full((1, N), 0), np.full((1, 3*N), 1), np.full((1, 2*N), 0), np.full((1, N), 1)), axis=1).flatten(),
        'cell_size': np.arange(7*N),
        'axon_length': np.concatenate((np.random.normal(5, 1, (1, N)), np.random.normal(2, 1, (1, 3*N)), np.random.normal(5, 1, (1, 2*N)), np.random.normal(2, 1, (1, N))), axis=1).flatten()},
)
rnd_idx = np.arange(7*N)
np.random.shuffle(rnd_idx)
A = A[np.ix_(rnd_idx, rnd_idx)]
meta = meta.reindex(rnd_idx)

[10]:

fig, ax = plt.subplots(1, 1, figsize=(10, 10))
adjplot(
    data=A,
    ax=ax,
    meta=meta,
    plot_type="scattermap",
    group=["hemisphere", "region"],
    group_order="size", # note that this is a special keyword which was not in `meta`
    color=["cell_size", "axon_length"],
    palette=["Purples", "Blues"],
    sizes=(1, 30),
)

[10]:

(<Axes: >,
 <mpl_toolkits.axes_grid1.axes_divider.AxesDivider at 0x7f2c78162230>)

../../_images/tutorials_plotting_matrixplot_20_1.png

如果元数据有其他字段，我们也可以按某些字段的平均值升序排序

[11]:

fig, ax = plt.subplots(1, 1, figsize=(10, 10))
adjplot(
    data=A,
    ax=ax,
    meta=meta,
    plot_type="scattermap",
    group=["hemisphere", "region"],
    group_order=["cell_size"],
    color=["cell_size", "axon_length"],
    palette=["Purples", "Blues"],
)

[11]:

(<Axes: >,
 <mpl_toolkits.axes_grid1.axes_divider.AxesDivider at 0x7f2c791da320>)

../../_images/tutorials_plotting_matrixplot_22_1.png

我们也可以同时按多个字段进行排序，包括group_class的大小。

[12]:

fig, ax = plt.subplots(1, 1, figsize=(10, 10))
adjplot(
    data=A,
    ax=ax,
    meta=meta,
    plot_type="scattermap",
    group=["hemisphere", "region"],
    group_order=["cell_size", "axon_length"],
    color=["cell_size", "axon_length"],
    palette=["Purples", "Blues"],
)

[12]:

(<Axes: >,
 <mpl_toolkits.axes_grid1.axes_divider.AxesDivider at 0x7f2c78934850>)

../../_images/tutorials_plotting_matrixplot_24_1.png

`item_order` 的使用¶

参数 item_order 用于在每个特定类别内对项目进行排序

如果不按item_order排序，矩阵在每个分组类别中仍然保持随机化：

[13]:

fig, ax = plt.subplots(1, 1, figsize=(10, 10))
adjplot(
    data=A,
    ax=ax,
    meta=meta,
    plot_type="scattermap",
    group=["hemisphere", "region"],
    color=["cell_size"],
    palette="Purples",
)

[13]:

(<Axes: >,
 <mpl_toolkits.axes_grid1.axes_divider.AxesDivider at 0x7f2c79112e60>)

../../_images/tutorials_plotting_matrixplot_27_1.png

但是使用item_order进行排序时，项目在每个分组类别内是有序的：

[14]:

fig, ax = plt.subplots(1, 1, figsize=(10, 10))
adjplot(
    data=A,
    ax=ax,
    meta=meta,
    plot_type="scattermap",
    group=["hemisphere", "region"],
    item_order=["cell_size"],
    color=["cell_size"],
    palette="Purples",
)

[14]:

(<Axes: >,
 <mpl_toolkits.axes_grid1.axes_divider.AxesDivider at 0x7f2c78813df0>)

../../_images/tutorials_plotting_matrixplot_29_1.png

`highlight`的使用¶

highlight 可以用来以不同的样式突出显示特定类别的分隔符

[15]:

fig, ax = plt.subplots(1, 1, figsize=(10, 10))
highlight_kws = dict(color="red", linestyle="-", linewidth=5)
adjplot(
    data=A,
    ax=ax,
    meta=meta,
    plot_type="scattermap",
    group=["hemisphere", "region"],
    highlight=["hemisphere"],
    highlight_kws=highlight_kws,
)

[15]:

(<Axes: >,
 <mpl_toolkits.axes_grid1.axes_divider.AxesDivider at 0x7f2c7420f1c0>)

../../_images/tutorials_plotting_matrixplot_31_1.png

多调色板的使用¶

每种颜色都可以用相同的palette绘制，或者可以为每种颜色指定不同的调色板

[16]:

fig, ax = plt.subplots(1, 1, figsize=(10, 10))
adjplot(
    data=A,
    ax=ax,
    meta=meta,
    plot_type="scattermap",
    group=["hemisphere", "region"],
    ticks=False,
    item_order=["cell_size"],
    color=["hemisphere", "region", "cell_size", "axon_length"],
    palette=["tab10", "tab20", "Purples", "Blues"],
)

[16]:

(<Axes: >,
 <mpl_toolkits.axes_grid1.axes_divider.AxesDivider at 0x7f2c743a5660>)

../../_images/tutorials_plotting_matrixplot_33_1.png

用不同的元数据标记行和列轴¶

如果您希望按不同的元数据对行和列进行分组，可以使用matrixplot函数来指定两个轴的参数。大多数参数是相同的，增加了row_或col_来指定data的对应轴。

注意：对于邻接矩阵，不应分别对行和列进行排序，因为这会破坏图的表示。这里我们这样做只是为了演示，假设A不是邻接矩阵。

[17]:

fig, ax = plt.subplots(1, 1, figsize=(10, 10))
matrixplot(
    data=A,
    ax=ax,
    col_meta=meta,
    row_meta=meta,
    plot_type="scattermap",
    col_group=["hemisphere"],
    row_group=["region"],
    col_item_order=["cell_size"],
    row_item_order=["cell_size"],
    row_color=["region"],
    row_ticks=False,
)

[17]:

(<Axes: >,
 <mpl_toolkits.axes_grid1.axes_divider.AxesDivider at 0x7f2c6dea86d0>)

../../_images/tutorials_plotting_matrixplot_35_1.png

使用`heatmap`而不是`scattermap`进行绘图¶

[18]:

fig, ax = plt.subplots(1, 1, figsize=(10, 10))
matrixplot(
    data=A,
    ax=ax,
    col_meta=meta,
    row_meta=meta,
    plot_type="heatmap",
    col_group=["hemisphere"],
    row_group=["region"],
    col_item_order=["cell_size"],
    row_item_order=["cell_size"],
    row_color=["region"],
    row_ticks=False,
)

[18]:

(<Axes: >,
 <mpl_toolkits.axes_grid1.axes_divider.AxesDivider at 0x7f2c6ddd6290>)

../../_images/tutorials_plotting_matrixplot_37_1.png

提供类似数组的对象而不是`meta`¶

如果未提供meta，则可以使用类似数组的数据结构提供每个排序/分组/着色关键字。

[19]:

group_0 = np.concatenate((np.full((4*N, 1), 0), np.full((3*N, 1), 1)), axis=0)
group_1 = np.concatenate((np.full((N, 1), 0), np.full((3*N, 1), 1), np.full((2*N, 1), 0), np.full((N, 1), 1)), axis=0)
group = np.concatenate((group_0, group_1), axis=1)

group = group[rnd_idx, :]

[20]:

fig, ax = plt.subplots(1, 1, figsize=(10, 10))
adjplot(
    data=A,
    ax=ax,
    plot_type="scattermap",
    group=group,
    sizes=(1, 30),
)

[20]:

(<Axes: >,
 <mpl_toolkits.axes_grid1.axes_divider.AxesDivider at 0x7f2c786848e0>)

../../_images/tutorials_plotting_matrixplot_40_1.png