分布的维度#

PyMC 提供了多种方式来指定其分布的维度。本文档提供了一个概述，并提供了一些用户提示。

术语表#

在本文档中，我们将使用“维度”一词来指代维度的概念。以下每个术语在 PyMC 中都有特定的语义和计算定义。虽然我们在这里分享它们，但在下面的示例中查看时，它们会更有意义。

支持维度 → 分布的核心维度
批处理维度 → 超出分布支持维度的额外维度
隐含维度 → 源自分布参数的值或形状的维度
显式维度 → 由以下参数之一明确定义的维度：
- 形状 → 从分布中抽样的数量
- 维度 → 维度名称的数组
坐标 → 一个字典，将维度名称映射到坐标值

from functools import partial

import pymc as pm
import numpy as np
import pytensor.tensor as pt

单变量分布示例#

我们可以从最简单的情况开始，单个正态分布。我们使用.dist来指定一个位于PyMC模型之外的分布。

normal_dist = pm.Normal.dist()

然后我们可以使用 draw() 函数从该分布中进行随机抽样。

# 仅对绘图函数进行修补以确保可重复性
rng = np.random.default_rng(seed=sum(map(ord, "dimensionality")))
draw = partial(pm.draw, random_seed=rng)

normal_draw = draw(normal_dist)
normal_draw, normal_draw.ndim

(array(0.80189558), 0)

在这种情况下，我们最终得到一个单一的标量值。这意味着正态分布具有标量支撑维度，因为你可以进行的最小随机抽取是一个维度为零的标量。每个分布的支撑维度作为一个属性被硬编码。

normal_dist.owner.op.ndim_supp

显式批处理维度#

如果需要进行多次抽取，通常的倾向是创建多个相同变量的副本并将它们堆叠在一起。

normal_dists = pm.math.stack([pm.Normal.dist() for _ in range(3)])
draw(normal_dists)

array([ 0.83636296, -0.33327414,  0.9434115 ])

更简单地说，可以通过使用形状参数来创建来自同一分布族的批量独立抽样。

normal_dists = pm.Normal.dist(shape=(3,))
draw(normal_dists)

array([ 0.98810294, -0.07003785, -0.37962748])

normal_dists = pm.Normal.dist(shape=(4, 3))
draw(normal_dists)

array([[ 7.99932116e-04, -1.94407945e+00,  3.90451962e-01],
       [ 1.10657367e+00,  6.49042149e-01, -1.09450185e+00],
       [-2.96226305e-01,  1.41884595e+00, -1.31589441e+00],
       [ 1.53109449e+00, -7.73771737e-01,  2.37418367e+00]])

这不仅更加简洁，而且生成的向量化代码效率更高。在PyMC中，我们很少使用堆栈方法，除非需要组合来自不同分布族的抽样。

隐式批次维度#

还可以通过传递具有更高维度的参数来创建一批抽样，而无需指定形状。

normal_dists = pm.Normal.dist(mu=np.array([0, 0, 0]), sigma=np.array([1, 1, 1]))
draw(normal_dists)

array([ 0.81833093, -0.2891973 ,  1.2399946 ])

这与之前的显式形状示例是等效的，我们本可以在这里显式地传递它。由于我们没有这样做，因此我们将这些批次维度称为隐式。

这在我们希望参数在批次维度上变化时变得非常有用。

draw(pm.Normal.dist(mu=[1, 10, 100], sigma=0.0001))

array([  0.99989975,  10.00009874, 100.00004215])

当参数的形状不同时，它们会被广播，类似于NumPy的工作方式。在这种情况下，sigma被广播以匹配mu的形状。

np.broadcast_arrays([1, 10, 100], 0.0001)

[array([  1,  10, 100]), array([0.0001, 0.0001, 0.0001])]

了解 NumPy 广播的工作原理是很重要的。当你做一些无效的操作时，容易遇到这种错误：

try:
    # 形状为 (3,) 和 (2,) 的数组无法一起广播
    draw(pm.Normal.dist(mu=[1, 10, 100], sigma=[0.1, 0.1]))
except ValueError as error:
    print(error)

shape mismatch: objects cannot be broadcast to a single shape.  Mismatch is between arg 0 with shape (3,) and arg 1 with shape (2,).
Apply node that caused the error: normal_rv{0, (0, 0), floatX, True}(RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F6427F8CAC0>), [], 11, [  1  10 100], [0.1 0.1])
Toposort index: 0
Inputs types: [RandomGeneratorType, TensorType(int64, shape=(0,)), TensorType(int64, shape=()), TensorType(int64, shape=(3,)), TensorType(float64, shape=(2,))]
Inputs shapes: ['No shapes', (0,), (), (3,), (2,)]
Inputs strides: ['No strides', (0,), (), (8,), (8,)]
Inputs values: [Generator(PCG64) at 0x7F6427F8CAC0, array([], dtype=int64), array(11), array([  1,  10, 100]), array([0.1, 0.1])]
Outputs clients: [['output'], ['output']]

HINT: Re-running with most PyTensor optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the PyTensor flag 'optimizer=fast_compile'. If that does not work, PyTensor optimizations can be disabled with 'optimizer=None'.
HINT: Use the PyTensor flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.

结合隐式和显式批次维度#

您可以将显式形状维度与隐式批次维度结合使用。如上所提到的，它们可以提供相同的信息。

normal_dists = pm.Normal.dist(mu=np.array([0, 1, 2]), sigma=1, shape=(3,))
draw(normal_dists)

array([-0.49526775, -0.94608062,  1.66397913])

但是形状也可以用来扩展超出任何隐式批次维度。

normal_dists = pm.Normal.dist(mu=np.array([0, 1, 2]), sigma=1, shape=(4, 3))
draw(normal_dists)

array([[ 2.22626513,  2.12938134,  0.49074886],
       [ 0.08312601,  1.05049093,  1.91718083],
       [-0.68191815,  1.43771096,  1.76780399],
       [-0.59883241,  0.26954893,  2.74319335]])

请注意，由于广播规则，显式的批次维度必须始终“放在左侧”，在任何隐式维度之前。因此，在前面的例子中，shape=(4, 3) 是有效的，但 shape=(3, 4) 是无效的，因为 mu 参数可以广播到第一个形状，而不能广播到第二个形状。

try:
    draw(pm.Normal.dist(mu=np.array([0, 1, 2]), sigma=1, shape=(3, 4)))
except ValueError as error:
    print(error)

shape mismatch: objects cannot be broadcast to a single shape.  Mismatch is between arg 0 with shape (3, 4) and arg 1 with shape (3,).
Apply node that caused the error: normal_rv{0, (0, 0), floatX, True}(RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F64280725E0>), [3 4], 11, [0 1 2], 1.0)
Toposort index: 0
Inputs types: [RandomGeneratorType, TensorType(int64, shape=(2,)), TensorType(int64, shape=()), TensorType(int64, shape=(3,)), TensorType(float64, shape=())]
Inputs shapes: ['No shapes', (2,), (), (3,), ()]
Inputs strides: ['No strides', (8,), (), (8,), ()]
Inputs values: [Generator(PCG64) at 0x7F64280725E0, array([3, 4]), array(11), array([0, 1, 2]), array(1.)]
Outputs clients: [['output'], ['output']]

HINT: Re-running with most PyTensor optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the PyTensor flag 'optimizer=fast_compile'. If that does not work, PyTensor optimizations can be disabled with 'optimizer=None'.
HINT: Use the PyTensor flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.

如果您需要正态变量的形状为 shape=(3, 4)，可以在定义后对其进行转置。

transposed_normals = pm.Normal.dist(mu=np.array([0, 1, 2]), sigma=1, shape=(4, 3)).T
draw(transposed_normals)

array([[-0.73397401, -0.18717845, -0.78548049,  1.64478883],
       [ 3.54543846,  1.22954216,  2.13674063,  1.94194106],
       [ 0.85294471,  3.52041332,  2.94428975,  3.25944187]])

小技巧

重要的是不要混淆在分布定义中设定的维度与在后续操作（如转置、索引或广播）中设定的维度。在使用PyMC进行采样时（无论是通过前向采样还是MCMC），随机抽样始终会来自于分布形状。请注意，在以下示例中，尽管两个变量具有相同的最终形状，但实际抽取的“随机”样本数量是不同的。

vector_normals = pm.Normal.dist(shape=(3,))
broadcasted_normal = pt.broadcast_to(pm.Normal.dist(), (3,))

draw(vector_normals), draw(broadcasted_normal)

(array([-0.45755879,  1.59975702,  0.20546749]),
 array([0.29866199, 0.29866199, 0.29866199]))

多元分布示例#

某些分布根据定义在评估时返回多个值。这可能是一个值的向量，或一个矩阵，或一个任意的多维张量。一个例子是多元正态分布，它总是返回一个向量（具有一个维度的数组）。

mvnormal_dist = pm.MvNormal.dist(mu=np.ones(3), cov=np.eye(3))
mvnormal_draw = draw(mvnormal_dist)
mvnormal_draw, mvnormal_draw.ndim

(array([0.55390975, 2.17440418, 1.83014764]), 1)

与任何分布一样，支持的维度被指定为一个固定的属性。

mvnormal_dist.owner.op.ndim_supp

即使您指定一个单维的MvNormal，您也会得到一个向量！

smallest_mvnormal_dist = pm.MvNormal.dist(mu=[1], cov=[[1]])
smallest_mvnormal_draw = draw(smallest_mvnormal_dist)
smallest_mvnormal_draw, smallest_mvnormal_draw.ndim

(array([-0.68893796]), 1)

隐式支持维度#

在我们刚刚看到的MvNormal示例中，支持维度实际上是隐含的。我们并没有具体指定我们想要的抽样向量的维度是3还是1。这是从mu和cov的形状中推断出来的。因此，我们称其为隐含支持维度。我们可以通过使用形状来使其更加明确。

explicit_mvnormal = pm.MvNormal.dist(mu=np.ones(3), cov=np.eye(3), shape=(3,))
draw(explicit_mvnormal)

array([0.57262853, 0.34230354, 1.96818163])

警告

然而，请注意，在撰写时，形状仅仅被忽略用于支持维度。它仅作为一种“类型提示”来标记预期的维度。

ignored_shape_mvnormal = pm.MvNormal.dist(mu=np.ones(3), cov=np.eye(3), shape=(4,))
draw(ignored_shape_mvnormal)

array([1.0623799 , 0.84622693, 0.34046237])

显式批次维度#

与单变量分布一样，我们可以添加显式的批量维度。我们将使用另一个向量分布来说明这一点：多项分布。下面的代码片段定义了一个包含五个独立多项分布的矩阵，每个分布都是大小为3的向量。

draw(pm.Multinomial.dist(n=5, p=[0.1, 0.3, 0.6], shape=(5, 3)))

array([[2, 0, 3],
       [1, 1, 3],
       [0, 2, 3],
       [0, 1, 4],
       [1, 0, 4]])

警告

再次注意，形状对支持的维度没有影响

draw(pm.Multinomial.dist(n=5, p=[0.1, 0.3, 0.6], shape=(5, 4)))

array([[0, 1, 4],
       [0, 0, 5],
       [3, 1, 1],
       [0, 1, 4],
       [0, 2, 3]])

出于同样的原因，您必须始终在支持维度的“左侧”明确定义显式的批量维度。以下内容将不会按预期行为。

draw(pm.Multinomial.dist(n=5, p=[0.1, 0.3, 0.6], shape=(3, 5)))

array([[2, 0, 3],
       [1, 3, 1],
       [1, 1, 3]])

如果你需要多项式变量的形状为 shape=(3, 5)，可以在定义后进行转置。

transposed_multinomials = pm.Multinomial.dist(n=5, p=[0.1, 0.3, 0.6], shape=(5, 3)).T
draw(transposed_multinomials)

array([[0, 0, 0, 0, 0],
       [2, 2, 1, 0, 3],
       [3, 3, 4, 5, 2]])

隐式批量维度#

与单变量分布一样，我们可以为每个批处理维度使用不同的参数。

multinomial_dist = pm.Multinomial.dist(n=[5, 10], p=[0.1, 0.3, 0.6])
draw(multinomial_dist)

array([[1, 2, 2],
       [0, 3, 7]])

哪一个等同于更冗长的内容

draw(pm.Multinomial.dist(n=[5, 10], p=[[0.1, 0.3, 0.6], [0.1, 0.3, 0.6]]))

array([[2, 2, 1],
       [0, 3, 7]])

如果您熟悉NumPy的广播规则，您可能会好奇PyMC是如何使其工作的。简单的广播在这里是行不通的。

try:
    np.broadcast_arrays([5, 10], [0.1, 0.3, 0.6])
except ValueError as exc:
    print(exc)

shape mismatch: objects cannot be broadcast to a single shape.  Mismatch is between arg 0 with shape (2,) and arg 1 with shape (3,).

为了理解发生了什么，我们需要引入参数核心维度的概念。分布参数的核心维度是定义分布所需的参数最小维数。在多项式分布中，n 至少必须是一个标量整数，而 p 至少必须是一个表示每个类别结果概率的向量。因此，对于多项式分布，n 的核心维度是0，而 p 的核心维度是1。

因此，如果我们有一个包含两个 n 的向量，我们实际上应该将 p 的向量广播成一个包含两个这样的向量的矩阵，并将每个 n 与每个广播的 p 的行配对。这的工作方式与 np.vectorize 完全相同。

def core_multinomial(n, p):
    print(">>", n, p)
    return draw(pm.Multinomial.dist(n, p))


vectorized_multinomial = np.vectorize(core_multinomial, signature="(),(p)->(p)")
vectorized_multinomial([5, 10], [0.1, 0.3, 0.6])

>> 5 [0.1 0.3 0.6]
>> 10 [0.1 0.3 0.6]

array([[1, 0, 4],
       [1, 2, 7]])

每个分布参数的核心维度也是硬编码为每个分布的一个属性。

multinomial_dist.owner.op.ndims_params

(0, 1)

隐式批处理维度仍然必须遵循广播规则。以下示例无效，因为 n 的批处理维度为 shape=(2,) ，而 p 的批处理维度为 shape=(3,) ，这两个维度无法一起广播。

try:
    draw(pm.Multinomial.dist(n=[5, 10], p=[[0.1, 0.3, 0.6], [0.1, 0.3, 0.6], [0.1, 0.3, 0.6]]))
except ValueError as error:
    print(error)

operands could not be broadcast together with remapped shapes [original->remapped]: (2,)  and requested shape (3,)
Apply node that caused the error: multinomial_rv{1, (0, 1), int64, True}(RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F6425B8B060>), [], 4, [ 5 10], [[0.1 0.3 ... 0.3 0.6]])
Toposort index: 0
Inputs types: [RandomGeneratorType, TensorType(int64, shape=(0,)), TensorType(int64, shape=()), TensorType(int64, shape=(2,)), TensorType(float64, shape=(3, 3))]
Inputs shapes: ['No shapes', (0,), (), (2,), (3, 3)]
Inputs strides: ['No strides', (0,), (), (8,), (24, 8)]
Inputs values: [Generator(PCG64) at 0x7F6425B8B060, array([], dtype=int64), array(4), array([ 5, 10]), 'not shown']
Outputs clients: [['output'], ['output']]

HINT: Re-running with most PyTensor optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the PyTensor flag 'optimizer=fast_compile'. If that does not work, PyTensor optimizations can be disabled with 'optimizer=None'.
HINT: Use the PyTensor flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.

结合隐式和显式批量维度#

您可以并且应该将多维参数的隐式维度与显式形状信息结合，这样更易于推理。

draw(pm.Multinomial.dist(n=[5, 10], p=[0.1, 0.3, 0.6], shape=(2, 3)))

array([[0, 1, 4],
       [4, 1, 5]])

显式的批量维度仍然可以超出任何隐式的批量维度。同样，由于广播的工作原理，显式批量维度必须始终”在左边”。以下情况是无效的，因为n具有形状为(2,)的批量维度，这无法广播到形状为(2, 4)的显式批量维度。

try:
    draw(pm.Multinomial.dist(n=[5, 10], p=[0.1, 0.3, 0.6], shape=(2, 4, 3)))
except ValueError as error:
    print(error)

operands could not be broadcast together with remapped shapes [original->remapped]: (2,)  and requested shape (2,4)
Apply node that caused the error: multinomial_rv{1, (0, 1), int64, True}(RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F6425AC8120>), [2 4], 4, [ 5 10], [0.1 0.3 0.6])
Toposort index: 0
Inputs types: [RandomGeneratorType, TensorType(int64, shape=(2,)), TensorType(int64, shape=()), TensorType(int64, shape=(2,)), TensorType(float64, shape=(3,))]
Inputs shapes: ['No shapes', (2,), (), (2,), (3,)]
Inputs strides: ['No strides', (8,), (), (8,), (8,)]
Inputs values: [Generator(PCG64) at 0x7F6425AC8120, array([2, 4]), array(4), array([ 5, 10]), array([0.1, 0.3, 0.6])]
Outputs clients: [['output'], ['output']]

HINT: Re-running with most PyTensor optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the PyTensor flag 'optimizer=fast_compile'. If that does not work, PyTensor optimizations can be disabled with 'optimizer=None'.
HINT: Use the PyTensor flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.

使用模型图检查维度#

在PyMC模型中，分布通常会被频繁使用，因此存在一些工具可以在该上下文中帮助推理分布的形状。

with pm.Model() as pmodel:
    mu = pm.Normal("x", mu=0, shape=(3))
    sigma = pm.HalfNormal("sigma")
    y = pm.Normal("y", mu=mu, sigma=sigma)

for rv, shape in pmodel.eval_rv_shapes().items():
    print(f"{rv:>11}: shape={shape}")

          x: shape=(3,)
sigma_log__: shape=()
      sigma: shape=()
          y: shape=(3,)

一个更强大的工具来理解和调试PyMC中的维度是model_to_graphviz()函数。我们可以通过读取Graphviz输出而不是检查数组输出，以理解变量的维度。

pm.model_to_graphviz(pmodel)

../../_images/32bbcc40193d6edd997363f8ef2052558a0eb76115d36ec4ca8a54aff60924d9.svg

在上面的示例中，每个框（或板）左下角的数字表示其中分布的维度。如果某个分布在任何带数字的框之外，它具有标量形状。

让我们利用这个工具来回顾隐式和显式维度：

with pm.Model() as pmodel:
    pm.Normal("scalar (support)")
    pm.Normal("vector (implicit)", mu=[1, 2, 3])
    pm.Normal("vector (explicit)", shape=(4,))

pm.model_to_graphviz(pmodel)

../../_images/ebf196be8ef8c74bf92aa108634caba3e3b9e05dbf8fd0ff5ba350547a33af96.svg

维度#

PyMC 支持 dims 的概念。对于许多随机变量来说，哪个维度对应于哪个“现实世界”的概念，例如观察数量、处理单元数量等，可能会变得令人困惑。dims 参数是一个额外的易读标签，可以传达这种含义。当单独使用时，dims 必须与显式的 shape 信息结合使用。

with pm.Model() as model:
    pm.Normal("profit", shape=(3,), dims="year")

pm.model_to_graphviz(model)

../../_images/112253519c8d490bb25a5efe495a103862af614d679a3e5563aea86d5d35e104.svg

在 dims 的使用中，最强大的地方在于模型中指定的 coords。这为每个 dim 条目提供了一个独特的标签，使其更具意义。

coords = {"year": [2020, 2021, 2022]}
with pm.Model(coords=coords) as model:
    pm.Normal("profit", dims="year")

pm.model_to_graphviz(model)

在这种情况下，分布的维度实际上可以通过使用的 dims 来定义。我们不需要传递形状或定义隐式批处理维度。

让我们通过一个多元正态分布的例子来回顾不同维度的特点。

coords = {
    "batch": [0, 1, 2, 3],
    "support": ["A", "B", "C"],
}
with pm.Model(coords=coords) as model:
    pm.MvNormal("vector", mu=[0, 0, 0], cov=np.eye(3), dims=("support",))
    pm.MvNormal("matrix (implicit)", mu=np.zeros((4, 3)), cov=np.eye(3), dims=("batch", "support"))
    pm.MvNormal(
        "matrix (explicit)", mu=[0, 0, 0], cov=np.eye(3), shape=(4, 3), dims=("batch", "support")
    )

pm.model_to_graphviz(model)

../../_images/6513eb061fe8c2e25397f014288b28f41752168c35ef0b79db4f9dbb4dd90605.svg

小技巧

对于最终模型的发布，我们建议将尺寸和坐标作为标签，因为这些标签将传递给 arviz.InferenceData。这既是最佳实践，确保透明性和可读性，方便他人理解。同时，在单开发者的工作流程中，这也是非常有用的，例如，当存在三维或更高维度的分布时，它将有助于指明哪个维度对应于哪个模型概念。

调试形状问题的提示#

尽管我们提供了所有这些工具以方便用户，并且 PyMC 尽力理解用户的意图，但混合维度工具的结果可能并不总是符合最终的预期维度。有时模型可能在采样时不会指示错误，或者根本不会指示问题。在处理维度，特别是更复杂的维度时，我们建议：

在采样之前使用 model_to_graphviz 可视化模型
使用 draw 或 sample_prior predictive 早期捕捉错误
检查返回的 az.InferenceData 对象，以确保所有数组大小符合预期
在追踪错误时使用质数定义形状。

%load_ext watermark

%watermark -n -u -v -iv -w -p pytensor

Last updated: Mon Jul 17 2023

Python implementation: CPython
Python version       : 3.10.9
IPython version      : 8.11.0

pytensor: 2.12.3

pymc    : 5.2.0
numpy   : None
pytensor: 2.12.3

Watermark: 2.3.1