变化的长模型#

import arviz as az
import bambi as bmb
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import preliz as pz
import pymc as pm
import statsmodels.api as sm
import xarray as xr

lowess = sm.nonparametric.lowess

%config InlineBackend.figure_format = 'retina'  # high resolution figures
az.style.use("arviz-darkgrid")
rng = np.random.default_rng(42)

对变化的研究涉及同时分析个体变化的轨迹，并从所研究的个体集合中抽象出更广泛的关于所讨论变化性质的见解。因此，很容易因为专注于树木而忽略了森林。在这个例子中，我们将展示使用分层贝叶斯模型研究个体群体内变化的一些微妙之处——从个体内部视角转向个体间/跨个体视角。

我们将遵循Singer和Willett在《应用纵向数据分析》中概述的模型构建讨论和迭代方法。更多详情请参见Singer .D和Willett .B [2003]）。我们将与他们的展示有所不同，因为他们专注于使用最大似然法拟合数据，而我们将使用PyMC和贝叶斯方法。在这一方法中，我们遵循了Solomon Kurz在brms中的工作，参见Kurz [2021]。我们强烈推荐这两本书，以获得对这里讨论问题的更详细介绍。

演示文稿的结构#

青少年酒精消费随时间变化分析
插曲：使用Bambi的替代模型规范
青少年外化行为随时间变化分析
结论

探索性数据分析：个体差异的混乱#

对于任何纵向分析，我们需要三个组成部分：(1) 多次数据收集 (2) 合适的时间定义 (3) 一个感兴趣的结果。结合这些，我们可以评估个体在时间上的变化与该结果的关系。在这个模型的第一系列中，我们将研究青少年酒精使用情况如何随着从14岁开始每年收集的数据在儿童之间变化。

try:
    df = pd.read_csv("../data/alcohol1_pp.csv")
except FileNotFoundError:
    df = pd.read_csv(pm.get_data("alcohol1_pp.csv"))

df["peer_hi_lo"] = np.where(df["peer"] > df["peer"].mean(), 1, 0)
df

	id	age	coa	male	age_14	alcuse	peer	cpeer	ccoa	peer_hi_lo
0	1	14	1	0	0	1.732051	1.264911	0.246911	0.549	1
1	1	15	1	0	1	2.000000	1.264911	0.246911	0.549	1
2	1	16	1	0	2	2.000000	1.264911	0.246911	0.549	1
3	2	14	1	1	0	0.000000	0.894427	-0.123573	0.549	0
4	2	15	1	1	1	0.000000	0.894427	-0.123573	0.549	0
...	...	...	...	...	...	...	...	...	...	...
241	81	15	0	1	1	0.000000	1.549193	0.531193	-0.451	1
242	81	16	0	1	2	0.000000	1.549193	0.531193	-0.451	1
243	82	14	0	0	0	0.000000	2.190890	1.172890	-0.451	1
244	82	15	0	0	1	1.414214	2.190890	1.172890	-0.451	1
245	82	16	0	0	2	1.000000	2.190890	1.172890	-0.451	1

246 行 × 10 列

首先，我们将研究一部分儿童的消费模式，以了解他们报告的使用情况如何表现出不同的轨迹。所有这些轨迹都可以合理地建模为线性现象。

fig, axs = plt.subplots(2, 4, figsize=(20, 8), sharey=True)
axs = axs.flatten()
for i, ax in zip(df["id"].unique()[0:8], axs):
    temp = df[df["id"] == i].sort_values("age")
    ax.plot(temp["age"], temp["alcuse"], "--o", color="black")
    ax.set_title(f"Subject {i}")
    ax.set_ylabel("alcohol Usage")
    ax.set_xlabel("Age")

../_images/cb51b79da3d2ea65a74632f35d3f6fc34ece85c010e19f4603eee7e16a6809fd.png

我们可能会认为酒精使用模式因性别的影响而有所不同，但个体的轨迹是嘈杂的。在接下来的系列图中，我们在个体数据上拟合了简单的回归模型，并根据斜率系数是\(\color{green}{负的}\)还是\(\color{red}{正的}\)来为趋势线着色。这非常粗略地给出了每个性别的个体消费模式是否以及如何变化的印象。

绿色和红色的线代表个人的OLS拟合，但灰色虚线代表个人的观察轨迹。

../_images/496d04259c81687ecd86709da11024de77587f2be9867f57b4fecea23318bb51.png

接下来我们检查相同的图表，但根据孩子是否是酗酒父母的子女进行分层。

../_images/6ef96c8b15e7b2edfcebd50a95c2484086a018fd05a12cb5b5203aa1bac1bbf2.png

我们将通过粗略地将孩子们分为他们的同伴群体是否报告了高酒精使用得分来结束对这个数据集的探索。

../_images/8d6b43d4e9050496125be017d069a8bb50bce97ecf6aac407c62d346be465f2c.png

这次探索性练习的整体印象是巩固复杂性的概念。虽然有许多个体的轨迹和独特的时间旅程，但如果我们想对感兴趣的现象——父母酗酒对子女饮酒行为的影响——做出一般性的陈述，我们需要做的不仅仅是面对复杂性而放弃。

随时间变化的建模#

我们从简单的无条件模型开始，该模型仅模拟个体对最终结果的贡献。换句话说，该模型的特点是它不包含任何预测因子。它用于划分结果中的变异来源，根据个体偏离总体均值的程度，将或多或少异常的行为归因于每个个体。

无条件均值模型#

id_indx, unique_ids = pd.factorize(df["id"])
coords = {"ids": unique_ids, "obs": range(len(df["alcuse"]))}
with pm.Model(coords=coords) as model:
    subject_intercept_sigma = pm.HalfNormal("subject_intercept_sigma", 2)
    subject_intercept = pm.Normal("subject_intercept", 0, subject_intercept_sigma, dims="ids")
    global_sigma = pm.HalfStudentT("global_sigma", 1, 3)
    global_intercept = pm.Normal("global_intercept", 0, 1)
    grand_mean = pm.Deterministic("grand_mean", global_intercept + subject_intercept[id_indx])
    outcome = pm.Normal("outcome", grand_mean, global_sigma, observed=df["alcuse"], dims="obs")
    idata_m0 = pm.sample_prior_predictive()
    idata_m0.extend(
        pm.sample(random_seed=100, target_accept=0.95, idata_kwargs={"log_likelihood": True})
    )
    idata_m0.extend(pm.sample_posterior_predictive(idata_m0))

pm.model_to_graphviz(model)

Sampling: [global_intercept, global_sigma, outcome, subject_intercept, subject_intercept_sigma]
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [subject_intercept_sigma, subject_intercept, global_sigma, global_intercept]

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 5 seconds.
Sampling: [outcome]

../_images/e92b9c131b709dd3102a2bbda8a48ec4e31709504f4c74c002a97a6fba5b5619.svg

az.summary(idata_m0, var_names=["subject_intercept_sigma", "global_intercept", "global_sigma"])

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
subject_intercept_sigma	0.766	0.084	0.604	0.917	0.002	0.001	2852.0	2579.0	1.0
global_intercept	0.913	0.100	0.709	1.088	0.003	0.002	1032.0	1798.0	1.0
global_sigma	0.757	0.043	0.676	0.833	0.001	0.001	3442.0	2960.0	1.0

fig, ax = plt.subplots(figsize=(20, 7))
expected_individual_mean = idata_m0.posterior["subject_intercept"].mean(axis=1).values[0]

ax.bar(
    range(len(expected_individual_mean)),
    np.sort(expected_individual_mean),
    color="slateblue",
    ec="black",
)
ax.set_xlabel("Individual ID")
ax.set_ylabel("alcohol Use")
ax.set_title("Distribution of Individual Modifications to the Grand Mean");

../_images/27bed2cffbf77245799132d03bf9a57ed1a0f3eeba63a1b4cb37097bf086e809.png

我们在这里看到研究中每个个体对总体均值隐含修正的变化。

无条件增长模型#

接下来，我们将更明确地建模个体对回归模型斜率的贡献，其中时间是关键预测因子。这个模型的结构值得停下来考虑。在不同的领域和学科中，这种层次模型有各种不同的实例。经济学、政治学、心理测量学和生态学都有自己稍微不同的词汇来命名模型的各个部分：固定效应、随机效应、内部估计量、外部估计量……等等，这个列表还在继续，讨论也因此变得复杂。这些术语是模糊的，并且使用方式各不相同。Wilett 和 Singer 提到了 Level 1 和 Level 2 子模型，但精确的术语并不重要。

这些模型中重要的是层次结构。存在一个全局现象和一个特定主题的现象实例化。该模型允许我们将全局模型与每个主题的个体贡献组合起来。这有助于模型解释主题层面的未观察到的异质性。根据模型规范，每个主题的斜率和截距可能会有所不同。它不能解决所有形式的偏差，但它确实有助于解释模型预测中的这种偏斜来源。

\[\begin{split} \begin{aligned} & alcohol \sim Normal(\color{purple}{\mu, \sigma}) \\ & \color{purple}{\mu} = \color{red}{\alpha} + \color{green}{\beta} \cdot age \\ & \color{red}{\alpha} = \sum_{j=0}^{N} \alpha_{1} + \alpha_{2, j} \ \ \ \ \forall j \in Subjects \\ & \color{green}{\beta} = \sum_{j=0}^{N} \beta_{1} + \beta_{2, j} \ \ \ \ \forall j \in Subjects \\ & \color{purple}{\sigma} = HalfStudentT(?, ?) \\ & \alpha_{i, j} \sim Normal(?, ?) \\ & \beta_{i, j} \sim Normal(?, ?) \end{aligned} \end{split}\]

拟合模型后，我们不仅了解到每个个体如何修改全局模型，还让我们学习到全局参数。特别是，我们允许对表示时间的变量系数进行特定主题的修改。对于我们在接下来的一系列模型中概述的所有层次模型，都存在一种广泛相似的组合模式。在贝叶斯设置中，我们试图学习最适合数据的参数。在PyMC中实现模型的代码如下：

id_indx, unique_ids = pd.factorize(df["id"])
coords = {"ids": unique_ids, "obs": range(len(df["alcuse"]))}
with pm.Model(coords=coords) as model:
    age_14 = pm.MutableData("age_14_data", df["age_14"].values)

    ## Level 1
    global_intercept = pm.Normal("global_intercept", 0, 1)
    global_sigma = pm.HalfStudentT("global_sigma", 1, 3)
    global_age_beta = pm.Normal("global_age_beta", 0, 1)

    subject_intercept_sigma = pm.HalfNormal("subject_intercept_sigma", 5)
    subject_age_sigma = pm.HalfNormal("subject_age_sigma", 5)

    ## Level 2
    subject_intercept = pm.Normal("subject_intercept", 0, subject_intercept_sigma, dims="ids")
    subject_age_beta = pm.Normal("subject_age_beta", 0, subject_age_sigma, dims="ids")

    growth_model = pm.Deterministic(
        "growth_model",
        (global_intercept + subject_intercept[id_indx])
        + (global_age_beta + subject_age_beta[id_indx]) * age_14,
    )
    outcome = pm.Normal(
        "outcome", growth_model, global_sigma, observed=df["alcuse"].values, dims="obs"
    )
    idata_m1 = pm.sample_prior_predictive()
    idata_m1.extend(
        pm.sample(random_seed=100, target_accept=0.95, idata_kwargs={"log_likelihood": True})
    )
    idata_m1.extend(pm.sample_posterior_predictive(idata_m1))

pm.model_to_graphviz(model)

/home/osvaldo/proyectos/00_BM/pymc-devs/pymc/pymc/data.py:304: FutureWarning: MutableData is deprecated. All Data variables are now mutable. Use Data instead.
  warnings.warn(
Sampling: [global_age_beta, global_intercept, global_sigma, outcome, subject_age_beta, subject_age_sigma, subject_intercept, subject_intercept_sigma]
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [global_intercept, global_sigma, global_age_beta, subject_intercept_sigma, subject_age_sigma, subject_intercept, subject_age_beta]

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 13 seconds.
Sampling: [outcome]

../_images/026f294cb58b2a4b6c7668daf03e1febc51a44fca380a8bcbd2c7a03714cf287.svg

西格玛项（方差分量）可能是理解模型中最重要的部分。全局和特定于主题的西格玛项代表了我们在模型中允许的变异来源。全局效应可以被认为是整个人群中的“固定”效应，而特定于主题的项则是从同一个人群中“随机”抽取的。

az.summary(
    idata_m1,
    var_names=[
        "global_intercept",
        "global_sigma",
        "global_age_beta",
        "subject_intercept_sigma",
        "subject_age_sigma",
    ],
)

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
global_intercept	0.644	0.103	0.441	0.830	0.002	0.002	2100.0	2584.0	1.00
global_sigma	0.612	0.045	0.526	0.696	0.001	0.001	1293.0	2791.0	1.00
global_age_beta	0.271	0.062	0.158	0.389	0.001	0.001	4585.0	3257.0	1.00
subject_intercept_sigma	0.754	0.084	0.597	0.909	0.001	0.001	3178.0	3130.0	1.00
subject_age_sigma	0.347	0.069	0.217	0.477	0.003	0.002	694.0	1058.0	1.01

我们现在可以通过使用后验分布来推导隐含模型的不确定性。我们通过在所有受试者中取平均参数估计值来绘制推导出的轨迹。下面我们将看到如何在我们关注个体成长时，改为关注受试者内的估计。这里我们的关注点是青少年酒精使用的一般隐含轨迹。

fig, ax = plt.subplots(figsize=(20, 8))
posterior = az.extract(idata_m1.posterior)
intercept_group_specific = posterior["subject_intercept"].mean(dim="ids")
slope_group_specific = posterior["subject_age_beta"].mean(dim="ids")
a = posterior["global_intercept"].mean() + intercept_group_specific
b = posterior["global_age_beta"].mean() + slope_group_specific
time_xi = xr.DataArray(np.arange(4))
ax.plot(time_xi, (a + b * time_xi).T, color="slateblue", alpha=0.2, linewidth=0.2)
ax.plot(
    time_xi, (a.mean() + b.mean() * time_xi), color="red", lw=2, label="Expected Growth Trajectory"
)
ax.set_ylabel("Alcohol Usage")
ax.set_xlabel("Time in Years from 14")
ax.legend()
ax.set_title("Individual Consumption Growth", fontsize=20);

../_images/43af8ba9f15d36a19263d73958a2e5c6cf3b89c032e2ae87430ea3e27c81578b.png

父母酗酒的失控影响#

接下来，我们将添加第二个预测变量及其与年龄的交互作用来修改结果。这扩展了上述模型，如下所示：

\[\begin{split} \begin{aligned} & alcohol \sim Normal(\color{purple}{\mu, \sigma}) \\ & \color{purple}{\mu} = \color{red}{\alpha} + \color{green}{\beta_{age}} \cdot age + \color{blue}{\beta_{coa}} \cdot coa + \color{orange}{\beta_{coa*age}}(coa*age) \\ & \color{red}{\alpha} = \sum_{j=0}^{N} \alpha_{1} + \alpha_{2, j} \ \ \ \ \forall j \in Subjects \\ & \color{green}{\beta} = \sum_{j=0}^{N} \beta_{1} + \beta_{2, j} \ \ \ \ \forall j \in Subjects \\ & \color{purple}{\sigma} = HalfStudentT(?, ?) \\ & \alpha_{i, j} \sim Normal(?, ?) \\ & \beta_{i, j} \sim Normal(?, ?) \\ & \color{blue}{\beta_{coa}} \sim Normal(?, ?) \\ & \color{orange}{\beta_{coa*age}} \sim Normal(?,?) \end{aligned} \end{split}\]

id_indx, unique_ids = pd.factorize(df["id"])
coords = {"ids": unique_ids, "obs": range(len(df["alcuse"]))}
with pm.Model(coords=coords) as model:
    age_14 = pm.MutableData("age_14_data", df["age_14"].values)
    coa = pm.MutableData("coa_data", df["coa"].values)

    ## Level 1
    global_intercept = pm.Normal("global_intercept", 0, 1)
    global_sigma = pm.HalfStudentT("global_sigma", 1, 3)
    global_age_beta = pm.Normal("global_age_beta", 0, 1)
    global_coa_beta = pm.Normal("global_coa_beta", 0, 1)
    global_coa_age_beta = pm.Normal("global_coa_age_beta", 0, 1)

    subject_intercept_sigma = pm.HalfNormal("subject_intercept_sigma", 5)
    subject_age_sigma = pm.HalfNormal("subject_age_sigma", 5)

    ## Level 2
    subject_intercept = pm.Normal("subject_intercept", 0, subject_intercept_sigma, dims="ids")
    subject_age_beta = pm.Normal("subject_age_beta", 0, subject_age_sigma, dims="ids")

    growth_model = pm.Deterministic(
        "growth_model",
        (global_intercept + subject_intercept[id_indx])
        + global_coa_beta * coa
        + global_coa_age_beta * (coa * age_14)
        + (global_age_beta + subject_age_beta[id_indx]) * age_14,
    )
    outcome = pm.Normal(
        "outcome", growth_model, global_sigma, observed=df["alcuse"].values, dims="obs"
    )
    idata_m2 = pm.sample_prior_predictive()
    idata_m2.extend(
        pm.sample(random_seed=100, target_accept=0.95, idata_kwargs={"log_likelihood": True})
    )
    idata_m2.extend(pm.sample_posterior_predictive(idata_m2))

pm.model_to_graphviz(model)

/home/osvaldo/proyectos/00_BM/pymc-devs/pymc/pymc/data.py:304: FutureWarning: MutableData is deprecated. All Data variables are now mutable. Use Data instead.
  warnings.warn(
Sampling: [global_age_beta, global_coa_age_beta, global_coa_beta, global_intercept, global_sigma, outcome, subject_age_beta, subject_age_sigma, subject_intercept, subject_intercept_sigma]
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [global_intercept, global_sigma, global_age_beta, global_coa_beta, global_coa_age_beta, subject_intercept_sigma, subject_age_sigma, subject_intercept, subject_age_beta]

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 16 seconds.
Sampling: [outcome]

../_images/47cbb4d0afc2fed7abe56cec504144923b190713872ad569d4d6e76e07c8219c.svg

az.summary(
    idata_m2,
    var_names=[
        "global_intercept",
        "global_sigma",
        "global_age_beta",
        "global_coa_age_beta",
        "subject_intercept_sigma",
        "subject_age_sigma",
    ],
    stat_focus="median",
)

	median	mad	eti_3%	eti_97%	mcse_median	ess_median	ess_tail	r_hat
global_intercept	0.320	0.087	0.088	0.565	0.004	2284.281	2552.0	1.00
global_sigma	0.613	0.030	0.537	0.705	0.002	1577.315	2186.0	1.01
global_age_beta	0.290	0.057	0.129	0.449	0.002	2820.643	2815.0	1.00
global_coa_age_beta	-0.046	0.081	-0.268	0.182	0.003	2861.193	2935.0	1.00
subject_intercept_sigma	0.663	0.054	0.524	0.814	0.002	2662.958	2366.0	1.00
subject_age_sigma	0.351	0.045	0.215	0.469	0.002	1046.679	752.0	1.01

我们可以看到，这里包含的父母酗酒的二元标志对儿童消费的增长轨迹没有显著影响，但它确实改变了截距项的可能位置。

fig, ax = plt.subplots(figsize=(20, 8))
posterior = az.extract(idata_m2.posterior)
intercept_group_specific = posterior["subject_intercept"].mean(dim="ids")
slope_group_specific = posterior["subject_age_beta"].mean(dim="ids")
a = posterior["global_intercept"].mean() + intercept_group_specific
b = posterior["global_age_beta"].mean() + slope_group_specific
b_coa = posterior["global_coa_beta"].mean()
b_coa_age = posterior["global_coa_age_beta"].mean()

time_xi = xr.DataArray(np.arange(4))
ax.plot(
    time_xi,
    (a + b * time_xi + b_coa * 1 + b_coa_age * (time_xi * 1)).T,
    color="slateblue",
    linewidth=0.2,
    alpha=0.2,
)
ax.plot(
    time_xi,
    (a + b * time_xi + b_coa * 0 + b_coa_age * (time_xi * 0)).T,
    color="magenta",
    linewidth=0.2,
    alpha=0.2,
)
ax.plot(
    time_xi,
    (a.mean() + b.mean() * time_xi + b_coa * 1 + b_coa_age * (time_xi * 1)),
    color="darkblue",
    lw=2,
    label="Expected Growth Trajectory: Child of Alcoholic",
)
ax.plot(
    time_xi,
    (a.mean() + b.mean() * time_xi + b_coa * 0 + b_coa_age * (time_xi * 0)),
    color="darkred",
    lw=2,
    label="Expected Growth Trajectory: Not Child of Alcoholic",
)
ax.set_ylabel("Alcohol Usage")
ax.set_xlabel("Time in Years from 14")
ax.legend()
ax.set_title("Individual Consumption Growth", fontsize=20);

../_images/0700df0faffc654f1b7e09f8b026b4749caa7ca885dae8cad1f190f558db8570.png

这已经暗示了分层纵向模型如何允许我们探讨政策问题和因果干预的影响。政策转变或特定干预在隐含的增长轨迹中的影响可能会导致重大的投资决策。然而，我们将这些评论作为暗示，因为关于在面板数据设计中使用因果推断的文献丰富且存在争议。差异中的差异文献充满了关于有意义地解释因果问题所需条件的警告。例如，参见差异中的差异以获取更多讨论和相关辩论的参考。一个关键点是，虽然主题级别的术语有助于解释数据中的一种异质性，但它们无法解释所有来源的个体变异，尤其是与时间相关的变异。

我们将暂时忽略因果推断的细微差别，接下来考虑同伴效应的引入如何改变我们模型所隐含的关联。

控制同伴效应的模型#

为了解释性和简化我们的采样器，我们将同龄人数据围绕其均值进行中心化。同样，这个模型自然地使用这些控制因素及其与年龄这一焦点变量的交互项来指定。

\[\begin{split} \begin{aligned} & alcohol \sim Normal(\color{purple}{\mu, \sigma}) \\ & \color{purple}{\mu} = \color{red}{\alpha} + \color{green}{\beta_{age}} \cdot age + \color{blue}{\beta_{coa}} \cdot coa + \color{orange}{\beta_{coa*age}}(coa*age) + \color{pink}{\beta_{peer}}*peer + \color{lightblue}{\beta_{peer*age}}(peer*age) \\ & \color{red}{\alpha} = \sum_{j=0}^{N} \alpha_{1} + \alpha_{2, j} \ \ \ \ \forall j \in Subjects \\ & \color{green}{\beta} = \sum_{j=0}^{N} \beta_{1} + \beta_{2, j} \ \ \ \ \forall j \in Subjects \\ & \color{purple}{\sigma} = HalfStudentT(?, ?) \\ & \alpha_{i, j} \sim Normal(?, ?) \\ & \beta_{i, j} \sim Normal(?, ?) \\ & \color{blue}{\beta_{coa}} \sim Normal(?, ?) \\ & \color{orange}{\beta_{coa*age}} \sim Normal(?,?) \\ & \color{pink}{\beta_{peer}} \sim Normal(?, ?) \\ & \color{lightblue}{\beta_{peer*age}} \sim Normal(?, ?) \end{aligned} \end{split}\]

id_indx, unique_ids = pd.factorize(df["id"])
coords = {"ids": unique_ids, "obs": range(len(df["alcuse"]))}
with pm.Model(coords=coords) as model:
    age_14 = pm.MutableData("age_14_data", df["age_14"].values)
    coa = pm.MutableData("coa_data", df["coa"].values)
    peer = pm.MutableData("peer_data", df["cpeer"].values)

    ## Level 1
    global_intercept = pm.Normal("global_intercept", 0, 1)
    global_sigma = pm.HalfStudentT("global_sigma", 1, 3)
    global_age_beta = pm.Normal("global_age_beta", 0, 1)
    global_coa_beta = pm.Normal("global_coa_beta", 0, 1)
    global_peer_beta = pm.Normal("global_peer_beta", 0, 1)
    global_coa_age_beta = pm.Normal("global_coa_age_beta", 0, 1)
    global_peer_age_beta = pm.Normal("global_peer_age_beta", 0, 1)

    subject_intercept_sigma = pm.HalfNormal("subject_intercept_sigma", 5)
    subject_age_sigma = pm.HalfNormal("subject_age_sigma", 5)

    ## Level 2
    subject_intercept = pm.Normal("subject_intercept", 0, subject_intercept_sigma, dims="ids")
    subject_age_beta = pm.Normal("subject_age_beta", 0, subject_age_sigma, dims="ids")

    growth_model = pm.Deterministic(
        "growth_model",
        (global_intercept + subject_intercept[id_indx])
        + global_coa_beta * coa
        + global_coa_age_beta * (coa * age_14)
        + global_peer_beta * peer
        + global_peer_age_beta * (peer * age_14)
        + (global_age_beta + subject_age_beta[id_indx]) * age_14,
    )
    outcome = pm.Normal(
        "outcome", growth_model, global_sigma, observed=df["alcuse"].values, dims="obs"
    )
    idata_m3 = pm.sample_prior_predictive()
    idata_m3.extend(
        pm.sample(random_seed=100, target_accept=0.95, idata_kwargs={"log_likelihood": True})
    )
    idata_m3.extend(pm.sample_posterior_predictive(idata_m3))

pm.model_to_graphviz(model)

/home/osvaldo/proyectos/00_BM/pymc-devs/pymc/pymc/data.py:304: FutureWarning: MutableData is deprecated. All Data variables are now mutable. Use Data instead.
  warnings.warn(
Sampling: [global_age_beta, global_coa_age_beta, global_coa_beta, global_intercept, global_peer_age_beta, global_peer_beta, global_sigma, outcome, subject_age_beta, subject_age_sigma, subject_intercept, subject_intercept_sigma]
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [global_intercept, global_sigma, global_age_beta, global_coa_beta, global_peer_beta, global_coa_age_beta, global_peer_age_beta, subject_intercept_sigma, subject_age_sigma, subject_intercept, subject_age_beta]

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 19 seconds.
Sampling: [outcome]

../_images/02339b0161813a1d741a543d97075e0860da13f692011803597e4f97a73554f1.svg

az.plot_trace(idata_m3);

../_images/0506fadc530bf7adfd2d4a7b2e2cfc90b18b4dee5c9b77d118b98185bb688dfb.png

az.summary(
    idata_m3,
    var_names=[
        "global_intercept",
        "global_sigma",
        "global_age_beta",
        "global_coa_age_beta",
        "global_peer_beta",
        "global_peer_age_beta",
        "subject_intercept_sigma",
        "subject_age_sigma",
    ],
)

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
global_intercept	0.395	0.112	0.177	0.601	0.002	0.002	2210.0	2599.0	1.0
global_sigma	0.595	0.042	0.522	0.676	0.001	0.001	1589.0	2884.0	1.0
global_age_beta	0.273	0.085	0.117	0.434	0.002	0.001	2382.0	2805.0	1.0
global_coa_age_beta	-0.007	0.128	-0.245	0.225	0.003	0.002	2364.0	2734.0	1.0
global_peer_beta	0.683	0.114	0.483	0.904	0.002	0.001	3072.0	2872.0	1.0
global_peer_age_beta	-0.146	0.090	-0.313	0.022	0.002	0.001	3080.0	2617.0	1.0
subject_intercept_sigma	0.500	0.077	0.356	0.647	0.002	0.002	1023.0	1231.0	1.0
subject_age_sigma	0.384	0.061	0.271	0.497	0.002	0.001	1080.0	1407.0	1.0

接下来，我们将绘制个体在父母和同伴关系条件下的典型变化轨迹。注意数据中的同伴评分如何推动多项式曲线的截距沿图表的y轴上升。

../_images/9d44697026955ec88b798fa1b065eb1bd7f144d82c5c6046a116813572e33cef.png

模型估计的比较#

最后，综合我们迄今为止的所有建模工作，我们可以从上述图表中看到一些明显的现象：(i) 年龄项的全局斜率参数在所有三个包含它的模型中都非常稳定。同样，对于显示的三个个体中的每一个的特定主题斜率参数也是如此。全局截距参数被拉向零，因为我们可以将更多的结果效应抵消到父母和同伴关系的影响中。父母酗酒的全局效应在我们拟合的模型中大致一致。

然而，需要注意的是——解释这种层次模型的结果可能会很困难，特定参数的含义可能会根据模型性质和变量测量尺度而有所偏差。在本笔记本中，我们专注于理解由不同输入数据值引起的后验预测轨迹中的边际效应所隐含的对比。这通常是理解模型对所讨论增长轨迹所学内容的最清晰方式。

az.plot_forest(
    [idata_m0, idata_m1, idata_m2, idata_m3],
    model_names=["Grand Mean", "Unconditional Growth", "COA growth Model", "COA_Peer_Model"],
    var_names=[
        "global_intercept",
        "global_age_beta",
        "global_coa_beta",
        "global_coa_age_beta",
        "global_peer_beta",
        "global_peer_age_beta",
        "subject_intercept_sigma",
        "subject_age_sigma",
        "subject_intercept",
        "subject_age_beta",
        "global_sigma",
    ],
    figsize=(20, 15),
    kind="ridgeplot",
    combined=True,
    ridgeplot_alpha=0.3,
    coords={"ids": [1, 2, 70]},
);

../_images/426ca83e3f0b1598fa3b09c57d1404808cefd7893a3ce90c4fb6bfe72aba56b6.png

对于模型的数值摘要，Willett 和 Singer 建议使用偏差统计量。在贝叶斯工作流程中，我们将使用 LOO。

compare = az.compare(
    {
        "Grand Mean": idata_m0,
        "Unconditional Growth": idata_m1,
        "COA growth Model": idata_m2,
        "COA_Peer_Model": idata_m3,
    },
)
compare

/home/osvaldo/proyectos/00_BM/arviz-devs/arviz/arviz/stats/stats.py:795: UserWarning: Estimated shape parameter of Pareto distribution is greater than 0.70 for one or more samples. You should consider using a more robust model, this is because importance sampling is less likely to work well if the marginal posterior and LOO posterior are very different. This is more likely to happen with a non-robust model and highly influential observations.
  warnings.warn(
/home/osvaldo/proyectos/00_BM/arviz-devs/arviz/arviz/stats/stats.py:795: UserWarning: Estimated shape parameter of Pareto distribution is greater than 0.70 for one or more samples. You should consider using a more robust model, this is because importance sampling is less likely to work well if the marginal posterior and LOO posterior are very different. This is more likely to happen with a non-robust model and highly influential observations.
  warnings.warn(
/home/osvaldo/proyectos/00_BM/arviz-devs/arviz/arviz/stats/stats.py:795: UserWarning: Estimated shape parameter of Pareto distribution is greater than 0.70 for one or more samples. You should consider using a more robust model, this is because importance sampling is less likely to work well if the marginal posterior and LOO posterior are very different. This is more likely to happen with a non-robust model and highly influential observations.
  warnings.warn(
/home/osvaldo/proyectos/00_BM/arviz-devs/arviz/arviz/stats/stats.py:795: UserWarning: Estimated shape parameter of Pareto distribution is greater than 0.70 for one or more samples. You should consider using a more robust model, this is because importance sampling is less likely to work well if the marginal posterior and LOO posterior are very different. This is more likely to happen with a non-robust model and highly influential observations.
  warnings.warn(

	rank	elpd_loo	p_loo	elpd_diff	weight	se	dse	warning	scale
COA_Peer_Model	0	-280.547108	88.317101	0.000000	0.919592	12.286282	0.000000	True	log
COA growth Model	1	-289.267096	89.708607	8.719988	0.000000	12.723690	4.226971	True	log
Unconditional Growth	2	-289.776375	91.650347	9.229267	0.080408	12.808780	4.784305	True	log
Grand Mean	3	-315.943445	58.611975	35.396337	0.000000	12.426100	8.431326	True	log

az.plot_compare(compare, figsize=(10, 4));

../_images/26371d1234a5069fdd76455975e10be709414dd99c2fa7389e16baaa4162a0fc.png

Willett 和 Singer 详细讨论了如何分析模型之间的差异，并评估不同的变异来源，以得出关于预测变量与结果之间关系的多个汇总统计数据。我们强烈推荐它作为对感兴趣读者的深入探讨。

插曲：使用Bambi的分层模型#

虽然我们直接在 PyMC 中拟合这些模型，但还有一个替代的贝叶斯多层次建模包 Bambi，它是基于 PyMC 框架构建的。Bambi 在多个方面进行了优化，以适应分层贝叶斯模型，包括使用公式指定模型结构的选项。我们将简要演示如何使用这种语法拟合最后一个模型。

公式规范使用 1 表示截距项，并使用条件 | 运算符表示主题级别参数与上述方式指定的相同类型的全局参数的组合。我们将添加截距项和焦点变量年龄的beta系数的主体特定修改，如上述模型所示。我们使用 Bambi 公式语法中的语法 (1 + age_14 | id) 来实现这一点。

模型定义得很好，详细说明了层次结构和主题级别的参数结构。默认情况下，Bambi 模型会分配先验并使用非中心参数化。Bambi 模型定义使用了常见和组级别效应的语言，而不是我们在此示例中一直使用的全局和主题的区别。再次强调，重要的是层次结构，而不是名称。

model

       Formula: alcuse ~ 1 + age_14 + coa + cpeer + age_14:coa + age_14:cpeer + (1 + age_14 | id)
        Family: gaussian
          Link: mu = identity
  Observations: 246
        Priors: 
    target = mu
        Common-level effects
            Intercept ~ Normal(mu: 0.922, sigma: 5.0974)
            age_14 ~ Normal(mu: 0.0, sigma: 3.2485)
            coa ~ Normal(mu: 0.0, sigma: 5.3302)
            cpeer ~ Normal(mu: 0.0, sigma: 3.6587)
            age_14:coa ~ Normal(mu: 0.0, sigma: 3.5816)
            age_14:cpeer ~ Normal(mu: 0.0, sigma: 2.834)
        
        Group-level effects
            1|id ~ Normal(mu: 0.0, sigma: HalfNormal(sigma: 5.0974))
            age_14|id ~ Normal(mu: 0.0, sigma: HalfNormal(sigma: 3.2485))
        
        Auxiliary parameters
            sigma ~ HalfStudentT(nu: 4.0, sigma: 1.0609)
------
* To see a plot of the priors call the .plot_priors() method.
* To see a summary or plot of the posterior pass the object returned by .fit() to az.summary() or az.plot_trace()

由于使用了偏移量来实现非中心参数化，模型图看起来也有所不同。

model.graph()

../_images/a54a8b4c7f97510284fc95396e0b8715508e244b59df6b78f5c0aa84f0113cfb.svg

az.summary(
    idata_bambi,
    var_names=[
        "Intercept",
        "age_14",
        "coa",
        "cpeer",
        "age_14:coa",
        "age_14:cpeer",
        "1|id_sigma",
        "age_14|id_sigma",
        "sigma",
    ],
)

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
Intercept	0.390	0.107	0.170	0.579	0.002	0.001	2855.0	3083.0	1.0
age_14	0.277	0.084	0.123	0.438	0.002	0.001	2283.0	2994.0	1.0
coa	0.581	0.162	0.268	0.873	0.003	0.002	2784.0	3038.0	1.0
cpeer	0.692	0.116	0.480	0.914	0.002	0.002	2959.0	3040.0	1.0
age_14:coa	-0.013	0.123	-0.237	0.219	0.002	0.002	2441.0	2928.0	1.0
age_14:cpeer	-0.150	0.087	-0.303	0.020	0.002	0.001	2890.0	3103.0	1.0
1\|id_sigma	0.502	0.078	0.356	0.645	0.002	0.002	1315.0	2130.0	1.0
age_14\|id_sigma	0.379	0.060	0.262	0.488	0.002	0.001	1244.0	1860.0	1.0
sigma	0.595	0.043	0.516	0.674	0.001	0.001	1124.0	2355.0	1.0

az.plot_forest(
    idata_bambi,
    var_names=[
        "Intercept",
        "age_14",
        "coa",
        "cpeer",
        "age_14:coa",
        "age_14:cpeer",
        "1|id_sigma",
        "age_14|id_sigma",
        "sigma",
    ],
    figsize=(20, 6),
    kind="ridgeplot",
    combined=True,
    ridgeplot_alpha=0.3,
);

../_images/0c7e65eaa5eb10e117e77377fde4d57a183b08b027d2bf85c001666a5c338e32.png

我们可以在这里看到，bambi模型规范如何恢复了与使用PyMC推导出的相同的参数化。在实际应用和生产环境中，如果您使用的是贝叶斯分层模型，您应该尽可能使用bambi。它在许多用例中都很灵活，您可能只需要在高度定制的模型中使用PyMC，因为在这些模型中，模型规范的灵活性无法通过公式语法的约束来满足。

非线性变化轨迹#

接下来我们将研究一个数据集，其中个体的轨迹显示出个体间行为的剧烈波动。该数据报告了每个儿童的外化行为得分。这可以衡量多种行为，包括但不限于：身体攻击、言语欺凌、关系攻击、反抗、盗窃和破坏行为。得分越高，儿童表现出的外化行为越多。该量表的下限为0，最高可能得分为68。每个儿童在学校的每个年级都会被测量这些行为。

try:
    df_external = pd.read_csv("../data/external_pp.csv")
except FileNotFoundError:
    df_external = pd.read_csv(pm.get_data("external_pp.csv"))

df_external.head()

	ID	EXTERNAL	TIME	GRADE
0	1	50	0	1
1	1	57	1	2
2	1	51	2	3
3	1	48	3	4
4	1	43	4	5

fig, axs = plt.subplots(2, 4, figsize=(20, 8))
axs = axs.flatten()
for ax, id in zip(axs, range(9)[1:9]):
    temp = df_external[df_external["ID"] == id]
    ax.plot(temp["GRADE"], temp["EXTERNAL"], "o", color="black")
    z = lowess(temp["EXTERNAL"], temp["GRADE"], 1 / 2)
    ax.plot(z[:, 1], "--", color="black")

../_images/7b786a9ab38e0af9de5f80a19c3d1a2341177e0e9d7a79b2a896e8a3741cd5a3.png

fig, axs = plt.subplots(2, 3, figsize=(20, 6))
axs = axs.flatten()
for ax, g in zip(axs, [1, 2, 3, 4, 5, 6]):
    temp = df_external[df_external["GRADE"] == g]
    ax.hist(temp["EXTERNAL"], bins=10, ec="black", color="C0")
    ax.set_title(f"External Behaviour in Grade {g}")

../_images/774f8b825dac8a40476a0f9c251ba04f0eab21842802a37628f413616ec0e414.png

我们可以看到，数据集中存在一个强烈的偏斜，趋向于低external分数。我们将在这里偏离Willett和Singer的演示，尝试将这个分布建模为一个带有删失的Gumbel分布。为此，我们将让PyMC找到一个在Gumbel分布上合理的先验。

pz.maxent(pz.Gumbel(), lower=0, upper=68, mass=0.99);

../_images/b83a2a630dd5559b365bd0240ef39e752861286d42506065c527fce7cbd1ffba.png

一个最小模型#

与之前一样，我们将从一个相当简单的模型开始，指定一个层次模型，其中每个个体修改总体均值。我们允许使用非正态的截断似然项。

id_indx, unique_ids = pd.factorize(df_external["ID"])
coords = {"ids": unique_ids, "obs": range(len(df_external["EXTERNAL"]))}
with pm.Model(coords=coords) as model:
    external = pm.MutableData("external_data", df_external["EXTERNAL"].values + 1e-25)
    global_intercept = pm.Normal("global_intercept", 6, 1)
    global_sigma = pm.HalfNormal("global_sigma", 7)

    subject_intercept_sigma = pm.HalfNormal("subject_intercept_sigma", 5)
    subject_intercept = pm.Normal("subject_intercept", 0, subject_intercept_sigma, dims="ids")
    mu = pm.Deterministic("mu", global_intercept + subject_intercept[id_indx])
    outcome_latent = pm.Gumbel.dist(mu, global_sigma)
    outcome = pm.Censored(
        "outcome", outcome_latent, lower=0, upper=68, observed=external, dims="obs"
    )
    idata_m4 = pm.sample_prior_predictive()
    idata_m4.extend(
        pm.sample(random_seed=100, target_accept=0.95, idata_kwargs={"log_likelihood": True})
    )
    idata_m4.extend(pm.sample_posterior_predictive(idata_m4))

pm.model_to_graphviz(model)

/home/osvaldo/proyectos/00_BM/pymc-devs/pymc/pymc/data.py:304: FutureWarning: MutableData is deprecated. All Data variables are now mutable. Use Data instead.
  warnings.warn(
Sampling: [global_intercept, global_sigma, outcome, subject_intercept, subject_intercept_sigma]
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [global_intercept, global_sigma, subject_intercept_sigma, subject_intercept]

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 11 seconds.
Sampling: [outcome]

../_images/266253ece3e6a4798ce35fa0a337d359c66793a8b38da653ea4f6c9680998ee8.svg

az.summary(idata_m4, var_names=["global_intercept", "global_sigma", "subject_intercept_sigma"])

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
global_intercept	7.347	0.754	5.950	8.769	0.019	0.013	1612.0	2336.0	1.0
global_sigma	6.810	0.380	6.118	7.546	0.006	0.004	4726.0	2892.0	1.0
subject_intercept_sigma	6.802	0.895	5.158	8.459	0.016	0.011	3045.0	2681.0	1.0

az.plot_ppc(idata_m4, figsize=(20, 7))

<Axes: xlabel='outcome'>

../_images/c0dd81c2934d74167b239058dc8c95a3ea4723d0a3b3c2c7efcea52d7b812e4e.png

fig, ax = plt.subplots(figsize=(20, 7))
expected_individual_mean = idata_m4.posterior["subject_intercept"].mean(axis=1).values[0]

ax.bar(
    range(len(expected_individual_mean)),
    np.sort(expected_individual_mean),
    color="slateblue",
    ec="black",
)
ax.set_xlabel("Individual ID")
ax.set_ylabel("External Score")
ax.set_title("Distribution of Individual Modifications to the Grand Mean");

../_images/b274d39569b46a433531f7a533530bd8ee8d3b77eecc81f5f63076d1167f19a1.png

随时间变化的行为#

我们现在以分层的方式对行为随时间的演变进行建模。我们从一个简单的分层线性回归开始，以年级作为焦点预测因子。

id_indx, unique_ids = pd.factorize(df_external["ID"])
coords = {"ids": unique_ids, "obs": range(len(df_external["EXTERNAL"]))}
with pm.Model(coords=coords) as model:
    grade = pm.MutableData("grade_data", df_external["GRADE"].values)
    external = pm.MutableData("external_data", df_external["EXTERNAL"].values + 1e-25)
    global_intercept = pm.Normal("global_intercept", 6, 1)
    global_sigma = pm.Normal("global_sigma", 7, 0.5)
    global_beta_grade = pm.Normal("global_beta_grade", 0, 1)

    subject_intercept_sigma = pm.HalfNormal("subject_intercept_sigma", 2)
    subject_intercept = pm.Normal("subject_intercept", 5, subject_intercept_sigma, dims="ids")

    subject_beta_grade_sigma = pm.HalfNormal("subject_beta_grade_sigma", 1)
    subject_beta_grade = pm.Normal("subject_beta_grade", 0, subject_beta_grade_sigma, dims="ids")
    mu = pm.Deterministic(
        "mu",
        global_intercept
        + subject_intercept[id_indx]
        + (global_beta_grade + subject_beta_grade[id_indx]) * grade,
    )
    outcome_latent = pm.Gumbel.dist(mu, global_sigma)
    outcome = pm.Censored(
        "outcome", outcome_latent, lower=0, upper=68, observed=external, dims="obs"
    )
    idata_m5 = pm.sample_prior_predictive()
    idata_m5.extend(
        pm.sample(random_seed=100, target_accept=0.95, idata_kwargs={"log_likelihood": True})
    )
    idata_m5.extend(pm.sample_posterior_predictive(idata_m5))

pm.model_to_graphviz(model)

/home/osvaldo/proyectos/00_BM/pymc-devs/pymc/pymc/data.py:304: FutureWarning: MutableData is deprecated. All Data variables are now mutable. Use Data instead.
  warnings.warn(
Sampling: [global_beta_grade, global_intercept, global_sigma, outcome, subject_beta_grade, subject_beta_grade_sigma, subject_intercept, subject_intercept_sigma]
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [global_intercept, global_sigma, global_beta_grade, subject_intercept_sigma, subject_intercept, subject_beta_grade_sigma, subject_beta_grade]

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 33 seconds.
There were 60 divergences after tuning. Increase `target_accept` or reparameterize.
The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details
The effective sample size per chain is smaller than 100 for some parameters.  A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details
Sampling: [outcome]

../_images/1236aba3394381ff5d6f960e546c78465402df9369227bd94a145be2238beab5.svg

az.summary(
    idata_m5,
    var_names=[
        "global_intercept",
        "global_sigma",
        "global_beta_grade",
        "subject_intercept_sigma",
        "subject_beta_grade_sigma",
    ],
)

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
global_intercept	5.246	0.746	3.952	6.829	0.078	0.061	143.0	2169.0	1.09
global_sigma	6.875	0.279	6.343	7.456	0.004	0.003	3986.0	1885.0	1.39
global_beta_grade	-0.262	0.242	-0.673	0.253	0.028	0.020	99.0	1903.0	1.07
subject_intercept_sigma	5.320	0.887	3.416	6.789	0.133	0.095	55.0	976.0	1.06
subject_beta_grade_sigma	0.625	0.444	0.019	1.266	0.166	0.123	8.0	5.0	1.49

我们现在可以检查模型的后验预测图。结果似乎对数据拟合得相当合理。

az.plot_ppc(idata_m5, figsize=(20, 7));

../_images/b3441c3af594e0bfc255e0a863bd017c00d4acb10af58c953d44d4aa4c925027.png

但我们希望看到每个人的个体模型拟合情况。这里我们绘制了预期的轨迹。

../_images/c06e40a07c332b66f73879045a52af1ef23f7da84111efddf2a62dc927bee613.png

我们可以看到这里模型是如何被限制以非常线性的方式拟合行为轨迹的。第一印象是每个个体的行为模式相对稳定。但这实际上是模型无法表达行为数据曲率的结果。尽管如此，我们可以看到个体差异使得截距项在5到25的范围内变化。

多项式时间内的加法#

为了使模型更具灵活性以模拟随时间的变化，我们可以加入多项式项。

id_indx, unique_ids = pd.factorize(df_external["ID"])
coords = {"ids": unique_ids, "obs": range(len(df_external["EXTERNAL"]))}
with pm.Model(coords=coords) as model:
    grade = pm.MutableData("grade_data", df_external["GRADE"].values)
    grade2 = pm.MutableData("grade2_data", df_external["GRADE"].values ** 2)
    external = pm.MutableData("external_data", df_external["EXTERNAL"].values + 1e-25)

    global_intercept = pm.Normal("global_intercept", 6, 2)
    global_sigma = pm.Normal("global_sigma", 7, 1)
    global_beta_grade = pm.Normal("global_beta_grade", 0, 1)
    global_beta_grade2 = pm.Normal("global_beta_grade2", 0, 1)

    subject_intercept_sigma = pm.HalfNormal("subject_intercept_sigma", 1)
    subject_intercept = pm.Normal("subject_intercept", 2, subject_intercept_sigma, dims="ids")

    subject_beta_grade_sigma = pm.HalfNormal("subject_beta_grade_sigma", 1)
    subject_beta_grade = pm.Normal("subject_beta_grade", 0, subject_beta_grade_sigma, dims="ids")

    subject_beta_grade2_sigma = pm.HalfNormal("subject_beta_grade2_sigma", 1)
    subject_beta_grade2 = pm.Normal("subject_beta_grade2", 0, subject_beta_grade2_sigma, dims="ids")

    mu = pm.Deterministic(
        "mu",
        global_intercept
        + subject_intercept[id_indx]
        + (global_beta_grade + subject_beta_grade[id_indx]) * grade
        + (global_beta_grade2 + subject_beta_grade2[id_indx]) * grade2,
    )
    outcome_latent = pm.Gumbel.dist(mu, global_sigma)
    outcome = pm.Censored(
        "outcome", outcome_latent, lower=0, upper=68, observed=external, dims="obs"
    )
    idata_m6 = pm.sample_prior_predictive()
    idata_m6.extend(
        pm.sample(random_seed=100, target_accept=0.95, idata_kwargs={"log_likelihood": True})
    )
    idata_m6.extend(pm.sample_posterior_predictive(idata_m6))

pm.model_to_graphviz(model)

/home/osvaldo/proyectos/00_BM/pymc-devs/pymc/pymc/data.py:304: FutureWarning: MutableData is deprecated. All Data variables are now mutable. Use Data instead.
  warnings.warn(
Sampling: [global_beta_grade, global_beta_grade2, global_intercept, global_sigma, outcome, subject_beta_grade, subject_beta_grade2, subject_beta_grade2_sigma, subject_beta_grade_sigma, subject_intercept, subject_intercept_sigma]
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [global_intercept, global_sigma, global_beta_grade, global_beta_grade2, subject_intercept_sigma, subject_intercept, subject_beta_grade_sigma, subject_beta_grade, subject_beta_grade2_sigma, subject_beta_grade2]

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 79 seconds.
There were 849 divergences after tuning. Increase `target_accept` or reparameterize.
The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details
The effective sample size per chain is smaller than 100 for some parameters.  A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details
Sampling: [outcome]

../_images/c568131c2809c162f9080648c86a1d0c4a1acf13dec3b2e3a326023b1afe8a14.svg

az.summary(
    idata_m6,
    var_names=[
        "global_intercept",
        "global_sigma",
        "global_beta_grade",
        "global_beta_grade2",
        "subject_intercept_sigma",
        "subject_beta_grade_sigma",
        "subject_beta_grade2_sigma",
    ],
)

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
global_intercept	6.160	1.136	4.591	8.464	0.306	0.221	16.0	1359.0	1.17
global_sigma	6.978	0.333	6.329	7.650	0.013	0.010	809.0	1315.0	1.53
global_beta_grade	-0.117	0.591	-1.328	0.958	0.040	0.065	590.0	1156.0	1.53
global_beta_grade2	0.071	0.093	-0.118	0.246	0.006	0.004	445.0	1403.0	1.53
subject_intercept_sigma	1.174	1.173	0.010	3.273	0.424	0.312	6.0	4.0	1.79
subject_beta_grade_sigma	1.337	0.293	0.745	1.731	0.083	0.063	13.0	393.0	1.22
subject_beta_grade2_sigma	0.081	0.076	0.001	0.212	0.027	0.020	6.0	4.0	1.67

az.plot_ppc(idata_m6, figsize=(20, 7));

../_images/8e736ef5a0d3fd1a5753c919c77181619f3dbe3a5a0300d6faac2f804660daa8.png

../_images/de90840743bd65bbc1fb42d0e035e795a97da0bd38e84a3afde09646304d0be8.png

赋予模型更多的灵活性，使其能够描述更细致的增长轨迹。

按性别比较轨迹#

我们现在将允许模型更大的灵活性，并引入主体的性别来分析青少年的性别是否以及在多大程度上影响他们的行为变化。

id_indx, unique_ids = pd.factorize(df_external["ID"])
coords = {"ids": unique_ids, "obs": range(len(df_external["EXTERNAL"]))}
with pm.Model(coords=coords) as model:
    grade = pm.MutableData("grade_data", df_external["GRADE"].values)
    grade2 = pm.MutableData("grade2_data", df_external["GRADE"].values ** 2)
    grade3 = pm.MutableData("grade3_data", df_external["GRADE"].values ** 3)
    external = pm.MutableData("external_data", df_external["EXTERNAL"].values + 1e-25)
    female = pm.MutableData("female_data", df_external["FEMALE"].values)

    global_intercept = pm.Normal("global_intercept", 6, 2)
    global_sigma = pm.Normal("global_sigma", 7, 1)
    global_beta_grade = pm.Normal("global_beta_grade", 0, 1)
    global_beta_grade2 = pm.Normal("global_beta_grade2", 0, 1)
    global_beta_grade3 = pm.Normal("global_beta_grade3", 0, 1)
    global_beta_female = pm.Normal("global_beta_female", 0, 1)
    global_beta_female_grade = pm.Normal("global_beta_female_grade", 0, 1)
    global_beta_female_grade2 = pm.Normal("global_beta_female_grade2", 0, 1)
    global_beta_female_grade3 = pm.Normal("global_beta_female_grade3", 0, 1)

    subject_intercept_sigma = pm.Exponential("subject_intercept_sigma", 1)
    subject_intercept = pm.Normal("subject_intercept", 3, subject_intercept_sigma, dims="ids")

    subject_beta_grade_sigma = pm.Exponential("subject_beta_grade_sigma", 1)
    subject_beta_grade = pm.Normal("subject_beta_grade", 0, subject_beta_grade_sigma, dims="ids")

    subject_beta_grade2_sigma = pm.Exponential("subject_beta_grade2_sigma", 1)
    subject_beta_grade2 = pm.Normal("subject_beta_grade2", 0, subject_beta_grade2_sigma, dims="ids")

    subject_beta_grade3_sigma = pm.Exponential("subject_beta_grade3_sigma", 1)
    subject_beta_grade3 = pm.Normal("subject_beta_grade3", 0, subject_beta_grade3_sigma, dims="ids")

    mu = pm.Deterministic(
        "growth_model",
        global_intercept
        + subject_intercept[id_indx]
        + global_beta_female * female
        + global_beta_female_grade * (female * grade)
        + global_beta_female_grade2 * (female * grade2)
        + global_beta_female_grade3 * (female * grade3)
        + (global_beta_grade + subject_beta_grade[id_indx]) * grade
        + (global_beta_grade2 + subject_beta_grade2[id_indx]) * grade2
        + (global_beta_grade3 + subject_beta_grade3[id_indx]) * grade3,
    )

    outcome_latent = pm.Gumbel.dist(mu, global_sigma)
    outcome = pm.Censored(
        "outcome", outcome_latent, lower=0, upper=68, observed=external, dims="obs"
    )

    idata_m7 = pm.sample_prior_predictive()
    idata_m7.extend(
        pm.sample(
            draws=5000,
            random_seed=100,
            target_accept=0.99,
            nuts_sampler="numpyro",
            idata_kwargs={"log_likelihood": True},
        )
    )
    idata_m7.extend(pm.sample_posterior_predictive(idata_m7))

Show code cell output Hide code cell output

/home/osvaldo/proyectos/00_BM/pymc-devs/pymc/pymc/data.py:304: FutureWarning: MutableData is deprecated. All Data variables are now mutable. Use Data instead.
  warnings.warn(
Sampling: [global_beta_female, global_beta_female_grade, global_beta_female_grade2, global_beta_female_grade3, global_beta_grade, global_beta_grade2, global_beta_grade3, global_intercept, global_sigma, outcome, subject_beta_grade, subject_beta_grade2, subject_beta_grade2_sigma, subject_beta_grade3, subject_beta_grade3_sigma, subject_beta_grade_sigma, subject_intercept, subject_intercept_sigma]

2024-08-17 14:12:54.209555: E external/xla/xla/service/slow_operation_alarm.cc:65] Constant folding an instruction is taking > 1s:

  %reduce.1 = f64[4,5000,254]{2,1,0} reduce(f64[4,5000,1,254]{3,2,1,0} %broadcast.80, f64[] %constant.25), dimensions={2}, to_apply=%region_2.83, metadata={op_name="jit(process_fn)/jit(main)/reduce_prod[axes=(2,)]" source_file="/tmp/tmp42m6glc9" source_line=43}

This isn't necessarily a bug; constant-folding is inherently a trade-off between compilation time and speed at runtime. XLA has some guards that attempt to keep constant folding from taking too long, but fundamentally you'll always be able to come up with an input program that takes a long time.

If you'd like to file a bug, run with envvar XLA_FLAGS=--xla_dump_to=/tmp/foo and attach the results.
2024-08-17 14:12:55.884724: E external/xla/xla/service/slow_operation_alarm.cc:133] The operation took 2.675309039s
Constant folding an instruction is taking > 1s:

  %reduce.1 = f64[4,5000,254]{2,1,0} reduce(f64[4,5000,1,254]{3,2,1,0} %broadcast.80, f64[] %constant.25), dimensions={2}, to_apply=%region_2.83, metadata={op_name="jit(process_fn)/jit(main)/reduce_prod[axes=(2,)]" source_file="/tmp/tmp42m6glc9" source_line=43}

This isn't necessarily a bug; constant-folding is inherently a trade-off between compilation time and speed at runtime. XLA has some guards that attempt to keep constant folding from taking too long, but fundamentally you'll always be able to come up with an input program that takes a long time.

If you'd like to file a bug, run with envvar XLA_FLAGS=--xla_dump_to=/tmp/foo and attach the results.
2024-08-17 14:12:57.889074: E external/xla/xla/service/slow_operation_alarm.cc:65] Constant folding an instruction is taking > 2s:

  %reduce.2 = f64[4,5000,254]{2,1,0} reduce(f64[4,5000,1,254]{3,2,1,0} %broadcast.81, f64[] %constant.25), dimensions={2}, to_apply=%region_2.83, metadata={op_name="jit(process_fn)/jit(main)/reduce_prod[axes=(2,)]" source_file="/tmp/tmp42m6glc9" source_line=43}

This isn't necessarily a bug; constant-folding is inherently a trade-off between compilation time and speed at runtime. XLA has some guards that attempt to keep constant folding from taking too long, but fundamentally you'll always be able to come up with an input program that takes a long time.

If you'd like to file a bug, run with envvar XLA_FLAGS=--xla_dump_to=/tmp/foo and attach the results.
2024-08-17 14:12:59.534352: E external/xla/xla/service/slow_operation_alarm.cc:133] The operation took 3.645368623s
Constant folding an instruction is taking > 2s:

  %reduce.2 = f64[4,5000,254]{2,1,0} reduce(f64[4,5000,1,254]{3,2,1,0} %broadcast.81, f64[] %constant.25), dimensions={2}, to_apply=%region_2.83, metadata={op_name="jit(process_fn)/jit(main)/reduce_prod[axes=(2,)]" source_file="/tmp/tmp42m6glc9" source_line=43}

This isn't necessarily a bug; constant-folding is inherently a trade-off between compilation time and speed at runtime. XLA has some guards that attempt to keep constant folding from taking too long, but fundamentally you'll always be able to come up with an input program that takes a long time.

If you'd like to file a bug, run with envvar XLA_FLAGS=--xla_dump_to=/tmp/foo and attach the results.
2024-08-17 14:13:33.840528: E external/xla/xla/service/slow_operation_alarm.cc:65] Constant folding an instruction is taking > 4s:

  %reduce.7 = f64[4,5000,254]{2,1,0} reduce(f64[4,5000,1,254]{3,2,1,0} %broadcast.65, f64[] %constant.28), dimensions={2}, to_apply=%region_0.68, metadata={op_name="jit(process_fn)/jit(main)/reduce_prod[axes=(2,)]" source_file="/tmp/tmppjx6pjej" source_line=37}

This isn't necessarily a bug; constant-folding is inherently a trade-off between compilation time and speed at runtime. XLA has some guards that attempt to keep constant folding from taking too long, but fundamentally you'll always be able to come up with an input program that takes a long time.

If you'd like to file a bug, run with envvar XLA_FLAGS=--xla_dump_to=/tmp/foo and attach the results.
2024-08-17 14:13:33.945988: E external/xla/xla/service/slow_operation_alarm.cc:133] The operation took 4.109097056s
Constant folding an instruction is taking > 4s:

  %reduce.7 = f64[4,5000,254]{2,1,0} reduce(f64[4,5000,1,254]{3,2,1,0} %broadcast.65, f64[] %constant.28), dimensions={2}, to_apply=%region_0.68, metadata={op_name="jit(process_fn)/jit(main)/reduce_prod[axes=(2,)]" source_file="/tmp/tmppjx6pjej" source_line=37}

This isn't necessarily a bug; constant-folding is inherently a trade-off between compilation time and speed at runtime. XLA has some guards that attempt to keep constant folding from taking too long, but fundamentally you'll always be able to come up with an input program that takes a long time.

If you'd like to file a bug, run with envvar XLA_FLAGS=--xla_dump_to=/tmp/foo and attach the results.
There were 7 divergences after tuning. Increase `target_accept` or reparameterize.
The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details
The effective sample size per chain is smaller than 100 for some parameters.  A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details
Sampling: [outcome]

pm.model_to_graphviz(model)

../_images/800eabc85d015a35598969eef66846e9a8bb29d85e1b787afd680ee1f5935903.svg

az.plot_trace(idata_m7);

../_images/e39584e608253ef660ec5aa413f0074dcd4463bb7f83ee2eb3b63e0cb51c9ef6.png

az.summary(
    idata_m7,
    var_names=[
        "global_intercept",
        "global_sigma",
        "global_beta_grade",
        "global_beta_grade2",
        "subject_intercept_sigma",
        "subject_beta_grade_sigma",
        "subject_beta_grade2_sigma",
        "subject_beta_grade3_sigma",
        "global_beta_female",
        "global_beta_female_grade",
        "global_beta_female_grade2",
        "global_beta_female_grade3",
    ],
)

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
global_intercept	6.557	1.297	4.027	8.916	0.016	0.011	6847.0	10276.0	1.00
global_sigma	6.633	0.408	5.871	7.393	0.010	0.007	1777.0	8061.0	1.00
global_beta_grade	0.009	0.878	-1.577	1.735	0.011	0.008	6155.0	9098.0	1.00
global_beta_grade2	-0.139	0.390	-0.840	0.621	0.006	0.004	4503.0	7911.0	1.00
subject_intercept_sigma	5.786	1.239	3.392	8.144	0.054	0.038	573.0	550.0	1.01
subject_beta_grade_sigma	0.552	0.379	0.003	1.218	0.047	0.033	50.0	166.0	1.07
subject_beta_grade2_sigma	0.098	0.072	0.001	0.220	0.009	0.006	60.0	171.0	1.07
subject_beta_grade3_sigma	0.019	0.013	0.000	0.042	0.001	0.001	131.0	111.0	1.05
global_beta_female	-0.212	0.940	-1.968	1.573	0.009	0.007	9904.0	10717.0	1.00
global_beta_female_grade	-0.132	0.898	-1.804	1.565	0.010	0.007	7308.0	10362.0	1.00
global_beta_female_grade2	0.020	0.495	-0.903	0.958	0.007	0.005	4831.0	7979.0	1.00
global_beta_female_grade3	-0.008	0.071	-0.141	0.124	0.001	0.001	4882.0	8440.0	1.00

az.plot_ppc(idata_m7, figsize=(20, 7));

../_images/6c5061d9c6b954dd364e1c1c3ddbcaa8218129fc57e1607f34aa7afa179a1119.png

比较模型#

如上所述，我们将使用LOO比较各种模型的参数拟合和模型性能指标。

az.plot_forest(
    [idata_m4, idata_m5, idata_m6, idata_m7],
    model_names=["Minimal Model", "Linear Model", "Polynomial Model", "Polynomial + Gender"],
    var_names=[
        "global_intercept",
        "global_sigma",
        "global_beta_grade",
        "global_beta_grade2",
        "subject_intercept_sigma",
        "subject_beta_grade_sigma",
        "subject_beta_grade2_sigma",
        "subject_beta_grade3_sigma",
        "global_beta_female",
        "global_beta_female_grade",
        "global_beta_female_grade2",
        "global_beta_female_grade3",
    ],
    figsize=(20, 15),
    kind="ridgeplot",
    combined=True,
    coords={"ids": [1, 2, 30]},
    ridgeplot_alpha=0.3,
);

../_images/ba462da37a72e2f9419163edd315546ac4edf4d5e26a2723645c892e9a58b63b.png

compare = az.compare(
    {
        "Minimal Model": idata_m4,
        "Linear Model": idata_m5,
        "Polynomial Model": idata_m6,
        "Polynomial + Gender": idata_m7,
    },
)
compare

/home/osvaldo/proyectos/00_BM/arviz-devs/arviz/arviz/stats/stats.py:795: UserWarning: Estimated shape parameter of Pareto distribution is greater than 0.70 for one or more samples. You should consider using a more robust model, this is because importance sampling is less likely to work well if the marginal posterior and LOO posterior are very different. This is more likely to happen with a non-robust model and highly influential observations.
  warnings.warn(
/home/osvaldo/proyectos/00_BM/arviz-devs/arviz/arviz/stats/stats.py:795: UserWarning: Estimated shape parameter of Pareto distribution is greater than 0.70 for one or more samples. You should consider using a more robust model, this is because importance sampling is less likely to work well if the marginal posterior and LOO posterior are very different. This is more likely to happen with a non-robust model and highly influential observations.
  warnings.warn(
/home/osvaldo/proyectos/00_BM/arviz-devs/arviz/arviz/stats/stats.py:795: UserWarning: Estimated shape parameter of Pareto distribution is greater than 0.70 for one or more samples. You should consider using a more robust model, this is because importance sampling is less likely to work well if the marginal posterior and LOO posterior are very different. This is more likely to happen with a non-robust model and highly influential observations.
  warnings.warn(
/home/osvaldo/proyectos/00_BM/arviz-devs/arviz/arviz/stats/stats.py:795: UserWarning: Estimated shape parameter of Pareto distribution is greater than 0.70 for one or more samples. You should consider using a more robust model, this is because importance sampling is less likely to work well if the marginal posterior and LOO posterior are very different. This is more likely to happen with a non-robust model and highly influential observations.
  warnings.warn(

	rank	elpd_loo	p_loo	elpd_diff	weight	se	dse	warning	scale
Linear Model	0	-913.492285	32.270139	0.000000	8.146420e-01	13.556027	0.000000	True	log
Polynomial + Gender	1	-915.895242	49.043791	2.402957	8.394507e-16	13.449388	2.597256	True	log
Minimal Model	2	-917.384833	33.632784	3.892548	7.475052e-16	13.823280	2.375968	True	log
Polynomial Model	3	-921.859743	27.035846	8.367458	1.853580e-01	14.762473	5.305436	True	log

az.plot_compare(compare, figsize=(10, 4));

../_images/ae81801edef14035bf511cbc2c1f20cacddc23614205a675968e40a39628b65a.png

正如预期的那样，根据WAIC排名，我们最终的基于性别的模型被认为是最好的。但有点出人意料的是，具有固定轨迹的线性模型紧随其后。

绘制最终模型#

我们希望展示模型对多个个体的拟合情况以及在整个群体中隐含的总体趋势。

Show code cell source Hide code cell source

def plot_individual(posterior, individual, female, ax):
    posterior = posterior.sel(ids=individual)
    time_xi = xr.DataArray(np.arange(7))
    i = posterior["global_intercept"].mean() + posterior["subject_intercept"]
    a = (posterior["global_beta_grade"].mean() + posterior["subject_beta_grade"]) * time_xi
    b = (posterior["global_beta_grade2"].mean() + posterior["subject_beta_grade2"]) * time_xi**2
    c = (posterior["global_beta_grade3"].mean() + posterior["subject_beta_grade3"]) * time_xi**3
    d = posterior["global_beta_female"].mean() * female + posterior["global_beta_female_grade"] * (
        time_xi * female
    )
    fit = i + a + b + c + d
    if female:
        color = "cyan"
    else:
        color = "slateblue"
    for i in range(len(fit)):
        ax.plot(time_xi, fit[i], color=color, alpha=0.1, linewidth=0.2)
    ax.plot(time_xi, fit.mean(axis=0), color="magenta")


mosaic = """BCDE
            AAAA
            FGHI"""
fig, axs = plt.subplot_mosaic(mosaic=mosaic, figsize=(20, 15))
axs = [axs[k] for k in axs.keys()]
posterior = az.extract(idata_m7.posterior, num_samples=4000)
intercept_group_specific = posterior["subject_intercept"].mean("ids")
slope_group_specific = posterior["subject_beta_grade"].mean("ids")
slope_group_specific_2 = posterior["subject_beta_grade2"].mean("ids")
slope_group_specific_3 = posterior["subject_beta_grade3"].mean("ids")
a = posterior["global_intercept"].mean() + intercept_group_specific
b = posterior["global_beta_grade"].mean() + slope_group_specific
c = posterior["global_beta_grade2"].mean() + slope_group_specific_2
d = posterior["global_beta_grade3"].mean() + slope_group_specific_3
e = posterior["global_beta_female"].mean()
f = posterior["global_beta_female_grade"].mean()

time_xi = xr.DataArray(np.arange(7))

axs[4].plot(
    time_xi,
    (a + b * time_xi + c * (time_xi**2) + d * (time_xi**3) + e * 1 + f * (1 * time_xi)).T,
    color="cyan",
    linewidth=2,
    alpha=0.1,
)
axs[4].plot(
    time_xi,
    (a + b * time_xi + c * (time_xi**2) + d * (time_xi**3) + e * 0 + f * (0 * time_xi)).T,
    color="slateblue",
    alpha=0.1,
    linewidth=2,
)
axs[4].plot(
    time_xi,
    (
        a.mean()
        + b.mean() * time_xi
        + c.mean() * (time_xi**2)
        + d.mean() * (time_xi**3)
        + e * 0
        + f * (0 * time_xi)
    ),
    color="red",
    lw=2,
    label="Expected Growth Trajectory - Male",
)

axs[4].plot(
    time_xi,
    (
        a.mean()
        + b.mean() * time_xi
        + c.mean() * (time_xi**2)
        + d.mean() * (time_xi**3)
        + e * 1
        + f * (1 * time_xi)
    ),
    color="darkblue",
    lw=2,
    label="Expected Growth Trajectory - Female",
)

for indx, id in zip([0, 1, 2, 3, 5, 6, 7, 8], [2, 8, 10, 30, 34, 40, 9, 11]):
    female = df_external[df_external["ID"] == id]["FEMALE"].unique()[0] == 1
    plot_individual(posterior, id, female, axs[indx])
    axs[indx].plot(
        df_external[df_external["ID"] == id]["GRADE"],
        df_external[df_external["ID"] == id]["EXTERNAL"],
        "o",
        color="black",
        label="Observed",
    )
    axs[indx].set_title(f"Within the Individual {id} Uncertainty")
    axs[indx].legend()


axs[4].set_ylabel("Externalising Score")
axs[4].set_xlabel("Time in Grade")
axs[4].legend()
axs[4].set_title("Between Individual Trajectories \n By Gender", fontsize=20);

../_images/3a53862a735439d70c06d3269f901a2b9f80c60dd6fdaae0273ab43f5a41544c.png

这个最终模型的含义表明，男性和女性在可能的成长轨迹上存在非常微小的差异，而且外化行为随时间变化的水平变化也非常小，但在6年级时趋于上升。请注意，对于个体40，模型拟合无法捕捉到观察数据中的剧烈波动。这是由于收缩效应，它将我们的截距项强烈地拉向全局平均值。这是一个很好的提醒，模型是为了泛化，而不是个体轨迹的完美表示。

结论#

我们现在已经了解了如何将贝叶斯分层模型应用于研究并探讨关于随时间变化的问题。我们看到了贝叶斯工作流的灵活性如何能够结合不同的先验和模型规范组合，以捕捉数据生成过程的细微差别。至关重要的是，我们看到了如何在个体内部视角和个体间模型的含义之间进行转换，同时评估模型对数据的忠实度。在面板数据背景下评估因果推断问题存在一些细微的问题，但也很明显，纵向建模使我们能够以系统化和有原则的方式梳理个体差异的复杂性。

这些是用于捕捉和评估变化模式以在队列内和队列间进行比较的强大模型。在没有随机对照试验或其他设计实验和准实验模式的情况下，这种类型的面板数据分析是你能从观察数据中得出因果结论（至少部分地考虑到个体治疗结果的异质性）的最接近的方法。

作者#

由 Nathaniel Forde 于2023年4月撰写
更新为使用 pz.maxent 而不是 pm.find_constrained_prior 和 LOO 而不是 WAIC。Osvaldo Martin。2024年8月

参考资料#

[1]

朱迪思·辛格 .D 和约翰·威利特 .B. 应用纵向数据分析。牛津大学出版社，2003年。

[2]

A. 所罗门·库尔兹. 应用纵向数据分析在brms和tidyverse中. Bookdown, 2021. URL: https://bookdown.org/content/4253/.

水印#

%load_ext watermark
%watermark -n -u -v -iv -w -p pytensor

Last updated: Sat Aug 17 2024

Python implementation: CPython
Python version       : 3.11.5
IPython version      : 8.16.1

pytensor: 2.25.2

xarray     : 2023.10.1
matplotlib : 3.8.4
statsmodels: 0.14.2
arviz      : 0.19.0.dev0
pymc       : 5.16.2+20.g747fda319
numpy      : 1.24.4
bambi      : 0.14.0
pandas     : 2.1.2
preliz     : 0.9.0

Watermark: 2.4.3

许可证声明#

本示例库中的所有笔记本均在MIT许可证下提供，该许可证允许修改和重新分发，前提是保留版权和许可证声明。

引用 PyMC 示例#

要引用此笔记本，请使用Zenodo为pymc-examples仓库提供的DOI。

重要

许多笔记本是从其他来源改编的：博客、书籍……在这种情况下，您应该引用原始来源。

同时记得引用代码中使用的相关库。

这是一个BibTeX的引用模板：

@incollection{citekey,
  author    = "<notebook authors, see above>",
  title     = "<notebook title>",
  editor    = "PyMC Team",
  booktitle = "PyMC examples",
  doi       = "10.5281/zenodo.5654871"
}

渲染后可能看起来像：

分类

标签