使用DoWhy进行中介分析：直接和间接效应#

[1]:

import numpy as np
import pandas as pd

from dowhy import CausalModel
import dowhy.datasets

# Warnings and logging
import warnings
warnings.filterwarnings('ignore')

创建数据集#

[2]:

# Creating a dataset with a single confounder and a single mediator (num_frontdoor_variables)
data = dowhy.datasets.linear_dataset(10, num_common_causes=1, num_samples=10000,
                                     num_instruments=0, num_effect_modifiers=0,
                                     num_treatments=1,
                                     num_frontdoor_variables=1,
                                     treatment_is_binary=False,
                                    outcome_is_binary=False)
df = data['df']
print(df.head())

         FD0        W0         v0           y
0  -5.161805 -0.238899  -2.774685  -24.838799
1  -5.297022 -0.639080  -2.613453  -27.150863
2 -21.917294 -2.305822 -10.291745 -110.928846
3  -5.663202 -0.588138  -2.992840  -28.633400
4  -4.601948 -0.892113  -2.831342  -25.002046

第一步：建模因果机制#

我们根据前门准则创建一个遵循因果图的数据集。也就是说，治疗对结果没有直接影响；所有影响都通过前门变量FD0进行中介。

[3]:

model = CausalModel(df,
                    data["treatment_name"],data["outcome_name"],
                    data["gml_graph"],
                   missing_nodes_as_confounders=True)

model.view_model()
from IPython.display import Image, display
display(Image(filename="causal_model.png"))

../_images/example_notebooks_dowhy_mediation_analysis_5_0.png

../_images/example_notebooks_dowhy_mediation_analysis_5_1.png

步骤2：识别自然直接和间接效应#

我们使用estimand_type参数来指定目标估计量应为自然直接效应或自然间接效应。有关定义，请参阅Judea Pearl的《因果中介的解释与识别》。

自然直接效应: 由于路径 v0->y 的效应

自然间接效应：由于路径 v0->FD0->y（由 FD0 中介）引起的效应。

[4]:

# Natural direct effect (nde)
identified_estimand_nde = model.identify_effect(estimand_type="nonparametric-nde",
                                            proceed_when_unidentifiable=True)
print(identified_estimand_nde)

Estimand type: EstimandType.NONPARAMETRIC_NDE

### Estimand : 1
Estimand name: mediation
Estimand expression:
 ⎡  d         ⎤
E⎢─────(y|FD0)⎥
 ⎣d[v₀]       ⎦
Estimand assumption 1, Mediation: FD0 intercepts (blocks) all directed paths from v0 to y except the path {v0}→{y}.
Estimand assumption 2, First-stage-unconfoundedness: If U→{v0} and U→{FD0} then P(FD0|v0,U) = P(FD0|v0)
Estimand assumption 3, Second-stage-unconfoundedness: If U→{FD0} and U→y then P(y|FD0, v0, U) = P(y|FD0, v0)

[5]:

# Natural indirect effect (nie)
identified_estimand_nie = model.identify_effect(estimand_type="nonparametric-nie",
                                            proceed_when_unidentifiable=True)
print(identified_estimand_nie)

Estimand type: EstimandType.NONPARAMETRIC_NIE

### Estimand : 1
Estimand name: mediation
Estimand expression:
 ⎡  d         d         ⎤
E⎢──────(y)⋅─────([FD₀])⎥
 ⎣d[FD₀]    d[v₀]       ⎦
Estimand assumption 1, Mediation: FD0 intercepts (blocks) all directed paths from v0 to y except the path {v0}→{y}.
Estimand assumption 2, First-stage-unconfoundedness: If U→{v0} and U→{FD0} then P(FD0|v0,U) = P(FD0|v0)
Estimand assumption 3, Second-stage-unconfoundedness: If U→{FD0} and U→y then P(y|FD0, v0, U) = P(y|FD0, v0)

步骤3：效果估计#

目前仅支持两阶段线性回归进行估计。我们计划很快添加一种非参数蒙特卡罗方法，如Imai, Keele 和 Yamamoto (2010)中所述。

自然间接效应#

估计器将中介效应估计转换为一连串的后门效应估计。1. 第一阶段模型估计从处理（v0）到中介（FD0）的效应。2. 第二阶段模型估计从中介（FD0）到结果（Y）的效应。

[6]:

import dowhy.causal_estimators.linear_regression_estimator
causal_estimate_nie = model.estimate_effect(identified_estimand_nie,
                                        method_name="mediation.two_stage_regression",
                                       confidence_intervals=False,
                                       test_significance=False,
                                        method_params = {
                                            'first_stage_model': dowhy.causal_estimators.linear_regression_estimator.LinearRegressionEstimator,
                                            'second_stage_model': dowhy.causal_estimators.linear_regression_estimator.LinearRegressionEstimator
                                        }
                                       )
print(causal_estimate_nie)

*** Causal Estimate ***

## Identified estimand
Estimand type: EstimandType.NONPARAMETRIC_NIE

### Estimand : 1
Estimand name: mediation
Estimand expression:
 ⎡  d         d         ⎤
E⎢──────(y)⋅─────([FD₀])⎥
 ⎣d[FD₀]    d[v₀]       ⎦
Estimand assumption 1, Mediation: FD0 intercepts (blocks) all directed paths from v0 to y except the path {v0}→{y}.
Estimand assumption 2, First-stage-unconfoundedness: If U→{v0} and U→{FD0} then P(FD0|v0,U) = P(FD0|v0)
Estimand assumption 3, Second-stage-unconfoundedness: If U→{FD0} and U→y then P(y|FD0, v0, U) = P(y|FD0, v0)

## Realized estimand
(b: FD0~v0+W0)*(b: y~FD0+W0)
Target units: ate

## Estimate
Mean value: 8.854778256842833

请注意，该值等于自然间接效应的真实值（在随机噪声范围内）。

[7]:

print(causal_estimate_nie.value, data["ate"])

8.854778256842833 8.961633713652406

该参数被称为ate，因为在模拟数据集中，直接效应被设为零。

自然直接效应#

现在让我们检查直接效应估计器是否返回（正确的）零估计。

[8]:

causal_estimate_nde = model.estimate_effect(identified_estimand_nde,
                                        method_name="mediation.two_stage_regression",
                                       confidence_intervals=False,
                                       test_significance=False,
                                        method_params = {
                                            'first_stage_model': dowhy.causal_estimators.linear_regression_estimator.LinearRegressionEstimator,
                                            'second_stage_model': dowhy.causal_estimators.linear_regression_estimator.LinearRegressionEstimator
                                        }
                                       )
print(causal_estimate_nde)

*** Causal Estimate ***

## Identified estimand
Estimand type: EstimandType.NONPARAMETRIC_NDE

### Estimand : 1
Estimand name: mediation
Estimand expression:
 ⎡  d         ⎤
E⎢─────(y|FD0)⎥
 ⎣d[v₀]       ⎦
Estimand assumption 1, Mediation: FD0 intercepts (blocks) all directed paths from v0 to y except the path {v0}→{y}.
Estimand assumption 2, First-stage-unconfoundedness: If U→{v0} and U→{FD0} then P(FD0|v0,U) = P(FD0|v0)
Estimand assumption 3, Second-stage-unconfoundedness: If U→{FD0} and U→y then P(y|FD0, v0, U) = P(y|FD0, v0)

## Realized estimand
(b: y~v0+W0) - ((b: FD0~v0+W0)*(b: y~FD0+W0))
Target units: ate

## Estimate
Mean value: -3.29335893400895e-05

步骤4：反驳#

待办事项