功能API预览#

本笔记本是一组笔记本的一部分，这些笔记本提供了对dowhy提出的功能API的预览。有关DoWhy新API的详细信息，请查看py-why/dowhy。这是一个正在进行的工作，随着我们添加新功能而更新。我们欢迎您通过Discord或在讨论页面上提供反馈。此功能API设计时考虑了向后兼容性。因此，旧的和新的API将继续共存并在即将发布的新版本中工作。逐渐地，使用CausalModel的旧API将被弃用，转而支持新API。

The current Functional API covers: * Identify Effect: * identify_effect(...): Run the identify effect algorithm using defaults just provide the graph, treatment and outcome. * auto_identify_effect(...): More configurable version of identify_effect(...). * id_identify_effect(...): Identify Effect using the ID-Algorithm. * Refute Estimate: * refute_estimate: Function to run a set of the refuters below with the default parameters. * refute_bootstrap: Refute an estimate by running it on a random sample of the data containing measurement error in the confounders. * refute_data_subset: Refute an estimate by rerunning it on a random subset of the original data. * refute_random_common_cause: Refute an estimate by introducing a randomly generated confounder (that may have been unobserved). * refute_placebo_treatment: Refute an estimate by replacing treatment with a randomly-generated placebo variable. * sensitivity_simulation: Add an unobserved confounder for refutation (Simulation of an unobserved confounder). * sensitivity_linear_partial_r2: Add an unobserved confounder for refutation (Linear partial R2 : Sensitivity Analysis for linear models). * sensitivity_non_parametric_partial_r2: Add an unobserved confounder for refutation (Non-Parametric partial R2 based : Sensitivity Analyis for non-parametric models). * sensitivity_e_value: Computes E-value for point estimate and confidence limits. Benchmarks E-values against measured confounders using Observed Covariate E-values. Plots E-values and Observed Covariate E-values. * refute_dummy_outcome: Refute an estimate by introducing a randomly generated confounder (that may have been unobserved).

导入依赖#

[1]:

# Config dict to set the logging level
import logging.config

from dowhy import CausalModel  # We still need this as we haven't created the functional API for effect estimation
from dowhy.causal_estimators.econml import Econml
from dowhy.causal_estimators.propensity_score_matching_estimator import PropensityScoreMatchingEstimator
from dowhy.causal_graph import CausalGraph
from dowhy.graph import build_graph

# Functional API imports
from dowhy.causal_identifier import (
    BackdoorAdjustment,
    EstimandType,
    identify_effect,
    identify_effect_auto,
    identify_effect_id,
)
from dowhy.causal_refuters import (
    refute_bootstrap,
    refute_data_subset,
    refute_estimate,
)
from dowhy.datasets import linear_dataset

DEFAULT_LOGGING = {
    "version": 1,
    "disable_existing_loggers": False,
    "loggers": {
        "": {
            "level": "WARN",
        },
    },
}


# set random seed for deterministic dataset generation
# and avoid problems when running tests
import numpy as np
np.random.seed(1)

logging.config.dictConfig(DEFAULT_LOGGING)
# Disabling warnings output
import warnings
from sklearn.exceptions import DataConversionWarning

warnings.filterwarnings(action="ignore", category=DataConversionWarning)

创建数据集#

[2]:

# Parameters for creating the Dataset
TREATMENT_IS_BINARY = True
BETA = 10
NUM_SAMPLES = 500
NUM_CONFOUNDERS = 3
NUM_INSTRUMENTS = 2
NUM_EFFECT_MODIFIERS = 2

# Creating a Linear Dataset with the given parameters
data = linear_dataset(
    beta=BETA,
    num_common_causes=NUM_CONFOUNDERS,
    num_instruments=NUM_INSTRUMENTS,
    num_effect_modifiers=NUM_EFFECT_MODIFIERS,
    num_samples=NUM_SAMPLES,
    treatment_is_binary=True,
)

data_2 = linear_dataset(
    beta=BETA,
    num_common_causes=NUM_CONFOUNDERS,
    num_instruments=NUM_INSTRUMENTS,
    num_effect_modifiers=NUM_EFFECT_MODIFIERS,
    num_samples=NUM_SAMPLES,
    treatment_is_binary=True,
)

treatment_name = data["treatment_name"]
print(treatment_name)
outcome_name = data["outcome_name"]
print(outcome_name)

graph = build_graph(
    action_nodes=treatment_name,
    outcome_nodes=outcome_name,
    effect_modifier_nodes=data["effect_modifier_names"],
    common_cause_nodes=data["common_causes_names"],
)
observed_nodes = data["df"].columns.tolist()

['v0']
y

识别效果 - 功能API（预览）#

[3]:

# Default identify_effect call example:
identified_estimand = identify_effect(graph, treatment_name, outcome_name, observed_nodes)

# auto_identify_effect example with extra parameters:
identified_estimand_auto = identify_effect_auto(
    graph,
    treatment_name,
    outcome_name,
    observed_nodes,
    estimand_type=EstimandType.NONPARAMETRIC_ATE,
    backdoor_adjustment=BackdoorAdjustment.BACKDOOR_EFFICIENT,
)

# id_identify_effect example:
identified_estimand_id = identify_effect_id(
    graph, treatment_name, outcome_name
)  # Note that the return type for id_identify_effect is IDExpression and not IdentifiedEstimand

print(identified_estimand)

Estimand type: EstimandType.NONPARAMETRIC_ATE

### Estimand : 1
Estimand name: backdoor
Estimand expression:
  d
─────(E[y|W1,W0,W2])
d[v₀]
Estimand assumption 1, Unconfoundedness: If U→{v0} and U→y then P(y|v0,W1,W0,W2,U) = P(y|v0,W1,W0,W2)

### Estimand : 2
Estimand name: iv
No such variable(s) found!

### Estimand : 3
Estimand name: frontdoor
No such variable(s) found!

估计效果 - 功能API（预览）#

[4]:

# Basic Estimate Effect function
estimator = PropensityScoreMatchingEstimator(
    identified_estimand=identified_estimand,
    test_significance=None,
    evaluate_effect_strength=False,
    confidence_intervals=False,
).fit(
    data=data["df"],
    effect_modifier_names=data["effect_modifier_names"]
)

estimate = estimator.estimate_effect(
    data=data["df"],
    control_value=0,
    treatment_value=1,
    target_units="ate",
)

# Using same estimator with different data
second_estimate = estimator.estimate_effect(
    data=data_2["df"],
    control_value=0,
    treatment_value=1,
    target_units="ate",
)

print(estimate)
print("-----------")
print(second_estimate)

*** Causal Estimate ***

## Identified estimand
Estimand type: EstimandType.NONPARAMETRIC_ATE

## Realized estimand
b: y~v0+W1+W0+W2
Target units: ate

## Estimate
Mean value: 11.32687325588067

-----------
*** Causal Estimate ***

## Identified estimand
Estimand type: EstimandType.NONPARAMETRIC_ATE

## Realized estimand
b: y~v0+W1+W0+W2
Target units: ate

## Estimate
Mean value: 15.620265231373276

[5]:

# EconML estimator example
from econml.dml import DML
from sklearn.linear_model import LassoCV
from sklearn.preprocessing import PolynomialFeatures

from sklearn.ensemble import GradientBoostingRegressor

estimator = Econml(
    identified_estimand=identified_estimand,
    econml_estimator=DML(
        model_y=GradientBoostingRegressor(),
        model_t=GradientBoostingRegressor(),
        model_final=LassoCV(fit_intercept=False),
        featurizer=PolynomialFeatures(degree=1, include_bias=True),
    ),
).fit(
    data=data["df"],
    effect_modifier_names=data["effect_modifier_names"],
)

estimate_econml = estimator.estimate_effect(
    data=data["df"],
    control_value=0,
    treatment_value=1,
    target_units="ate",
)

print(estimate)

*** Causal Estimate ***

## Identified estimand
Estimand type: EstimandType.NONPARAMETRIC_ATE

## Realized estimand
b: y~v0+W1+W0+W2
Target units: ate

## Estimate
Mean value: 11.32687325588067

反驳估计 - 功能API（预览）#

[6]:

# You can call the refute_estimate function for executing several refuters using default parameters
# Currently this function does not support sensitivity_* functions
refutation_results = refute_estimate(
    data["df"],
    identified_estimand,
    estimate,
    treatment_name=treatment_name,
    outcome_name=outcome_name,
    refuters=[refute_bootstrap, refute_data_subset],
)

for result in refutation_results:
    print(result)

# Or you can execute refute methods directly
# You can change the refute_bootstrap - refute_data_subset for any of the other refuters and add the missing parameters

bootstrap_refutation = refute_bootstrap(data["df"], identified_estimand, estimate)
print(bootstrap_refutation)

data_subset_refutation = refute_data_subset(data["df"], identified_estimand, estimate)
print(data_subset_refutation)

Refute: Bootstrap Sample Dataset
Estimated effect:11.32687325588067
New effect:11.88856219387756
p value:0.54

Refute: Use a subset of data
Estimated effect:11.32687325588067
New effect:11.66112498000895
p value:0.6599999999999999

Refute: Bootstrap Sample Dataset
Estimated effect:11.32687325588067
New effect:11.681262891992505
p value:0.7

Refute: Use a subset of data
Estimated effect:11.32687325588067
New effect:11.743625685672644
p value:0.5800000000000001

向后兼容性#

本节展示仅使用CausalModel API复制相同结果

[7]:

# Create Causal Model
causal_model = CausalModel(data=data["df"], treatment=treatment_name, outcome=outcome_name, graph=data["gml_graph"])

识别效果#

[8]:

identified_estimand_causal_model_api = (
    causal_model.identify_effect()
)  # graph, treatment and outcome comes from the causal_model object

print(identified_estimand_causal_model_api)

Estimand type: EstimandType.NONPARAMETRIC_ATE

### Estimand : 1
Estimand name: backdoor
Estimand expression:
  d
─────(E[y|W1,W0,W2])
d[v₀]
Estimand assumption 1, Unconfoundedness: If U→{v0} and U→y then P(y|v0,W1,W0,W2,U) = P(y|v0,W1,W0,W2)

### Estimand : 2
Estimand name: iv
Estimand expression:
 ⎡                              -1⎤
 ⎢    d        ⎛    d          ⎞  ⎥
E⎢─────────(y)⋅⎜─────────([v₀])⎟  ⎥
 ⎣d[Z₁  Z₀]    ⎝d[Z₁  Z₀]      ⎠  ⎦
Estimand assumption 1, As-if-random: If U→→y then ¬(U →→{Z1,Z0})
Estimand assumption 2, Exclusion: If we remove {Z1,Z0}→{v0}, then ¬({Z1,Z0}→y)

### Estimand : 3
Estimand name: frontdoor
No such variable(s) found!

估计效果#

[9]:

estimate_causal_model_api = causal_model.estimate_effect(
    identified_estimand_causal_model_api, method_name="backdoor.propensity_score_matching"
)

print(estimate_causal_model_api)

*** Causal Estimate ***

## Identified estimand
Estimand type: EstimandType.NONPARAMETRIC_ATE

### Estimand : 1
Estimand name: backdoor
Estimand expression:
  d
─────(E[y|W1,W0,W2])
d[v₀]
Estimand assumption 1, Unconfoundedness: If U→{v0} and U→y then P(y|v0,W1,W0,W2,U) = P(y|v0,W1,W0,W2)

## Realized estimand
b: y~v0+W1+W0+W2
Target units: ate

## Estimate
Mean value: 11.32687325588067

反驳估计#

[10]:

bootstrap_refutation_causal_model_api = causal_model.refute_estimate(identified_estimand_causal_model_api, estimate_causal_model_api, "bootstrap_refuter")
print(bootstrap_refutation_causal_model_api)

data_subset_refutation_causal_model_api = causal_model.refute_estimate(
    identified_estimand_causal_model_api, estimate_causal_model_api, "data_subset_refuter"
)

print(data_subset_refutation_causal_model_api)

Refute: Bootstrap Sample Dataset
Estimated effect:11.32687325588067
New effect:11.80905615602231
p value:0.5

Refute: Use a subset of data
Estimated effect:11.32687325588067
New effect:11.766914477757974
p value:0.6399999999999999