功能API预览#
本笔记本是一组笔记本的一部分,这些笔记本提供了对dowhy提出的功能API的预览。有关DoWhy新API的详细信息,请查看py-why/dowhy。这是一个正在进行的工作,随着我们添加新功能而更新。我们欢迎您通过Discord或在讨论页面上提供反馈。此功能API设计时考虑了向后兼容性。因此,旧的和新的API将继续共存并在即将发布的新版本中工作。逐渐地,使用CausalModel的旧API将被弃用,转而支持新API。
The current Functional API covers: * Identify Effect: * identify_effect(...): Run the identify effect algorithm using defaults just provide the graph, treatment and outcome. * auto_identify_effect(...): More configurable version of identify_effect(...). * id_identify_effect(...): Identify Effect using the ID-Algorithm. * Refute Estimate: * refute_estimate: Function to run a set of the refuters below with the default parameters. * refute_bootstrap: Refute an
estimate by running it on a random sample of the data containing measurement error in the confounders. * refute_data_subset: Refute an estimate by rerunning it on a random subset of the original data. * refute_random_common_cause: Refute an estimate by introducing a randomly generated confounder (that may have been unobserved). * refute_placebo_treatment: Refute an estimate by replacing treatment with a randomly-generated placebo variable. * sensitivity_simulation: Add an
unobserved confounder for refutation (Simulation of an unobserved confounder). * sensitivity_linear_partial_r2: Add an unobserved confounder for refutation (Linear partial R2 : Sensitivity Analysis for linear models). * sensitivity_non_parametric_partial_r2: Add an unobserved confounder for refutation (Non-Parametric partial R2 based : Sensitivity Analyis for non-parametric models). * sensitivity_e_value: Computes E-value for point estimate and confidence limits. Benchmarks
E-values against measured confounders using Observed Covariate E-values. Plots E-values and Observed Covariate E-values. * refute_dummy_outcome: Refute an estimate by introducing a randomly generated confounder (that may have been unobserved).
导入依赖#
[1]:
# Config dict to set the logging level
import logging.config
from dowhy import CausalModel # We still need this as we haven't created the functional API for effect estimation
from dowhy.causal_estimators.econml import Econml
from dowhy.causal_estimators.propensity_score_matching_estimator import PropensityScoreMatchingEstimator
from dowhy.causal_graph import CausalGraph
from dowhy.graph import build_graph
# Functional API imports
from dowhy.causal_identifier import (
BackdoorAdjustment,
EstimandType,
identify_effect,
identify_effect_auto,
identify_effect_id,
)
from dowhy.causal_refuters import (
refute_bootstrap,
refute_data_subset,
refute_estimate,
)
from dowhy.datasets import linear_dataset
DEFAULT_LOGGING = {
"version": 1,
"disable_existing_loggers": False,
"loggers": {
"": {
"level": "WARN",
},
},
}
# set random seed for deterministic dataset generation
# and avoid problems when running tests
import numpy as np
np.random.seed(1)
logging.config.dictConfig(DEFAULT_LOGGING)
# Disabling warnings output
import warnings
from sklearn.exceptions import DataConversionWarning
warnings.filterwarnings(action="ignore", category=DataConversionWarning)
创建数据集#
[2]:
# Parameters for creating the Dataset
TREATMENT_IS_BINARY = True
BETA = 10
NUM_SAMPLES = 500
NUM_CONFOUNDERS = 3
NUM_INSTRUMENTS = 2
NUM_EFFECT_MODIFIERS = 2
# Creating a Linear Dataset with the given parameters
data = linear_dataset(
beta=BETA,
num_common_causes=NUM_CONFOUNDERS,
num_instruments=NUM_INSTRUMENTS,
num_effect_modifiers=NUM_EFFECT_MODIFIERS,
num_samples=NUM_SAMPLES,
treatment_is_binary=True,
)
data_2 = linear_dataset(
beta=BETA,
num_common_causes=NUM_CONFOUNDERS,
num_instruments=NUM_INSTRUMENTS,
num_effect_modifiers=NUM_EFFECT_MODIFIERS,
num_samples=NUM_SAMPLES,
treatment_is_binary=True,
)
treatment_name = data["treatment_name"]
print(treatment_name)
outcome_name = data["outcome_name"]
print(outcome_name)
graph = build_graph(
action_nodes=treatment_name,
outcome_nodes=outcome_name,
effect_modifier_nodes=data["effect_modifier_names"],
common_cause_nodes=data["common_causes_names"],
)
observed_nodes = data["df"].columns.tolist()
['v0']
y
识别效果 - 功能API(预览)#
[3]:
# Default identify_effect call example:
identified_estimand = identify_effect(graph, treatment_name, outcome_name, observed_nodes)
# auto_identify_effect example with extra parameters:
identified_estimand_auto = identify_effect_auto(
graph,
treatment_name,
outcome_name,
observed_nodes,
estimand_type=EstimandType.NONPARAMETRIC_ATE,
backdoor_adjustment=BackdoorAdjustment.BACKDOOR_EFFICIENT,
)
# id_identify_effect example:
identified_estimand_id = identify_effect_id(
graph, treatment_name, outcome_name
) # Note that the return type for id_identify_effect is IDExpression and not IdentifiedEstimand
print(identified_estimand)
Estimand type: EstimandType.NONPARAMETRIC_ATE
### Estimand : 1
Estimand name: backdoor
Estimand expression:
d
─────(E[y|W1,W0,W2])
d[v₀]
Estimand assumption 1, Unconfoundedness: If U→{v0} and U→y then P(y|v0,W1,W0,W2,U) = P(y|v0,W1,W0,W2)
### Estimand : 2
Estimand name: iv
No such variable(s) found!
### Estimand : 3
Estimand name: frontdoor
No such variable(s) found!
估计效果 - 功能API(预览)#
[4]:
# Basic Estimate Effect function
estimator = PropensityScoreMatchingEstimator(
identified_estimand=identified_estimand,
test_significance=None,
evaluate_effect_strength=False,
confidence_intervals=False,
).fit(
data=data["df"],
effect_modifier_names=data["effect_modifier_names"]
)
estimate = estimator.estimate_effect(
data=data["df"],
control_value=0,
treatment_value=1,
target_units="ate",
)
# Using same estimator with different data
second_estimate = estimator.estimate_effect(
data=data_2["df"],
control_value=0,
treatment_value=1,
target_units="ate",
)
print(estimate)
print("-----------")
print(second_estimate)
*** Causal Estimate ***
## Identified estimand
Estimand type: EstimandType.NONPARAMETRIC_ATE
## Realized estimand
b: y~v0+W1+W0+W2
Target units: ate
## Estimate
Mean value: 11.32687325588067
-----------
*** Causal Estimate ***
## Identified estimand
Estimand type: EstimandType.NONPARAMETRIC_ATE
## Realized estimand
b: y~v0+W1+W0+W2
Target units: ate
## Estimate
Mean value: 15.620265231373276
[5]:
# EconML estimator example
from econml.dml import DML
from sklearn.linear_model import LassoCV
from sklearn.preprocessing import PolynomialFeatures
from sklearn.ensemble import GradientBoostingRegressor
estimator = Econml(
identified_estimand=identified_estimand,
econml_estimator=DML(
model_y=GradientBoostingRegressor(),
model_t=GradientBoostingRegressor(),
model_final=LassoCV(fit_intercept=False),
featurizer=PolynomialFeatures(degree=1, include_bias=True),
),
).fit(
data=data["df"],
effect_modifier_names=data["effect_modifier_names"],
)
estimate_econml = estimator.estimate_effect(
data=data["df"],
control_value=0,
treatment_value=1,
target_units="ate",
)
print(estimate)
*** Causal Estimate ***
## Identified estimand
Estimand type: EstimandType.NONPARAMETRIC_ATE
## Realized estimand
b: y~v0+W1+W0+W2
Target units: ate
## Estimate
Mean value: 11.32687325588067
反驳估计 - 功能API(预览)#
[6]:
# You can call the refute_estimate function for executing several refuters using default parameters
# Currently this function does not support sensitivity_* functions
refutation_results = refute_estimate(
data["df"],
identified_estimand,
estimate,
treatment_name=treatment_name,
outcome_name=outcome_name,
refuters=[refute_bootstrap, refute_data_subset],
)
for result in refutation_results:
print(result)
# Or you can execute refute methods directly
# You can change the refute_bootstrap - refute_data_subset for any of the other refuters and add the missing parameters
bootstrap_refutation = refute_bootstrap(data["df"], identified_estimand, estimate)
print(bootstrap_refutation)
data_subset_refutation = refute_data_subset(data["df"], identified_estimand, estimate)
print(data_subset_refutation)
Refute: Bootstrap Sample Dataset
Estimated effect:11.32687325588067
New effect:11.88856219387756
p value:0.54
Refute: Use a subset of data
Estimated effect:11.32687325588067
New effect:11.66112498000895
p value:0.6599999999999999
Refute: Bootstrap Sample Dataset
Estimated effect:11.32687325588067
New effect:11.681262891992505
p value:0.7
Refute: Use a subset of data
Estimated effect:11.32687325588067
New effect:11.743625685672644
p value:0.5800000000000001
向后兼容性#
本节展示仅使用CausalModel API复制相同结果
[7]:
# Create Causal Model
causal_model = CausalModel(data=data["df"], treatment=treatment_name, outcome=outcome_name, graph=data["gml_graph"])
识别效果#
[8]:
identified_estimand_causal_model_api = (
causal_model.identify_effect()
) # graph, treatment and outcome comes from the causal_model object
print(identified_estimand_causal_model_api)
Estimand type: EstimandType.NONPARAMETRIC_ATE
### Estimand : 1
Estimand name: backdoor
Estimand expression:
d
─────(E[y|W1,W0,W2])
d[v₀]
Estimand assumption 1, Unconfoundedness: If U→{v0} and U→y then P(y|v0,W1,W0,W2,U) = P(y|v0,W1,W0,W2)
### Estimand : 2
Estimand name: iv
Estimand expression:
⎡ -1⎤
⎢ d ⎛ d ⎞ ⎥
E⎢─────────(y)⋅⎜─────────([v₀])⎟ ⎥
⎣d[Z₁ Z₀] ⎝d[Z₁ Z₀] ⎠ ⎦
Estimand assumption 1, As-if-random: If U→→y then ¬(U →→{Z1,Z0})
Estimand assumption 2, Exclusion: If we remove {Z1,Z0}→{v0}, then ¬({Z1,Z0}→y)
### Estimand : 3
Estimand name: frontdoor
No such variable(s) found!
估计效果#
[9]:
estimate_causal_model_api = causal_model.estimate_effect(
identified_estimand_causal_model_api, method_name="backdoor.propensity_score_matching"
)
print(estimate_causal_model_api)
*** Causal Estimate ***
## Identified estimand
Estimand type: EstimandType.NONPARAMETRIC_ATE
### Estimand : 1
Estimand name: backdoor
Estimand expression:
d
─────(E[y|W1,W0,W2])
d[v₀]
Estimand assumption 1, Unconfoundedness: If U→{v0} and U→y then P(y|v0,W1,W0,W2,U) = P(y|v0,W1,W0,W2)
## Realized estimand
b: y~v0+W1+W0+W2
Target units: ate
## Estimate
Mean value: 11.32687325588067
反驳估计#
[10]:
bootstrap_refutation_causal_model_api = causal_model.refute_estimate(identified_estimand_causal_model_api, estimate_causal_model_api, "bootstrap_refuter")
print(bootstrap_refutation_causal_model_api)
data_subset_refutation_causal_model_api = causal_model.refute_estimate(
identified_estimand_causal_model_api, estimate_causal_model_api, "data_subset_refuter"
)
print(data_subset_refutation_causal_model_api)
Refute: Bootstrap Sample Dataset
Estimated effect:11.32687325588067
New effect:11.80905615602231
p value:0.5
Refute: Use a subset of data
Estimated effect:11.32687325588067
New effect:11.766914477757974
p value:0.6399999999999999