可解释的因果机器学习

Causal ML 提供了方法来解释训练的治疗效果模型,我们在 feature_interpretations_example.ipynb notebook 中提供了更多的示例代码。

元学习器特征重要性

from causalml.inference.meta import BaseSRegressor, BaseTRegressor, BaseXRegressor, BaseRRegressor

slearner = BaseSRegressor(LGBMRegressor(), control_name='control')
slearner.estimate_ate(X, w_multi, y)
slearner_tau = slearner.fit_predict(X, w_multi, y)

model_tau_feature = RandomForestRegressor()  # specify model for model_tau_feature

slearner.get_importance(X=X, tau=slearner_tau, model_tau_feature=model_tau_feature,
                        normalize=True, method='auto', features=feature_names)

# Using the feature_importances_ method in the base learner (LGBMRegressor() in this example)
slearner.plot_importance(X=X, tau=slearner_tau, normalize=True, method='auto')

# Using eli5's PermutationImportance
slearner.plot_importance(X=X, tau=slearner_tau, normalize=True, method='permutation')

# Using SHAP
shap_slearner = slearner.get_shap_values(X=X, tau=slearner_tau)

# Plot shap values without specifying shap_dict
slearner.plot_shap_values(X=X, tau=slearner_tau)

# Plot shap values WITH specifying shap_dict
slearner.plot_shap_values(X=X, shap_dict=shap_slearner)

# interaction_idx set to 'auto' (searches for feature with greatest approximate interaction)
slearner.plot_shap_dependence(treatment_group='treatment_A',
                            feature_idx=1,
                            X=X,
                            tau=slearner_tau,
                            interaction_idx='auto')
_images/meta_feature_imp_vis.png _images/meta_shap_vis.png _images/meta_shap_dependence_vis.png

提升树可视化

from IPython.display import Image
from causalml.inference.tree import UpliftTreeClassifier, UpliftRandomForestClassifier
from causalml.inference.tree import uplift_tree_string, uplift_tree_plot
from causalml.dataset import make_uplift_classification

df, x_names = make_uplift_classification()
uplift_model = UpliftTreeClassifier(max_depth=5, min_samples_leaf=200, min_samples_treatment=50,
                                    n_reg=100, evaluationFunction='KL', control_name='control')

uplift_model.fit(df[x_names].values,
                treatment=df['treatment_group_key'].values,
                y=df['conversion'].values)

graph = uplift_tree_plot(uplift_model.fitted_uplift_tree, x_names)
Image(graph.create_png())
_images/uplift_tree_vis.png

请参见以下内容以了解如何阅读图表,并且在仓库中提供了uplift_tree_visualization.ipynb示例笔记本

  • feature_name > threshold: 对于非叶节点,第一行是一个不等式,表示该节点到其子节点的分割规则。

  • impurity: 杂质被定义为在当前节点评估的分割准则函数(如KL、Chi或ED)的值

  • total_sample: 此节点中的样本大小。

  • group_sample: 按治疗组划分的样本大小

  • 提升分数:此节点中的处理效果,如果有多个处理,则表示所有处理与对照组对之间的处理效果的最大(有符号)值。

  • uplift p_value: 该节点中处理效果的p值

  • 验证提升分数:以上所有信息在树训练完成后是静态的(基于训练好的树),而验证提升分数表示在使用fill()方法时测试数据的处理效果。该分数可用于与训练提升分数进行比较,以评估树是否存在过拟合问题。

提升树特征重要性

pd.Series(uplift_model.feature_importances_, index=x_names).sort_values().plot(kind='barh', figsize=(12,8))
_images/uplift_tree_feature_imp_vis.png