可解释的因果机器学习
Causal ML 提供了方法来解释训练的治疗效果模型,我们在 feature_interpretations_example.ipynb notebook 中提供了更多的示例代码。
元学习器特征重要性
from causalml.inference.meta import BaseSRegressor, BaseTRegressor, BaseXRegressor, BaseRRegressor
slearner = BaseSRegressor(LGBMRegressor(), control_name='control')
slearner.estimate_ate(X, w_multi, y)
slearner_tau = slearner.fit_predict(X, w_multi, y)
model_tau_feature = RandomForestRegressor() # specify model for model_tau_feature
slearner.get_importance(X=X, tau=slearner_tau, model_tau_feature=model_tau_feature,
normalize=True, method='auto', features=feature_names)
# Using the feature_importances_ method in the base learner (LGBMRegressor() in this example)
slearner.plot_importance(X=X, tau=slearner_tau, normalize=True, method='auto')
# Using eli5's PermutationImportance
slearner.plot_importance(X=X, tau=slearner_tau, normalize=True, method='permutation')
# Using SHAP
shap_slearner = slearner.get_shap_values(X=X, tau=slearner_tau)
# Plot shap values without specifying shap_dict
slearner.plot_shap_values(X=X, tau=slearner_tau)
# Plot shap values WITH specifying shap_dict
slearner.plot_shap_values(X=X, shap_dict=shap_slearner)
# interaction_idx set to 'auto' (searches for feature with greatest approximate interaction)
slearner.plot_shap_dependence(treatment_group='treatment_A',
feature_idx=1,
X=X,
tau=slearner_tau,
interaction_idx='auto')
提升树可视化
from IPython.display import Image
from causalml.inference.tree import UpliftTreeClassifier, UpliftRandomForestClassifier
from causalml.inference.tree import uplift_tree_string, uplift_tree_plot
from causalml.dataset import make_uplift_classification
df, x_names = make_uplift_classification()
uplift_model = UpliftTreeClassifier(max_depth=5, min_samples_leaf=200, min_samples_treatment=50,
n_reg=100, evaluationFunction='KL', control_name='control')
uplift_model.fit(df[x_names].values,
treatment=df['treatment_group_key'].values,
y=df['conversion'].values)
graph = uplift_tree_plot(uplift_model.fitted_uplift_tree, x_names)
Image(graph.create_png())
请参见以下内容以了解如何阅读图表,并且在仓库中提供了uplift_tree_visualization.ipynb示例笔记本。
feature_name > threshold: 对于非叶节点,第一行是一个不等式,表示该节点到其子节点的分割规则。
impurity: 杂质被定义为在当前节点评估的分割准则函数(如KL、Chi或ED)的值
total_sample: 此节点中的样本大小。
group_sample: 按治疗组划分的样本大小
提升分数:此节点中的处理效果,如果有多个处理,则表示所有处理与对照组对之间的处理效果的最大(有符号)值。
uplift p_value: 该节点中处理效果的p值
验证提升分数:以上所有信息在树训练完成后是静态的(基于训练好的树),而验证提升分数表示在使用fill()方法时测试数据的处理效果。该分数可用于与训练提升分数进行比较,以评估树是否存在过拟合问题。
提升树特征重要性
pd.Series(uplift_model.feature_importances_, index=x_names).sort_values().plot(kind='barh', figsize=(12,8))