注意
点击 here 下载完整的示例代码
解释每个父节点的局部分类器
一个极简示例,展示如何使用HiClass Explainer获取LCPPN模型的SHAP值。 关于Explainer类的详细总结已在分层可解释性的算法概述部分给出。 SHAP值是基于一个可以这里下载的合成鸭嘴兽疾病数据集计算的。

输出:
<xarray.Dataset>
Dimensions: (class: 15, level: 3, sample: 246, feature: 9)
Coordinates:
* class (class) <U16 'Allergy' 'Bee Allergy' ... 'Respiratory'
* level (level) int64 0 1 2
Dimensions without coordinates: sample, feature
Data variables:
node (sample, level) object 'hiclass::root' ... 'Food Allergy'
predicted_class (sample, level) object 'Respiratory' ... 'Milk Allergy'
predict_proba (sample, level, class) float64 0.2 nan nan ... nan nan nan
classes (sample, level, class) object 'Allergy' nan nan ... nan nan
shap_values (level, class, sample, feature) float64 0.0163 ... nan
from sklearn.ensemble import RandomForestClassifier
from hiclass import LocalClassifierPerParentNode, Explainer
import shap
from hiclass.datasets import load_platypus
# Load train and test splits
X_train, X_test, Y_train, Y_test = load_platypus()
# Use random forest classifiers for every node
rfc = RandomForestClassifier()
classifier = LocalClassifierPerParentNode(
local_classifier=rfc, replace_classifiers=False
)
# Train local classifier per parent node
classifier.fit(X_train, Y_train)
# Define Explainer
explainer = Explainer(classifier, data=X_train.values, mode="tree")
explanations = explainer.explain(X_test.values)
print(explanations)
# Filter samples which only predicted "Respiratory" at first level
respiratory_idx = classifier.predict(X_test)[:, 0] == "Respiratory"
# Specify additional filters to obtain only level 0
shap_filter = {"level": 0, "class": "Respiratory", "sample": respiratory_idx}
# Use .sel() method to apply the filter and obtain filtered results
shap_val_respiratory = explanations.sel(shap_filter)
# Plot feature importance on test set
shap.plots.violin(
shap_val_respiratory.shap_values,
feature_names=X_train.columns.values,
plot_size=(13, 8),
)
脚本总运行时间: ( 0 分钟 34.499 秒)