可解释的分类#

在本笔记本中,我们将拟合分类可解释增强机(EBM)、逻辑回归(LogisticRegression)和分类树(ClassificationTree)模型。拟合完成后,我们将利用它们的透明性来理解它们的全局和局部解释。

这个笔记本可以在我们的examples folder在GitHub上找到。

# install interpret if not already installed
try:
    import interpret
except ModuleNotFoundError:
    !pip install --quiet interpret pandas scikit-learn
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from interpret import show
from interpret.perf import ROC

from interpret import set_visualize_provider
from interpret.provider import InlineProvider
set_visualize_provider(InlineProvider())

df = pd.read_csv(
    "https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data",
    header=None)
df.columns = [
    "Age", "WorkClass", "fnlwgt", "Education", "EducationNum",
    "MaritalStatus", "Occupation", "Relationship", "Race", "Gender",
    "CapitalGain", "CapitalLoss", "HoursPerWeek", "NativeCountry", "Income"
]
X = df.iloc[:, :-1]
y = (df.iloc[:, -1] == " >50K").astype(int)

seed = 42
np.random.seed(seed)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=seed)

探索数据集

from interpret.data import ClassHistogram

hist = ClassHistogram().explain_data(X_train, y_train, name='Train Data')
show(hist)

训练可解释的提升机(EBM)

from interpret.glassbox import ExplainableBoostingClassifier

ebm = ExplainableBoostingClassifier()
ebm.fit(X_train, y_train)
ExplainableBoostingClassifier()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

EBMs 是玻璃盒模型,因此我们可以编辑它们

# post-process monotonize the Age feature
ebm.monotonize("Age", increasing=True)
ExplainableBoostingClassifier()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

全局解释:模型整体学到了什么

ebm_global = ebm.explain_global(name='EBM')
show(ebm_global)

局部解释:单个预测是如何做出的

ebm_local = ebm.explain_local(X_test[:5], y_test[:5], name='EBM')
show(ebm_local, 0)

评估EBM性能

ebm_perf = ROC(ebm).explain_perf(X_test, y_test, name='EBM')
show(ebm_perf)

让我们测试一些其他可解释的模型

from interpret.glassbox import LogisticRegression, ClassificationTree

# We have to transform categorical variables to use Logistic Regression and Decision Tree
X = pd.get_dummies(X, prefix_sep='.').astype(float)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=seed)

lr = LogisticRegression(random_state=seed, penalty='l1', solver='liblinear')
lr.fit(X_train, y_train)

tree = ClassificationTree()
tree.fit(X_train, y_train)
<interpret.glassbox._decisiontree.ClassificationTree at 0x7fae8f412190>

使用仪表板比较性能

lr_perf = ROC(lr).explain_perf(X_test, y_test, name='Logistic Regression')
show(lr_perf)
tree_perf = ROC(tree).explain_perf(X_test, y_test, name='Classification Tree')
show(tree_perf)

Glassbox:我们所有的模型都有全局和局部解释

lr_global = lr.explain_global(name='Logistic Regression')
show(lr_global)
tree_global = tree.explain_global(name='Classification Tree')
show(tree_global)