线性模型#

API参考链接: LogisticRegression, LinearRegression

查看线性模型的支持仓库这里

摘要

线性/逻辑回归,其中响应与其解释变量之间的关系通过线性预测函数建模。这是统计建模中的基础模型之一,具有快速的训练时间和良好的可解释性,但模型性能有所不同。该实现是对scikit-learn中暴露的线性/逻辑回归的轻量级封装。

工作原理

Christoph Molnar 的《可解释的机器学习》电子书 [1] 对线性和回归模型有很好的概述,可以分别在 这里这里 找到。

有关实现的具体细节,scikit-learn 的用户指南 [2] 关于线性和回归模型的内容非常扎实,可以在此处找到 here

代码示例

以下代码将为乳腺癌数据集训练一个逻辑回归模型。提供的可视化将包括全局和局部解释。

from interpret import set_visualize_provider
from interpret.provider import InlineProvider
set_visualize_provider(InlineProvider())
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score

from interpret.glassbox import LogisticRegression
from interpret import show

seed = 42
np.random.seed(seed)
X, y = load_breast_cancer(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=seed)

lr = LogisticRegression(max_iter=3000, random_state=seed)
lr.fit(X_train, y_train)

auc = roc_auc_score(y_test, lr.predict_proba(X_test)[:, 1])
print("AUC: {:.3f}".format(auc))
AUC: 0.998
/opt/hostedtoolcache/Python/3.9.20/x64/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:465: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
show(lr.explain_global())