通过编译报告评估分类#

已经开发了特定的指标来评估使用不平衡数据训练的分类器。imblearn提供了一个类似于sklearn的分类报告，其中包含了针对不平衡学习问题的额外指标。

                   pre       rec       spe        f1       geo       iba       sup

          0       0.42      0.84      0.88      0.56      0.86      0.73       123
          1       0.98      0.88      0.84      0.93      0.86      0.74      1127

avg / total       0.93      0.87      0.84      0.89      0.86      0.74      1250

# Authors: Guillaume Lemaitre <g.lemaitre58@gmail.com>
# License: MIT


from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

from imblearn import over_sampling as os
from imblearn import pipeline as pl
from imblearn.metrics import classification_report_imbalanced

print(__doc__)

RANDOM_STATE = 42

# Generate a dataset
X, y = datasets.make_classification(
    n_classes=2,
    class_sep=2,
    weights=[0.1, 0.9],
    n_informative=10,
    n_redundant=1,
    flip_y=0,
    n_features=20,
    n_clusters_per_class=4,
    n_samples=5000,
    random_state=RANDOM_STATE,
)

pipeline = pl.make_pipeline(
    StandardScaler(),
    os.SMOTE(random_state=RANDOM_STATE),
    LogisticRegression(max_iter=10_000),
)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=RANDOM_STATE)

# Train the classifier with balancing
pipeline.fit(X_train, y_train)

# Test the classifier and get the prediction
y_pred_bal = pipeline.predict(X_test)

# Show the classification report
print(classification_report_imbalanced(y_test, y_pred_bal))

脚本的总运行时间： (0 分钟 0.312 秒)

预计内存使用量： 199 MB

图库由Sphinx-Gallery生成

通过编译报告评估分类#

本页面