注意
转到末尾 以下载完整的示例代码。
通过编译报告评估分类#
已经开发了特定的指标来评估使用不平衡数据训练的分类器。imblearn
提供了一个类似于sklearn
的分类报告,其中包含了针对不平衡学习问题的额外指标。
pre rec spe f1 geo iba sup
0 0.42 0.84 0.88 0.56 0.86 0.73 123
1 0.98 0.88 0.84 0.93 0.86 0.74 1127
avg / total 0.93 0.87 0.84 0.89 0.86 0.74 1250
# Authors: Guillaume Lemaitre <g.lemaitre58@gmail.com>
# License: MIT
from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from imblearn import over_sampling as os
from imblearn import pipeline as pl
from imblearn.metrics import classification_report_imbalanced
print(__doc__)
RANDOM_STATE = 42
# Generate a dataset
X, y = datasets.make_classification(
n_classes=2,
class_sep=2,
weights=[0.1, 0.9],
n_informative=10,
n_redundant=1,
flip_y=0,
n_features=20,
n_clusters_per_class=4,
n_samples=5000,
random_state=RANDOM_STATE,
)
pipeline = pl.make_pipeline(
StandardScaler(),
os.SMOTE(random_state=RANDOM_STATE),
LogisticRegression(max_iter=10_000),
)
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=RANDOM_STATE)
# Train the classifier with balancing
pipeline.fit(X_train, y_train)
# Test the classifier and get the prediction
y_pred_bal = pipeline.predict(X_test)
# Show the classification report
print(classification_report_imbalanced(y_test, y_pred_bal))
脚本的总运行时间: (0 分钟 0.312 秒)
预计内存使用量: 199 MB