欠采样方法#

imblearn.under_sampling 提供了对数据集进行欠采样的方法。

原型生成#

imblearn.under_sampling.prototype_generation 子模块包含生成新样本以平衡数据集的方法。

ClusterCentroids(*[, sampling_strategy, ...])

通过基于聚类方法生成质心来进行欠采样。

imblearn.under_sampling.prototype_selection 子模块包含用于选择样本以平衡数据集的方法。

`CondensedNearestNeighbour`(*[, ...])	基于压缩最近邻方法进行欠采样。
`EditedNearestNeighbours`(*[, ...])	基于编辑最近邻方法进行欠采样。
`RepeatedEditedNearestNeighbours`(*[, ...])	基于重复编辑最近邻方法进行欠采样。
`AllKNN`(*[, sampling_strategy, n_neighbors, ...])	基于AllKNN方法进行欠采样。
`InstanceHardnessThreshold`(*[, estimator, ...])	基于实例硬度阈值进行欠采样。
`NearMiss`(*[, sampling_strategy, version, ...])	用于执行基于NearMiss方法的欠采样的类。
`NeighbourhoodCleaningRule`(*[, ...])	基于邻域清理规则进行欠采样。
`OneSidedSelection`(*[, sampling_strategy, ...])	用于执行基于单边选择方法的欠采样的类。
`RandomUnderSampler`(*[, sampling_strategy, ...])	用于执行随机欠采样的类。
`TomekLinks`(*[, sampling_strategy, n_jobs])	通过移除Tomek的链接进行欠采样。