管道#

class imblearn.pipeline.Pipeline(steps, *, transform_input=None, memory=None, verbose=False)[source]#

具有最终估计器的转换和重采样管道。

依次应用一系列转换、采样和最终估计器。管道中的中间步骤必须是转换器或重采样器，也就是说，它们必须实现fit、transform和sample方法。采样器仅在fit期间应用。最终估计器只需要实现fit。管道中的转换器和采样器可以使用 memory参数进行缓存。

管道的目的是将几个步骤组合在一起，这些步骤可以在设置不同参数的同时进行交叉验证。为此，它允许使用步骤的名称和参数名称（由“__”分隔）来设置各个步骤的参数，如下例所示。可以通过将参数名称设置为另一个估计器来完全替换步骤的估计器，或者通过将其设置为“passthrough”或None来移除转换器。

Parameters:

stepslist

（名称，转换）元组列表（实现fit/transform/fit_resample），它们按顺序链接，最后一个对象是一个估计器。

transform_inputlist of str, default=None

在传递给使用它的步骤之前，应该由管道转换的metadata参数的名称。

这使得可以将一些输入参数（除了X）通过管道的步骤进行转换，直到需要它们的步骤。需求是通过metadata routing定义的。例如，这可以用于通过管道传递验证集。

只有在启用了元数据路由的情况下才能设置此选项，您可以使用 sklearn.set_config(enable_metadata_routing=True) 来启用它。

在版本1.6中添加。

memoryInstance of joblib.Memory or str, default=None

用于缓存管道中已拟合的转换器。默认情况下，不执行缓存。如果给定一个字符串，则是缓存目录的路径。启用缓存会在拟合之前触发转换器的克隆。因此，无法直接检查提供给管道的转换器实例。使用属性named_steps或steps来检查管道中的估计器。当拟合过程耗时较长时，缓存转换器是有利的。

verbosebool, default=False

如果为True，则在完成每个步骤时，将打印出拟合所花费的时间。

Attributes:

named_stepsBunch: 按名称访问步骤。
classes_ndarray of shape (n_classes,): 类别标签。
n_features_in_int: 在第一步fit方法中看到的特征数量。
feature_names_in_ndarray of shape (n_features_in_,): 在第一步fit方法中看到的特征名称。

另请参阅

make_pipeline: 用于创建管道的辅助函数。

注释

参见管道嵌入采样器的使用

警告

imbalanced-learn 管道的一个令人惊讶的行为是它打破了 scikit-learn 的约定，即人们期望 estimmator.fit_transform(X, y) 等同于 estimator.fit(X, y).transform(X)。

fit_resample 的语义仅在拟合阶段应用。因此，当调用 fit_transform 时，重采样会发生，而在分别调用 fit 和 transform 时，重采样仅在 fit 阶段发生。实际上，fit_transform 会导致数据集被重采样，而 fit 和 transform 则不会。

示例

>>> from collections import Counter
>>> from sklearn.datasets import make_classification
>>> from sklearn.model_selection import train_test_split as tts
>>> from sklearn.decomposition import PCA
>>> from sklearn.neighbors import KNeighborsClassifier as KNN
>>> from sklearn.metrics import classification_report
>>> from imblearn.over_sampling import SMOTE
>>> from imblearn.pipeline import Pipeline
>>> X, y = make_classification(n_classes=2, class_sep=2,
... weights=[0.1, 0.9], n_informative=3, n_redundant=1, flip_y=0,
... n_features=20, n_clusters_per_class=1, n_samples=1000, random_state=10)
>>> print(f'Original dataset shape {Counter(y)}')
Original dataset shape Counter({1: 900, 0: 100})
>>> pca = PCA()
>>> smt = SMOTE(random_state=42)
>>> knn = KNN()
>>> pipeline = Pipeline([('smt', smt), ('pca', pca), ('knn', knn)])
>>> X_train, X_test, y_train, y_test = tts(X, y, random_state=42)
>>> pipeline.fit(X_train, y_train)
Pipeline(...)
>>> y_hat = pipeline.predict(X_test)
>>> print(classification_report(y_test, y_hat))
              precision    recall  f1-score   support

           0       0.87      1.00      0.93        26
           1       1.00      0.98      0.99       224

    accuracy                           0.98       250
   macro avg       0.93      0.99      0.96       250
weighted avg       0.99      0.98      0.98       250

方法

`decision_function`(X, **params)	转换数据，并使用最终估计器应用`decision_function`。
`fit`(X[, y])	拟合模型。
`fit_predict`(X[, y])	在转换后应用管道中最后一步的`fit_predict`。
`fit_resample`(X[, y])	使用最终估计器拟合模型并进行采样。
`fit_transform`(X[, y])	拟合模型并使用最终估计器进行转换。
`get_feature_names_out`([input_features])	获取转换的输出特征名称。
`get_metadata_routing`()	获取此对象的元数据路由。
`get_params`([deep])	获取此估计器的参数。
`inverse_transform`(Xt, **params)	按相反顺序对每个步骤应用 `inverse_transform`。
`predict`(X, **params)	转换数据，并使用最终的估计器应用`predict`。
`predict_log_proba`(X, **params)	转换数据，并使用最终估计器应用 `predict_log_proba`。
`predict_proba`(X, **params)	转换数据，并使用最终估计器应用 `predict_proba`。
`score`(X[, y, sample_weight])	转换数据，并使用最终的估计器应用`score`。
`score_samples`(X)	转换数据，并使用最终估计器应用`score_samples`。
`set_output`(*[, transform])	当调用`"transform"`和`"fit_transform"`时，设置输出容器。
`set_params`(**kwargs)	设置此估计器的参数。
`set_score_request`(*[, sample_weight])	传递给`score`方法的请求元数据。
`transform`(X, **params)	转换数据，并使用最终的估计器应用`transform`。

property classes_#: 类别标签。仅当最后一步是分类器时存在。

decision_function(X, **params)[source]#

转换数据，并使用最终估计器应用decision_function。

调用管道中每个转换器的transform方法。转换后的数据最终传递给调用decision_function方法的最终估计器。仅当最终估计器实现了decision_function时有效。

Parameters:

Xiterable: 用于预测的数据。必须满足管道第一步的输入要求。
**paramsdict of string -> object: 步骤请求和接受的参数。每个步骤必须请求某些元数据，以便将这些参数转发给它们。

在版本1.4中新增：仅在enable_metadata_routing=True时可用。更多详情请参阅元数据路由用户指南。

Returns:

y_scorendarray of shape (n_samples, n_classes): 在最终估计器上调用decision_function的结果。

property feature_names_in_#: 在第一步fit方法中看到的特征名称。

fit(X, y=None, **params)[source]#

拟合模型。

依次拟合所有的变换/采样器并对数据进行变换/采样，然后使用最终的估计器拟合变换/采样后的数据。

Parameters:

Xiterable

训练数据。必须满足管道第一步的输入要求。

yiterable, default=None

训练目标。必须满足管道所有步骤的标签要求。

**paramsdict of str -> object

如果 enable_metadata_routing=False（默认）：

传递给每个步骤的fit方法的参数，其中每个参数名称都带有前缀，使得步骤s的参数p的键为s__p。
如果 enable_metadata_routing=True:

步骤请求和接受的参数。每个步骤必须请求某些元数据，以便将这些参数转发给它们。

在版本1.4中更改：如果请求，并且通过set_config设置了enable_metadata_routing=True，参数现在也会传递给中间步骤的transform方法。

有关更多详细信息，请参阅 Metadata Routing User Guide。

Returns:

selfPipeline: 这个估计器。

fit_predict(X, y=None, **params)[source]#

在转换后应用管道中最后一步的fit_predict。

将管道的fit_transforms应用于数据，然后应用管道中最终估计器的fit_predict方法。仅当最终估计器实现了fit_predict时有效。

Parameters:

Xiterable

训练数据。必须满足管道第一步的输入要求。

yiterable, default=None

训练目标。必须满足管道所有步骤的标签要求。

**paramsdict of str -> object

如果 enable_metadata_routing=False（默认）：

在管道中所有转换结束时调用的predict的参数。
如果 enable_metadata_routing=True:

步骤请求和接受的参数。每个步骤必须请求某些元数据，以便将这些参数转发给它们。

在版本0.20中添加。

在版本1.4中更改：如果请求，并且如果enable_metadata_routing=True，参数现在也会传递给中间步骤的transform方法。

有关更多详细信息，请参阅 Metadata Routing User Guide。

请注意，虽然这可以用于从一些模型中返回不确定性，这些模型具有return_std或return_cov，但由管道中的转换生成的不确定性不会传播到最终的估计器。

Returns:

y_predndarray of shape (n_samples,): 预测的目标。

fit_resample(X, y=None, **params)[source]#

使用最终估计器拟合模型并进行采样。

依次拟合所有的转换器/采样器并转换/采样数据，然后在转换后的数据上使用最终估计器进行fit_resample。

Parameters:

Xiterable

训练数据。必须满足管道第一步的输入要求。

yiterable, default=None

训练目标。必须满足管道所有步骤的标签要求。

**paramsdict of str -> object

如果 enable_metadata_routing=False（默认）：

传递给每个步骤的fit方法的参数，其中每个参数名称都带有前缀，使得步骤s的参数p的键为s__p。
如果 enable_metadata_routing=True:

步骤请求和接受的参数。每个步骤必须请求某些元数据，以便将这些参数转发给它们。

在版本1.4中更改：如果请求并且如果enable_metadata_routing=True，参数现在也会传递给中间步骤的transform方法。

有关更多详细信息，请参阅 Metadata Routing User Guide。

Returns:

Xtarray-like of shape (n_samples, n_transformed_features): 转换后的样本。
ytarray-like of shape (n_samples, n_transformed_features): 转换后的目标。

fit_transform(X, y=None, **params)[source]#

拟合模型并使用最终估计器进行转换。

依次拟合所有的转换器/采样器并转换/采样数据，然后在转换后的数据上使用最终估计器进行拟合转换。

Parameters:

Xiterable

训练数据。必须满足管道第一步的输入要求。

yiterable, default=None

训练目标。必须满足管道所有步骤的标签要求。

**paramsdict of str -> object

如果 enable_metadata_routing=False（默认）：

传递给每个步骤的fit方法的参数，其中每个参数名称都带有前缀，使得步骤s的参数p的键为s__p。
如果 enable_metadata_routing=True:

步骤请求和接受的参数。每个步骤必须请求某些元数据，以便将这些参数转发给它们。

在版本1.4中更改：如果请求，并且如果enable_metadata_routing=True，参数现在也会传递给中间步骤的transform方法。

有关更多详细信息，请参阅 Metadata Routing User Guide。

Returns:

Xtarray-like of shape (n_samples, n_transformed_features): 转换后的样本。

get_feature_names_out(input_features=None)[source]#

获取转换的输出特征名称。

使用管道转换输入特征。

Parameters:

input_featuresarray-like of str or None, default=None: 输入特征。

Returns:

feature_names_outndarray of str objects: 转换后的特征名称。

get_metadata_routing()[source]#

获取此对象的元数据路由。

请查看用户指南了解路由机制的工作原理。

Returns:

routingMetadataRouter: 一个封装了路由信息的MetadataRouter。

get_params(deep=True)[source]#

获取此估计器的参数。

返回构造函数中给定的参数以及包含在Pipeline的steps中的估计器。

Parameters:

deepbool, default=True: 如果为True，将返回此估计器及其包含的子对象的参数。

Returns:

paramsmapping of string to any: 参数名称映射到它们的值。

inverse_transform(Xt, **params)[source]#

按相反顺序对每个步骤应用 inverse_transform。

管道中的所有估计器都必须支持 inverse_transform。

Parameters:

Xtarray-like of shape (n_samples, n_transformed_features): 数据样本，其中n_samples是样本数量， n_features是特征数量。必须满足管道最后一步的inverse_transform方法的输入要求。
**paramsdict of str -> object: 步骤请求和接受的参数。每个步骤必须请求某些元数据，以便将这些参数转发给它们。

在版本1.4中新增：仅在enable_metadata_routing=True时可用。更多详情请参阅元数据路由用户指南。

Returns:

Xtndarray of shape (n_samples, n_features): 逆变换后的数据，即原始特征空间中的数据。

property n_features_in_#: 在第一步fit方法中看到的特征数量。

property named_steps#

按名称访问步骤。

只读属性，用于通过给定名称访问任何步骤。键是步骤名称，值是步骤对象。

predict(X, **params)[source]#

转换数据，并使用最终的估计器应用predict。

调用管道中每个转换器的transform方法。转换后的数据最终传递给调用predict方法的最终估计器。仅当最终估计器实现了predict方法时有效。

Parameters:

Xiterable

用于预测的数据。必须满足管道第一步的输入要求。

**paramsdict of str -> object

如果 enable_metadata_routing=False（默认）：

在管道中所有转换结束时调用的predict的参数。
如果 enable_metadata_routing=True:

步骤请求和接受的参数。每个步骤必须请求某些元数据，以便将这些参数转发给它们。

在版本0.20中添加。

在版本1.4中更改：如果请求，并且通过set_config设置了enable_metadata_routing=True，参数现在也会传递给中间步骤的transform方法。

有关更多详细信息，请参阅 Metadata Routing User Guide。

请注意，虽然这可以用于从一些模型中返回不确定性，这些模型具有return_std或return_cov，但由管道中的转换生成的不确定性不会传播到最终的估计器。

Returns:

y_predndarray: 在最终估计器上调用 predict 的结果。

predict_log_proba(X, **params)[source]#

转换数据，并使用最终估计器应用 predict_log_proba。

调用管道中每个转换器的transform方法。转换后的数据最终传递给调用predict_log_proba方法的最终估计器。仅当最终估计器实现了predict_log_proba方法时有效。

Parameters:

Xiterable

用于预测的数据。必须满足管道第一步的输入要求。

**paramsdict of str -> object

如果 enable_metadata_routing=False（默认）：

在管道中所有转换结束时调用的predict_log_proba的参数。
如果 enable_metadata_routing=True:

步骤请求和接受的参数。每个步骤必须请求某些元数据，以便将这些参数转发给它们。

在版本0.20中添加。

在版本1.4中更改：如果请求，并且如果enable_metadata_routing=True，参数现在也会传递给中间步骤的transform方法。

有关更多详细信息，请参阅 Metadata Routing User Guide。

Returns:

y_log_probandarray of shape (n_samples, n_classes): 调用最终估计器上的predict_log_proba的结果。

predict_proba(X, **params)[source]#

转换数据，并使用最终估计器应用 predict_proba。

调用管道中每个转换器的transform方法。转换后的数据最终传递给调用predict_proba方法的最终估计器。仅当最终估计器实现了predict_proba方法时有效。

Parameters:

Xiterable

用于预测的数据。必须满足管道第一步的输入要求。

**paramsdict of str -> object

如果 enable_metadata_routing=False（默认）：

在管道中所有转换结束时调用的predict_proba的参数。
如果 enable_metadata_routing=True:

步骤请求和接受的参数。每个步骤必须请求某些元数据，以便将这些参数转发给它们。

在版本0.20中添加。

在版本1.4中更改：如果请求，并且如果enable_metadata_routing=True，参数现在也会传递给中间步骤的transform方法。

有关更多详细信息，请参阅 Metadata Routing User Guide。

Returns:

y_probandarray of shape (n_samples, n_classes): 在最终估计器上调用predict_proba的结果。

score(X, y=None, sample_weight=None, **params)[source]#

转换数据，并使用最终的估计器应用score。

调用管道中每个转换器的transform方法。转换后的数据最终传递给调用score方法的最终估计器。仅当最终估计器实现了score方法时有效。

Parameters:

Xiterable: 用于预测的数据。必须满足管道第一步的输入要求。
yiterable, default=None: 用于评分的目标。必须满足管道所有步骤的标签要求。
sample_weightarray-like, default=None: 如果不是None，这个参数将作为sample_weight关键字参数传递给最终估计器的score方法。
**paramsdict of str -> object: 步骤请求和接受的参数。每个步骤必须请求某些元数据，以便将这些参数转发给它们。

在版本1.4中新增：仅在enable_metadata_routing=True时可用。更多详情请参阅元数据路由用户指南。

Returns:

scorefloat: 调用最终估计器上的score的结果。

score_samples(X)[source]#

转换数据，并使用最终估计器应用score_samples。

调用管道中每个转换器的transform。转换后的数据最终传递给调用score_samples方法的最终估计器。仅当最终估计器实现了score_samples时有效。

Parameters:

Xiterable: 用于预测的数据。必须满足管道第一步的输入要求。

Returns:

y_scorendarray of shape (n_samples,): 在最终估计器上调用score_samples的结果。

set_output(*, transform=None)[source]#

当调用"transform"和"fit_transform"时，设置输出容器。

调用 set_output 将设置 steps 中所有估计器的输出。

Parameters:

transform{“default”, “pandas”, “polars”}, default=None

配置 transform 和 fit_transform 的输出。

"default": 转换器的默认输出格式
"pandas": DataFrame 输出
"polars": Polars 输出
None: 转换配置未更改

在版本1.4中添加："polars" 选项已添加。

Returns:

selfestimator instance: 估计器实例。

set_params(**kwargs)[source]#

设置此估计器的参数。

可以使用get_params()列出有效的参数键。请注意，您可以直接设置包含在steps中的估计器的参数。

Parameters:

**kwargsdict: 此估计器的参数或包含在steps中的估计器的参数。步骤的参数可以通过其名称和参数名称用'__'分隔来设置。

Returns:

selfobject: Pipeline 类实例。

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → Pipeline[source]#

传递给score方法的请求元数据。

请注意，此方法仅在 enable_metadata_routing=True 时相关（参见 sklearn.set_config）。请参阅用户指南了解路由机制的工作原理。

每个参数的选项是：

True: 请求元数据，并在提供时传递给 score。如果未提供元数据，则忽略该请求。
False: 不请求元数据，元估计器不会将其传递给 score。
None: 不请求元数据，如果用户提供了元数据，元估计器将引发错误。
str: 元数据应该使用这个给定的别名传递给元估计器，而不是原始名称。

默认情况下（sklearn.utils.metadata_routing.UNCHANGED）保留现有的请求。这允许您更改某些参数的请求，而不更改其他参数。

在版本1.3中添加。

注意

此方法仅在此估计器用作元估计器的子估计器时相关，例如在Pipeline内部使用。否则它没有效果。

Parameters:

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: score 中 sample_weight 参数的元数据路由。

Returns:

selfobject: 更新后的对象。

transform(X, **params)[source]#

转换数据，并使用最终的估计器应用transform。

调用管道中每个转换器的transform方法。转换后的数据最终传递给调用transform方法的最终估计器。仅当最终估计器实现了transform方法时有效。

这也适用于最终估计器为 None 的情况，在这种情况下，所有先前的转换都会被应用。

Parameters:

Xiterable: 要转换的数据。必须满足管道第一步的输入要求。
**paramsdict of str -> object: 步骤请求和接受的参数。每个步骤必须请求某些元数据，以便将这些参数转发给它们。

在版本1.4中新增：仅在enable_metadata_routing=True时可用。更多详情请参阅元数据路由用户指南。

Returns:

Xtndarray of shape (n_samples, n_transformed_features): 转换后的数据。

使用`imblearn.pipeline.Pipeline`的示例#

多类分类与欠采样

Multiclass classification with under-sampling

文本文档中的主题分类示例

Example of topic classification in text documents

自定义采样器以实现异常值拒绝估计器

Customized sampler to implement an outlier rejections estimator

在人脸识别任务中基准过采样方法

Benchmark over-sampling methods in a face recognition task

在不平衡数据集上拟合模型以及如何对抗偏差

Fitting model on imbalanced datasets and how to fight bias

比较采样器结合过采样和欠采样

Compare sampler combining over- and under-sampling

使用采样器的Bagging分类器

Bagging classifiers using sampler

通过编译报告评估分类

Evaluate classification by compiling a report

不平衡学习特有的指标

Metrics specific to imbalanced learning

绘制验证曲线

Plotting Validation Curves

比较过采样采样器

Compare over-sampling samplers

管道嵌入采样器的使用

Usage of pipeline embedding samplers

比较欠采样采样器

Compare under-sampling samplers

管道#

使用imblearn.pipeline.Pipeline的示例#

本页面

使用`imblearn.pipeline.Pipeline`的示例#