BaggingRegressor#

class sklearn.ensemble.BaggingRegressor(estimator=None, n_estimators=10, *, max_samples=1.0, max_features=1.0, bootstrap=True, bootstrap_features=False, oob_score=False, warm_start=False, n_jobs=None, random_state=None, verbose=0)#

一个Bagging回归器。

Bagging回归器是一个集成元估计器，它将基础回归器分别拟合到原始数据集的随机子集上，然后通过投票或平均来聚合它们的个体预测，形成最终预测。这种元估计器通常可以用来减少黑箱估计器（例如，决策树）的方差，通过在其构造过程中引入随机化，然后从中形成一个集成。

该算法涵盖了文献中的几项工作。当数据集的随机子集作为样本的随机子集绘制时，该算法被称为Pasting [1]。如果样本是带替换绘制的，则该方法被称为Bagging [2]。当数据集的随机子集作为特征的随机子集绘制时，该方法被称为随机子空间 [3]。最后，当基础估计器建立在样本和特征的子集上时，该方法被称为随机补丁 [4]。

更多信息请参阅用户指南。

Added in version 0.15.

Parameters:

estimatorobject, default=None

用于拟合数据集随机子集的基础估计器。如果为None，则基础估计器是 DecisionTreeRegressor 。

Added in version 1.2: base_estimator 被重命名为 estimator 。

n_estimatorsint, default=10

集成中基础估计器的数量。

max_samplesint or float, default=1.0

从X中抽取的样本数量，用于训练每个基础估计器（默认情况下带替换，详见 bootstrap ）。

如果为int，则抽取 max_samples 个样本。
如果为float，则抽取 max_samples * X.shape[0] 个样本。

max_featuresint or float, default=1.0

从X中抽取的特征数量，用于训练每个基础估计器（默认情况下不带替换，详见 bootstrap_features ）。

如果为int，则抽取 max_features 个特征。
如果为float，则抽取 max(1, int(max_features * n_features_in_)) 个特征。

bootstrapbool, default=True

是否带替换抽取样本。如果为False，则进行不带替换的抽样。

bootstrap_featuresbool, default=False

是否带替换抽取特征。

oob_scorebool, default=False

是否使用袋外样本来估计泛化误差。仅在 bootstrap=True 时可用。

warm_startbool, default=False

当设置为True时，重用上一次调用fit的解决方案，并向集成中添加更多估计器，否则，拟合一个全新的集成。详见术语。

n_jobsint, default=None

用于并行运行 fit 和 predict 的作业数量。 None 表示1，除非在 joblib.parallel_backend 上下文中。 -1 表示使用所有处理器。详见术语。

random_stateint, RandomState instance or None, default=None

控制原始数据集的随机重采样（样本和特征）。如果基础估计器接受 random_state 属性，则为集成中的每个实例生成不同的种子。传递一个int以在多次函数调用中获得可重复的输出。详见术语。

verboseint, default=0

拟合和预测时的详细程度控制。

Attributes:

estimator_estimator: 用于生长集成的基本估计器。

Added in version 1.2: base_estimator_ 被重命名为 estimator_ 。
n_features_in_int: 在 fit 期间看到的特征数量。

Added in version 0.24.
feature_names_in_ndarray of shape ( n_features_in_ ,): 在 fit 期间看到的特征名称。仅当 X 的特征名称均为字符串时定义。

Added in version 1.0.
estimators_list of estimators: 拟合的子估计器集合。
estimators_samples_list of arrays: 子集，每个基估计器绘制的样本。
estimators_features_list of arrays: 每个基础估计器的抽取特征子集。
oob_score_float: 使用袋外估计获得的训练数据集得分。仅在 oob_score 为True时存在。
oob_prediction_ndarray of shape (n_samples,): 在训练集上使用袋外估计计算的预测。如果n_estimators很小，可能会有数据点从未在bootstrap中被遗漏。在这种情况下， oob_prediction_ 可能包含NaN。仅在 oob_score 为True时存在。

See also

BaggingClassifier: 一个Bagging分类器。

References

[1]

L. Breiman, “Pasting small votes for classification in large databases and on-line”, Machine Learning, 36(1), 85-103, 1999.

[2]

L. Breiman, “Bagging predictors”, Machine Learning, 24(2), 123-140, 1996.

[3]

T. Ho, “The random subspace method for constructing decision forests”, Pattern Analysis and Machine Intelligence, 20(8), 832-844, 1998.

[4]

G. Louppe and P. Geurts, “Ensembles on Random Patches”, Machine Learning and Knowledge Discovery in Databases, 346-361, 2012.

Examples

>>> from sklearn.svm import SVR
>>> from sklearn.ensemble import BaggingRegressor
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression(n_samples=100, n_features=4,
...                        n_informative=2, n_targets=1,
...                        random_state=0, shuffle=False)
>>> regr = BaggingRegressor(estimator=SVR(),
...                         n_estimators=10, random_state=0).fit(X, y)
>>> regr.predict([[0, 0, 0, 0]])
array([-2.8720...])

property estimators_samples_#

子集，每个基估计器绘制的样本。

返回一个动态生成的索引列表，标识用于拟合集成中每个成员的样本，即，袋内样本。

注意：为了减少对象的内存占用，不在存储采样数据，每次调用该属性时都会重新创建列表。因此，获取该属性可能比预期的要慢。

fit(X, y, *, sample_weight=None, **fit_params)#

构建一个从训练集（X，y）中估计器的Bagging集成。

Parameters:

X{array-like, sparse matrix}，形状为 (n_samples, n_features): 训练输入样本。仅当基础估计器支持时，才接受稀疏矩阵。
yarray-like，形状为 (n_samples,): 目标值（分类中的类标签，回归中的实数）。
sample_weightarray-like，形状为 (n_samples,)，默认=None: 样本权重。如果为None，则样本等权重。注意，这仅在基础估计器支持样本权重时才受支持。
**fit_paramsdict: 传递给基础估计器的参数。

Added in version 1.5: 仅在 enable_metadata_routing=True 时可用，可以通过使用 sklearn.set_config(enable_metadata_routing=True) 设置。有关更多详细信息，请参阅 Metadata Routing User Guide 。

Returns:

selfobject: 拟合的估计器。

get_metadata_routing()#

获取此对象的元数据路由。

请查看用户指南以了解路由机制的工作原理。

Added in version 1.5.

Returns:

routingMetadataRouter: MetadataRouter 封装的路由信息。

get_params(deep=True)#

获取此估计器的参数。

Parameters:

deepbool, 默认=True: 如果为True，将返回此估计器和包含的子对象（也是估计器）的参数。

Returns:

paramsdict: 参数名称映射到它们的值。

predict(X)#

预测X的回归目标。

输入样本的预测回归目标是作为集合中估计器的平均预测回归目标计算的。

Parameters:

X{array-like, sparse matrix}，形状为 (n_samples, n_features): 训练输入样本。仅当基础估计器支持时，才接受稀疏矩阵。

Returns:

yndarray，形状为 (n_samples,): 预测值。

score(X, y, sample_weight=None)#

返回预测的决定系数。

决定系数 $R^2$ 定义为 $(1 - rac{u}{v})$ ，其中 $u$ 是残差平方和 ((y_true - y_pred)** 2).sum() ，而 $v$ 是总平方和 ((y_true - y_true.mean()) ** 2).sum() 。最好的可能得分是 1.0，它可能是负的（因为模型可能任意地差）。一个总是预测 y 的期望值的常数模型，忽略输入特征，将得到 $R^2$ 得分为 0.0。

Parameters:

Xarray-like of shape (n_samples, n_features): 测试样本。对于某些估计器，这可能是一个预计算的核矩阵或一个形状为 (n_samples, n_samples_fitted) 的通用对象列表，其中 n_samples_fitted 是估计器拟合中使用的样本数量。
yarray-like of shape (n_samples,) or (n_samples, n_outputs): X 的真实值。
sample_weightarray-like of shape (n_samples,), default=None: 样本权重。

Returns:

scorefloat: $R^2$ 相对于 y 的 self.predict(X) 。

Notes

在调用回归器的 score 时使用的 $R^2$ 得分从 0.23 版本开始使用 multioutput='uniform_average' 以保持与 r2_score 默认值一致。这影响了所有多输出回归器的 score 方法（除了 MultiOutputRegressor ）。

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → BaggingRegressor#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config ). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True : metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False : metadata is not requested and the meta-estimator will not pass it to fit .
None : metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str : metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default ( sklearn.utils.metadata_routing.UNCHANGED ) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline . Otherwise it has no effect.

Parameters:

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in fit .

Returns:

selfobject: The updated object.

set_params(**params)#

设置此估计器的参数。

该方法适用于简单估计器以及嵌套对象（例如 Pipeline ）。后者具有形式为 <component>__<parameter> 的参数，以便可以更新嵌套对象的每个组件。

Parameters:

**paramsdict: 估计器参数。

Returns:

selfestimator instance: 估计器实例。

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → BaggingRegressor#

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config ). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True : metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False : metadata is not requested and the meta-estimator will not pass it to score .
None : metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str : metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default ( sklearn.utils.metadata_routing.UNCHANGED ) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline . Otherwise it has no effect.

Parameters:

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score .

Returns:

selfobject: The updated object.

Gallery examples#

单一估计器与袋装法：偏差-方差分解