使用EnbPI进行时间序列回归，一种保形预测方法¶

在这个例子中，我们展示了EnbPI，一种用于时间序列回归的保形预测方法。对于数据和模型定义，我们遵循了scikit-learn示例Time-related feature engineering。

下载和准备数据¶

我们下载了一个共享单车需求数据集。给定各种特征，我们有兴趣预测随时间变化的单车需求数量。

[1]:

from sklearn.datasets import fetch_openml

bike_sharing = fetch_openml(
    "Bike_Sharing_Demand", version=2, as_frame=True, parser="pandas"
)
df = bike_sharing.frame

[2]:

import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(12, 2))
average_week_demand = df.groupby(["weekday", "hour"])["count"].mean()
average_week_demand.plot(ax=ax)
_ = ax.set(
    title="Average hourly bike demand during the week",
    xticks=[i * 24 for i in range(7)],
    xticklabels=["Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"],
    xlabel="Time of the week",
    ylabel="Number of bike rentals",
)

../_images/examples_enbpi_ts_regression_5_0.svg

[3]:

from sklearn.model_selection import train_test_split
import numpy as np

df = df.sample(500)
y = df["count"] / df["count"].max()
X = df.drop("count", axis="columns")
X_train, X_test = train_test_split(X, test_size=0.2, shuffle=False)
y_train, y_test = train_test_split(y, test_size=0.2, shuffle=False)

数据引导¶

EnbPI 需要对数据进行自举，即对时间序列的随机子集进行有放回抽样，并为每个样本训练一个模型。

[4]:

class DataFrameBootstrapper:
    def __init__(self, n_samples: int):
        self.n_samples = n_samples

    def __call__(
        self, X: np.ndarray, y: np.ndarray
    ) -> tuple[np.ndarray, list[tuple[np.ndarray, np.ndarray]]]:
        indices = np.random.choice(y.shape[0], size=(self.n_samples, y.shape[0]))
        return indices, [(X.iloc[idx], y.iloc[idx]) for idx in indices]


import numpy as np

n_bs_samples = 10
bs_indices, bs_train_data = DataFrameBootstrapper(n_samples=n_bs_samples)(
    X_train, y_train
)

模型定义¶

为了适应时间序列，我们定义了一个基于直方图的梯度提升回归树，同时对分类和序数特征进行编码。

[5]:

from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import OrdinalEncoder
from sklearn.compose import ColumnTransformer
from sklearn.ensemble import HistGradientBoostingRegressor

categorical_columns = ["weather", "season", "holiday", "workingday"]
categories = [
    ["clear", "misty", "rain", "heavy_rain"],
    ["spring", "summer", "fall", "winter"],
    ["False", "True"],
    ["False", "True"],
]
ordinal_encoder = OrdinalEncoder(categories=categories)

gbrt_pipeline = make_pipeline(
    ColumnTransformer(
        transformers=[
            ("categorical", ordinal_encoder, categorical_columns),
        ],
        remainder="passthrough",
        verbose_feature_names_out=False,
    ),
    HistGradientBoostingRegressor(
        categorical_features=categorical_columns,
    ),
).set_output(transform="pandas")

每个引导样本的模型训练¶

我们现在准备为每个自举样本数据训练模型。对于每个自举样本，我们按照EnbPI所需的格式存储所有训练和测试数据点的预测结果。

[6]:

bs_train_preds = np.zeros((n_bs_samples, X_train.shape[0]))
bs_test_preds = np.zeros((n_bs_samples, X_test.shape[0]))
for i, batch in enumerate(bs_train_data):
    gbrt_pipeline.fit(*batch)
    bs_train_preds[i] = gbrt_pipeline.predict(X_train)
    bs_test_preds[i] = gbrt_pipeline.predict(X_test)

使用EnbPI的保形区间¶

我们利用Fortuna中的EnbPI实现来计算时间序列测试数据的共形区间，给定上述的训练和测试自举预测。我们选择覆盖误差等于0.05，对应于95%的置信度。

[7]:

from fortuna.conformal import EnbPI

conformal_intervals = EnbPI().conformal_interval(
    bootstrap_indices=bs_indices,
    bootstrap_train_preds=bs_train_preds,
    bootstrap_test_preds=bs_test_preds,
    train_targets=y_train.values,
    error=0.05,
)

为了评估条件覆盖，我们测量预测区间覆盖概率（PICP），即实际落在符合区间内的测试目标变量的百分比。我们进一步测量包含模型给出的点预测的区间百分比。最后，我们测量符合区间的大小，如果没有提供在线反馈，如在本例中，EnbPI 认为每个区间的大小相同。

[8]:

from fortuna.metric.regression import prediction_interval_coverage_probability

print(
    "Percentage of intervals containing average bootstrap predictions: "
    f"{prediction_interval_coverage_probability(*conformal_intervals.T, bs_test_preds.mean(0))}."
)
print(
    "Percentage of intervals containing true targets: "
    f"{prediction_interval_coverage_probability(*conformal_intervals.T, y_test.values)}."
)
print(f"Size of the conformal intervals: {np.diff(conformal_intervals)[0][0]}")

Percentage of intervals containing average bootstrap predictions: 1.0.
Percentage of intervals containing true targets: 0.9699999690055847.
Size of the conformal intervals: 0.49568092823028564

很高兴看到所有区间都包含了各自的点预测。另一方面，估计的条件覆盖概率约为80%，低于期望的95%。请注意，EnbPI满足的是近似边际覆盖保证，而不是条件覆盖保证，并且只有在数据中的残差在每个时间步都是同分布的假设下才成立。这些事实可能解释了这里观察到的覆盖不足。

[9]:

def weakly_avg(x):
    s = x.shape[0] // 7
    x = x[: s * 7]
    return x.reshape(7, s, *x.shape[1:]).mean(0)


weekly_avg_test = weakly_avg(y_test.values)
n_weeks = weekly_avg_test.shape[0]

plt.figure(figsize=(12, 2))
plt.plot(weakly_avg(y_test.values), label="weekly averaged true test target")
plt.fill_between(
    np.arange(n_weeks),
    *weakly_avg(conformal_intervals).T,
    alpha=0.5,
    color="C0",
    label="weekly averaged conformal interval",
)
plt.xlabel("test weeks", fontsize=14)
plt.legend(fontsize=11, loc="upper right")

[9]:

<matplotlib.legend.Legend at 0x7ff9c46f9650>

../_images/examples_enbpi_ts_regression_22_1.svg

在上面的例子中，EnbPI假设测试数据集中的所有预测都是一次性完成的，没有来自测试目标的新观测数据的在线反馈。因此，所有测试数据点的置信区间大小是相同的。然而，如果我们在一个在线环境中工作，其中测试目标是逐步观察到的，EnbPI可以利用这种反馈来改进置信区间。以下部分提供了一个如何做到这一点的示例。

带有在线反馈的EnbPI¶

在本节中，我们假设测试数据是分批到达的，并且对于每个任务，我们必须进行预测并计算符合区间。为了简化，我们将考虑每批大小为1个数据点，即在每个测试数据点到达后进行预测。在不重新训练模型的情况下，我们将使用在第一时间步估计的残差，以及新预测提供的反馈，来动态计算改进的符合区间。

[10]:

batch_size = 1
conformal_intervals2 = np.zeros((len(y_test), 2))
for i in range(0, len(y_test), batch_size):
    if i == 0:
        conformal_intervals2[:batch_size], train_residuals = EnbPI().conformal_interval(
            bootstrap_indices=bs_indices,
            bootstrap_train_preds=bs_train_preds,
            bootstrap_test_preds=bs_test_preds[:, :batch_size],
            train_targets=y_train.values,
            error=0.05,
            return_residuals=True,
        )
    else:
        (
            conformal_intervals2[i : i + batch_size],
            train_residuals,
        ) = EnbPI().conformal_interval_from_residuals(
            train_residuals=train_residuals,
            bootstrap_new_train_preds=bs_test_preds[:, i - batch_size : i],
            bootstrap_new_test_preds=bs_test_preds[:, i : i + batch_size],
            new_train_targets=y_test.values[i - batch_size : i],
            error=0.05,
        )

与上述操作类似，我们计算预测值和真实测试目标落在符合区间内的百分比。

[11]:

print(
    "Percentage of intervals containing average bootstrap predictions: "
    f"{prediction_interval_coverage_probability(*conformal_intervals2.T, bs_test_preds.mean(0))}."
)
print(
    "Percentage of intervals containing true targets: "
    f"{prediction_interval_coverage_probability(*conformal_intervals2.T, y_test.values)}."
)

Percentage of intervals containing average bootstrap predictions: 1.0.
Percentage of intervals containing true targets: 0.9699999690055847.

再次，很高兴看到所有保形区间都包含了点预测。此外，包含真实目标的区间百分比增加到了大约83%，更接近期望的95%覆盖率。

[12]:

plt.figure(figsize=(12, 2))
plt.plot(weakly_avg(y_test.values), label="weekly averaged true test target")
plt.fill_between(
    np.arange(n_weeks),
    *weakly_avg(conformal_intervals2).T,
    alpha=0.5,
    color="C0",
    label="weekly averaged conformal interval",
)
plt.xlabel("test weeks", fontsize=14)
plt.legend(fontsize=11, loc="upper right")

[12]:

<matplotlib.legend.Legend at 0x7ff9c47c6f90>

../_images/examples_enbpi_ts_regression_30_1.svg

以下图表比较了没有和有在线反馈的区间大小。

[13]:

plt.figure(figsize=(12, 2))
plt.title("Conformal interval size comparison")
plt.plot(np.diff(conformal_intervals), label="without online feedback")
plt.plot(np.diff(conformal_intervals2), label="with online feedback")
plt.xlabel("test set", fontsize=14)
plt.ylabel("size", fontsize=14)
plt.legend(fontsize=12)

[13]:

<matplotlib.legend.Legend at 0x7ff9c4439490>

../_images/examples_enbpi_ts_regression_32_1.svg

我们应该再次指出，关于保形区间的反馈是在不需要重新训练模型的情况下提供的，这在实际应用中非常有用。然而，如果数据的分布随着时间的推移开始漂移，保形区间可能会逐渐变大并变得不可用。在这种情况下，可以跟踪保形区间的大小，并在达到某个阈值后触发重新训练。