使用EnbPI进行时间序列回归,一种保形预测方法¶
在这个例子中,我们展示了EnbPI,一种用于时间序列回归的保形预测方法。对于数据和模型定义,我们遵循了scikit-learn示例Time-related feature engineering。
下载和准备数据¶
我们下载了一个共享单车需求数据集。给定各种特征,我们有兴趣预测随时间变化的单车需求数量。
[1]:
from sklearn.datasets import fetch_openml
bike_sharing = fetch_openml(
"Bike_Sharing_Demand", version=2, as_frame=True, parser="pandas"
)
df = bike_sharing.frame
[2]:
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(12, 2))
average_week_demand = df.groupby(["weekday", "hour"])["count"].mean()
average_week_demand.plot(ax=ax)
_ = ax.set(
title="Average hourly bike demand during the week",
xticks=[i * 24 for i in range(7)],
xticklabels=["Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"],
xlabel="Time of the week",
ylabel="Number of bike rentals",
)
[3]:
from sklearn.model_selection import train_test_split
import numpy as np
df = df.sample(500)
y = df["count"] / df["count"].max()
X = df.drop("count", axis="columns")
X_train, X_test = train_test_split(X, test_size=0.2, shuffle=False)
y_train, y_test = train_test_split(y, test_size=0.2, shuffle=False)
数据引导¶
EnbPI 需要对数据进行自举,即对时间序列的随机子集进行有放回抽样,并为每个样本训练一个模型。
[4]:
class DataFrameBootstrapper:
def __init__(self, n_samples: int):
self.n_samples = n_samples
def __call__(
self, X: np.ndarray, y: np.ndarray
) -> tuple[np.ndarray, list[tuple[np.ndarray, np.ndarray]]]:
indices = np.random.choice(y.shape[0], size=(self.n_samples, y.shape[0]))
return indices, [(X.iloc[idx], y.iloc[idx]) for idx in indices]
import numpy as np
n_bs_samples = 10
bs_indices, bs_train_data = DataFrameBootstrapper(n_samples=n_bs_samples)(
X_train, y_train
)
模型定义¶
为了适应时间序列,我们定义了一个基于直方图的梯度提升回归树,同时对分类和序数特征进行编码。
[5]:
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import OrdinalEncoder
from sklearn.compose import ColumnTransformer
from sklearn.ensemble import HistGradientBoostingRegressor
categorical_columns = ["weather", "season", "holiday", "workingday"]
categories = [
["clear", "misty", "rain", "heavy_rain"],
["spring", "summer", "fall", "winter"],
["False", "True"],
["False", "True"],
]
ordinal_encoder = OrdinalEncoder(categories=categories)
gbrt_pipeline = make_pipeline(
ColumnTransformer(
transformers=[
("categorical", ordinal_encoder, categorical_columns),
],
remainder="passthrough",
verbose_feature_names_out=False,
),
HistGradientBoostingRegressor(
categorical_features=categorical_columns,
),
).set_output(transform="pandas")
每个引导样本的模型训练¶
我们现在准备为每个自举样本数据训练模型。对于每个自举样本,我们按照EnbPI所需的格式存储所有训练和测试数据点的预测结果。
[6]:
bs_train_preds = np.zeros((n_bs_samples, X_train.shape[0]))
bs_test_preds = np.zeros((n_bs_samples, X_test.shape[0]))
for i, batch in enumerate(bs_train_data):
gbrt_pipeline.fit(*batch)
bs_train_preds[i] = gbrt_pipeline.predict(X_train)
bs_test_preds[i] = gbrt_pipeline.predict(X_test)
使用EnbPI的保形区间¶
我们利用Fortuna中的EnbPI实现来计算时间序列测试数据的共形区间,给定上述的训练和测试自举预测。我们选择覆盖误差等于0.05,对应于95%的置信度。
[7]:
from fortuna.conformal import EnbPI
conformal_intervals = EnbPI().conformal_interval(
bootstrap_indices=bs_indices,
bootstrap_train_preds=bs_train_preds,
bootstrap_test_preds=bs_test_preds,
train_targets=y_train.values,
error=0.05,
)
为了评估条件覆盖,我们测量预测区间覆盖概率(PICP),即实际落在符合区间内的测试目标变量的百分比。我们进一步测量包含模型给出的点预测的区间百分比。最后,我们测量符合区间的大小,如果没有提供在线反馈,如在本例中,EnbPI 认为每个区间的大小相同。
[8]:
from fortuna.metric.regression import prediction_interval_coverage_probability
print(
"Percentage of intervals containing average bootstrap predictions: "
f"{prediction_interval_coverage_probability(*conformal_intervals.T, bs_test_preds.mean(0))}."
)
print(
"Percentage of intervals containing true targets: "
f"{prediction_interval_coverage_probability(*conformal_intervals.T, y_test.values)}."
)
print(f"Size of the conformal intervals: {np.diff(conformal_intervals)[0][0]}")
Percentage of intervals containing average bootstrap predictions: 1.0.
Percentage of intervals containing true targets: 0.9699999690055847.
Size of the conformal intervals: 0.49568092823028564
很高兴看到所有区间都包含了各自的点预测。另一方面,估计的条件覆盖概率约为80%,低于期望的95%。请注意,EnbPI满足的是近似边际覆盖保证,而不是条件覆盖保证,并且只有在数据中的残差在每个时间步都是同分布的假设下才成立。这些事实可能解释了这里观察到的覆盖不足。
[9]:
def weakly_avg(x):
s = x.shape[0] // 7
x = x[: s * 7]
return x.reshape(7, s, *x.shape[1:]).mean(0)
weekly_avg_test = weakly_avg(y_test.values)
n_weeks = weekly_avg_test.shape[0]
plt.figure(figsize=(12, 2))
plt.plot(weakly_avg(y_test.values), label="weekly averaged true test target")
plt.fill_between(
np.arange(n_weeks),
*weakly_avg(conformal_intervals).T,
alpha=0.5,
color="C0",
label="weekly averaged conformal interval",
)
plt.xlabel("test weeks", fontsize=14)
plt.legend(fontsize=11, loc="upper right")
[9]:
<matplotlib.legend.Legend at 0x7ff9c46f9650>
在上面的例子中,EnbPI假设测试数据集中的所有预测都是一次性完成的,没有来自测试目标的新观测数据的在线反馈。因此,所有测试数据点的置信区间大小是相同的。然而,如果我们在一个在线环境中工作,其中测试目标是逐步观察到的,EnbPI可以利用这种反馈来改进置信区间。以下部分提供了一个如何做到这一点的示例。
带有在线反馈的EnbPI¶
在本节中,我们假设测试数据是分批到达的,并且对于每个任务,我们必须进行预测并计算符合区间。为了简化,我们将考虑每批大小为1个数据点,即在每个测试数据点到达后进行预测。在不重新训练模型的情况下,我们将使用在第一时间步估计的残差,以及新预测提供的反馈,来动态计算改进的符合区间。
[10]:
batch_size = 1
conformal_intervals2 = np.zeros((len(y_test), 2))
for i in range(0, len(y_test), batch_size):
if i == 0:
conformal_intervals2[:batch_size], train_residuals = EnbPI().conformal_interval(
bootstrap_indices=bs_indices,
bootstrap_train_preds=bs_train_preds,
bootstrap_test_preds=bs_test_preds[:, :batch_size],
train_targets=y_train.values,
error=0.05,
return_residuals=True,
)
else:
(
conformal_intervals2[i : i + batch_size],
train_residuals,
) = EnbPI().conformal_interval_from_residuals(
train_residuals=train_residuals,
bootstrap_new_train_preds=bs_test_preds[:, i - batch_size : i],
bootstrap_new_test_preds=bs_test_preds[:, i : i + batch_size],
new_train_targets=y_test.values[i - batch_size : i],
error=0.05,
)
与上述操作类似,我们计算预测值和真实测试目标落在符合区间内的百分比。
[11]:
print(
"Percentage of intervals containing average bootstrap predictions: "
f"{prediction_interval_coverage_probability(*conformal_intervals2.T, bs_test_preds.mean(0))}."
)
print(
"Percentage of intervals containing true targets: "
f"{prediction_interval_coverage_probability(*conformal_intervals2.T, y_test.values)}."
)
Percentage of intervals containing average bootstrap predictions: 1.0.
Percentage of intervals containing true targets: 0.9699999690055847.
再次,很高兴看到所有保形区间都包含了点预测。此外,包含真实目标的区间百分比增加到了大约83%,更接近期望的95%覆盖率。
[12]:
plt.figure(figsize=(12, 2))
plt.plot(weakly_avg(y_test.values), label="weekly averaged true test target")
plt.fill_between(
np.arange(n_weeks),
*weakly_avg(conformal_intervals2).T,
alpha=0.5,
color="C0",
label="weekly averaged conformal interval",
)
plt.xlabel("test weeks", fontsize=14)
plt.legend(fontsize=11, loc="upper right")
[12]:
<matplotlib.legend.Legend at 0x7ff9c47c6f90>
以下图表比较了没有和有在线反馈的区间大小。
[13]:
plt.figure(figsize=(12, 2))
plt.title("Conformal interval size comparison")
plt.plot(np.diff(conformal_intervals), label="without online feedback")
plt.plot(np.diff(conformal_intervals2), label="with online feedback")
plt.xlabel("test set", fontsize=14)
plt.ylabel("size", fontsize=14)
plt.legend(fontsize=12)
[13]:
<matplotlib.legend.Legend at 0x7ff9c4439490>
我们应该再次指出,关于保形区间的反馈是在不需要重新训练模型的情况下提供的,这在实际应用中非常有用。然而,如果数据的分布随着时间的推移开始漂移,保形区间可能会逐渐变大并变得不可用。在这种情况下,可以跟踪保形区间的大小,并在达到某个阈值后触发重新训练。