使用TSMixer进行多元预测

关于如何使用TSMixer模型进行多元预测的教程。

在_多变量_预测中，我们利用每个时间序列的信息共同生成所有时间序列的预测。相比之下，在_单变量_预测中，我们只考虑每个个体时间序列的信息，并单独为每个时间序列生成预测。因此，多变量预测方法利用更多信息来生成每个预测，因此应该能够提供更好的预测结果。然而，多变量预测方法也随着时间序列数量的增加而扩展，这意味着这些方法通常不太适合大规模问题（即预测许多时间序列）。

在这个笔记本中，我们将展示一种最先进的多变量预测架构TSMixer / TSMixerx的性能，与单变量预测方法（NHITS）和基于简单MLP的多变量方法（MLPMultivariate）进行比较。

我们将展示如何： * 加载ETTm2基准数据集，该数据集在学术文献中使用。 * 训练TSMixer、TSMixerx和MLPMultivariate模型 * 预测测试集 * 优化超参数

您可以使用Google Colab在GPU上运行这些实验。

1. 安装库

%%capture
!pip install neuralforecast datasetsforecast

2. 加载ETTm2数据

LongHorizon类将自动下载完整的ETTm2数据集并进行处理。

它返回三个DataFrame：Y_df包含目标变量的值，X_df包含外生日历特征，S_df包含每个时间序列的静态特征（ETTm2中没有）。在本示例中，我们将使用Y_df和X_df。

在TSMixerx中，我们可以利用X_df中包含的额外外生特征。而在TSMixer中，_不_支持外生特征。因此，如果您想使用外生特征，应该使用TSMixerx。

如果您想使用自己的数据，只需替换Y_df和X_df。确保使用长格式，并确保与我们的数据集具有相似的结构。

import pandas as pd

from datasetsforecast.long_horizon import LongHorizon

# 将此更改为自己的数据以尝试模型
Y_df, X_df, _ = LongHorizon.load(directory='./', group='ETTm2')
Y_df['ds'] = pd.to_datetime(Y_df['ds'])

# X_df 包含外生特征，我们将其添加到 Y_df 中。
X_df['ds'] = pd.to_datetime(X_df['ds'])
Y_df = Y_df.merge(X_df, on=['unique_id', 'ds'], how='left')

# 我们进行验证和测试拆分
n_time = len(Y_df.ds.unique())
val_size = int(.2 * n_time)
test_size = int(.2 * n_time)

Y_df

	unique_id	ds	y	ex_1	ex_2	ex_3	ex_4
0	HUFL	2016-07-01 00:00:00	-0.041413	-0.500000	0.166667	-0.500000	-0.001370
1	HUFL	2016-07-01 00:15:00	-0.185467	-0.500000	0.166667	-0.500000	-0.001370
2	HUFL	2016-07-01 00:30:00	-0.257495	-0.500000	0.166667	-0.500000	-0.001370
3	HUFL	2016-07-01 00:45:00	-0.577510	-0.500000	0.166667	-0.500000	-0.001370
4	HUFL	2016-07-01 01:00:00	-0.385501	-0.456522	0.166667	-0.500000	-0.001370
...	...	...	...	...	...	...	...
403195	OT	2018-02-20 22:45:00	-1.581325	0.456522	-0.333333	0.133333	-0.363014
403196	OT	2018-02-20 23:00:00	-1.581325	0.500000	-0.333333	0.133333	-0.363014
403197	OT	2018-02-20 23:15:00	-1.581325	0.500000	-0.333333	0.133333	-0.363014
403198	OT	2018-02-20 23:30:00	-1.562328	0.500000	-0.333333	0.133333	-0.363014
403199	OT	2018-02-20 23:45:00	-1.562328	0.500000	-0.333333	0.133333	-0.363014

403200 rows × 7 columns

3. 训练模型

我们将使用 cross_validation 方法训练模型，该方法允许用户自动模拟多个历史预测（在测试集内）。

cross_validation 方法将使用验证集进行超参数选择和早期停止，然后为测试集生成预测。

首先，在 models 列表中实例化每个模型，指定 horizon、input_size 和训练迭代次数。在此笔记本中，我们将与单变量的 NHITS 和多变量的 MLPMultivariate 模型进行比较。

# %%capture
from neuralforecast.core import NeuralForecast
from neuralforecast.models import TSMixer, TSMixerx, NHITS, MLPMultivariate
from neuralforecast.losses.pytorch import MSE, MAE

%%capture
horizon = 96
input_size = 512
models = [
          TSMixer(h=horizon,
                input_size=input_size,
                n_series=7,
                max_steps=1000,
                val_check_steps=100,
                early_stop_patience_steps=5,
                scaler_type='identity',
                valid_loss=MAE(),
                random_seed=12345678,
                ),  
          TSMixerx(h=horizon,
                input_size=input_size,
                n_series=7,
                max_steps=1000,
                val_check_steps=100,
                early_stop_patience_steps=5,
                scaler_type='identity',
                dropout=0.7,
                valid_loss=MAE(),
                random_seed=12345678,
                futr_exog_list=['ex_1', 'ex_2', 'ex_3', 'ex_4'],
                ),
          MLPMultivariate(h=horizon,
                input_size=input_size,
                n_series=7,
                max_steps=1000,
                val_check_steps=100,
                early_stop_patience_steps=5,
                scaler_type='standard',
                hidden_size=256,
                valid_loss=MAE(),
                random_seed=12345678,
                ),                                             
           NHITS(h=horizon,
                input_size=horizon,
                max_steps=1000,
                val_check_steps=100,
                early_stop_patience_steps=5,
                scaler_type='robust',
                valid_loss=MAE(),
                random_seed=12345678,
                ),                                                                       
         ]

INFO:lightning_fabric.utilities.seed:Seed set to 12345678
INFO:lightning_fabric.utilities.seed:Seed set to 12345678
INFO:lightning_fabric.utilities.seed:Seed set to 12345678
INFO:lightning_fabric.utilities.seed:Seed set to 12345678

Tip

查看我们的auto模型以进行自动超参数优化，并在本教程的最后查看超参数调整的示例。

实例化一个 NeuralForecast 对象，所需参数如下：

models: 模型列表。
freq: 一个字符串，表示数据的频率。（请参见 pandas 的可用频率。）

其次，使用 cross_validation 方法，指定数据集（Y_df）、验证集大小和测试集大小。

%%capture
nf = NeuralForecast(
    models=models,
    freq='15min')

Y_hat_df = nf.cross_validation(df=Y_df,
                               val_size=val_size,
                               test_size=test_size,
                               n_windows=None
                               )                                 
Y_hat_df = Y_hat_df.reset_index()

cross_validation 方法将返回每个模型在测试集上的预测结果。

4. 评估结果

接下来，我们为所有模型绘制测试集中的OT变量的预测图。

import matplotlib.pyplot as plt
Y_plot = Y_hat_df[Y_hat_df['unique_id']=='OT'] # OT数据集
cutoffs = Y_hat_df['cutoff'].unique()[::horizon]
Y_plot = Y_plot[Y_hat_df['cutoff'].isin(cutoffs)]

plt.figure(figsize=(20,5))
plt.plot(Y_plot['ds'], Y_plot['y'], label='True')
for model in models:
    plt.plot(Y_plot['ds'], Y_plot[f'{model}'], label=f'{model}')
plt.xlabel('Datestamp')
plt.ylabel('OT')
plt.grid()
plt.legend()

最后，我们使用平均绝对误差（MAE）和均方误差（MSE）计算测试误差：

\(\qquad MAE = \frac{1}{Windows * Horizon} \sum_{\tau} |y_{\tau} - \hat{y}_{\tau}| \qquad\) 和 \(\qquad MSE = \frac{1}{Windows * Horizon} \sum_{\tau} (y_{\tau} - \hat{y}_{\tau})^{2} \qquad\)

from neuralforecast.losses.numpy import mse, mae

for model in models:
    mae_model = mae(Y_hat_df['y'], Y_hat_df[f'{model}'])
    mse_model = mse(Y_hat_df['y'], Y_hat_df[f'{model}'])
    print(f'{model} horizon {horizon} - MAE: {mae_model:.3f}')
    print(f'{model} horizon {horizon} - MSE: {mse_model:.3f}')

TSMixer horizon 96 - MAE: 0.250
TSMixer horizon 96 - MSE: 0.163
TSMixerx horizon 96 - MAE: 0.257
TSMixerx horizon 96 - MSE: 0.170
MLPMultivariate horizon 96 - MAE: 0.322
MLPMultivariate horizon 96 - MSE: 0.257
NHITS horizon 96 - MAE: 0.251
NHITS horizon 96 - MSE: 0.179

作为参考，我们可以检查与论文中自报告性能的比较。我们发现 TSMixer 提供的结果优于 单变量 方法 NHITS。此外，我们对 TSMixer 的实现与原论文的结果非常接近。最后，似乎使用数据框 X_df 中的额外外生变量的好处很小，因为 TSMixerx 的表现逊色于 TSMixer，尤其是在较长的预测期上。请注意，MLPMultivariate 的表现明显低于其他方法，这在一定程度上是可以预料的，因为它相对简单。

平均绝对误差 (MAE)

预测期	TSMixer (本笔记本)	TSMixer (论文)	TSMixerx (本笔记本)	NHITS (本笔记本)	NHITS (论文)	MLPMultivariate (本笔记本)
96	0.250	0.252	0.257	0.251	0.255	0.322
192	0.288	0.290	0.300	0.291	0.305	0.361
336	0.323	0.324	0.380	0.344	0.346	0.390
720	0.377	0.422	0.464	0.417	0.413	0.608

均方误差 (MSE)

预测期	TSMixer (本笔记本)	TSMixer (论文)	TSMixerx (本笔记本)	NHITS (本笔记本)	NHITS (论文)	MLPMultivariate (本笔记本)
96	0.163	0.163	0.170	0.179	0.176	0.255
192	0.220	0.216	0.231	0.239	0.245	0.330
336	0.272	0.268	0.361	0.311	0.295	0.376
720	0.356	0.420	0.493	0.451	0.401	3.421

请注意，在上表中，我们对所有方法在所有预测期使用相同的超参数，而原论文则为每个预测期调整超参数。

5. 调整超参数

AutoTSMixer / AutoTSMixerx 类将自动使用 Tune 库进行超参数调整，探索用户定义或默认的搜索空间。根据验证集上的错误选择模型，然后存储最佳模型并在推理期间使用。

AutoTSMixer.default_config / AutoTSMixerx.default_config 属性包含建议的超参数空间。在这里，我们根据论文的超参数指定不同的搜索空间。欢迎对这个空间进行自由探索。

在这个例子中，我们将为 horizon = 96 优化超参数。

from ray import tune
from ray.tune.search.hyperopt import HyperOptSearch
from neuralforecast.auto import AutoTSMixer, AutoTSMixerx

horizon = 96 # 24小时 = 4 * 15分钟。

tsmixer_config = {
       "input_size": input_size,                                                 # 输入窗口大小
       "max_steps": tune.choice([500, 1000, 2000]),                              # 训练迭代次数
       "val_check_steps": 100,                                                   # 每x步进行一次验证计算
       "early_stop_patience_steps": 5,                                           # 早停步骤
       "learning_rate": tune.loguniform(1e-4, 1e-2),                             # 初始学习率
       "n_block": tune.choice([1, 2, 4, 6, 8]),                                  # 混合层数量
       "dropout": tune.uniform(0.0, 0.99),                                       # 丢弃法
       "ff_dim": tune.choice([32, 64, 128]),                                     # 特征线性层的维度
       "scaler_type": 'identity',       
    }

tsmixerx_config = tsmixer_config.copy()
tsmixerx_config['futr_exog_list'] = ['ex_1', 'ex_2', 'ex_3', 'ex_4']

要实例化 AutoTSMixer 和 AutoTSMixerx，您需要定义：

h: 预测范围
n_series: 多元时间序列问题中的时间序列数量。

此外，我们定义以下参数（如果未给出，AutoTSMixer / AutoTSMixerx 类将使用预定义值）： * loss: 训练损失。使用 DistributionLoss 生成概率预测。 * config: 超参数搜索空间。如果为 None，AutoTSMixer 类将使用预定义的建议超参数空间。 * num_samples: 探索的配置数量。在此示例中，我们仅使用有限的 10 个。 * search_alg: 用于在超参数空间内选择参数值的搜索算法类型。 * backend: 用于超参数优化搜索的后端，选择 ray 或 optuna。 * valid_loss: 在优化过程中用于验证集的损失。

model = AutoTSMixer(h=horizon,
                    n_series=7,
                    loss=MAE(),
                    config=tsmixer_config,
                    num_samples=10,
                    search_alg=HyperOptSearch(),
                    backend='ray',
                    valid_loss=MAE())

modelx = AutoTSMixerx(h=horizon,
                    n_series=7,
                    loss=MAE(),
                    config=tsmixerx_config,
                    num_samples=10,
                    search_alg=HyperOptSearch(),
                    backend='ray',
                    valid_loss=MAE())

现在，我们通过实例化一个 NeuralForecast 对象来拟合模型，所需参数如下：

models：模型列表。
freq：一个字符串，表示数据的频率。（参见 pandas 的可用频率。）

cross_validation 方法允许你模拟多个历史预测，极大简化了管道，通过 fit 和 predict 方法替代了 for 循环。

在时间序列数据中，交叉验证是通过在历史数据上定义一个滑动窗口来完成的，并预测随后的时间段。这种形式的交叉验证使我们能够更好地估计模型在更广泛时间实例下的预测能力，同时保持训练集中的数据连续，这是我们模型所要求的。

cross_validation 方法将使用验证集进行超参数选择，然后生成测试集的预测结果。

%%capture
nf = NeuralForecast(models=[model, modelx], freq='15min')
Y_hat_df = nf.cross_validation(df=Y_df, val_size=val_size,
                               test_size=test_size, n_windows=None)

2024-03-22 09:08:28,183 INFO worker.py:1724 -- Started a local Ray instance.
2024-03-22 09:08:29,427 INFO tune.py:220 -- Initializing Ray automatically. For cluster usage or custom Ray initialization, call `ray.init(...)` before `Tuner(...)`.
2024-03-22 09:08:29,429 INFO tune.py:583 -- [output] This uses the legacy output and progress reporter, as Jupyter notebooks are not supported by the new engine, yet. For more information, please see https://github.com/ray-project/ray/issues/36949
2024-03-22 09:08:45,570 INFO tensorboardx.py:275 -- Removed the following hyperparameter values when logging to tensorboard: {'loss': ('__ref_ph', 'de895953'), 'valid_loss': ('__ref_ph', '004b9a7a')}
2024-03-22 09:09:03,688 INFO tensorboardx.py:275 -- Removed the following hyperparameter values when logging to tensorboard: {'loss': ('__ref_ph', 'de895953'), 'valid_loss': ('__ref_ph', '004b9a7a')}
2024-03-22 09:09:17,107 INFO tensorboardx.py:275 -- Removed the following hyperparameter values when logging to tensorboard: {'loss': ('__ref_ph', 'de895953'), 'valid_loss': ('__ref_ph', '004b9a7a')}
2024-03-22 09:09:28,650 INFO tensorboardx.py:275 -- Removed the following hyperparameter values when logging to tensorboard: {'loss': ('__ref_ph', 'de895953'), 'valid_loss': ('__ref_ph', '004b9a7a')}
2024-03-22 09:09:47,489 INFO tensorboardx.py:275 -- Removed the following hyperparameter values when logging to tensorboard: {'loss': ('__ref_ph', 'de895953'), 'valid_loss': ('__ref_ph', '004b9a7a')}
2024-03-22 09:10:19,949 INFO tensorboardx.py:275 -- Removed the following hyperparameter values when logging to tensorboard: {'loss': ('__ref_ph', 'de895953'), 'valid_loss': ('__ref_ph', '004b9a7a')}
2024-03-22 09:11:20,191 INFO tensorboardx.py:275 -- Removed the following hyperparameter values when logging to tensorboard: {'loss': ('__ref_ph', 'de895953'), 'valid_loss': ('__ref_ph', '004b9a7a')}
2024-03-22 09:11:30,224 INFO tensorboardx.py:275 -- Removed the following hyperparameter values when logging to tensorboard: {'loss': ('__ref_ph', 'de895953'), 'valid_loss': ('__ref_ph', '004b9a7a')}
2024-03-22 09:12:06,451 INFO tensorboardx.py:275 -- Removed the following hyperparameter values when logging to tensorboard: {'loss': ('__ref_ph', 'de895953'), 'valid_loss': ('__ref_ph', '004b9a7a')}
2024-03-22 09:12:28,275 INFO tensorboardx.py:275 -- Removed the following hyperparameter values when logging to tensorboard: {'loss': ('__ref_ph', 'de895953'), 'valid_loss': ('__ref_ph', '004b9a7a')}
INFO:lightning_fabric.utilities.seed:Seed set to 1
2024-03-22 09:13:12,831 INFO tune.py:583 -- [output] This uses the legacy output and progress reporter, as Jupyter notebooks are not supported by the new engine, yet. For more information, please see https://github.com/ray-project/ray/issues/36949
2024-03-22 09:13:42,119 INFO tensorboardx.py:275 -- Removed the following hyperparameter values when logging to tensorboard: {'futr_exog_list': ('ex_1', 'ex_2', 'ex_3', 'ex_4'), 'loss': ('__ref_ph', 'de895953'), 'valid_loss': ('__ref_ph', '004b9a7a')}
2024-03-22 09:14:08,067 INFO tensorboardx.py:275 -- Removed the following hyperparameter values when logging to tensorboard: {'futr_exog_list': ('ex_1', 'ex_2', 'ex_3', 'ex_4'), 'loss': ('__ref_ph', 'de895953'), 'valid_loss': ('__ref_ph', '004b9a7a')}
2024-03-22 09:14:34,340 INFO tensorboardx.py:275 -- Removed the following hyperparameter values when logging to tensorboard: {'futr_exog_list': ('ex_1', 'ex_2', 'ex_3', 'ex_4'), 'loss': ('__ref_ph', 'de895953'), 'valid_loss': ('__ref_ph', '004b9a7a')}
2024-03-22 09:14:59,946 INFO tensorboardx.py:275 -- Removed the following hyperparameter values when logging to tensorboard: {'futr_exog_list': ('ex_1', 'ex_2', 'ex_3', 'ex_4'), 'loss': ('__ref_ph', 'de895953'), 'valid_loss': ('__ref_ph', '004b9a7a')}
2024-03-22 09:16:44,930 INFO tensorboardx.py:275 -- Removed the following hyperparameter values when logging to tensorboard: {'futr_exog_list': ('ex_1', 'ex_2', 'ex_3', 'ex_4'), 'loss': ('__ref_ph', 'de895953'), 'valid_loss': ('__ref_ph', '004b9a7a')}
2024-03-22 09:17:02,576 INFO tensorboardx.py:275 -- Removed the following hyperparameter values when logging to tensorboard: {'futr_exog_list': ('ex_1', 'ex_2', 'ex_3', 'ex_4'), 'loss': ('__ref_ph', 'de895953'), 'valid_loss': ('__ref_ph', '004b9a7a')}
2024-03-22 09:17:22,409 INFO tensorboardx.py:275 -- Removed the following hyperparameter values when logging to tensorboard: {'futr_exog_list': ('ex_1', 'ex_2', 'ex_3', 'ex_4'), 'loss': ('__ref_ph', 'de895953'), 'valid_loss': ('__ref_ph', '004b9a7a')}
2024-03-22 09:17:41,035 INFO tensorboardx.py:275 -- Removed the following hyperparameter values when logging to tensorboard: {'futr_exog_list': ('ex_1', 'ex_2', 'ex_3', 'ex_4'), 'loss': ('__ref_ph', 'de895953'), 'valid_loss': ('__ref_ph', '004b9a7a')}
2024-03-22 09:18:02,149 INFO tensorboardx.py:275 -- Removed the following hyperparameter values when logging to tensorboard: {'futr_exog_list': ('ex_1', 'ex_2', 'ex_3', 'ex_4'), 'loss': ('__ref_ph', 'de895953'), 'valid_loss': ('__ref_ph', '004b9a7a')}
2024-03-22 09:19:45,156 INFO tensorboardx.py:275 -- Removed the following hyperparameter values when logging to tensorboard: {'futr_exog_list': ('ex_1', 'ex_2', 'ex_3', 'ex_4'), 'loss': ('__ref_ph', 'de895953'), 'valid_loss': ('__ref_ph', '004b9a7a')}
INFO:lightning_fabric.utilities.seed:Seed set to 1

6. 评估结果

AutoTSMixer/AutoTSMixerx类包含一个results属性，该属性存储了每个探索的配置的信息。它包含验证损失和最佳验证超参数。我们在上一步中获得的结果数据框Y_hat_df是基于超参数搜索的最佳配置。对于AutoTSMixer，最佳配置是：

nf.models[0].results.get_best_result().config

{'input_size': 512,
 'max_steps': 2000,
 'val_check_steps': 100,
 'early_stop_patience_steps': 5,
 'learning_rate': 0.0008831625975972278,
 'n_block': 1,
 'dropout': 0.531963627534685,
 'ff_dim': 128,
 'scaler_type': 'identity',
 'n_series': 7,
 'h': 96,
 'loss': MAE(),
 'valid_loss': MAE()}

并且对于 AutoTSMixerx：

nf.models[1].results.get_best_result().config

{'input_size': 512,
 'max_steps': 500,
 'val_check_steps': 100,
 'early_stop_patience_steps': 5,
 'learning_rate': 0.006813015000503828,
 'n_block': 1,
 'dropout': 0.6915259307542235,
 'ff_dim': 32,
 'scaler_type': 'identity',
 'futr_exog_list': ('ex_1', 'ex_2', 'ex_3', 'ex_4'),
 'n_series': 7,
 'h': 96,
 'loss': MAE(),
 'valid_loss': MAE()}

我们计算了两个感兴趣指标的最佳配置的测试误差：

\(\qquad MAE = \frac{1}{Windows * Horizon} \sum_{\tau} |y_{\tau} - \hat{y}_{\tau}| \qquad\) 和 \(\qquad MSE = \frac{1}{Windows * Horizon} \sum_{\tau} (y_{\tau} - \hat{y}_{\tau})^{2} \qquad\)

y_true = Y_hat_df.y.values
y_hat_tsmixer = Y_hat_df['AutoTSMixer'].values
y_hat_tsmixerx = Y_hat_df['AutoTSMixerx'].values

print(f'MAE TSMixer: {mae(y_hat_tsmixer, y_true):.3f}')
print(f'MSE TSMixer: {mse(y_hat_tsmixer, y_true):.3f}')
print(f'MAE TSMixerx: {mae(y_hat_tsmixerx, y_true):.3f}')
print(f'MSE TSMixerx: {mse(y_hat_tsmixerx, y_true):.3f}')

MAE TSMixer: 0.250
MSE TSMixer: 0.163
MAE TSMixerx: 0.264
MSE TSMixerx: 0.178

我们可以将优化设置的误差指标与使用默认超参数的早期设置进行比较。在这种情况下，对于96的预测范围，我们获得了TSMixer在MAE上的略微改进结果。有趣的是，与默认设置相比，TSMixerx并没有改善。对于这个数据集，对于96的预测范围，使用外生特征与TSMixerx架构似乎有限的价值。

指标	TSMixer （优化）	TSMixer （默认）	TSMixer （论文）	TSMixerx （优化）	TSMixerx （默认）
MAE	0.247	0.250	0.252	0.258	0.257
MSE	0.162	0.163	0.163	0.174	0.170

请注意，我们只评估了10种超参数配置（num_samples=10），这可能表明通过探索更多超参数配置可以进一步改善预测性能。

参考文献

Chen, Si-An, Chun-Liang Li, Nate Yoder, Sercan O. Arik, 和 Tomas Pfister (2023). “TSMixer: 一种用于时间序列预测的全MLP架构。”
Cristian Challu, Kin G. Olivares, Boris N. Oreshkin, Federico Garza, Max Mergenthaler-Canseco, Artur Dubrawski (2021). NHITS: 用于时间序列预测的神经层次插值。接受于AAAI 2023。

Give us a ⭐ on Github