端到端演练

多时间序列的模型训练、评估和选择

前提条件

本指南假设您对 NeuralForecast 有基本的了解。有关最简示例，请访问快速入门

按照本文提供的逐步指南，构建一个适用于多个时间序列的生产级预测管道。

在本指南中，您将熟悉核心 NeuralForecast 类和一些相关的方法，如 NeuralForecast.fit、NeuralForecast.predict 和 StatsForecast.cross_validation.

我们将使用 M4 竞赛中的经典基准数据集。该数据集包含来自不同领域（如金融、经济和销售）的时间序列。在本示例中，我们将使用每小时数据集的一个子集。

我们将全球建模每个时间序列，因此，您将为整个数据集训练一组模型，然后为每个单独的时间序列选择最佳模型。 NeuralForecast 专注于速度、简单性和可扩展性，这使其非常适合这项任务。

大纲：

安装软件包。
读取数据。
探索数据。
为整个数据集全局训练多个模型。
使用交叉验证评估模型性能。
为每个独特的时间序列选择最佳模型。

本指南未涵盖的内容

使用外部回归变量或外生变量
- 请遵循本教程包含外生变量，如天气或假期，或静态变量如类别或家族。
概率预测
- 请遵循本教程生成概率预测
迁移学习
- 训练一个模型并使用它对不同数据进行预测，请参阅本教程

Tip

您可以使用Colab以交互方式运行此笔记本

Warning

为了减少计算时间，建议使用GPU。使用Colab时，请不要忘记激活它。只需进入 Runtime>Change runtime type 并选择GPU作为硬件加速器。

1. 安装库

我们假设你已经安装了 NeuralForecast。请查看本指南以获取有关如何安装 NeuralForecast 的说明。

此外，我们将安装 s3fs 以便从 AWS 的 S3 文件系统读取数据，statsforecast 用于绘图，以及 datasetsforecast 用于常见的错误指标，如 MAE 或 MASE。

使用 pip install statsforecast s3fs datasetsforecast 安装必要的包。

%%capture
! pip install statsforecast s3fs datasetsforecast

%%capture
! pip install git+https://github.com/Nixtla/neuralforecast.git@main

2. 读取数据

我们将使用pandas读取存储在parquet文件中的M4小时数据集，以提高效率。您可以使用普通的pandas操作以其他格式（如.csv）读取数据。

NeuralForecast的输入始终是一个具有三列的长格式数据框：unique_id、ds和y：

unique_id（字符串、整数或类别）表示系列的标识符。
ds（日期戳或整数）列应该是一个整数索引时间，或理想情况下为YYYY-MM-DD格式的日期或YYYY-MM-DD HH:MM:SS格式的时间戳。
y（数值）表示我们希望预测的测量值。我们将重命名

该数据集已经满足要求。

根据您的互联网连接情况，此步骤大约需要10秒。

import pandas as pd

Y_df = pd.read_parquet('https://datasets-nixtla.s3.amazonaws.com/m4-hourly.parquet')

Y_df.head()

	unique_id	ds	y
0	H1	1	605.0
1	H1	2	586.0
2	H1	3	586.0
3	H1	4	559.0
4	H1	5	511.0

该数据集包含414个唯一系列，平均有900个观测值。为了本示例和可复现性，我们将仅选择10个唯一ID。根据您的处理基础设施，您可以选择更多或更少的系列。

Note

处理时间取决于可用的计算资源。在AWS的c5d.24xlarge（96核）实例上运行此示例的完整数据集大约需要10分钟。

uids = Y_df['unique_id'].unique()[:10] # 选择10个ID以加快示例速度
Y_df = Y_df.query('unique_id in @uids').reset_index(drop=True)

3. 使用StatsForecast的plot方法探索数据

使用StatsForecast类中的plot方法绘制一些系列。此方法会从数据集中随机打印8个系列，非常适合进行基本的探索性数据分析（EDA）。

Note

StatsForecast.plot 方法默认使用 Plotly 作为绘图引擎。您可以通过设置 engine="matplotlib" 来更改为 MatPlotLib。

from statsforecast import StatsForecast

StatsForecast.plot(Y_df, engine='matplotlib')

/Users/cchallu/opt/anaconda3/envs/neuralforecast/lib/python3.10/site-packages/statsforecast/core.py:25: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from tqdm.autonotebook import tqdm

4. 为多个序列训练多个模型

NeuralForecast 可以高效地在多个时间序列上全局训练多个模型。

from ray import tune

from neuralforecast import NeuralForecast
from neuralforecast.auto import AutoNHITS, AutoLSTM
from neuralforecast.losses.pytorch import MQLoss

每个 Auto 模型都包含一个在多个大规模数据集上经过广泛测试的默认搜索空间。此外，用户可以定义针对特定数据集和任务的特定搜索空间。

首先，我们为 AutoNHITS 和 AutoLSTM 模型创建一个自定义搜索空间。搜索空间通过字典指定，其中键对应于模型的超参数，值是一个 Tune 函数，用于指定超参数将如何进行采样。例如，使用 randint 均匀采样整数，使用 choice 从列表中采样值。

config_nhits = {
    "input_size": tune.choice([48, 48*2, 48*3]),              # 输入窗口长度
    "start_padding_enabled": True,
    "n_blocks": 5*[1],                                              # 输入窗口长度
    "mlp_units": 5 * [[64, 64]],                                  # 输入窗口长度
    "n_pool_kernel_size": tune.choice([5*[1], 5*[2], 5*[4],         
                                      [8, 4, 2, 1, 1]]),            # 最大池化核大小
    "n_freq_downsample": tune.choice([[8, 4, 2, 1, 1],
                                      [1, 1, 1, 1, 1]]),            # 插值表达能力比率
    "learning_rate": tune.loguniform(1e-4, 1e-2),                   # 初始学习率
    "scaler_type": tune.choice([None]),                             # 标量类型
    "max_steps": tune.choice([1000]),                               # 最大训练迭代次数
    "batch_size": tune.choice([1, 4, 10]),                          # 批量中的系列数量
    "windows_batch_size": tune.choice([128, 256, 512]),             # 批处理中的窗口数量
    "random_seed": tune.randint(1, 20),                             # 随机种子
}

config_lstm = {
    "input_size": tune.choice([48, 48*2, 48*3]),              # 输入窗口长度
    "encoder_hidden_size": tune.choice([64, 128]),            # LSTM 单元的隐藏层大小
    "encoder_n_layers": tune.choice([2,4]),                   # LSTM中的层数
    "learning_rate": tune.loguniform(1e-4, 1e-2),             # 初始学习率
    "scaler_type": tune.choice(['robust']),                   # 标量类型
    "max_steps": tune.choice([500, 1000]),                    # 最大训练迭代次数
    "batch_size": tune.choice([1, 4]),                        # 批量中的系列数量
    "random_seed": tune.randint(1, 20),                       # 随机种子
}

要实例化一个 Auto 模型，您需要定义：

h: 预测视野。
loss: 来自 neuralforecast.losses.pytorch 的训练和验证损失。
config: 超参数搜索空间。如果为 None，则 Auto 类将使用预定义的建议超参数空间。
search_alg: 搜索算法（来自 tune.search），默认是随机搜索。有关不同搜索算法选项的更多信息，请参阅 https://docs.ray.io/en/latest/tune/api_docs/suggestion.html。
num_samples: 探索的配置数量。

在此示例中，我们将视野 h 设置为 48，使用 MQLoss 分布损失进行训练和验证，并使用默认的搜索算法。

nf = NeuralForecast(
    models=[
        AutoNHITS(h=48, config=config_nhits, loss=MQLoss(), num_samples=5),
        AutoLSTM(h=48, config=config_lstm, loss=MQLoss(), num_samples=2),
    ],
    freq='H'
)

Tip

样本数量 num_samples 是一个关键参数！较大的值通常会产生更好的结果，因为我们在搜索空间中探索了更多的配置，但这会增加训练时间。较大的搜索空间通常需要更多的样本。作为一般规则，我们建议将 num_samples 设置为高于 20。

接下来，我们使用 Neuralforecast 类来训练 Auto 模型。在此步骤中，Auto 模型将自动执行超参数调优，训练多个具有不同超参数的模型，在验证集上生成预测，并对其进行评估。最佳配置是基于验证集上的误差进行选择的。只有最佳模型会被存储并在推理时使用。

%%capture
nf.fit(df=Y_df)

Global seed set to 15
Global seed set to 4

接下来，我们使用 predict 方法使用最优超参数预测未来48天的情况。

fcst_df = nf.predict()
fcst_df.columns = fcst_df.columns.str.replace('-median', '')
fcst_df.head()

Predicting DataLoader 0: 100%|██████████| 3/3 [00:00<00:00, 164.16it/s]
Predicting DataLoader 0: 100%|██████████| 3/3 [00:00<00:00, 13.89it/s]

	ds	AutoNHITS	AutoNHITS-lo-90	AutoNHITS-lo-80	AutoNHITS-hi-80	AutoNHITS-hi-90	AutoLSTM	AutoLSTM-lo-90	AutoLSTM-lo-80	AutoLSTM-hi-80	AutoLSTM-hi-90
unique_id
H1	749	550.545288	491.368347	484.838226	640.832520	658.631592	581.597534	510.460632	533.967041	660.153076	690.976379
H1	750	549.216736	491.054932	484.474243	639.552002	657.615967	530.324402	440.821899	472.254272	622.214539	653.435913
H1	751	528.075989	466.917053	463.002289	621.197205	642.255005	487.045593	383.502045	423.310974	594.273071	627.640320
H1	752	486.842255	418.012115	419.017242	585.653259	611.903809	457.408081	347.901093	390.807495	569.789062	604.200012
H1	753	452.015930	371.543884	379.539215	558.845154	590.465942	441.641418	333.888611	374.730621	557.401978	595.008484

StatsForecast.plot(Y_df, fcst_df, engine='matplotlib', max_insample_length=48 * 3, level=[80, 90])

StatsForecast.plot 允许进一步自定义。例如，绘制不同模型和唯一标识符的结果。

# 将图表绘制为唯一标识符，并选择一些模型
StatsForecast.plot(Y_df, fcst_df, models=["AutoLSTM"], unique_ids=["H107", "H104"], level=[80, 90], engine='matplotlib')

# 探索其他模型 
StatsForecast.plot(Y_df, fcst_df, models=["AutoNHITS"], unique_ids=["H10", "H105"], level=[80, 90], engine='matplotlib')

5. 评估模型的性能

在之前的步骤中，我们利用历史数据来预测未来。然而，为了评估模型的准确性，我们还希望了解模型在过去的表现。为了评估模型在您数据上的准确性和稳健性，请执行交叉验证。

对于时间序列数据，交叉验证是通过定义一个滑动窗口来进行的，该窗口在历史数据上滑动，并预测随后的时间段。这种形式的交叉验证使我们能够更好地估计模型在更广泛的时间实例上的预测能力，同时也保持训练集中的数据是连续的，这正是我们的模型所要求的。

以下图表展示了这种交叉验证策略：

Tip

设置 n_windows=1 反映了传统的训练-测试划分，使用我们的历史数据作为训练集，而最后48小时作为测试集。

NeuralForecast 类的 cross_validation 方法接受以下参数。

df: 训练数据框
step_size (int): 每个窗口之间的步长。换句话说：您希望多久运行一次预测过程。
n_windows (int): 用于交叉验证的窗口数量。换句话说：您希望评估过去多少次预测过程。

from neuralforecast.auto import AutoNHITS, AutoLSTM
config_nhits = {
    "input_size": tune.choice([48, 48*2, 48*3]),              # 输入窗口长度
    "start_padding_enabled": True,
    "n_blocks": 5*[1],                                              # 输入窗口长度
    "mlp_units": 5 * [[64, 64]],                                  # 输入窗口长度
    "n_pool_kernel_size": tune.choice([5*[1], 5*[2], 5*[4],         
                                      [8, 4, 2, 1, 1]]),            # 最大池化核大小
    "n_freq_downsample": tune.choice([[8, 4, 2, 1, 1],
                                      [1, 1, 1, 1, 1]]),            # 插值表达能力比率
    "learning_rate": tune.loguniform(1e-4, 1e-2),                   # 初始学习率
    "scaler_type": tune.choice([None]),                             # 标量类型
    "max_steps": tune.choice([1000]),                               # 最大训练迭代次数
    "batch_size": tune.choice([1, 4, 10]),                          # 批量中的系列数量
    "windows_batch_size": tune.choice([128, 256, 512]),             # 批处理中的窗口数量
    "random_seed": tune.randint(1, 20),                             # 随机种子
}

config_lstm = {
    "input_size": tune.choice([48, 48*2, 48*3]),              # 输入窗口长度
    "encoder_hidden_size": tune.choice([64, 128]),            # LSTM 单元隐藏层大小
    "encoder_n_layers": tune.choice([2,4]),                   # LSTM中的层数
    "learning_rate": tune.loguniform(1e-4, 1e-2),             # 初始学习率
    "scaler_type": tune.choice(['robust']),                   # 标量类型
    "max_steps": tune.choice([500, 1000]),                    # 最大训练迭代次数
    "batch_size": tune.choice([1, 4]),                        # 批量中的系列数量
    "random_seed": tune.randint(1, 20),                       # 随机种子
}
nf = NeuralForecast(
    models=[
        AutoNHITS(h=48, config=config_nhits, loss=MQLoss(), num_samples=5),
        AutoLSTM(h=48, config=config_lstm, loss=MQLoss(), num_samples=2), 
    ],
    freq='H'
)

%%capture
cv_df = nf.cross_validation(Y_df, n_windows=2)

Global seed set to 4
Global seed set to 19

cv_df 对象是一个新的数据框，包括以下列：

unique_id：标识每个时间序列
ds：日期戳或时间索引
cutoff：n_windows的最后一个日期戳或时间索引。如果 n_windows=1，则只有一个唯一的 cutoff 值；如果 n_windows=2，则有两个唯一的 cutoff 值。
y：真实值
"model"：包含模型名称和拟合值的列。

cv_df.columns = cv_df.columns.str.replace('-median', '')

cv_df.head()

	unique_id	ds	cutoff	AutoNHITS	AutoNHITS-lo-90	AutoNHITS-lo-80	AutoNHITS-hi-80	AutoNHITS-hi-90	AutoLSTM	AutoLSTM-lo-90	AutoLSTM-lo-80	AutoLSTM-hi-80	AutoLSTM-hi-90	y
0	H1	700	699	646.881714	601.402893	626.471008	672.432617	683.847778	633.707031	365.139832	407.289246	871.474976	925.476196	684.0
1	H1	701	699	635.608643	595.042908	612.889771	669.565979	679.472900	632.455017	365.303131	406.472992	869.484985	922.926514	619.0
2	H1	702	699	592.663940	564.124390	566.502319	648.286072	647.859253	633.002502	365.147522	407.174866	868.677979	925.269409	565.0
3	H1	703	699	543.364563	516.760742	517.990234	603.099182	601.462280	633.903503	364.976746	408.498779	869.797180	925.993164	532.0
4	H1	704	699	498.051178	461.069489	474.206360	540.752563	555.169739	634.015991	363.384155	408.305298	870.154297	920.329224	495.0

for cutoff in cv_df['cutoff'].unique():
    StatsForecast.plot(
        Y_df, 
        cv_df.query('cutoff == @cutoff').drop(columns=['y', 'cutoff']), 
        max_insample_length=48 * 4, 
        unique_ids=['H185'],
        engine='matplotlib'
    )

现在，让我们评估模型的性能。

from datasetsforecast.losses import mse, mae, rmse
from datasetsforecast.evaluation import accuracy

Warning

您也可以使用平均绝对百分比误差（MAPE），然而对于细粒度预测，MAPE 值非常难以判断，并且在评估预测质量时没有用处。

创建一个数据框，包含使用均方误差指标评估交叉验证数据框的结果。

evaluation_df = accuracy(cv_df, [mse, mae, rmse], agg_by=['unique_id'])
evaluation_df['best_model'] = evaluation_df.drop(columns=['metric', 'unique_id']).idxmin(axis=1)
evaluation_df.head()

	metric	unique_id	AutoNHITS	AutoLSTM	best_model
0	mae	H1	38.259457	131.158150	AutoNHITS
1	mae	H10	14.044900	32.972164	AutoNHITS
2	mae	H100	254.464978	281.836064	AutoNHITS
3	mae	H101	257.810841	148.341771	AutoLSTM
4	mae	H102	176.114826	472.413350	AutoNHITS

模型	表现最佳的系列数量
模型 A	5
模型 B	3
模型 C	7
模型 D	2
模型 E	6

summary_df = evaluation_df.groupby(['metric', 'best_model']).size().sort_values().to_frame()

summary_df = summary_df.reset_index()
summary_df.columns = ['metric', 'model', 'nr. of unique_ids']
summary_df

	metric	model	nr. of unique_ids
0	mae	AutoLSTM	1
1	mse	AutoLSTM	1
2	rmse	AutoLSTM	1
3	mae	AutoNHITS	9
4	mse	AutoNHITS	9
5	rmse	AutoNHITS	9

summary_df.query('metric == "mse"')

	metric	model	nr. of unique_ids
1	mse	AutoLSTM	1
4	mse	AutoNHITS	9

您可以通过绘制特定模型获胜的 unique_ids 来进一步探索您的结果。

nhits_ids = evaluation_df.query('best_model == "AutoNHITS" and metric == "mse"')['unique_id'].unique()

StatsForecast.plot(Y_df, fcst_df, unique_ids=nhits_ids, engine='matplotlib')

6. 为每个独特系列选择最佳模型

import pandas as pd

def best_forecast(predictions_df, evaluation_df):
    """
    该函数接收一个包含预测结果的数据框和一个评估数据框，并返回一个数据框，
    其中包含每个 unique_id 的最佳预测。
    
    参数:
    predictions_df: 包含预测结果的数据框，必须包含 'unique_id' 和 'forecast' 列。
    evaluation_df: 包含真实值的数据框，必须包含 'unique_id' 和 'actual' 列。
    
    返回:
    best_forecast_df: 数据框，包含每个 unique_id 的最佳预测。
    """
    # 合并预测和评估数据框
    merged_df = predictions_df.merge(evaluation_df, on='unique_id', how='left')
    
    # 计算每个预测的误差
    merged_df['error'] = (merged_df['forecast'] - merged_df['actual']).abs()
    
    # 每个 unique_id 查找误差最小的预测
    best_forecast_df = merged_df.loc[merged_df.groupby('unique_id')['error'].idxmin()]
    
    # 选择需要的列并重命名
    best_forecast_df = best_forecast_df[['unique_id', 'forecast']].rename(columns={'forecast': 'best_forecast'})
    
    return best_forecast_df

def get_best_model_forecast(forecasts_df, evaluation_df, metric):
    df = forecasts_df.set_index('ds', append=True).stack().to_frame().reset_index(level=2) # 宽到长 
    df.columns = ['model', 'best_model_forecast'] 
    df = df.join(evaluation_df.query('metric == @metric').set_index('unique_id')[['best_model']])
    df = df.query('model.str.replace("-lo-90|-hi-90", "", regex=True) == best_model').copy()
    df.loc[:, 'model'] = [model.replace(bm, 'best_model') for model, bm in zip(df['model'], df['best_model'])]
    df = df.drop(columns='best_model').set_index('model', append=True).unstack()
    df.columns = df.columns.droplevel()
    df = df.reset_index(level=1)
    return df

创建您的生产就绪数据框，其中包含每个 unique_id 的最佳预测。

prod_forecasts_df = get_best_model_forecast(fcst_df, evaluation_df, metric='mse')

prod_forecasts_df.head()

model	ds	best_model	best_model-hi-90	best_model-lo-90
unique_id
H1	749	550.545288	658.631592	491.368347
H1	750	549.216736	657.615967	491.054932
H1	751	528.075989	642.255005	466.917053
H1	752	486.842255	611.903809	418.012115
H1	753	452.015930	590.465942	371.543884

绘制结果。

StatsForecast.plot(Y_df, prod_forecasts_df, level=[90], engine='matplotlib')

Give us a ⭐ on Github