外生变量

!pip install -Uqq nixtla

from nixtla.utils import in_colab

IN_COLAB = in_colab()

if not IN_COLAB:
    from nixtla.utils import colab_badge
    from dotenv import load_dotenv

外生变量或外部因素在时间序列预测中至关重要，因为它们提供了可能影响预测的额外信息。这些变量可能包括假期标记、营销支出、天气数据或与您预测的时间序列数据相关的任何其他外部数据。

例如，如果您正在预测冰淇淋销量，温度数据可以作为一个有用的外生变量。在较热的日子里，冰淇淋销量可能会增加。

要在TimeGPT中纳入外生变量，您需要将时间序列数据中的每个点与相应的外部数据配对。

if not IN_COLAB:
    load_dotenv()
    colab_badge('docs/tutorials/01_exogenous_variables')

1. 导入包

首先，我们导入所需的包并初始化Nixtla客户端。

import pandas as pd
from nixtla import NixtlaClient

nixtla_client = NixtlaClient(
    # defaults to os.environ.get("NIXTLA_API_KEY")
    api_key = 'my_api_key_provided_by_nixtla'
)

👍 使用 Azure AI 终端

要使用 Azure AI 终端，请记得还要设置 base_url 参数：

nixtla_client = NixtlaClient(base_url="你的 Azure AI 终端", api_key="你的 api_key")

if not IN_COLAB:
    nixtla_client = NixtlaClient()

2. 加载数据

让我们看一个关于预测第二天电价的例子。下面的数据集包含了五个欧洲和美国市场的每小时电价（y列），由unique_id列识别。Exogenous1到day_6的列是外生变量，TimeGPT将用来预测价格。

df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-with-ex-vars.csv')
df.head()

	unique_id	ds	y	Exogenous1	Exogenous2	day_5
0	BE	2016-10-22 00:00:00	70.00	49593.0	57253.0	1.0
1	BE	2016-10-22 01:00:00	37.10	46073.0	51887.0	1.0
2	BE	2016-10-22 02:00:00	37.10	44927.0	51896.0	1.0
3	BE	2016-10-22 03:00:00	44.75	44483.0	48428.0	1.0
4	BE	2016-10-22 04:00:00	37.10	44338.0	46721.0	1.0

3a. 使用未来外生变量预测电力价格

为了使用未来的外生变量进行预测，我们必须添加外生变量的未来值。让我们读取这个数据集。在这种情况下，我们希望预测24个时间步，因此每个unique_id将有24个观察值。

future_ex_vars_df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-future-ex-vars.csv')
future_ex_vars_df.head()

	unique_id	ds	Exogenous1	Exogenous2	day_5
0	BE	2016-12-31 00:00:00	64108.0	70318.0	1.0
1	BE	2016-12-31 01:00:00	62492.0	67898.0	1.0
2	BE	2016-12-31 02:00:00	61571.0	68379.0	1.0
3	BE	2016-12-31 03:00:00	60381.0	64972.0	1.0
4	BE	2016-12-31 04:00:00	60298.0	62900.0	1.0

让我们调用 forecast 方法，添加以下信息：

timegpt_fcst_ex_vars_df = nixtla_client.forecast(df=df, X_df=future_ex_vars_df, h=24, level=[80, 90])
timegpt_fcst_ex_vars_df.head()

INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Inferred freq: H
INFO:nixtla.nixtla_client:Querying model metadata...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Using future exogenous features: ['Exogenous1', 'Exogenous2', 'day_0', 'day_1', 'day_2', 'day_3', 'day_4', 'day_5', 'day_6']
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...

	unique_id	ds	TimeGPT	TimeGPT-hi-80	TimeGPT-hi-90	TimeGPT-lo-80	TimeGPT-lo-90
0	BE	2016-12-31 00:00:00	74.540771	84.506861	89.003936	64.574681	60.077606
1	BE	2016-12-31 01:00:00	43.344290	52.200879	57.771782	34.487701	28.916798
2	BE	2016-12-31 02:00:00	44.429219	51.034622	57.623160	37.823817	31.235279
3	BE	2016-12-31 03:00:00	38.094396	48.108948	51.528001	28.079844	24.660791
4	BE	2016-12-31 04:00:00	37.389140	46.747685	52.186070	28.030595	22.592211

📘 Azure AI 中可用的模型

如果您正在使用 Azure AI 端点，请确保设置 model="azureai"：

nixtla_client.forecast(..., model="azureai")

对于公共 API，我们支持两个模型：timegpt-1 和 timegpt-1-long-horizon。

默认情况下使用 timegpt-1。请参阅此教程了解如何以及何时使用 timegpt-1-long-horizon。

nixtla_client.plot(
    df[['unique_id', 'ds', 'y']], 
    timegpt_fcst_ex_vars_df, 
    max_insample_length=365, 
    level=[80, 90], 
)

我们还可以展示特征的重要性。

nixtla_client.weights_x.plot.barh(x='features', y='weights')

这个图表显示 Exogenous1 和 Exogenous2 对于这个预测任务来说是最重要的，因为它们的权重最大。

3b. 使用历史外生变量预测电力价格

在上述示例中，我们只加载了未来的外生变量。通常情况下，这些变量是不可用的，因为这些变量是未知的。我们也可以仅使用历史外生变量进行预测。这可以通过简单地省略 X_df 参数来实现。在这种情况下，存在于 df 中的外生变量将被视为历史外生变量。

Important

如果您在模型中包含历史外生变量，您就隐含地对这些外生变量的未来进行了假设。在预测中，建议通过使用未来外生变量使这些假设显性化。

让我们调用 forecast 方法，去掉 X_df：

timegpt_fcst_hist_ex_vars_df = nixtla_client.forecast(df=df, h=24, level=[80, 90])
timegpt_fcst_hist_ex_vars_df.head()

INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Inferred freq: H
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Using historical exogenous features: ['Exogenous1', 'Exogenous2', 'day_0', 'day_1', 'day_2', 'day_3', 'day_4', 'day_5', 'day_6']
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...

	unique_id	ds	TimeGPT	TimeGPT-hi-80	TimeGPT-hi-90	TimeGPT-lo-80	TimeGPT-lo-90
0	BE	2016-12-31 00:00:00	45.769382	55.735472	60.232546	35.803292	31.306217
1	BE	2016-12-31 01:00:00	47.991004	56.847593	62.418496	39.134415	33.563512
2	BE	2016-12-31 02:00:00	49.496135	56.101537	62.690075	42.890732	36.302195
3	BE	2016-12-31 03:00:00	49.510808	59.525360	62.944413	39.496257	36.077203
4	BE	2016-12-31 04:00:00	48.510558	57.869103	63.307488	39.152014	33.713629

📘 Azure AI中可用的模型

如果您正在使用Azure AI端点，请确保设置 model="azureai"：

nixtla_client.forecast(..., model="azureai")

对于公共API，我们支持两个模型：timegpt-1 和 timegpt-1-long-horizon。

默认情况下，使用 timegpt-1。请参阅本教程了解如何以及何时使用 timegpt-1-long-horizon。

nixtla_client.plot(
    df[['unique_id', 'ds', 'y']], 
    timegpt_fcst_hist_ex_vars_df, 
    max_insample_length=365, 
    level=[80, 90], 
)

3c. 使用未来和历史外生变量预测电价

第三种选择是使用历史和未来的外生变量。例如，我们可能没有可用的 Exogenous1 和 Exogenous2 的未来信息。在这个例子中，我们从未来的外生数据框中删除这些变量（因为我们假设不知道这些变量的未来值）。

future_ex_vars_df_limited = future_ex_vars_df.drop(columns = ["Exogenous1", "Exogenous2"])
timegpt_fcst_ex_vars_df_limited = nixtla_client.forecast(df=df, X_df=future_ex_vars_df_limited, h=24, level=[80, 90])

INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Inferred freq: H
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Using historical exogenous features: ['Exogenous2', 'Exogenous1']
INFO:nixtla.nixtla_client:Using future exogenous features: ['day_0', 'day_1', 'day_2', 'day_3', 'day_4', 'day_5', 'day_6']
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...

📘 Azure AI中可用模型

如果您正在使用Azure AI端点，请确保设置model="azureai"：

nixtla_client.forecast(..., model="azureai")

对于公共API，我们支持两种模型：timegpt-1及timegpt-1-long-horizon。

默认情况下使用timegpt-1。有关如何以及何时使用timegpt-1-long-horizon的说明，请参见此教程。

nixtla_client.plot(
    df[['unique_id', 'ds', 'y']], 
    timegpt_fcst_ex_vars_df_limited, 
    max_insample_length=365, 
    level=[80, 90], 
)

注意，TimeGPT 会告知您哪些变量被用作历史外生变量，哪些变量被用作未来外生变量。

3d. 预测未来的外生变量

第四个选项是在未来的外生变量不可用的情况下对它们进行预测。下面，我们将向您展示如何单独预测 Exogenous1 和 Exogenous2，以便在未来的外生变量不可用时生成它们。

# 我们读取数据，并为想要分别预测的历史外生变量创建了独立的数据框。
df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-with-ex-vars.csv')
df_exog1 = df[['unique_id', 'ds', 'Exogenous1']]
df_exog2 = df[['unique_id', 'ds', 'Exogenous2']]

接下来，我们可以使用TimeGPT预测Exogenous1和Exogenous2。在这种情况下，我们假设这些量可以单独预测。

timegpt_fcst_ex1 = nixtla_client.forecast(df=df_exog1, h=24, target_col='Exogenous1')
timegpt_fcst_ex2 = nixtla_client.forecast(df=df_exog2, h=24, target_col='Exogenous2')

INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Inferred freq: H
INFO:nixtla.nixtla_client:Restricting input...
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Inferred freq: H
INFO:nixtla.nixtla_client:Restricting input...
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...

📘 Azure AI 中可用的模型

如果您使用的是 Azure AI 端点，请确保设置 model="azureai"：

nixtla_client.forecast(..., model="azureai")

对于公共 API，我们支持两个模型：timegpt-1 和 timegpt-1-long-horizon。

默认情况下，使用 timegpt-1。请查看本教程，了解如何以及何时使用 timegpt-1-long-horizon。

我们现在可以开始创建 X_df，其中包含未来的外生变量。

timegpt_fcst_ex1 = timegpt_fcst_ex1.rename(columns={'TimeGPT':'Exogenous1'})
timegpt_fcst_ex2 = timegpt_fcst_ex2.rename(columns={'TimeGPT':'Exogenous2'})

X_df = timegpt_fcst_ex1.merge(timegpt_fcst_ex2)

接下来，我们还需要添加 day_0 到 day_6 的未来外部变量。这很简单：这就是星期几，我们可以从 ds 列中提取。

# 我们有7天，每一天用一个单独的列表示1或0。
for i in range(7):
    X_df[f'day_{i}'] = 1 * (pd.to_datetime(X_df['ds']).dt.weekday == i)

我们现在已经创建了 X_df，让我们来研究一下它：

X_df.head(10)

	unique_id	ds	Exogenous1	Exogenous2	day_5
0	BE	2016-12-31 00:00:00	66282.507812	70861.390625	1
1	BE	2016-12-31 01:00:00	64465.335938	67851.718750	1
2	BE	2016-12-31 02:00:00	63257.125000	67246.546875	1
3	BE	2016-12-31 03:00:00	62059.343750	64027.210938	1
4	BE	2016-12-31 04:00:00	61247.132812	61523.867188	1
5	BE	2016-12-31 05:00:00	62052.453125	63053.929688	1
6	BE	2016-12-31 06:00:00	63457.507812	65199.175781	1
7	BE	2016-12-31 07:00:00	65388.433594	68285.367188	1
8	BE	2016-12-31 08:00:00	67406.664062	72037.671875	1
9	BE	2016-12-31 09:00:00	68057.156250	72820.468750	1

让我们将其与我们预先加载的版本进行比较：

future_ex_vars_df.head(10)

	unique_id	ds	Exogenous1	Exogenous2	day_5
0	BE	2016-12-31 00:00:00	64108.0	70318.0	1.0
1	BE	2016-12-31 01:00:00	62492.0	67898.0	1.0
2	BE	2016-12-31 02:00:00	61571.0	68379.0	1.0
3	BE	2016-12-31 03:00:00	60381.0	64972.0	1.0
4	BE	2016-12-31 04:00:00	60298.0	62900.0	1.0
5	BE	2016-12-31 05:00:00	60339.0	62364.0	1.0
6	BE	2016-12-31 06:00:00	62576.0	64242.0	1.0
7	BE	2016-12-31 07:00:00	63732.0	65884.0	1.0
8	BE	2016-12-31 08:00:00	66235.0	68217.0	1.0
9	BE	2016-12-31 09:00:00	66801.0	69921.0	1.0

正如您所看到的，Exogenous1 和 Exogenous2 的值略有不同，这很有道理，因为我们使用 TimeGPT 对这些值进行了预测。

让我们使用我们的新 X_df 创建电价的新预测，借助 TimeGPT：

timegpt_fcst_ex_vars_df_new = nixtla_client.forecast(df=df, X_df=X_df, h=24, level=[80, 90])
timegpt_fcst_ex_vars_df_new.head()

INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Inferred freq: H
INFO:nixtla.nixtla_client:Using the following exogenous variables: Exogenous1, Exogenous2, day_0, day_1, day_2, day_3, day_4, day_5, day_6
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...

	unique_id	ds	TimeGPT	TimeGPT-lo-90	TimeGPT-lo-80	TimeGPT-hi-80	TimeGPT-hi-90
0	BE	2016-12-31 00:00:00	46.578371	40.398307	41.808656	51.348086	52.758435
1	BE	2016-12-31 01:00:00	37.258364	28.092805	30.929055	43.587673	46.423923
2	BE	2016-12-31 02:00:00	41.779458	29.432284	35.379695	48.179221	54.126632
3	BE	2016-12-31 03:00:00	37.822341	25.122863	31.484450	44.160232	50.521820
4	BE	2016-12-31 04:00:00	37.389141	23.840454	28.535553	46.242729	50.937828

📘 Azure AI 中可用的模型

如果您正在使用 Azure AI 端点，请确保设置 model="azureai"：

nixtla_client.forecast(..., model="azureai")

对于公共 API，我们支持两种模型：timegpt-1 和 timegpt-1-long-horizon。

默认情况下使用 timegpt-1。请参阅本教程了解如何以及何时使用 timegpt-1-long-horizon。

让我们创建一个包含两个预测的合并数据框，并绘制这些值以比较预测。

timegpt_fcst_ex_vars_df = timegpt_fcst_ex_vars_df.rename(columns={'TimeGPT':'TimeGPT-provided_exogenous'})
timegpt_fcst_ex_vars_df_new = timegpt_fcst_ex_vars_df_new.rename(columns={'TimeGPT':'TimeGPT-forecasted_exogenous'})

forecasts = timegpt_fcst_ex_vars_df[['unique_id', 'ds', 'TimeGPT-provided_exogenous']].merge(timegpt_fcst_ex_vars_df_new[['unique_id', 'ds', 'TimeGPT-forecasted_exogenous']])

nixtla_client.plot(
    df[['unique_id', 'ds', 'y']], 
    forecasts, 
    max_insample_length=365, 
)

正如您所看到的，如果我们使用预测的外生变量，我们会得到一个稍微不同的预测。

Give us a ⭐ on Github