有界预测

!pip install -Uqq nixtla pyreadr

from nixtla.utils import in_colab

IN_COLAB = in_colab()

if not IN_COLAB:
    from nixtla.utils import colab_badge
    from itertools import product
    from fastcore.test import test_eq, test_fail, test_warns
    from dotenv import load_dotenv

在预测中，我们通常希望确保预测结果保持在某个范围内。例如，在预测产品销售时，我们可能要求所有的预测值为正。因此，预测值可能需要有界限。

使用TimeGPT，您可以通过在调用预测函数之前转换数据来创建有界的预测。

if not IN_COLAB:
    load_dotenv()    
    colab_badge('docs/tutorials/13_bounded_forecasts')

1. 导入包

首先，我们安装并导入所需的包

import pandas as pd
import numpy as np

from nixtla import NixtlaClient

nixtla_client = NixtlaClient(
    # defaults to os.environ.get("NIXTLA_API_KEY")
    api_key = 'my_api_key_provided_by_nixtla'
)

👍 使用 Azure AI 端点

要使用 Azure AI 端点，设置 base_url 参数：

nixtla_client = NixtlaClient(base_url="你的 Azure AI 端点", api_key="你的 api_key")

if not IN_COLAB:
    nixtla_client = NixtlaClient()

2. 加载数据

我们使用来自预测，原理与实践的年度鸡蛋价格数据集。我们期待鸡蛋价格严格为正，因此我们希望将我们的预测限制为正值。

Note

您可以使用 pip 安装 pyreadr：

pip install pyreadr

import pyreadr
from pathlib import Path

# 下载并存储数据集
url = 'https://github.com/robjhyndman/fpp3package/raw/master/data/prices.rda'
dst_path = str(Path.cwd().joinpath('prices.rda'))
result = pyreadr.read_r(pyreadr.download_file(url, dst_path), dst_path)

# 执行一些预处理
df = result['prices'][['year', 'eggs']]
df = df.dropna().reset_index(drop=True)
df = df.rename(columns={'year':'ds', 'eggs':'y'})
df['ds'] = pd.to_datetime(df['ds'], format='%Y')
df['unique_id'] = 'eggs'

df.tail(10)

	ds	y	unique_id
84	1984-01-01	100.58	eggs
85	1985-01-01	76.84	eggs
86	1986-01-01	81.10	eggs
87	1987-01-01	69.60	eggs
88	1988-01-01	64.55	eggs
89	1989-01-01	80.36	eggs
90	1990-01-01	79.79	eggs
91	1991-01-01	74.79	eggs
92	1992-01-01	64.86	eggs
93	1993-01-01	62.27	eggs

我们可以看看20世纪价格的发展情况，展示价格呈下降趋势。

nixtla_client.plot(df)

3. 使用TimeGPT的有界预测

首先，我们对目标数据进行转换。在这种情况下，我们将在预测之前对数据进行对数转换，从而使我们只能预测正价格。

df_transformed = df.copy()
df_transformed['y'] = np.log(df_transformed['y'])

我们将创建未来10年的预测，同时包含我们预测分布的80%、90%和99.5%的百分位数。

timegpt_fcst_with_transform = nixtla_client.forecast(df=df_transformed, h=10, freq='Y', level=[80, 90, 99.5])

INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Inferred freq: AS-JAN
INFO:nixtla.nixtla_client:Restricting input...
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...

📘 Azure AI中的可用模型

如果您正在使用Azure AI端点，请确保设置model="azureai"：

nixtla_client.forecast(..., model="azureai")

对于公共API，我们支持两个模型：timegpt-1和timegpt-1-long-horizon。

默认情况下，使用timegpt-1。请参阅本教程了解如何以及何时使用timegpt-1-long-horizon。

在生成预测之后，我们需要对之前应用的变换进行逆变换。对于对数变换，这简单意味着我们需要对预测值进行指数运算：

cols_to_transform = [col for col in timegpt_fcst_with_transform if col not in ['unique_id', 'ds']]
for col in cols_to_transform:
    timegpt_fcst_with_transform[col] = np.exp(timegpt_fcst_with_transform[col])

现在，我们可以绘制预测图。我们包括多个预测区间，表示我们预测分布的80%、90%和99.5%分位数。

nixtla_client.plot(
    df, 
    timegpt_fcst_with_transform, 
    level=[80, 90, 99.5],
    max_insample_length=20
)

预测和预测区间看起来合理。

让我们将这些预测与不应用转换的情况进行比较。在这种情况下，可能会预测到负价格。

timegpt_fcst_without_transform = nixtla_client.forecast(df=df, h=10, freq='Y', level=[80, 90, 99.5])

INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Inferred freq: AS-JAN
INFO:nixtla.nixtla_client:Restricting input...
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...

确实，我们现在观察到预测区间变为负值：

nixtla_client.plot(
    df, 
    timegpt_fcst_without_transform, 
    level=[80, 90, 99.5],
    max_insample_length=20
)

例如，在1995年：

timegpt_fcst_without_transform

	unique_id	ds	TimeGPT	TimeGPT-lo-99.5	TimeGPT-lo-90	TimeGPT-lo-80	TimeGPT-hi-80	TimeGPT-hi-90	TimeGPT-hi-99.5
0	eggs	1994-01-01	66.859756	43.103240	46.131448	49.319034	84.400479	87.588065	90.616273
1	eggs	1995-01-01	64.993477	-20.924112	-4.750041	12.275298	117.711656	134.736995	150.911066
2	eggs	1996-01-01	66.695808	6.499170	8.291150	10.177444	123.214173	125.100467	126.892446
3	eggs	1997-01-01	66.103325	17.304282	24.966939	33.032894	99.173756	107.239711	114.902368
4	eggs	1998-01-01	67.906517	4.995371	12.349648	20.090992	115.722042	123.463386	130.817663
5	eggs	1999-01-01	66.147575	29.162207	31.804460	34.585779	97.709372	100.490691	103.132943
6	eggs	2000-01-01	66.062637	14.671932	19.305822	24.183601	107.941673	112.819453	117.453343
7	eggs	2001-01-01	68.045769	3.915282	13.188964	22.950736	113.140802	122.902573	132.176256
8	eggs	2002-01-01	66.718903	-42.212631	-30.583703	-18.342726	151.780531	164.021508	175.650436
9	eggs	2003-01-01	67.344078	-86.239911	-44.959745	-1.506939	136.195095	179.647901	220.928067

这演示了对数变换的价值，以便使用TimeGPT获得有界的预测，这使我们能够获得更好校准的预测区间。

参考文献

Hyndman, Rob J. 和 George Athanasopoulos (2021). “预测：原理与实践 (第3版)”

Give us a ⭐ on Github