使用特定损失函数的微调

!pip install -Uqq nixtla utilsforecast

from nixtla.utils import in_colab

IN_COLAB = in_colab()

if not IN_COLAB:
    from nixtla.utils import colab_badge
    from dotenv import load_dotenv

在微调时，模型会在您的数据集上进行训练，以使其预测适应您的特定场景。因此，可以指定在微调过程中使用的损失函数。

具体来说，您可以选择以下之一：

"default" - 一种对离群值具有鲁棒性的专有损失函数
"mae" - 平均绝对误差
"mse" - 均方误差
"rmse" - 均方根误差
"mape" - 平均绝对百分比误差
"smape" - 对称平均绝对百分比误差

if not IN_COLAB:
    load_dotenv()    
    colab_badge('docs/tutorials/07_loss_function_finetuning')

1. 导入包

首先，我们导入所需的包，并初始化Nixtla客户端。

import pandas as pd
from nixtla import NixtlaClient
from utilsforecast.losses import mae, mse, rmse, mape, smape

nixtla_client = NixtlaClient(
    # defaults to os.environ.get("NIXTLA_API_KEY")
    api_key = 'my_api_key_provided_by_nixtla'
)

👍 使用Azure AI端点

要使用Azure AI端点，请记得同时设置base_url参数：

nixtla_client = NixtlaClient(base_url="您的Azure AI端点", api_key="您的API密钥")

if not IN_COLAB:
    nixtla_client = NixtlaClient()

2. 加载数据

让我们通过均值绝对误差（MAE）对模型进行微调。

为此，我们只需将表示损失函数的适当字符串传递给 forecast 方法的 finetune_loss 参数。

df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv')
df.insert(loc=0, column='unique_id', value=1)

df.head()

	unique_id	timestamp	value
0	1	1949-01-01	112
1	1	1949-02-01	118
2	1	1949-03-01	132
3	1	1949-04-01	129
4	1	1949-05-01	121

3. 使用平均绝对误差进行微调

让我们使用平均绝对误差 (MAE) 在一个数据集上微调模型。

为此，我们只需将表示损失函数的适当字符串传递给 forecast 方法的 finetune_loss 参数。

timegpt_fcst_finetune_mae_df = nixtla_client.forecast(
    df=df, 
    h=12, 
    finetune_steps=10,
    finetune_loss='mae',   # 设定你期望的损失函数
    time_col='timestamp', 
    target_col='value',
)

INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Inferred freq: MS
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...

📘 Azure AI 中可用的模型

如果您正在使用 Azure AI 端点，请确保设置 model="azureai"：

nixtla_client.forecast(..., model="azureai")

对于公共 API，我们支持两个模型：timegpt-1 和 timegpt-1-long-horizon。

默认情况下使用 timegpt-1。请参阅本教程了解如何以及何时使用 timegpt-1-long-horizon。

nixtla_client.plot(
    df, timegpt_fcst_finetune_mae_df, 
    time_col='timestamp', target_col='value',
)

现在，根据您的数据，您将使用特定的误差指标来准确评估您的预测模型的性能。

以下是关于根据用例选择指标的非详尽指南。

平均绝对误差 (MAE)

$\mathrm{MAE}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}_{\tau}) = \frac{1}{H} \sum^{t+H}_{\tau=t+1} |y_{\tau} - \hat{y}_{\tau}|$

对异常值具有鲁棒性
容易理解
您同样关注所有误差大小
与您的数据单位相同

均方误差 (MSE)

$\mathrm{MSE}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}_{\tau}) = \frac{1}{H} \sum^{t+H}_{\tau=t+1} (y_{\tau} - \hat{y}_{\tau})^{2}$

您想对大误差施加更大的惩罚，而不是小误差
对异常值敏感
在必须避免大误差时使用
与您的数据单位不同

平方根均方误差 (RMSE)

$\mathrm{RMSE}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}_{\tau}) = \sqrt{\frac{1}{H} \sum^{t+H}_{\tau=t+1} (y_{\tau} - \hat{y}_{\tau})^{2}}$

将均方误差转换回数据的原始单位
对大误差施加更大的惩罚，而不是小误差

平均绝对百分比误差 (MAPE)

$\mathrm{MAPE}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}_{\tau}) = \frac{1}{H} \sum^{t+H}_{\tau=t+1} \frac{|y_{\tau}-\hat{y}_{\tau}|}{|y_{\tau}|}$

对非技术利益相关者容易理解
以百分比表示
对正误差施加更重的惩罚，而对负误差施加轻的惩罚
如果您的数据值接近0或等于0，则应避免使用

对称平均绝对百分比误差 (sMAPE)

$\mathrm{SMAPE}_{2}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}_{\tau}) = \frac{1}{H} \sum^{t+H}_{\tau=t+1} \frac{|y_{\tau}-\hat{y}_{\tau}|}{|y_{\tau}|+|\hat{y}_{\tau}|}$

修正了MAPE的偏差
对过度和低估预测同样敏感
如果您的数据值接近0或等于0，则应避免使用

使用TimeGPT，您可以在微调过程中选择损失函数，以最大化模型在特定用例上的性能指标。

让我们进行一个小实验，看看每个损失函数在与默认设置相比时如何改善其相关指标。

train = df[:-36]
test = df[-36:]

losses = ['default', 'mae', 'mse', 'rmse', 'mape', 'smape']

test = test.copy()

for loss in losses:
    preds_df = nixtla_client.forecast(
    df=train, 
    h=36, 
    finetune_steps=10,
    finetune_loss=loss,
    time_col='timestamp', 
    target_col='value')

    preds = preds_df['TimeGPT'].values

    test.loc[:,f'TimeGPT_{loss}'] = preds

INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Inferred freq: MS
WARNING:nixtla.nixtla_client:The specified horizon "h" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Inferred freq: MS
WARNING:nixtla.nixtla_client:The specified horizon "h" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Inferred freq: MS
WARNING:nixtla.nixtla_client:The specified horizon "h" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Inferred freq: MS
WARNING:nixtla.nixtla_client:The specified horizon "h" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Inferred freq: MS
WARNING:nixtla.nixtla_client:The specified horizon "h" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Inferred freq: MS
WARNING:nixtla.nixtla_client:The specified horizon "h" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...

📘 Azure AI中的可用模型

如果您正在使用Azure AI端点，请确保设置model="azureai"：

nixtla_client.forecast(..., model="azureai")

对于公共API，我们支持两种模型：timegpt-1和timegpt-1-long-horizon。

默认情况下，使用timegpt-1。有关如何以及何时使用timegpt-1-long-horizon的详细信息，请参见本教程。

test.head()

	unique_id	timestamp	value	TimeGPT_default	TimeGPT_mae	TimeGPT_mse	TimeGPT_rmse	TimeGPT_mape	TimeGPT_smape
108	1	1958-01-01	340	347.134094	341.933563	347.600616	347.059113	356.154938	341.958679
109	1	1958-02-01	318	345.739746	343.268738	346.399963	345.678314	354.163422	343.929657
110	1	1958-03-01	362	394.611450	390.873169	395.436646	394.636627	396.496155	392.543640
111	1	1958-04-01	348	404.133545	400.997070	404.369598	403.498901	396.927185	402.459625
112	1	1958-05-01	363	421.236542	418.793365	422.122223	421.541443	410.335663	422.161255

很好！我们已经有了使用所有不同损失函数的TimeGPT的预测。我们可以使用其相关度量来评估性能并衡量改进。

loss_fct_dict = {
    "mae": mae,
    "mse": mse,
    "rmse": rmse,
    "mape": mape,
    "smape": smape
}

pct_improv = []

for loss in losses[1:]:
    evaluation = loss_fct_dict[f'{loss}'](test, models=['TimeGPT_default', f'TimeGPT_{loss}'], id_col='unique_id', target_col='value')
    pct_diff = (evaluation['TimeGPT_default'] - evaluation[f'TimeGPT_{loss}']) / evaluation['TimeGPT_default'] * 100
    pct_improv.append(round(pct_diff, 2))

data = {
    'mae': pct_improv[0].values,
    'mse': pct_improv[1].values,
    'rmse': pct_improv[2].values,
    'mape': pct_improv[3].values,
    'smape': pct_improv[4].values
}

metrics_df = pd.DataFrame(data)
metrics_df.index = ['Metric improvement (%)']

metrics_df

	mae	mse	rmse	mape	smape
Metric improvement (%)	8.54	0.31	0.64	31.02	7.36

从上表可以看出，在微调过程中使用特定的损失函数会提高其相关的误差指标，相较于默认损失函数。

在这个例子中，使用MAE作为损失函数时，相较于使用默认损失函数，指标提高了8.54%。

因此，根据您的用例和性能指标，您可以使用适当的损失函数来最大化预测的准确性。

Give us a ⭐ on Github