异常检测

!pip install -Uqq nixtla

from nixtla.utils import in_colab

IN_COLAB = in_colab()

if not IN_COLAB:
    from nixtla.utils import colab_badge
    from dotenv import load_dotenv

异常检测是检测异常点的任务，这些点偏离了总体系列的正常行为。这在许多应用中至关重要，例如网络安全或设备监控。

在本教程中，我们将详细探讨 TimeGPT 的异常检测能力。

if not IN_COLAB:
    load_dotenv()
    colab_badge('docs/tutorials/20_anomaly_detection')

导入包

首先，我们导入本教程所需的包，并创建一个 NixtlaClient 的实例。

import pandas as pd
from nixtla import NixtlaClient

nixtla_client = NixtlaClient(
    # defaults to os.environ.get("NIXTLA_API_KEY")
    api_key = 'my_api_key_provided_by_nixtla'
)

👍 使用 Azure AI 端点

要使用 Azure AI 端点，请设置 base_url 参数：

nixtla_client = NixtlaClient(base_url="你的 Azure AI 端点", api_key="你的 api_key")

if not IN_COLAB:
    nixtla_client = NixtlaClient()

加载数据集

现在，让我们加载本教程的数据集。我们使用佩顿·曼宁数据集，该数据集跟踪佩顿·曼宁维基百科页面的访问情况。

df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/peyton_manning.csv')

df.head()

	timestamp	value
0	2007-12-10	9.590761
1	2007-12-11	8.519590
2	2007-12-12	8.183677
3	2007-12-13	8.072467
4	2007-12-14	7.893572

nixtla_client.plot(
    df,
    time_col='timestamp',
    target_col='value',
    max_insample_length=365
)

异常检测

我们现在执行异常检测。默认情况下，TimeGPT 使用 99% 的置信区间。如果一个点落在该区间之外，则被视为异常。

anomalies_df = nixtla_client.detect_anomalies(
    df, 
    time_col='timestamp', 
    target_col='value', 
    freq='D'
)

anomalies_df.head()

INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Calling Anomaly Detector Endpoint...

	timestamp	TimeGPT-lo-99	TimeGPT	TimeGPT-hi-99
0	2008-01-10	6.936009	8.224194	9.512378
1	2008-01-11	6.863336	8.151521	9.439705
2	2008-01-12	6.839064	8.127249	9.415433
3	2008-01-13	7.629072	8.917256	10.205441
4	2008-01-14	7.714111	9.002295	10.290480

📘 Azure AI中可用的模型

如果您正在使用Azure AI端点，请确保设置model="azureai"：

nixtla_client.detect_anomalies(..., model="azureai")

对于公共API，我们支持两种模型：timegpt-1和timegpt-1-long-horizon。

默认情况下，使用timegpt-1。有关何时及如何使用timegpt-1-long-horizon的更多信息，请参见本教程。

如您所见，0 被分配给“正常”值，因为它们落在置信区间内。异常点则被分配标签 1。

我们还可以使用 NixtlaClient 绘制这些异常。

nixtla_client.plot(
    df, 
    anomalies_df,
    time_col='timestamp', 
    target_col='value'
)

带外部特征的异常检测

之前，我们在不使用任何外部特征的情况下进行了异常检测。现在，可以专门为此场景创建特征，以帮助模型进行异常检测的任务。

在这里，我们创建可以被模型使用的日期特征。

这可以通过 date_features 参数来完成。我们可以将其设为 True，它会从给定的日期和数据频率中生成所有可能的特征。或者，我们可以指定一个我们想要的特征列表。在这种情况下，我们只想要月份和年份级别的特征。

anomalies_df_x = nixtla_client.detect_anomalies(
    df, time_col='timestamp', 
    target_col='value', 
    freq='D', 
    date_features=['month', 'year'],
)

INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Calling Anomaly Detector Endpoint...
INFO:nixtla.nixtla_client:Using the following exogenous variables: month_1, month_2, month_3, month_4, month_5, month_6, month_7, month_8, month_9, month_10, month_11, month_12, year_2007, year_2008, year_2009, year_2010, year_2011, year_2012, year_2013, year_2014, year_2015, year_2016

然后，我们可以绘制每个特征的权重，以理解其对异常检测的影响。

nixtla_client.weights_x.plot.barh(x='features', y='weights')

修改置信区间

我们可以通过 level 参数来调整置信区间。此参数接受 0 到 100 之间的任何值，包括小数。

减少置信区间会导致检测到更多的异常，而增加置信区间将减少异常的数量。

例如，这里我们将区间减少到 70%，我们会注意到绘制的异常点（红点）增加了。

anomalies_df = nixtla_client.detect_anomalies(
    df, 
    time_col='timestamp', 
    target_col='value', 
    freq='D',
    level=70
)

INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Calling Anomaly Detector Endpoint...

nixtla_client.plot(
    df, 
    anomalies_df,
    time_col='timestamp', 
    target_col='value'
)

Give us a ⭐ on Github