使用LOESS的多季节趋势分解（MSTL）¶

本笔记本演示了使用 MSTL [1] 将时间序列分解为：趋势分量、多个季节性分量和残差分量。MSTL 使用 STL（基于 LOESS 的季节性趋势分解）迭代地从时间序列中提取季节性分量。MSTL 的关键输入包括：

periods - 每个季节性成分的周期（例如，对于具有每日和每周季节性的小时数据，我们将有：periods=(24, 24*7)。
windows - 每个季节性平滑器相对于每个周期的长度。如果这些值较大，则季节性成分在时间上将显示较少的变异性。必须是奇数。如果 None，则使用原始论文[1]中通过实验确定的一组默认值。
lmbda - Box-Cox变换前的lambda参数，用于分解。如果None，则不进行变换。如果"auto"，则从数据中自动选择合适的lambda值。
iterate - 用于优化季节性成分的迭代次数。
stl_kwargs - 可以传递给STL的所有其他参数（例如，robust、seasonal_deg等）。请参阅STL文档。

[1] K. Bandura, R.J. Hyndman, 和 C. Bergmeir (2021) MSTL: 一种用于具有多季节模式时间序列的季节性趋势分解算法。arXiv 预印本 arXiv:2107.13462。

请注意，此实现与1有一些关键差异。缺失数据必须在MSTL类之外处理。论文中提出的算法处理了没有季节性的情况。此实现假设至少存在一个季节性成分。

首先我们导入所需的包，准备图形环境，并准备数据。

[1]:

import matplotlib.pyplot as plt
import datetime
import pandas as pd
import numpy as np
import seaborn as sns
from pandas.plotting import register_matplotlib_converters

from statsmodels.tsa.seasonal import MSTL
from statsmodels.tsa.seasonal import DecomposeResult

register_matplotlib_converters()
sns.set_style("darkgrid")

[2]:

plt.rc("figure", figsize=(16, 12))
plt.rc("font", size=13)

MSTL 应用于一个玩具数据集¶

创建一个具有多重季节性的玩具数据集¶

我们创建了一个具有每小时频率的时间序列，该序列具有每日和每周的季节性，这些季节性遵循正弦波。我们稍后在笔记本中演示了一个更贴近现实世界的示例。

[3]:

t = np.arange(1, 1000)
daily_seasonality = 5 * np.sin(2 * np.pi * t / 24)
weekly_seasonality = 10 * np.sin(2 * np.pi * t / (24 * 7))
trend = 0.0001 * t**2
y = trend + daily_seasonality + weekly_seasonality + np.random.randn(len(t))
ts = pd.date_range(start="2020-01-01", freq="H", periods=len(t))
df = pd.DataFrame(data=y, index=ts, columns=["y"])

/var/folders/xc/cwj7_pwj6lb0lkpyjtcbm7y80000gn/T/ipykernel_80673/288299940.py:6: FutureWarning: 'H' is deprecated and will be removed in a future version, please use 'h' instead.
  ts = pd.date_range(start="2020-01-01", freq="H", periods=len(t))

[4]:

df.head()

[4]:

	y
2020-01-01 00:00:00	1.749946
2020-01-01 01:00:00	1.473535
2020-01-01 02:00:00	5.653997
2020-01-01 03:00:00	7.034944
2020-01-01 04:00:00	6.379327

让我们绘制时间序列

[5]:

df["y"].plot(figsize=[10, 5])

[5]:

<Axes: >

../../../_images/examples_notebooks_generated_mstl_decomposition_9_1.png

用MSTL分解玩具数据集¶

让我们使用MSTL将时间序列分解为趋势分量、每日和每周的季节性分量以及残差分量。

[6]:

mstl = MSTL(df["y"], periods=[24, 24 * 7])
res = mstl.fit()

如果输入是 pandas 数据框，那么季节性成分的输出也是一个数据框。每个成分的周期反映在列名中。

[7]:

res.seasonal.head()

[7]:

	seasonal_24	seasonal_168
2020-01-01 00:00:00	0.583430	1.346826
2020-01-01 01:00:00	1.650795	0.370467
2020-01-01 02:00:00	3.613729	1.789999
2020-01-01 03:00:00	4.960929	1.323921
2020-01-01 04:00:00	4.688963	2.312499

[8]:

ax = res.plot()

../../../_images/examples_notebooks_generated_mstl_decomposition_15_0.png

我们看到每小时和每周的季节性成分已经被提取出来。

除了 period 和 seasonal（因为它们由 periods 和 windows 在 MSTL 中设置）之外的任何 STL 参数，也可以通过将 arg:value 对作为字典传递给 stl_kwargs 来设置（我们将在示例中展示这一点）。

在这里，我们展示了我们仍然可以通过trend设置STL的趋势平滑器，并通过seasonal_deg设置季节性拟合的多项式阶数。我们还将显式设置windows、seasonal_deg和iterate参数。我们将得到一个较差的拟合结果，但这只是一个如何将这些参数传递给MSTL类的示例。

[9]:

mstl = MSTL(
    df,
    periods=[24, 24 * 7],  # The periods and windows must be the same length and will correspond to one another.
    windows=[101, 101],  # Setting this large along with `seasonal_deg=0` will force the seasonality to be periodic.
    iterate=3,
    stl_kwargs={
                "trend":1001, # Setting this large will force the trend to be smoother.
                "seasonal_deg":0, # Means the seasonal smoother is fit with a moving average.
               }
)
res = mstl.fit()
ax = res.plot()

../../../_images/examples_notebooks_generated_mstl_decomposition_18_0.png

MSTL应用于电力需求数据集¶

准备数据¶

我们将使用这里找到的维多利亚电力需求数据集：https://github.com/tidyverts/tsibbledata/tree/master/data-raw/vic_elec。该数据集用于原始MSTL论文 [1]。它是2002年至2015年初澳大利亚维多利亚州以半小时为粒度的总电力需求。有关数据集的更详细描述可以在这里找到。

[10]:

url = "https://raw.githubusercontent.com/tidyverts/tsibbledata/master/data-raw/vic_elec/VIC2015/demand.csv"
df = pd.read_csv(url)

[11]:

df.head()

[11]:

	Date	Period	OperationalLessIndustrial	Industrial
0	37257	1	3535.867064	1086.132936
1	37257	2	3383.499028	1088.500972
2	37257	3	3655.527552	1084.472448
3	37257	4	3510.446636	1085.553364
4	37257	5	3294.697156	1081.302844

日期是表示从原点日期开始的天数的整数。此数据集的原点日期由这里和这里确定，为“1899-12-30”。Period整数指的是一天中24小时的30分钟间隔，因此每天有48个间隔。

让我们提取日期和日期时间。

[12]:

df["Date"] = df["Date"].apply(lambda x: pd.Timestamp("1899-12-30") + pd.Timedelta(x, unit="days"))
df["ds"] = df["Date"] + pd.to_timedelta((df["Period"]-1)*30, unit="m")

我们将关注OperationalLessIndustrial，这是排除某些高能耗工业用户需求后的电力需求。我们将数据重采样为每小时，并将数据过滤到与原始MSTL论文 [1]相同的时间段，即2012年的前149天。

[13]:

timeseries = df[["ds", "OperationalLessIndustrial"]]
timeseries.columns = ["ds", "y"] # Rename to OperationalLessIndustrial to y for simplicity.

# Filter for first 149 days of 2012.
start_date = pd.to_datetime("2012-01-01")
end_date = start_date + pd.Timedelta("149D")
mask = (timeseries["ds"] >= start_date) & (timeseries["ds"] < end_date)
timeseries = timeseries[mask]

# Resample to hourly
timeseries = timeseries.set_index("ds").resample("H").sum()
timeseries.head()

/var/folders/xc/cwj7_pwj6lb0lkpyjtcbm7y80000gn/T/ipykernel_80673/185151541.py:11: FutureWarning: 'H' is deprecated and will be removed in a future version, please use 'h' instead.
  timeseries = timeseries.set_index("ds").resample("H").sum()

[13]:

	y
ds
2012-01-01 00:00:00	7926.529376
2012-01-01 01:00:00	7901.826990
2012-01-01 02:00:00	7255.721350
2012-01-01 03:00:00	6792.503352
2012-01-01 04:00:00	6635.984460

使用MSTL分解电力需求¶

让我们将MSTL应用于此数据集。

注意：stl_kwargs 设置为给出接近 [1] 的结果，该文献使用了 R，因此对于底层 STL 参数有略微不同的默认设置。在实践中，手动明确设置 inner_iter 和 outer_iter 的情况很少见。

[14]:

mstl = MSTL(timeseries["y"], periods=[24, 24 * 7], iterate=3, stl_kwargs={"seasonal_deg": 0,
                                                                          "inner_iter": 2,
                                                                          "outer_iter": 0})
res = mstl.fit() # Use .fit() to perform and return the decomposition
ax = res.plot()
plt.tight_layout()

../../../_images/examples_notebooks_generated_mstl_decomposition_30_0.png

多个季节性成分存储在 seasonal 属性中的 pandas 数据框中：

[15]:

res.seasonal.head()

[15]:

	seasonal_24	seasonal_168
ds
2012-01-01 00:00:00	-1685.986297	-161.807086
2012-01-01 01:00:00	-1591.640845	-229.788887
2012-01-01 02:00:00	-2192.989492	-260.121300
2012-01-01 03:00:00	-2442.169359	-388.484499
2012-01-01 04:00:00	-2357.492551	-660.245476

让我们更详细地检查季节性成分，并查看前几天的数据，以分析每日和每周的季节性。

[16]:

fig, ax = plt.subplots(nrows=2, figsize=[10,10])
res.seasonal["seasonal_24"].iloc[:24*3].plot(ax=ax[0])
ax[0].set_ylabel("seasonal_24")
ax[0].set_title("Daily seasonality")

res.seasonal["seasonal_168"].iloc[:24*7*3].plot(ax=ax[1])
ax[1].set_ylabel("seasonal_168")
ax[1].set_title("Weekly seasonality")

plt.tight_layout()

../../../_images/examples_notebooks_generated_mstl_decomposition_34_0.png

我们可以看到，电力需求的日季节性得到了很好的捕捉。这是1月份的前几天，因此在澳大利亚的夏季月份，下午有一个高峰，这很可能是由于空调的使用。

对于每周的季节性，我们可以看到周末的使用量较少。

MSTL的优点之一是它允许我们捕捉随时间变化的季节性。因此，让我们来看看五月较凉爽月份的季节性。

[17]:

fig, ax = plt.subplots(nrows=2, figsize=[10,10])
mask = res.seasonal.index.month==5
res.seasonal[mask]["seasonal_24"].iloc[:24*3].plot(ax=ax[0])
ax[0].set_ylabel("seasonal_24")
ax[0].set_title("Daily seasonality")

res.seasonal[mask]["seasonal_168"].iloc[:24*7*3].plot(ax=ax[1])
ax[1].set_ylabel("seasonal_168")
ax[1].set_title("Weekly seasonality")

plt.tight_layout()

../../../_images/examples_notebooks_generated_mstl_decomposition_36_0.png

现在我们可以看到晚上有一个额外的峰值！这可能与现在晚上所需的供暖和照明有关。所以这是有道理的。我们看到周末需求较低的主要每周模式仍在继续。

其他组件也可以从 trend 和 resid 属性中提取：

[18]:

display(res.trend.head()) # trend component
display(res.resid.head()) # residual component

ds
2012-01-01 00:00:00    10373.942662
2012-01-01 01:00:00    10363.488489
2012-01-01 02:00:00    10353.037721
2012-01-01 03:00:00    10342.590527
2012-01-01 04:00:00    10332.147100
Freq: h, Name: trend, dtype: float64

ds
2012-01-01 00:00:00   -599.619903
2012-01-01 01:00:00   -640.231767
2012-01-01 02:00:00   -644.205579
2012-01-01 03:00:00   -719.433316
2012-01-01 04:00:00   -678.424613
Freq: h, Name: resid, dtype: float64

就是这样！使用MSTL，我们可以在多季节性时间序列上执行时间序列分解！

Last update: Oct 16, 2024