PAA和SAX特性

本示例展示了PAA [1]、SAX [2] 和 1d-SAX [3] 特征之间的比较。

PAA(分段聚合近似)对应于原始时间序列的下采样,并且在每个段(段的大小是固定的)中保留平均值。

SAX(符号聚合近似)通过量化均值在PAA的基础上构建。在标准正态分布假设下,计算所有符号的量化边界以使其等概率。

最后,1d-SAX 是 SAX 的扩展,其中每个段由一个仿射函数表示(因此每个段有两个参数被量化:斜率和平均值)。

[1] E. Keogh & M. Pazzani. 扩展动态时间规整以用于数据挖掘应用。SIGKDD 2000, 第285–289页。

[2] J. Lin, E. Keogh, L. Wei, 等. 体验SAX:一种新颖的时间序列符号表示法. 数据挖掘与知识发现, 2007. 卷15(107)

[3] S. Malinowski, T. Guyet, R. Quiniou, R. Tavenard. 1d-SAX: 一种新颖的时间序列符号表示法。IDA 2013.

Raw time series, PAA, SAX, 8 symbols, 1d-SAX, 64 symbols(8x8)
# Author: Romain Tavenard
# License: BSD 3 clause

import numpy
import matplotlib.pyplot as plt

from tslearn.generators import random_walks
from tslearn.preprocessing import TimeSeriesScalerMeanVariance
from tslearn.piecewise import PiecewiseAggregateApproximation
from tslearn.piecewise import SymbolicAggregateApproximation, \
    OneD_SymbolicAggregateApproximation

numpy.random.seed(0)
# Generate a random walk time series
n_ts, sz, d = 1, 100, 1
dataset = random_walks(n_ts=n_ts, sz=sz, d=d)
scaler = TimeSeriesScalerMeanVariance(mu=0., std=1.)  # Rescale time series
dataset = scaler.fit_transform(dataset)

# PAA transform (and inverse transform) of the data
n_paa_segments = 10
paa = PiecewiseAggregateApproximation(n_segments=n_paa_segments)
paa_dataset_inv = paa.inverse_transform(paa.fit_transform(dataset))

# SAX transform
n_sax_symbols = 8
sax = SymbolicAggregateApproximation(n_segments=n_paa_segments,
                                     alphabet_size_avg=n_sax_symbols)
sax_dataset_inv = sax.inverse_transform(sax.fit_transform(dataset))

# 1d-SAX transform
n_sax_symbols_avg = 8
n_sax_symbols_slope = 8
one_d_sax = OneD_SymbolicAggregateApproximation(
    n_segments=n_paa_segments,
    alphabet_size_avg=n_sax_symbols_avg,
    alphabet_size_slope=n_sax_symbols_slope)
transformed_data = one_d_sax.fit_transform(dataset)
one_d_sax_dataset_inv = one_d_sax.inverse_transform(transformed_data)

plt.figure()
plt.subplot(2, 2, 1)  # First, raw time series
plt.plot(dataset[0].ravel(), "b-")
plt.title("Raw time series")

plt.subplot(2, 2, 2)  # Second, PAA
plt.plot(dataset[0].ravel(), "b-", alpha=0.4)
plt.plot(paa_dataset_inv[0].ravel(), "b-")
plt.title("PAA")

plt.subplot(2, 2, 3)  # Then SAX
plt.plot(dataset[0].ravel(), "b-", alpha=0.4)
plt.plot(sax_dataset_inv[0].ravel(), "b-")
plt.title("SAX, %d symbols" % n_sax_symbols)

plt.subplot(2, 2, 4)  # Finally, 1d-SAX
plt.plot(dataset[0].ravel(), "b-", alpha=0.4)
plt.plot(one_d_sax_dataset_inv[0].ravel(), "b-")
plt.title("1d-SAX, %d symbols"
          "(%dx%d)" % (n_sax_symbols_avg * n_sax_symbols_slope,
                       n_sax_symbols_avg,
                       n_sax_symbols_slope))

plt.tight_layout()
plt.show()

脚本总运行时间: (0 分钟 5.561 秒)

Gallery generated by Sphinx-Gallery