注意
转到末尾 以下载完整的示例代码
PAA和SAX特性¶
本示例展示了PAA [1]、SAX [2] 和 1d-SAX [3] 特征之间的比较。
PAA(分段聚合近似)对应于原始时间序列的下采样,并且在每个段(段的大小是固定的)中保留平均值。
SAX(符号聚合近似)通过量化均值在PAA的基础上构建。在标准正态分布假设下,计算所有符号的量化边界以使其等概率。
最后,1d-SAX 是 SAX 的扩展,其中每个段由一个仿射函数表示(因此每个段有两个参数被量化:斜率和平均值)。
[1] E. Keogh & M. Pazzani. 扩展动态时间规整以用于数据挖掘应用。SIGKDD 2000, 第285–289页。
[2] J. Lin, E. Keogh, L. Wei, 等. 体验SAX:一种新颖的时间序列符号表示法. 数据挖掘与知识发现, 2007. 卷15(107)
[3] S. Malinowski, T. Guyet, R. Quiniou, R. Tavenard. 1d-SAX: 一种新颖的时间序列符号表示法。IDA 2013.
# Author: Romain Tavenard
# License: BSD 3 clause
import numpy
import matplotlib.pyplot as plt
from tslearn.generators import random_walks
from tslearn.preprocessing import TimeSeriesScalerMeanVariance
from tslearn.piecewise import PiecewiseAggregateApproximation
from tslearn.piecewise import SymbolicAggregateApproximation, \
OneD_SymbolicAggregateApproximation
numpy.random.seed(0)
# Generate a random walk time series
n_ts, sz, d = 1, 100, 1
dataset = random_walks(n_ts=n_ts, sz=sz, d=d)
scaler = TimeSeriesScalerMeanVariance(mu=0., std=1.) # Rescale time series
dataset = scaler.fit_transform(dataset)
# PAA transform (and inverse transform) of the data
n_paa_segments = 10
paa = PiecewiseAggregateApproximation(n_segments=n_paa_segments)
paa_dataset_inv = paa.inverse_transform(paa.fit_transform(dataset))
# SAX transform
n_sax_symbols = 8
sax = SymbolicAggregateApproximation(n_segments=n_paa_segments,
alphabet_size_avg=n_sax_symbols)
sax_dataset_inv = sax.inverse_transform(sax.fit_transform(dataset))
# 1d-SAX transform
n_sax_symbols_avg = 8
n_sax_symbols_slope = 8
one_d_sax = OneD_SymbolicAggregateApproximation(
n_segments=n_paa_segments,
alphabet_size_avg=n_sax_symbols_avg,
alphabet_size_slope=n_sax_symbols_slope)
transformed_data = one_d_sax.fit_transform(dataset)
one_d_sax_dataset_inv = one_d_sax.inverse_transform(transformed_data)
plt.figure()
plt.subplot(2, 2, 1) # First, raw time series
plt.plot(dataset[0].ravel(), "b-")
plt.title("Raw time series")
plt.subplot(2, 2, 2) # Second, PAA
plt.plot(dataset[0].ravel(), "b-", alpha=0.4)
plt.plot(paa_dataset_inv[0].ravel(), "b-")
plt.title("PAA")
plt.subplot(2, 2, 3) # Then SAX
plt.plot(dataset[0].ravel(), "b-", alpha=0.4)
plt.plot(sax_dataset_inv[0].ravel(), "b-")
plt.title("SAX, %d symbols" % n_sax_symbols)
plt.subplot(2, 2, 4) # Finally, 1d-SAX
plt.plot(dataset[0].ravel(), "b-", alpha=0.4)
plt.plot(one_d_sax_dataset_inv[0].ravel(), "b-")
plt.title("1d-SAX, %d symbols"
"(%dx%d)" % (n_sax_symbols_avg * n_sax_symbols_slope,
n_sax_symbols_avg,
n_sax_symbols_slope))
plt.tight_layout()
plt.show()
脚本总运行时间: (0 分钟 5.561 秒)