binder

本笔记本概述#

  • 激励示例与模块化构建块

    • 连接距离、对齐器、分类器

  • 成对变换器 - 时间序列距离和核的“类型”

  • 时间序列对齐和距离对齐,例如,时间弯曲

  • 距离、核函数、对齐器的组合模式

[1]:
import warnings

warnings.filterwarnings("ignore")

6.1 激励示例#

对象类型之间的丰富组件关系!

  • 许多分类器、回归器、聚类器使用距离或核函数

  • 距离和核函数通常是复合的,例如,距离之和、独立距离

  • TS距离通常基于标量的多元距离(例如,欧几里得距离)

  • TS 距离通常基于对齐,TS 对齐器是一种估计器类型!

  • 对齐器内部通常使用标量单/多变量距离

示例:

  • 1-nn 使用 sklearn 最近邻

  • 使用 dtw-python 库中的多变量动态时间规整距离

  • scipy 中,多变量的 "mahalanobis" 距离

  • sktime 兼容接口中,由自定义组件构建

所以,从概念上来说:

  • 我们使用 scipy Mahalanobis 距离构建了一个序列对齐算法 (dtw-python)。

  • 我们从对齐算法中获取距离矩阵计算

  • 我们在 sklearn knn 中使用该距离矩阵

  • 这是一个时间序列分类器!

[2]:
from sktime.alignment.dtw_python import AlignerDTWfromDist
from sktime.classification.distance_based import KNeighborsTimeSeriesClassifier
from sktime.dists_kernels.compose_from_align import DistFromAligner
from sktime.dists_kernels.scipy_dist import ScipyDist

# Mahalanobis distance on R^n
mahalanobis_dist = ScipyDist(metric="mahalanobis")  # uses scipy distances

# pairwise multivariate aligner from dtw-python with Mahalanobis distance
mw_aligner = AlignerDTWfromDist(mahalanobis_dist)  # uses dtw-python

# turning this into alignment distance on time series
dtw_dist = DistFromAligner(mw_aligner)  # interface mutation to distance

# and using this distance in a k-nn classifier
clf = KNeighborsTimeSeriesClassifier(distance=dtw_dist)  # uses sklearn knn
[3]:
clf.get_params()
[3]:
{'algorithm': 'brute',
 'distance': DistFromAligner(aligner=AlignerDTWfromDist(dist_trafo=ScipyDist(metric='mahalanobis'))),
 'distance_mtype': None,
 'distance_params': None,
 'leaf_size': 30,
 'n_jobs': None,
 'n_neighbors': 1,
 'pass_train_distances': False,
 'weights': 'uniform',
 'distance__aligner': AlignerDTWfromDist(dist_trafo=ScipyDist(metric='mahalanobis')),
 'distance__aligner__dist_trafo': ScipyDist(metric='mahalanobis'),
 'distance__aligner__open_begin': False,
 'distance__aligner__open_end': False,
 'distance__aligner__step_pattern': 'symmetric2',
 'distance__aligner__window_type': 'none',
 'distance__aligner__dist_trafo__colalign': 'intersect',
 'distance__aligner__dist_trafo__metric': 'mahalanobis',
 'distance__aligner__dist_trafo__metric_kwargs': None,
 'distance__aligner__dist_trafo__p': 2,
 'distance__aligner__dist_trafo__var_weights': None}

这个链中的所有对象是什么?

  • ScipyDist - 标量 之间的成对距离 - transformer-pairwise 类型

  • AlignerDtwFromDist - 时间序列对齐算法 - aligner 类型

  • DistFromAligner - 时间序列 之间的成对距离 - transformer-pairwise-panel 类型

  • KNeighborsTimeSeriesClassifier - 时间序列分类器

[4]:
from sktime.registry import scitype

scitype(mw_aligner)  # prints the type of estimator (as a string)
# same for other components
[4]:
'aligner'

让我们逐一了解这些 - 我们已经见过分类器了。

6.2 时间序列距离和核 - 成对面板转换器#

6.2.1 距离、核函数 - 通用接口#

成对面板变换器为面板中的每对序列生成一个距离:

[5]:
from sktime.datasets import load_osuleaf

# load an example time series panel in numpy mtype
X, _ = load_osuleaf(return_type="numpy3D")

X1 = X[:3]
X2 = X[5:10]
[6]:
# constructing the transformer
from sktime.dists_kernels import FlatDist
from sktime.dists_kernels.scipy_dist import ScipyDist

# paired Euclidean distances, over time points
eucl_dist = FlatDist(ScipyDist())
[7]:
X1.shape
[7]:
(3, 1, 427)
[8]:
X2.shape
[8]:
(5, 1, 427)

X1 是一个包含3个系列的图表面板,X2 是一个包含5个系列的图表面板。

因此,从 X1 到 X2 的成对距离矩阵应有形状 (3, 5)

[9]:
distmat = eucl_dist(X1, X2)

# alternatively, via the transform method
distmat = eucl_dist.transform(X1, X2)
distmat
[9]:
array([[29.94033435, 30.69443315, 29.02704475, 30.49413394, 29.77534229],
       [28.86289916, 32.03165025, 29.6118973 , 32.95499251, 30.82017584],
       [29.52672336, 18.76259726, 30.55213501, 15.93324954, 27.89072122]])
[10]:
distmat.shape
[10]:
(3, 5)

调用或 transform 带一个参数与传递两次相同:

[11]:
distmat_symm = eucl_dist.transform(X1)
distmat_symm
[11]:
array([[ 0.        , 24.58470308, 33.83913255],
       [24.58470308,  0.        , 35.44109497],
       [33.83913255, 35.44109497,  0.        ]])

成对面板变换器与 scikit-learn / scikit-base 接口兼容且可组合,就像 sktime 中的其他所有内容一样:

[12]:
eucl_dist.get_params()
[12]:
{'transformer': ScipyDist(),
 'transformer__colalign': 'intersect',
 'transformer__metric': 'euclidean',
 'transformer__metric_kwargs': None,
 'transformer__p': 2,
 'transformer__var_weights': None}

6.2.2 时间序列距离,核 - 组合#

成对变换器可以通过多种方式组合:

  • 算术,例如加法、乘法 - 使用双下划线 +, * 等,或 CombinedDistance

  • 子集到一个或多个列 - 使用 my_dist[colnames] 双下划线

  • 在多变量面板中对单变量距离进行求和或聚合,使用 ``IndepDist``(也称为“独立距离”)

  • 使用序列到序列的转换器进行组合 - 使用 * 双下划线或 make_pipeline

[13]:
from sktime.datasets import load_basic_motions

# load an example time series panel in numpy mtype
X, _ = load_basic_motions(return_type="numpy3D")
X = X[:3]
X.shape
[13]:
(3, 6, 100)
[14]:
# example 1: variable subsetting and arithmetic combinations

# we define *two* distances now
from sktime.dists_kernels import FlatDist, ScipyDist

# Euclidean distance (on flattened time series)
eucl_dist = FlatDist(ScipyDist())
# Mahalanobis distance (on flattened time series)
cos_dist = FlatDist(ScipyDist(metric="cosine"))

# arithmetic product of:
# * the Euclidean distance on gyrometer 2 time series
# * the Cosine distance on accelerometer 3 time series
prod_dist_42 = eucl_dist[4] * cos_dist[2]
prod_dist_42
[14]:
CombinedDistance(operation='*',
                 pw_trafos=[PwTrafoPanelPipeline(pw_trafo=FlatDist(transformer=ScipyDist()),
                                                 transformers=[ColumnSelect(columns=4)]),
                            PwTrafoPanelPipeline(pw_trafo=FlatDist(transformer=ScipyDist(metric='cosine')),
                                                 transformers=[ColumnSelect(columns=2)])])
Please rerun this cell to show the HTML repr or trust the notebook.
[15]:
prod_dist_42(X)
[15]:
array([[0.        , 1.87274896, 2.28712525],
       [1.87274896, 0.        , 2.62764453],
       [2.28712525, 2.62764453, 0.        ]])
[16]:
# example 2: independent dynamic time warping distance
from sktime.alignment.dtw_python import AlignerDTW
from sktime.dists_kernels.compose_from_align import DistFromAligner
from sktime.dists_kernels.indep import IndepDist

# dynamic time warping distance - this is multivariate
dtw_dist = DistFromAligner(AlignerDTW())

# independent distance - by default IndepDist sums over univariate distances
indep_dtw_dist = IndepDist(dtw_dist)

# that is, this distance is arithmetic sum of
# * DTW distance on accelerometer 1 time series
# * DTW distance on accelerometer 2 time series
# * DTW distance on accelerometer 3 time series
# * DTW distance on gyrometer 1 time series
# * DTW distance on gyrometer 2 time series
# * DTW distance on gyrometer 3 time series
[17]:
indep_dtw_dist(X)
[17]:
array([[ 0.        , 31.7765985 , 32.65822   ],
       [31.7765985 ,  0.        , 39.78652033],
       [32.65822   , 39.78652033,  0.        ]])
[18]:
# example 3: dynamic time warping distance on first differences
from sktime.transformations.series.difference import Differencer

diff_dtw_distance = Differencer() * dtw_dist
[19]:
diff_dtw_distance(X)
[19]:
array([[ 0.      , 20.622806, 27.731956],
       [20.622806,  0.      , 30.487498],
       [27.731956, 30.487498,  0.      ]])

某些组合可能作为基于 numba 的高效距离可用。

例如,difference-then-dtw 在 sktime.dists_kernels.dtw 中作为 sktime 原生实现 DtwDist(derivative=True) 提供。

6.3 成对表格转换器#

6.3.1 成对表格转换器 - 通用接口#

成对表格转换器转换成对的普通表格数据,例如,普通的 pd.DataFrame

为每一对行生成一个距离

[20]:
from sktime.datatypes import get_examples

# we retrieve some DataFrame examples
X_tabular = get_examples("pd.DataFrame", "Series")[1]
X2_tabular = get_examples("pd.DataFrame", "Series")[1][0:3]
[21]:
# just an ordinary DataFrame, no time series
X_tabular
[21]:
a b
0 1.0 3.000000
1 4.0 7.000000
2 0.5 2.000000
3 -3.0 -0.428571
[22]:
X2_tabular
[22]:
a b
0 1.0 3.0
1 4.0 7.0
2 0.5 2.0

示例:行之间的成对欧几里得距离

[23]:
# constructing the transformer
from sktime.dists_kernels import ScipyDist

# mean of paired Euclidean distances
my_tabular_dist = ScipyDist(metric="euclidean")
[24]:
# obtain matrix of distances between each pair of rows in X_tabular, X2_tabular
my_tabular_dist(X_tabular, X2_tabular)
[24]:
array([[ 0.        ,  5.        ,  1.11803399],
       [ 5.        ,  0.        ,  6.10327781],
       [ 1.11803399,  6.10327781,  0.        ],
       [ 5.26831112, 10.20704039,  4.26004216]])
[25]:
# alternative call with transform:
my_tabular_dist.transform(X_tabular, X2_tabular)
[25]:
array([[ 0.        ,  5.        ,  1.11803399],
       [ 5.        ,  0.        ,  6.10327781],
       [ 1.11803399,  6.10327781,  0.        ],
       [ 5.26831112, 10.20704039,  4.26004216]])
[26]:
# as with pairwise panel transformers, one arg means second is the same
my_tabular_dist(X_tabular)
[26]:
array([[ 0.        ,  5.        ,  1.11803399,  5.26831112],
       [ 5.        ,  0.        ,  6.10327781, 10.20704039],
       [ 1.11803399,  6.10327781,  0.        ,  4.26004216],
       [ 5.26831112, 10.20704039,  4.26004216,  0.        ]])

6.3.2 从表格数据构建成对时间序列转换器#

“简单”的时间序列距离可以直接从表格转换器中获得:

  • 将时间序列展平为表格,然后计算距离 - FlatDist

  • 聚合表格距离矩阵,来自两个单独的时间序列 - AggrDist

这些是重要的“基准”距离!

两者都可以用于 sktime 成对转换器和 sklearn 成对转换器。

这些类被称为“dist”,但它们都适用于内核。

[27]:
from sktime.datasets import load_basic_motions

# load an example time series panel in numpy mtype
X, _ = load_basic_motions(return_type="numpy3D")
X = X[:3]
X.shape
[27]:
(3, 6, 100)
[28]:
# example 1: flat Gaussian RBF kernel between time series
from sklearn.gaussian_process.kernels import RBF

from sktime.dists_kernels import FlatDist

flat_gaussian_tskernel = FlatDist(RBF(length_scale=10))
flat_gaussian_tskernel.get_params()
[28]:
{'transformer': RBF(length_scale=10),
 'transformer__length_scale': 10,
 'transformer__length_scale_bounds': (1e-05, 100000.0)}
[29]:
flat_gaussian_tskernel(X)
[29]:
array([[1.        , 0.02267939, 0.28034066],
       [0.02267939, 1.        , 0.05447445],
       [0.28034066, 0.05447445, 1.        ]])
[30]:
# example 2: pairwise cosine distance - we've already seen FlatDist a couple times
from sktime.dists_kernels import FlatDist, ScipyDist

cos_tsdist = FlatDist(ScipyDist(metric="cosine"))
cos_tsdist.get_params()
[30]:
{'transformer': ScipyDist(metric='cosine'),
 'transformer__colalign': 'intersect',
 'transformer__metric': 'cosine',
 'transformer__metric_kwargs': None,
 'transformer__p': 2,
 'transformer__var_weights': None}
[31]:
cos_tsdist(X)
[31]:
array([[1.11022302e-16, 1.36699314e+00, 6.99338545e-01],
       [1.36699314e+00, 0.00000000e+00, 1.10061843e+00],
       [6.99338545e-01, 1.10061843e+00, 0.00000000e+00]])

6.4 对齐算法,又名对齐器#

  • “对齐器” 为两个或更多的时间序列找到一个新的索引集,使它们变得“相似”

  • 新的索引集作为旧索引集的非线性重新参数化

  • 通常,排列器还会产生两个序列之间的总体距离

6.4.1 对齐器 - 通用接口#

对齐方法:

  • fit - 计算对齐

  • get_alignment - 返回重新参数化的索引,也称为“对齐路径”

  • get_aligned 返回重新参数化的序列

  • get_distance 返回两个对齐序列之间的距离 - 仅在 "capability:get_distance" 可用时有效

让我们尝试对齐来自OSUleaf的两个叶轮廓!

OSUleaf 是一个包含扁平化树叶轮廓的面板数据集

  • 实例 = 叶子

  • index (“time”) = 重心角度

  • variable = 在该角度下从重心开始的等高线距离

image1

[32]:
from sktime.datasets import load_osuleaf

# load an example time series panel in numpy mtype
X, _ = load_osuleaf(return_type="pd-multiindex")

X1 = X.loc[0]  # leaf 0
X2 = X.loc[1]  # leaf 1
[33]:
from sktime.utils.plotting import plot_series

plot_series(X1, X2, labels=["leaf_1", "leaf_2"])
[33]:
(<Figure size 1600x400 with 1 Axes>, <Axes: >)
../_images/examples_06_distances_kernels_alignment_60_1.png
[34]:
from sktime.alignment.dtw_python import AlignerDTW

# use dtw-python package for aligning
# simple univariate alignment algorithm with default params
aligner = AlignerDTW()
[35]:
aligner.fit([X1, X2])  # series to align need to be passed as list
[35]:
AlignerDTW()
Please rerun this cell to show the HTML repr or trust the notebook.
[36]:
# alignment path
aligner.get_alignment()

# this aligns, e.g.:
# from row "2": aligns index 0 in X1 with index 2 of X2
# from row "664": aligns index 424 in X1 with index 423 of X2
[36]:
ind0 ind1
0 0 0
1 0 1
2 0 2
3 1 2
4 2 3
... ... ...
663 423 422
664 424 423
665 425 424
666 426 425
667 426 426

668 rows × 2 columns

[37]:
# obtain the aligned versions of the two series
X1_al, X2_al = aligner.get_aligned()
[38]:
from sktime.utils.plotting import plot_series

plot_series(
    X1_al.reset_index(drop=True),
    X2_al.reset_index(drop=True),
    labels=["leaf_1", "leaf_2"],
)
[38]:
(<Figure size 1600x400 with 1 Axes>, <Axes: >)
../_images/examples_06_distances_kernels_alignment_65_1.png

DTW 对齐器实现了一个“距离”功能

直观地说,它是一个在对齐后求和的距离,以及拉伸的量:

[39]:
# the AlignerDTW class (based on dtw-python) doesn't just align
# it also produces a distance
aligner.get_tags()
[39]:
{'python_dependencies_alias': {'dtw-python': 'dtw'},
 'capability:multiple-alignment': False,
 'capability:distance': True,
 'capability:distance-matrix': True,
 'python_dependencies': 'dtw-python'}
[40]:
# this is the distance between the two time series we aligned
aligner.get_distance()
[40]:
113.73231668301005

6.4.2 基于对齐的时间序列距离#

DistFromAligner 包装器简单地计算每对对齐序列的距离。

这使得任何对齐器成为时间序列距离:

[41]:
from sktime.alignment.dtw_python import AlignerDTW
from sktime.dists_kernels.compose_from_align import DistFromAligner

# dynamic time warping distance - this is multivariate
dtw_dist = DistFromAligner(AlignerDTW())
[42]:
from sktime.datasets import load_osuleaf

# load an example time series panel in numpy mtype
X, _ = load_osuleaf(return_type="numpy3D")

X1 = X[:3]
X2 = X[5:10]
[43]:
dtw_distmat = dtw_dist(X1, X2)
dtw_distmat
[43]:
array([[165.25420136, 148.53521913, 159.93034065, 158.50379563,
        155.98824527],
       [153.5587322 , 151.52004769, 125.14570395, 183.97186106,
         93.55389512],
       [170.41354799, 154.24275848, 212.54601605,  66.59572457,
        295.32544676]])
[44]:
dtw_distmat.shape
[44]:
(3, 5)

6.5 重新审视初始示例#

[45]:
from sktime.alignment.dtw_python import AlignerDTWfromDist
from sktime.classification.distance_based import KNeighborsTimeSeriesClassifier
from sktime.dists_kernels.compose_from_align import DistFromAligner
from sktime.dists_kernels.scipy_dist import ScipyDist

# Mahalanobis distance on R^n
mahalanobis_dist = ScipyDist(metric="mahalanobis")  # uses scipy distances

# pairwise multivariate aligner from dtw-python with Mahalanobis distance
mw_aligner = AlignerDTWfromDist(mahalanobis_dist)  # uses dtw-python

# turning this into alignment distance on time series
dtw_dist = DistFromAligner(mw_aligner)  # interface mutation to distance

# and using this distance in a k-nn classifier
clf = KNeighborsTimeSeriesClassifier(distance=dtw_dist)  # uses sklearn knn
[46]:
clf
[46]:
KNeighborsTimeSeriesClassifier(distance=DistFromAligner(aligner=AlignerDTWfromDist(dist_trafo=ScipyDist(metric='mahalanobis'))))
Please rerun this cell to show the HTML repr or trust the notebook.
  • 我们使用 scipy Mahalanobis 距离构建了一个序列对齐算法 (dtw-python)。

  • 我们从对齐算法中获取距离矩阵计算

  • 我们在 sklearn knn 中使用该距离矩阵

  • 这是一个时间序列分类器!

6.6 搜索距离、核函数、转换器#

与所有 sktime 对象一样,我们可以使用 registry.all_estimators 工具来显示 sktime 中的所有转换器。

相关的科学类型是:

  • "transformer-pairwise" 用于表格数据上的所有成对转换器

  • transformer-panel 用于面板数据上的所有成对转换器

  • 对齐器 用于所有时间序列对齐器

  • transformer 适用于所有变压器,这些可以与上述所有内容组合。

[47]:
from sktime.registry import all_estimators
[48]:
# listing all pairwise panel transformers - distances, kernels on time series
all_estimators("transformer-pairwise-panel", as_dataframe=True)
[48]:
name object
0 AggrDist <class 'sktime.dists_kernels.compose_tab_to_pa...
1 CombinedDistance <class 'sktime.dists_kernels.algebra.CombinedD...
2 ConstantPwTrafoPanel <class 'sktime.dists_kernels.dummy.ConstantPwT...
3 DistFromAligner <class 'sktime.dists_kernels.compose_from_alig...
4 DistFromKernel <class 'sktime.dists_kernels.dist_to_kern.Dist...
5 DtwDist <class 'sktime.dists_kernels.dtw.DtwDist'>
6 EditDist <class 'sktime.dists_kernels.edit_dist.EditDist'>
7 FlatDist <class 'sktime.dists_kernels.compose_tab_to_pa...
8 IndepDist <class 'sktime.dists_kernels.indep.IndepDist'>
9 KernelFromDist <class 'sktime.dists_kernels.dist_to_kern.Kern...
10 PwTrafoPanelPipeline <class 'sktime.dists_kernels.compose.PwTrafoPa...
11 SignatureKernel <class 'sktime.dists_kernels.signature_kernel....
[49]:
# listing all pairwise (tabular) transformers - distances, kernels on vectors/df-rows
all_estimators("transformer-pairwise", as_dataframe=True)
[49]:
name object
0 ScipyDist <class 'sktime.dists_kernels.scipy_dist.ScipyD...
[50]:
# listing all alignment algorithms that can produce distances
all_estimators("aligner", as_dataframe=True, filter_tags={"capability:distance": True})
[50]:
name object
0 AlignerDTW <class 'sktime.alignment.dtw_python.AlignerDTW'>
1 AlignerDTWfromDist <class 'sktime.alignment.dtw_python.AlignerDTW...
2 AlignerDtwNumba <class 'sktime.alignment.dtw_numba.AlignerDtwN...

6.7 展望,路线图 - 面板任务#

  • 实现估计器 - 距离、分类器等

  • 后端优化 - numba,分布式/并行

  • 序列到序列回归, 分类

  • 进一步成熟时间序列对齐模块

加入并贡献!

6.8 总结#

  • sktime - 用于时间序列学习的模块化框架

  • 面板数据 = 时间序列集合 - 任务分类、回归、聚类

  • 使用转换器构建灵活的管道,通过网格搜索等进行调优

  • 面板估计器通常依赖于时间序列距离、核函数、对齐器

  • TS 距离、核函数、对齐器也可以以模块化、灵活的方式构建。

  • 所有上述对象都是具有 sklearn 类接口的一等公民!


致谢:笔记本 6 - 时间序列距离、核、对齐#

笔记本创建: fkiraly


使用 nbsphinx 生成。Jupyter 笔记本可以在 这里 找到。