Data is often gathered sequentially in the form of a time series, which consists of sequences of data points observed at successive time points. Dynamic time warping (DTW) was proposed as a mean to define a meaningful similarity measure between time series. However, it is not defined in settings in which time series live on incomparable spaces. We address this issue by defining a distance between time series on incomparable spaces.

Specifically, we propose Gromov-DTW^{1}, a distance between time series on completely incomparable spaces, and apply it to various settings, including barycentric averaging, generative modeling, and imitation learning.

# Time Series Alignment

Sakoe and Chiba^{2} consider the problem of aligning two time series $\boldsymbol{x} \in \mathcal{X}^{T_x}$ and $\boldsymbol{y} \in \boldsymbol{X}^{T_y}$, where potentially $T_x \neq T_y$.
This is formalized as
$$
DTW(\boldsymbol{x},\boldsymbol{y}) = \min_{\boldsymbol{A} \in \mathcal{A}(T_x,T_y)} \langle\boldsymbol{D}, \boldsymbol{A}\rangle_{F},
$$
where $D_{ij} = d_\mathcal{X}(x_i,y_j)$ is the pairwise distance matrix and $A_{ij}$ is $1$ if $x_i$ and $y_j$ are aligned, and $0$ otherwise. We illustrate a DTW alignment of two time series in Figure 1.

A practical drawback of DTW is the need to define a meaningful metric between samples from $\boldsymbol{x}$ and $\boldsymbol{y}$, which requires them to live on registered spaces with a metric $d_{\mathcal{X}}$. This is hard in the following settings:

- Time series are close up to isometries (up to rotations/translations)
- Spaces are completely unregistered (pixel images/2d samples)

In such cases, defining a meaningful distance between samples from the two sequences is impractical as it would require detailed understanding of the objects we wish to study.

# How to deal with incomparable spaces?

Motivated by connections between DTW and optimal transport^{3}, we introduce a distance between time series $\boldsymbol{x} \in \mathcal{X}^{T_x}$ and $\boldsymbol{y} \in \mathcal{Y}^{T_y}$ defined on potentially incomparable metric spaces.
The key idea is to define a loss function that encodes that pairwise distances in $\mathcal{X}^{T_x}$ are preserved in $\mathcal{Y}^{T_y}$. For this, we define the Gromov dynamic time warping distance as
$$
GDTW(\boldsymbol{x},\boldsymbol{y})=\min_{\boldsymbol{A} \in \mathcal{A}(T_x,T_y)} \sum_{ijkl} \mathcal{L} [{d_{\mathcal{X}}(x_i,x_k),d_{\mathcal{Y}}(y_j,y_l)}] A_{ij}A_{kl},
$$
where $d_{\mathcal X}$ is a distance defined on $\mathcal X$, and $d_{\mathcal Y}$ a distance defined on $\mathcal Y$.

We solve the optimization problem over the non-convex set of alignment matrices via a Franke-Wolfe algorithm, and provide local convergence guarantees.

Similarly to DTW, GDTW suffers from unpredictability when the time series is close to a change point of the optimal alignment matrix because of the discontinuity of derivatives. To remediate this, we describe in the paper^{1} how GDTW can be softened analogously to soft-DTW^{4}.

This allows smoother derivatives when applying it to for instance generative modeling of time series and imitation learning.

# A large range of applications

In the paper, we aimed to demonstrate the large range of applications of Gromov-DTW. We considered three settings: barycentric averaging, generative modeling and imitation learning.

**Barycentric averaging**: we extend the algorithm of Peyré and al^{3}to the sequential setting via coordinate descent on the GDTW objective. We plot barycenters under several warping approaches in Figure 2.**Generative modeling**: we extend the algorithm of Genevay et al.^{5}, Bunne et al.^{6}to the sequential setting by leveraging GDTW as ground metric in an entropic-Wasserstein objective. Samples can be observed in Figure 3.**Imitation learning**: We propose an approach to performing imitation learning when the agent and expert do not live on comparable spaces, which consists in minimizing the Gromov-DTW between expert demonstrations and agent rollouts. This is illustrated in Figure 4.

# Summary

We introduced a distance between time series living on potentially incomparable spaces, Gromov-DTW, which significantly broadens the range of applications of previous time-series metrics like DTW. We also proposed a smoothed version of Gromov-DTW that alleviates the discontinuity of GDTW’s gradients. Gromov-DTW is a general concept for comparing two time series and can be applied to a wide range of applications, including barycentric averaging, generative modeling and imitation learning.

S. Cohen, G. Luise, A. Terenin, B. Amos, M. P. Deisenroth. Aligning Time Series on Incomparable Spaces. arXiv:2006.12648 ↩︎

H. Sakoe, S. Chiba. Dynamic Programming Algorithm Optimization for Spoken Word Recognition. ICASSP 2018 ↩︎

G. Peyré, M. Cuturi, J. Solomon. Gromov-Wasserstein Averaging of Kernel and Distance Matrices. ICML 2016 ↩︎

M. Cuturi, M. Blondel. Soft-DTW: A Differentiable Loss Function for Time-Series. ICML 2017 ↩︎

A. Genevay, G. Peyré, M. Cuturi. Learning Generative Models with Sinkhorn Divergences. AISTATS 2018 ↩︎

C. Bunne, D. Alvarez-Melis, A. Krause, S. Jegelka. Learning Generative Models Across Incomparable Spaces. ICML 2019 ↩︎