Aligning Time Series on Incomparable Space

Data is often gathered sequentially in the form of a time series, which consists of sequences of data points observed at successive time points. Dynamic time warping (DTW) was proposed as a mean to define a meaningful similarity measure between time series. However, it is not defined in settings in which time series live on incomparable spaces. We address this issue by defining a distance between time series on incomparable spaces.

Specifically, we propose Gromov-DTW1, a distance between time series on completely incomparable spaces, and apply it to various settings, including barycentric averaging, generative modeling, and imitation learning.

Time Series Alignment

Sakoe and Chiba2 consider the problem of aligning two time series $\boldsymbol{x} \in \mathcal{X}^{T_x}$ and $\boldsymbol{y} \in \boldsymbol{X}^{T_y}$, where potentially $T_x \neq T_y$. This is formalized as $$ DTW(\boldsymbol{x},\boldsymbol{y}) = \min_{\boldsymbol{A} \in \mathcal{A}(T_x,T_y)} \langle\boldsymbol{D}, \boldsymbol{A}\rangle_{F}, $$ where $D_{ij} = d_\mathcal{X}(x_i,y_j)$ is the pairwise distance matrix and $A_{ij}$ is $1$ if $x_i$ and $y_j$ are aligned, and $0$ otherwise. We illustrate a DTW alignment of two time series in Figure 1.

Figure 1: Alignment of two time series via the Dynamic Time Warping approach

A practical drawback of DTW is the need to define a meaningful metric between samples from $\boldsymbol{x}$ and $\boldsymbol{y}$, which requires them to live on registered spaces with a metric $d_{\mathcal{X}}$. This is hard in the following settings:

  • Time series are close up to isometries (up to rotations/translations)
  • Spaces are completely unregistered (pixel images/2d samples)

In such cases, defining a meaningful distance between samples from the two sequences is impractical as it would require detailed understanding of the objects we wish to study.

How to deal with incomparable spaces?

Motivated by connections between DTW and optimal transport3, we introduce a distance between time series $\boldsymbol{x} \in \mathcal{X}^{T_x}$ and $\boldsymbol{y} \in \mathcal{Y}^{T_y}$ defined on potentially incomparable metric spaces. The key idea is to define a loss function that encodes that pairwise distances in $\mathcal{X}^{T_x}$ are preserved in $\mathcal{Y}^{T_y}$. For this, we define the Gromov dynamic time warping distance as $$ GDTW(\boldsymbol{x},\boldsymbol{y})=\min_{\boldsymbol{A} \in \mathcal{A}(T_x,T_y)} \sum_{ijkl} \mathcal{L} [{d_{\mathcal{X}}(x_i,x_k),d_{\mathcal{Y}}(y_j,y_l)}] A_{ij}A_{kl}, $$ where $d_{\mathcal X}$ is a distance defined on $\mathcal X$, and $d_{\mathcal Y}$ a distance defined on $\mathcal Y$.

We solve the optimization problem over the non-convex set of alignment matrices via a Franke-Wolfe algorithm, and provide local convergence guarantees.

Similarly to DTW, GDTW suffers from unpredictability when the time series is close to a change point of the optimal alignment matrix because of the discontinuity of derivatives. To remediate this, we describe in the paper1 how GDTW can be softened analogously to soft-DTW4.

This allows smoother derivatives when applying it to for instance generative modeling of time series and imitation learning.

A large range of applications

In the paper, we aimed to demonstrate the large range of applications of Gromov-DTW. We considered three settings: barycentric averaging, generative modeling and imitation learning.

  • Barycentric averaging: we extend the algorithm of Peyré and al3 to the sequential setting via coordinate descent on the GDTW objective. We plot barycenters under several warping approaches in Figure 2.

    Figure 2: Barycenters of the Quickdraw dataset’s fishes via various time warping approaches.

  • Generative modeling: we extend the algorithm of Genevay et al.5, Bunne et al.6 to the sequential setting by leveraging GDTW as ground metric in an entropic-Wasserstein objective. Samples can be observed in Figure 3.

    Figure 3: Samples generated by the time series GAN trained on Sequential MNIST

  • Imitation learning: We propose an approach to performing imitation learning when the agent and expert do not live on comparable spaces, which consists in minimizing the Gromov-DTW between expert demonstrations and agent rollouts. This is illustrated in Figure 4.

Figure 4: Left: Expert trajectory (sequence of pixel images). Right: policy of an agent (sequence of points in $\mathbb{R}^2$) learned by imitation learning. Agent and expert thus live on unregistered spaces.


We introduced a distance between time series living on potentially incomparable spaces, Gromov-DTW, which significantly broadens the range of applications of previous time-series metrics like DTW. We also proposed a smoothed version of Gromov-DTW that alleviates the discontinuity of GDTW’s gradients. Gromov-DTW is a general concept for comparing two time series and can be applied to a wide range of applications, including barycentric averaging, generative modeling and imitation learning.

  1. S. Cohen, G. Luise, A. Terenin, B. Amos, M. P. Deisenroth. Aligning Time Series on Incomparable Spaces. arXiv:2006.12648 ↩︎

  2. H. Sakoe, S. Chiba. Dynamic Programming Algorithm Optimization for Spoken Word Recognition. ICASSP 2018 ↩︎

  3. G. Peyré, M. Cuturi, J. Solomon. Gromov-Wasserstein Averaging of Kernel and Distance Matrices. ICML 2016 ↩︎

  4. M. Cuturi, M. Blondel. Soft-DTW: A Differentiable Loss Function for Time-Series. ICML 2017 ↩︎

  5. A. Genevay, G. Peyré, M. Cuturi. Learning Generative Models with Sinkhorn Divergences. AISTATS 2018 ↩︎

  6. C. Bunne, D. Alvarez-Melis, A. Krause, S. Jegelka. Learning Generative Models Across Incomparable Spaces. ICML 2019 ↩︎