Variational Integrator Networks

Learning models of physical systems can be tricky, but exploiting inductive biases about the nature of the system can speed up learning significantly. In the following, we will give a brief overview and the key insights behind variational integrator networks.

When learning models of physical systems, we’re often dealing with nonlinear dynamics and learning from noisy or high dimensional data from a limited number of samples. This is particularly relevant in robotics, where the cost of getting more data is expensive. Expressive models like neural networks are great at handling high dimensional data and learning complex functions. Using standard feed-forward or recurrent neural network architectures, we can learn to approximate physical systems if given enough data. There are two potential issues that can make neural networks difficult to use in practice.

  1. Because they learn approximate physics, predictions can behave erratically. This is particularly the case when predicting iteratively to forecast the evolution of the system.
  2. Having to learn the physics requires more data, and data-efficiency can be crucial.

Error can accumulate over time, causing even an accurate short-term model, such as the recurrent residual network shown below, to do worse over the long term.

Simple illustration: Erratic forecasting when the model (a residual network) lacks physical inductive biases. The ResNet is used in a VAE framework and learns the dynamics in a 2D latent space from pixel observations. Black pixels denote the ground truth and blue are generated by the model.

To address these issues we proposed variational integrator networks (VINs) 1. VINs are expressive neural network architectures with built-in physics. Using VINs allows us to easily learn models with physical forecasting behaviour from noisy or even pixel data in a data-efficient way.

From Residual Networks to Variational Integrator Networks

The idea is simple: if we view neural networks as dynamical systems234—and discretize them in a manner that preserves qualitative physical properties5—we can define network architectures that obey the laws of physics. A particularly salient example of the kind of inductive bias we are interested in is the presence of conservation laws, for instance conservation of energy or conservation of momentum.

A canonical description of classical physical dynamical systems is Lagrangian mechanics, where a system is completely characterized by its Lagrangian $L(q, \dot{q}, t)$, a scalar function that encodes underlying physical properties. The equations of motion for such a system are a set of first-order ODEs called the Euler-Lagrange equations. At the same time, a deep residual network can be viewed as a system of ODEs $$ \frac{\operatorname{d} x}{\operatorname{d} t} = f_{\theta}(x, t) $$ discretized using an Euler scheme,234 giving $$ x_{t+1} = x_t + hf_{\theta}(x_t) . $$ Inspired by this perspective, one can consider Euler discretising the equations of motion $$ \frac{\operatorname{d}}{\operatorname{d} t}\frac{\partial L_{\theta}}{\partial \dot{q}} - \frac{\partial L_{\theta}}{\partial q} = 0 $$ arising from Lagrangian mechanics instead for the corresponding Residual network. A problem with this approach is that the Euler scheme ignores the underlying geometry and qualitative properties of the equations of motion, and hence the physics. This is the reason the dynamics spiral out of control in the video shown previously. To avoid this, we propose to use variational integrators,5 a class of structure preserving integrators to address this issue. The result are __Variational Integrator Networks (VINs)__. VINs facilitate accurate long-term predictions and data-efficient learning while remaining flexible to model complex behavior. An illustration of the architecture and example comparisons are given below.

Example VIN: $(q,p)$ are hidden states and $f_{\theta}$ is a residual block obtained from a variational integrator.
Simple illustration: well behaved forecasting on the same setup as before using a VIN. The VIN is used in a VAE framework and learns the dynamics in a 2D latent space from pixel observations. Black pixels denote the ground truth and red are generated by the model.
(a) ResNet
(b) VIN-VV
(c) VIN-$SO(2)$
Forecasting in pixel-space, side-by-side comparison: black pixels are ground truth, pixels with colour are generated by the model. (a) The ResNet produces unphysical forecasts. (b) A VIN with no manifold restrictions. (c) A VIN restricted to $SO(2)$.
(a) VAE
(b) DVAE
(c) LG-VAE
(d) VIN-$SO(2)$
(e) VIN-$SO(2)$
(true $\mathbf{M}$)
(f) Ground Truth
Example embedded representations of an ideal pendulum system: black/colored dots represent embedded train/test images, gray lines connect points sequentially in time. The embeddings learned by the baseline models fail to capture the global structure (a)--(b) and/or are discontinuous with respect to the time dimension (c). The VIN-SO(2) (d), learns an embedding that is consistent with the ground truth (f), particularly in (e), where a non-identifiable latent mass matrix $\mathbf{M}$ is set to the true value.

Concluding remarks

To summarize, learning approximate physics implicitly can lead to incorrect qualitative behavior and a decrease in accuracy. Variational integrator networks are a class of network architectures that encode physical laws explicitly, which improves data-efficiency and produces well behaved forecasts, particularly over longer trajectories. Variational integrator networks can be used to learn from noisy observation of a physical system, or as an architecture for variational autoencoders, enabling them to learn from pixel observations.


  1. S. Saemundsson, A. Terenin, K. Hofmann, M. P. Deisenroth. Variational Integrator Networks for Physically Structured Embeddings. AISTATS, 2020. ↩︎

  2. E. Haber and L. Ruthotto. Stable architectures for deep neural networks. Inverse Problems, 34(1):014004, 2017. ↩︎

  3. W. E. A proposal on machine learning via dynamical systems. Communications in Mathematics and Statistics, 5(1):1–11, 2017. ↩︎

  4. R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D.Duvenaud. Neural ordinary differential equations. NeurIPS, 2018. ↩︎

  5. J. E. Marsden, S. Pekarsky, S. Shkoller, and M. West. Variational methods, multisymplectic geometry and continuum mechanics. Journal of Geometry and Physics, 38(3–4):253–284, 2001. ↩︎