Learning models of physical systems can be tricky, but exploiting inductive biases about the nature of the system can speed up learning significantly. In the following, we will give a brief overview and the key insights behind variational integrator networks.
When learning models of physical systems, we’re often dealing with nonlinear dynamics and learning from noisy or high dimensional data from a limited number of samples. This is particularly relevant in robotics, where the cost of getting more data is expensive. Expressive models like neural networks are great at handling high dimensional data and learning complex functions. Using standard feed-forward or recurrent neural network architectures, we can learn to approximate physical systems if given enough data. There are two potential issues that can make neural networks difficult to use in practice.
- Because they learn approximate physics, predictions can behave erratically. This is particularly the case when predicting iteratively to forecast the evolution of the system.
- Having to learn the physics requires more data, and data-efficiency can be crucial.
Error can accumulate over time, causing even an accurate short-term model, such as the recurrent residual network shown below, to do worse over the long term.
To address these issues we proposed variational integrator networks (VINs) 1. VINs are expressive neural network architectures with built-in physics. Using VINs allows us to easily learn models with physical forecasting behaviour from noisy or even pixel data in a data-efficient way.
From Residual Networks to Variational Integrator Networks
The idea is simple: if we view neural networks as dynamical systems234—and discretize them in a manner that preserves qualitative physical properties5—we can define network architectures that obey the laws of physics. A particularly salient example of the kind of inductive bias we are interested in is the presence of conservation laws, for instance conservation of energy or conservation of momentum.
A canonical description of classical physical dynamical systems is Lagrangian mechanics, where a system is completely characterized by its Lagrangian $L(q, \dot{q}, t)$, a scalar function that encodes underlying physical properties. The equations of motion for such a system are a set of first-order ODEs called the Euler-Lagrange equations. At the same time, a deep residual network can be viewed as a system of ODEs $$ \frac{\operatorname{d} x}{\operatorname{d} t} = f_{\theta}(x, t) $$ discretized using an Euler scheme,234 giving $$ x_{t+1} = x_t + hf_{\theta}(x_t) . $$ Inspired by this perspective, one can consider Euler discretising the equations of motion $$ \frac{\operatorname{d}}{\operatorname{d} t}\frac{\partial L_{\theta}}{\partial \dot{q}} - \frac{\partial L_{\theta}}{\partial q} = 0 $$ arising from Lagrangian mechanics instead for the corresponding Residual network. A problem with this approach is that the Euler scheme ignores the underlying geometry and qualitative properties of the equations of motion, and hence the physics. This is the reason the dynamics spiral out of control in the video shown previously. To avoid this, we propose to use variational integrators,5 a class of structure preserving integrators to address this issue. The result are Variational Integrator Networks (VINs). VINs facilitate accurate long-term predictions and data-efficient learning while remaining flexible to model complex behavior. An illustration of the architecture and example comparisons are given below.