We are a research group at UCL’s Centre for Artificial Intelligence. Our research expertise is in data-efficient machine learning, probabilistic modeling, and autonomous decision making. Applications focus on robotics, climate science, and sustainable development.

If you are interested in joining the team, please check out our openings.

Gaussian processes are a model class for learning unknown functions from data. They are particularly of interest in statistical …

Meta-learning can make machine learning algorithms more data-efficient, using experience from prior tasks to learn related tasks …

Learning models of physical systems can sometimes be difficult. Vanilla neural networks—like residual networks—particularly …

Learning models of physical systems can be tricky, but exploiting inductive biases about the nature of the system can speed up learning …

Bayesian optimization is a powerful technique for the optimization of expensive black-box functions, but typically limited to …

Data is often gathered sequentially in the form of a time series, which consists of sequences of data points observed at successive …

Barycenters summarize populations of measures, but computing them does not scale to high dimensions with existing methods. We propose a …

Efficient sampling from Gaussian process posteriors is relevant in practical applications. With Matheron’s rule we decouple the …

Products of Gaussian process experts commonly suffer from poor performance when experts are weak. We propose aggregations and weighting …

Gaussian processes are a useful technique for modeling unknown functions. They are used in many application areas, particularly in …

Our paper on Matern Gaussian processes on graphs has been awarded the Best Student Paper Award at AISTATS

Dr. Sanket Kamthe successfully passed his PhD viva

Dr. Riccardo Moriconi successfully passed his PhD viva

Workshop on Bridging the Gap between Data-driven and Analytical Physics-based Grasping and Manipulation accepted at ICRA 2021

Four papers from our group have been accepted at AISTATS and ICLR

Advances in algorithmic fairness have largely omitted sexual orientation and gender identity. We explore queer concerns in privacy, …

Recurrent neural networks are usually trained with backpropagation through time, which requires storing a complete history of network …

Dynamic time warping (DTW) is a useful method for aligning, comparing and combining time series, but it requires them to live in …

Learning physically structured representations of dynamical systems that include contact between different objects is an important …

In this work, we tackle the problem of learning symbolic representations of low-level and continuous environments. We present a …

Aggregated data is commonplace in areas such as epidemiology and demography. For example, census data for a population is usually given …

From everyday apps to complex algorithms, technology has the potential to hide, speed, and deepen discrimination, while appearing …

Dynamic time warping (DTW) is a useful method for aligning, comparing and combining time series, but it requires them to live in comparable spaces. In this work, we consider a setting in which time series live on different spaces without a sensible ground metric, causing DTW to become ill-defined. To alleviate this, we propose Gromov dynamic time warping (GDTW), a distance between time series on potentially incomparable spaces that avoids the comparability requirement by instead considering intra-relational geometry. We demonstrate its effectiveness at aligning, combining and comparing time series living on incomparable spaces. We further propose a smoothed version of GDTW as a differentiable loss and assess its properties in a variety of settings, including barycentric averaging, generative modeling and imitation learning.

Learning physically structured representations of dynamical systems that include contact between different objects is an important problem for deep learning based approaches in robotics. Black-box neural networks can learn to approximately represent discontinuous dynamics, but typically require impractical quantities of data, and often suffer from pathological behaviour when forecasting for longer time horizons. In this work, we use connections between deep neural networks and differential equations to design a family of deep network architectures for representing contact dynamics between objects. We show that these networks can learn discontinuous contact events in a data-efficient manner from noisy observations in settings which are traditionally difficult for black-box approaches and recent physics inspired neural networks. Our results indicate that an idealised form of touch feedback—which is heavily relied upon by biological systems—is a key component of making this learning problem tractable. Together with the inductive biases introduced through the network architectures, our techniques enable accurate learning of contact dynamics from physical data.

Gaussian processes are a versatile framework for learning unknown functions in a manner that permits one to utilize prior information about their properties. Although many different Gaussian process models are readily available when the input space is Euclidean, the choice is much more limited for Gaussian processes whose input space is an undirected graph. In this work, we leverage the stochastic partial differential equation characterization of Matérn Gaussian processes—a widely-used model class in the Euclidean setting—to study their analog for undirected graphs. We show that the resulting Gaussian processes inherit various attractive properties of their Euclidean and Riemannian analogs, and provide techniques that allow them to be trained using standard methods, such as inducing points. This enables graph Matérn Gaussian processes to be employed in mini-batch, online, and non-conjugate settings, thereby making them more accessible to practitioners and easier to deploy within larger learning frameworks.

As Gaussian processes are integrated into increasingly complex problem settings, analytic solutions to quantities of interest become scarcer and scarcer. Monte Carlo methods act as a convenient bridge for connecting intractable mathematical expressions with actionable estimates via sampling. Conventional approaches for simulating Gaussian process posteriors view samples as vectors drawn from marginal distributions over process values at a finite number of input location. This distribution-based characterization leads to generative strategies that scale cubically in the size of the desired random vector. These methods are, therefore, prohibitively expensive in cases where high-dimensional vectors - let alone continuous functions - are required. In this work, we investigate a different line of reasoning. Rather than focusing on distributions, we articulate Gaussian conditionals at the level of random variables. We show how this pathwise interpretation of conditioning gives rise to a general family of approximations that lend themselves to fast sampling from Gaussian process posteriors. We analyze these methods, along with the approximation errors they introduce, from first principles. We then complement this theory, by exploring the practical ramifications of pathwise conditioning in a various applied settings.

Gaussian processes are the gold standard for many real-world modeling problems, especially in cases where a model’s success hinges upon its ability to faithfully represent predictive uncertainty. These problems typically exist as parts of larger frameworks, wherein quantities of interest are ultimately defined by integrating over posterior distributions. These quantities are frequently intractable, motivating the use of Monte Carlo methods. Despite substantial progress in scaling up Gaussian processes to large training sets, methods for accurately generating draws from their posterior distributions still scale cubically in the number of test locations. We identify a decomposition of Gaussian processes that naturally lends itself to scalable sampling by separating out the prior from the data. Building off of this factorization, we propose an easy-to-use and general-purpose approach for fast posterior sampling, which seamlessly pairs with sparse approximations to afford scalability both during training and at test time. In a series of experiments designed to test competing sampling schemes’ statistical properties and practical ramifications, we demonstrate how decoupled sample paths accurately represent Gaussian process posteriors at a fraction of the usual cost.