We are a research group at UCL’s Centre for Artificial Intelligence. Our research expertise is in data-efficient machine learning, probabilistic modeling, and autonomous decision making. Applications focus on robotics, climate science, and sustainable development.
If you are interested in joining the team, please check out our openings.
Meta-learning can make machine learning algorithms more data-efficient, using experience from prior tasks to learn related tasks …
Learning models of physical systems can be tricky, but exploiting inductive biases about the nature of the system can speed up learning …
Bayesian optimization is a powerful technique for the optimization of expensive black-box functions, but typically limited to …
Data is often gathered sequentially in the form of a time series, which consists of sequences of data points observed at successive …
Barycenters summarize populations of measures, but computing them does not scale to high dimensions with existing methods. We propose a …
Efficient sampling from Gaussian process posteriors is relevant in practical applications. With Matheron’s rule we decouple the …
Products of Gaussian process experts commonly suffer from poor performance when experts are weak. We propose aggregations and weighting …
Gaussian processes are the gold standard for many real-world modeling problems, especially in cases where a model’s success hinges upon its ability to faithfully represent predictive uncertainty. These problems typically exist as parts of larger frameworks, wherein quantities of interest are ultimately defined by integrating over posterior distributions. These quantities are frequently intractable, motivating the use of Monte Carlo methods. Despite substantial progress in scaling up Gaussian processes to large training sets, methods for accurately generating draws from their posterior distributions still scale cubically in the number of test locations. We identify a decomposition of Gaussian processes that naturally lends itself to scalable sampling by separating out the prior from the data. Building off of this factorization, we propose an easy-to-use and general-purpose approach for fast posterior sampling, which seamlessly pairs with sparse approximations to afford scalability both during training and at test time. In a series of experiments designed to test competing sampling schemes’ statistical properties and practical ramifications, we demonstrate how decoupled sample paths accurately represent Gaussian process posteriors at a fraction of the usual cost.
Gaussian processes are nonparametric Bayesian models that have been applied to regression and classification problems. One of the approaches to alleviate their cubic training cost is the use of local GP experts trained on subsets of the data. In particular, product-of-expert models combine the predictive distributions of local experts through a tractable product operation. While these expert models allow for massively distributed computation, their predictions can suffer from erratic behaviour of the mean or uncalibrated uncertainty quantification. By calibrating predictions via tempered softmax weighting, we provide a solution to these problems for multiple product-of-expert models, including the generalised product of experts and the robust Bayesian committee machine. Furthermore, we leverage the optimal transport literature and propose a new product-of-expert model that combines predictions of local experts by computing their Wasserstein barycenter, which can be applied to both regression and classification.
We present a Bayesian non-parametric way of inferring stochastic differential equations for both regression tasks and continuous-time dynamical modelling. The work has high emphasis on the stochastic part of the differential equation, also known as the diffusion, and modelling it with Wishart processes. Further, we present a semi-parametric approach that allows the framework to scale to high dimensions. This successfully lead us onto how to model both latent and autoregressive temporal systems with conditional heteroskedastic noise. Experimentally, we verify that modelling diffusion often improves performance and that this randomness in the differential equation can be essential to avoid overfitting.
Learning workable representations of dynamical systems is becoming an increasingly important problem in a number of application areas. By leveraging recent work connecting deep neural networks to systems of differential equations, we propose variational integrator networks, a class of neural network architectures designed to preserve the geometric structure of physical systems. This class of network architectures facilitates accurate long-term prediction, interpretability, and data-efficient learning, while still remaining highly flexible and capable of modeling complex behavior. We demonstrate that they both noisy observations in phase space and from image pixels within which the unknown dynamics are embedded.
The interpretation of Large Hadron Collider (LHC) data in the framework of Beyond the Standard Model (BSM) theories is hampered by the need to run computationally expensive event generators and detector simulators. Performing statistically convergent scans of high-dimensional BSM theories is consequently challenging, and in practice unfeasible for very high-dimensional BSM theories. We present here a new machine learning method that accelerates the interpretation of LHC data, by learning the relationship between BSM theory parameters and data. As a proof-of-concept, we demonstrate that this technique accurately predicts natural SUSY signal events in two signal regions at the High Luminosity LHC, up to four orders of magnitude faster than standard techniques. The new approach makes it possible to rapidly and accurately reconstruct the theory parameters of complex BSM theories, should an excess in the data be discovered at the LHC.