Estimating Barycenters of Measures in High Dimensions
Barycenters summarize populations of measures, but computing them does not scale to high dimensions with existing methods. We propose a scalable algorithm for estimating barycenters in high dimensions by turning the optimization over measures into a more tractable optimization over a space of generative models.
Barycentric averaging is a principled way of summarizing populations of measures. Existing algorithms for estimating barycenters typically parametrize them as weighted sums of atoms and optimize weights and/or locations. However, these approaches do not scale to high-dimensional settings due to the curse of dimensionality. We propose a scalable and general algorithm for estimating barycenters of measures in high dimensions1.
Barycenters of Probability Measures
The barycenter of $P$ probability measures $\mu_{1},…,\mu_{P}$ weighted by a vector $\boldsymbol{\beta} \in \Delta_P$ can be expressed as the optimal measure $\mu^{\star}$ solving the optimization problem2 $$ \mu^{\star}=\arg\max_{\mu} \sum_{p=1}^{P}\beta_{p}D(\mu, \mu_{p}), $$ where $D$ is a distance between probability measures. It can be interpreted as an extension of the Euclidean mean to means of measures.
Importantly, different distances $D$ will induce different averaging properties. For instance, we see in Figure 1 that maximum mean discrepancy (MMD) leads to a mixture behavior, whilst the 2-Wasserstein distance interpolates between measures.
Previous approaches345 typically parametrized barycenters as a discrete measure and optimized its weights and/or locations. This requires an exponentially increasing number of locations as the dimensionality of the domain increases. As a result of this “curse of dimensionality”, these methods are typically restricted to low-dimensional problems in $\mathbb{R}^{\leq 3}$ and hand-tailored to specific choices of $D$.
Scalable and General Computation of Barycenters
In our paper1, we propose a scalable algorithm for estimating barycenters of measures. The main idea is to turn the optimization over measures into a more tractable optimization over a space of generative models67.
- Generative barycenter: Specifically, we parametrize the barycenter $P_\theta$ as a generative model consisting of a latent measure $\rho$ and a parametric functional $G_{\theta}$. We can sample from it as follows: $$\boldsymbol{z} \sim \rho, \qquad \boldsymbol{x} = G_\theta(\boldsymbol{z}).$$ We illustrate a circular generative model in Figure 2. In that example, the radius and centre of the ellipse are the parameters $\theta$ to be optimized, instead of the locations of individual atoms.
Optimization: Computing the parametric barycenter consists of solving the following optimization problem by SGD variants: $$ \theta^\star = \arg\min_{\theta} \sum_{p=1}^P\beta_p D(G_{\theta \star}\rho, \mu_{p}). $$
Inductive Biases: We can incorporate prior knowledge on the form of the barycenter through the generator $G_\theta$’s structure (e.g., CNNs for barycenters of images). This enables scaling to high-dimensional settings. Note that $G_\theta$ is not restricted to being a neural network, and domain knowledge can enable more efficient learning.
Convergence Guarantees We study local convergence of our algorithms for different $D$. In particular, we show that under smoothness assumptions on the distance, the barycentric problem converges to a stationary point, which holds for popular choices of discrepancies.1
Experiments
We now discuss briefly some experiments, and recommend looking into the paper1 for more details and comparisons to previous works.
- Nested Ellipses: Firstly, we consider the computation of the Sinkhorn barycenter of $P=30$ nested ellipses. We consider two approaches to parametrizing the generator $G_\theta$, $(i)$ using a multi-layer perceptron (MLP) as $G_\theta$ and $(ii)$ exploiting inductive biases by parametrizing two ellipses ($\theta$: axis lengths and centers of both ellipses). Figure 2 shows that both approaches recover a sensible barycenter.
- Natural Image Datasets: We now deal with problems in thousands of dimensions, by computing MMD barycenters of image datasets. Each atom is now an image, whilst previous CV approaches computed barycenters of single images, and atoms were individual pixels.
Concluding remarks
- Previous approaches used compact local basis functions to parametrize barycenters, which does not scale well with dimensions due to the curse of dimensionality.
- We propose a generative approach relying on global basis functions allowing for scalability and generalizability.
- We provide local convergence guarantees for a variety of distances.
- We apply our algorithm to problems of unprecedented scales, and study barycentric behavior from theoretical (advancing previous results) and empirical standpoints.
S. Cohen, M. Arbel, M. P. Deisenroth. Estimating Barycenters of Measures in High Dimensions. arXiv:2007.07105. 2020. ↩︎ ↩︎ ↩︎ ↩︎
M. Agueh, G. Carlier. Barycenters in the Wasserstein Space. SIAM Journal on Mathematical Analysis. 2011. ↩︎
M. Cuturi, A. Doucet. Fast Computation of Wasserstein Barycenters. ICML 2014. ↩︎
G. Luise, S. Salzo, M. Pontil, C. Ciliberto. Sinkhorn Barycenters with Free Support via Frank-Wolfe Algorithm. NeurIPS 2019. ↩︎
J.-D. Benamou, G. Carlier, M. Cuturi, L. Nenna, G. Peyré. Iterative Bregman Projections for Regularized Transportation Problems. SIAM Journal on Scientific Computing. 2014. ↩︎
A. Genevay, G. Peyré, M. Cuturi. Learning Generative Models with Sinkhorn Divergences. AISTATS 2018 ↩︎
C.-L. Li, W-C. Chang, Y. Cheng, Y. Yang, B. Poczos. MMD GAN: Towards Deeper Understanding of Moment Matching Network. NeurIPS 2017 ↩︎