Geoff Pleiss: Understanding Neural Networks through Gaussian Processes, and Vice Versa

Abstract

Neural networks and Gaussian processes represent different learning paradigms: the former are parametric and rely on ERM-based training, while the latter are non-parametric and employ Bayesian inference. Despite these differences, I will discuss how Gaussian processes can help us understand and improve neural network design. One example of this is our recent work investigating the effect of width on neural networks. We study a generalized class of models—Deep Gaussian Processes—where parametric layers are replaced with GP layers. Analysis techniques from Bayesian nonparametrics uncover surprising pathologies of wide models, introduce a new interpretation of feature learning, and demonstrate a loss of adaptability with increasing width. We empirically confirm these findings hold for DGP, Bayesian neural networks, and conventional neural networks alike. With time permitting, I will also discuss recent work that leverages insights from neural network training to improve Gaussian process scalability. Taking inspiration from deep learning libraries, we constrain ourselves to write GP inference algorithms that only use matrix multiplication and other linear operations—procedures amenable to GPU acceleration and distributed computing. While these methods induce a slight bias—which we quantify and bound through a novel numerical analysis—we demonstrate that this can be eliminated through randomized truncation techniques and stochastic optimization.

Bio

Geoff Pleiss is a postdoc in the department of statistics and Zuckerman Institute at Columbia University. He received his Ph.D. from Cornell University under the supervision of Kilian Q. Weinberger, and his undergraduate degree in engineering from Olin College. His research interests intersect deep learning and probabilistic modeling, with an emphasis on how to make learning algorithms more scalable, robust, and reliable. In particular, his work focuses on uncertainty quantification, detecting anomalous training and test data, and speeding up Gaussian processes for extremely large datasets.