Pavel Izmailov
I’m a PhD student in Computer Science at NYU, working with Andrew Gordon Wilson. I am primarily interested in understanding and improving deep neural networks. In particular my interests include out of distribution generalization, probabilistic deep learning, representation learning and large models. I am also excited about generative models, uncertainty estimation, semisupervised learning, and other topics.
In years 2017–2019 I was a PhD student in Operations Research and Information Engineering at Cornell University, after which I received an MSc degree and transferred to NYU. I received a BSc in applied math and computer science from the faculty of Computational Mathematics and Cybernetics of Lomonosov Moscow State University, where I was working in the Bayesian Methods Research Group under supervision of Dmitry Vetrov and Dmitry Kropotov.
In the summer of 2019 I completed a research internship at Amazon AWS in Palo Alto, working with Bernie Wang and Alex Smola. In the summer of 2020 I worked with Matt Hoffman at Google AI. Between June 2021 and February 2022 I worked with Alex Alemi and Ben Poole at Google as a research intern and a student researcher. In the summer of 2022 I am excited to return to Google to work with Lucas Beyer and Simon Kornblith.
Our work on Bayesian model selection was recently recognized with an Outstanding Paper Award at ICML 2022!
You can contact me at pi390@nyu.edu and follow me on Twitter.
Links
Recorded Talks

What Are Bayesian Neural Network Posteriors Really Like?
AABI; Invited Talk, 2022 
How Do We Build Neural Networks We Can Trust?
Broad Institute of MIT and Harvard MIA Seminar; Invited Talk 2019

Averaging Weights Leads to Wider Optima and Better Generalization
UAI Oral presentation, 2018
Publications

*Equal first authorship.

Last Layer ReTraining is Sufficient for Robustness to Spurious Correlations
arXiv preprint, 2022
[PDF, ArXiv, Code] 
On Uncertainty, Tempering, and Data Augmentation in Bayesian Classification
arXiv preprint, 2022
[PDF, ArXiv, Code] 
Bayesian Model Selection, the Marginal Likelihood, and Generalization
International Conference on Machine Learning (ICML), 2022
Long Talk (Oral)
[PDF, ArXiv, Code] 
Dangers of Bayesian Model Averaging under Covariate Shift
Neural Information Processing Systems (NeurIPS), 2021
[PDF, ArXiv, Poster, Code] 
Does Knowledge Distillation Really Work?
Neural Information Processing Systems (NeurIPS), 2021
[PDF, ArXiv, Poster, Code] 
What Are Bayesian Neural Network Posteriors Really Like?
International Conference on Machine Learning (ICML), 2021
Long Talk (Oral)
[PDF, ArXiv, Code, HMC samples, Poster, NeurIPS competition] 
Learning Invariances in Neural Networks from Training Data
Neural Information Processing Systems (NeurIPS), 2020
[PDF, ArXiv, Code] 
Why Normalizing Flows Fail to Detect OutofDistribution Data
Neural Information Processing Systems (NeurIPS), 2020
[PDF, ArXiv, Code] 
Bayesian Deep Learning and a Probabilistic Perspective of Generalization
Neural Information Processing Systems (NeurIPS), 2020
[PDF, ArXiv, Code] 
Generalizing Convolutional Neural Networks for Equivariance to Lie Groups on Arbitrary Continuous Data
International Conference on Machine Learning (ICML), 2020
[PDF, ArXiv, Code] 
SemiSupervised Learning with Normalizing Flows
International Conference on Machine Learning (ICML), 2020
[PDF, ArXiv, Code] 
Subspace Inference for Bayesian Deep Learning
Uncertainty in Artificial Intelligence (UAI), 2019
[PDF, ArXiv, Code, Poster] 
A Simple Baseline for Bayesian Uncertainty in Deep Learning
Neural Information Processing Systems (NeurIPS), 2019
[PDF, ArXiv, Code, Poster, Video] 
There Are Many Consistent Explanations of Unlabeled Data: Why You Should Average
International Conference on Learning Representations (ICLR), 2019
[PDF, ArXiv, Code, Poster] 
Averaging Weights Leads to Wider Optima and Better Generalization
Uncertainty in Artificial Intelligence (UAI), 2018
Oral Presentation
[PDF, ArXiv, Code, Poster, Slides, PyTorch Blogpost, Towards Data Science Blogpost, fast.ai Blogpost] 
Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs
Neural Information Processing Systems (NeurIPS), 2018
Spotlight Presentation
[PDF, ArXiv, Code, Poster, Slides, Video, Blogpost] 
Tensor Train decomposition on TensorFlow (T3F)
Journal of Machine Learning Research, 2020
[PDF, ArXiv, Code] 
Scalable Gaussian Processes with Billions of Inducing Inputs via Tensor Train Decomposition
Artificial Intelligence and Statistics (AISTATS), 2018
Oral Presentation
[PDF, ArXiv, Code, Poster, Slides] 
Faster variational inducing input Gaussian process classification
Journal of Machine Learning and Data Analysis, 2017
[PDF, ArXiv]
Workshop Papers

On Feature Learning in the Presence of Spurious Correlations
ICML Workshop on Principles of Distribution Shift (PODS), 2022

Last Layer ReTraining is Sufficient for Robustness to Spurious Correlations
ICML Workshop on Spurious Correlations, Invariance, and Stability, 2022
Oral Presentation
[PDF, ArXiv, Code] 
SemiSupervised Learning with Normalizing Flows
ICML Workshop on Invertible Neural Nets and Normalizing Flows, 2019
[PDF, Poster] 
Invertible Convolutional Networks
ICML Workshop on Invertible Neural Nets and Normalizing Flows, 2019
Spotlight Presentation
[PDF, Poster, Slides] 
Subspace Inference for Bayesian Deep Learning
ICML Workshop on Uncertainty & Robustness in Deep Learning, 2019
Oral Presentation
[PDF, ArXiv, Code, Poster, Slides, Polina's Talk] 
Fast Uncertainty Estimates and Bayesian Model Averaging of DNNs
UAI Workshop: Uncertainty in Deep Learning, 2018
Oral Presentation
[PDF, Code, Poster, Slides] 
Improving Stability in Deep Reinforcement Learning with Weight Averaging
UAI Workshop: Uncertainty in Deep Learning, 2018
[PDF, Poster]