Pavel Izmailov
Contact: pi390@nyu.edu, Twitter
Iām a PhD student in Computer Science at NYU, working with Andrew Gordon Wilson. In my research, I aim to build foundational understanding of models, training procedures, and their limitations. I use this understanding to develop practically impactful, interpretable, robust and broadly applicable methods and models. My interests include out of distribution generalization, probabilistic deep learning, representation learning, large models, and other topics. Our work on Bayesian model selection was recently recognized with an Outstanding Paper Award š at ICML 2022!
In years 2017ā2019 I was a PhD student in Operations Research and Information Engineering at Cornell University, after which I received an MSc degree and transferred to NYU. I received a BSc in applied math and computer science from the faculty of Computational Mathematics and Cybernetics of Lomonosov Moscow State University, where I was working at the Bayesian Methods Research Group under supervision of Dmitry Vetrov and Dmitry Kropotov.
In the summer of 2019 I completed a research internship at Amazon AWS in Palo Alto, working with Bernie Wang and Alex Smola. In the summer of 2020 I worked with Matt Hoffman at Google AI. Between June 2021 and February 2022 I worked with Alex Alemi and Ben Poole at Google as a research intern and a student researcher. In the summer of 2022 I worked with Lucas Beyer and Simon Kornblith at Google Brain.
I am on the academic job market!
Links
 [Home, Publications, Talks, CV, GitHub, Google Scholar, Semantic Scholar]
Publications

*Equal first authorship.

FlexiViT: one model for all patch sizes
In submission, 2022
[PDF, ArXiv, Code] 
Last Layer ReTraining is Sufficient for Robustness to Spurious Correlations
International Conference on Learning Representations (ICLR), 2023
š Spotlight Presentation
[PDF, ArXiv, Code] 
On Feature Learning in the Presence of Spurious Correlations
Neural Information Processing Systems (NeurIPS), 2022
[PDF, ArXiv, Code] 
On Uncertainty, Tempering, and Data Augmentation in Bayesian Classification
Neural Information Processing Systems (NeurIPS), 2022
[PDF, ArXiv, Code] 
Bayesian Model Selection, the Marginal Likelihood, and Generalization
International Conference on Machine Learning (ICML), 2022
š Outstanding Paper Award, š¢ Long Talk (Oral)
[PDF, ArXiv, Code] 
Unsupervised learning of twocomponent nematicity from STM data on magic angle bilayer graphene
arXiv preprint, 2022
[PDF, ArXiv] 
Dangers of Bayesian Model Averaging under Covariate Shift
Neural Information Processing Systems (NeurIPS), 2021
[PDF, ArXiv, Poster, Code] 
Does Knowledge Distillation Really Work?
Neural Information Processing Systems (NeurIPS), 2021
[PDF, ArXiv, Poster, Code] 
What Are Bayesian Neural Network Posteriors Really Like?
International Conference on Machine Learning (ICML), 2021
š¢ Long Talk (Oral)
[PDF, ArXiv, Code, HMC samples, Poster, NeurIPS competition] 
Learning Invariances in Neural Networks from Training Data
Neural Information Processing Systems (NeurIPS), 2020
[PDF, ArXiv, Code] 
Why Normalizing Flows Fail to Detect OutofDistribution Data
Neural Information Processing Systems (NeurIPS), 2020
[PDF, ArXiv, Code] 
Bayesian Deep Learning and a Probabilistic Perspective of Generalization
Neural Information Processing Systems (NeurIPS), 2020
[PDF, ArXiv, Code] 
Generalizing Convolutional Neural Networks for Equivariance to Lie Groups on Arbitrary Continuous Data
International Conference on Machine Learning (ICML), 2020
[PDF, ArXiv, Code] 
SemiSupervised Learning with Normalizing Flows
International Conference on Machine Learning (ICML), 2020
[PDF, ArXiv, Code] 
Subspace Inference for Bayesian Deep Learning
Uncertainty in Artificial Intelligence (UAI), 2019
[PDF, ArXiv, Code, Poster] 
A Simple Baseline for Bayesian Uncertainty in Deep Learning
Neural Information Processing Systems (NeurIPS), 2019
[PDF, ArXiv, Code, Poster, Video] 
There Are Many Consistent Explanations of Unlabeled Data: Why You Should Average
International Conference on Learning Representations (ICLR), 2019
[PDF, ArXiv, Code, Poster] 
Averaging Weights Leads to Wider Optima and Better Generalization
Uncertainty in Artificial Intelligence (UAI), 2018
š¢ Oral Presentation
[PDF, ArXiv, Code, Poster, Slides, PyTorch Blogpost, Towards Data Science Blogpost, fast.ai Blogpost] 
Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs
Neural Information Processing Systems (NeurIPS), 2018
š Spotlight Presentation
[PDF, ArXiv, Code, Poster, Slides, Video, Blogpost] 
Tensor Train decomposition on TensorFlow (T3F)
Journal of Machine Learning Research, 2020
[PDF, ArXiv, Code] 
Scalable Gaussian Processes with Billions of Inducing Inputs via Tensor Train Decomposition
Artificial Intelligence and Statistics (AISTATS), 2018
š¢ Oral Presentation
[PDF, ArXiv, Code, Poster, Slides] 
Faster variational inducing input Gaussian process classification
Journal of Machine Learning and Data Analysis, 2017
[PDF, ArXiv]
Workshop Papers

On Feature Learning in the Presence of Spurious Correlations
ICML Workshop on Principles of Distribution Shift (PODS), 2022

Last Layer ReTraining is Sufficient for Robustness to Spurious Correlations
ICML Workshop on Spurious Correlations, Invariance, and Stability, 2022
š¢ Oral Presentation
[PDF, ArXiv, Code] 
SemiSupervised Learning with Normalizing Flows
ICML Workshop on Invertible Neural Nets and Normalizing Flows, 2019
[PDF, Poster] 
Invertible Convolutional Networks
ICML Workshop on Invertible Neural Nets and Normalizing Flows, 2019
š Spotlight Presentation
[PDF, Poster, Slides] 
Subspace Inference for Bayesian Deep Learning
ICML Workshop on Uncertainty & Robustness in Deep Learning, 2019
š¢ Oral Presentation
[PDF, ArXiv, Code, Poster, Slides, Polina's Talk] 
Fast Uncertainty Estimates and Bayesian Model Averaging of DNNs
UAI Workshop: Uncertainty in Deep Learning, 2018
š¢ Oral Presentation
[PDF, Code, Poster, Slides] 
Improving Stability in Deep Reinforcement Learning with Weight Averaging
UAI Workshop: Uncertainty in Deep Learning, 2018
[PDF, Poster]