Improved Autoregressive Modeling with Distribution Smoothing

Chenlin Meng    Jiaming Song    Yang Song   
Shengjia Zhao    Stefano Ermon   

Stanford University    

Paper | GitHub


While autoregressive models excel at image compression, their sample quality is often lacking. Although not realistic, generated images often have high likelihood according to the model, resembling the case of adversarial examples. Inspired by a successful adversarial defense method, we incorporate randomized smoothing into autoregressive generative modeling. We first model a smoothed version of the data distribution, and then reverse the smoothing process to recover the original data distribution. This procedure drastically improves the sample quality of existing autoregressive models on several synthetic and real-world image datasets while obtaining competitive likelihoods on synthetic datasets.

paper thumbnail


arXiv 2103.15089, 2021.


Chenlin Meng, Jiaming Song, Yang Song, Shengjia Zhao and Stefano Ermon. "Improved Autoregressive Modeling with Distribution Smoothing"

ICLR 2021 (Oral), Paper, arXiv, Bibtex

Stage 1: Learning the smoothed distribution

Instead of directly modeling the data distribution, we propose to first train an autoregressive model on the smoothed version of the data distribution.


Stage 2: Reverse smoothing

Although the smoothing process makes the distribution easier to learn, it also introduces bias. Thus, we need an extra step to debias the learned distribution by reverting the smoothing process.



In this paper, we propose to incorporate randomized smoothing techniques into autoregressive modeling. By choosing the smoothness level appropriately, this seemingly simple approach is able to drastically improve the sample quality of existing autoregressive models on several synthetic and real-world datasets while retaining reasonable likelihoods. Our work provides insights into how recent adversarial defense techniques can be leveraged to building more robust generative models. Since we apply randomized smoothing technique directly to the target data distribution other than the model, we believe our approach is also applicable to other generative models such as variational autoencoders (VAEs) and generative adversarial networks (GANs).