Diffusion Beats Autoregressive in Data-Constrained Settings

...forecast suggests that by around 2028, we will enter a data-constrained regime: far more compute will be available than there are training tokens to consume.

This paper addresses the challenge by asking: how can we trade off more compute for less data? Our central idea is to revisit the foundations of modern generative modeling and compare the two dominant paradigms for scaling AI.

If you are compute-constrained, use autoregressive models; if you are data-constrained, use diffusion models.