TESS 2: A Large-Scale Generalist Diffusion Language Model
Abstract
We introduce TESS 2, a general instruction-following diffusion language model that outperforms contemporary instruction-tuned diffusion models, as well as matches and sometimes exceeds strong autoregressive (AR) models. We train TESS 2 by first adapting a strong AR model via continued pretraining with the usual cross-entropy as diffusion loss, and then performing further instruction tuning. We find that adaptation training as well as the choice of the base model is crucial for training good instruction-following diffusion models. We further propose reward guidance, a novel and modular inference-time guidance procedure to align model outputs without needing to train the underlying model. Finally, we show that TESS 2 further improves with increased inference-time compute, highlighting the utility of diffusion LMs in having fine-grained controllability over the amount of compute used at inference time. Code and models are available at https://github.com/hamishivi/tess-2.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- CoDe: Blockwise Control for Denoising Diffusion Models (2025)
- Inference-Time Alignment in Diffusion Models with Reward-Guided Generation: Tutorial and Review (2025)
- Test-time Alignment of Diffusion Models without Reward Over-optimization (2025)
- Large Language Models to Diffusion Finetuning (2025)
- D3RM: A Discrete Denoising Diffusion Refinement Model for Piano Transcription (2025)
- Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization (2025)
- Personalized Preference Fine-tuning of Diffusion Models (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 7
Browse 7 models citing this paperDatasets citing this paper 0
No dataset linking this paper