Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models Paper • 2406.11831 • Published Jun 17, 2024 • 22
NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment Paper • 2405.01481 • Published May 2, 2024 • 30
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Paper • 2404.02258 • Published Apr 2, 2024 • 104
Training Chain-of-Thought via Latent-Variable Inference Paper • 2312.02179 • Published Nov 28, 2023 • 11
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT) Paper • 2309.08968 • Published Sep 16, 2023 • 23
A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale Paper • 2309.06497 • Published Sep 12, 2023 • 6
Stabilizing RLHF through Advantage Model and Selective Rehearsal Paper • 2309.10202 • Published Sep 18, 2023 • 11