Fantastic Pretraining Optimizers and Where to Find Them Paper • 2509.02046 • Published 4 days ago • 10
Efficient Attention Mechanisms for Large Language Models: A Survey Paper • 2507.19595 • Published Jul 25 • 6
Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation Paper • 2507.06607 • Published Jul 9 • 10
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation Paper • 2506.09991 • Published Jun 11 • 56
Multimodal Latent Language Modeling with Next-Token Diffusion Paper • 2412.08635 • Published Dec 11, 2024 • 49