Sigma: Differential Rescaling of Query, Key and Value for Efficient Language Models Paper • 2501.13629 • Published Jan 23 • 44
An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models Paper • 2309.09958 • Published Sep 18, 2023 • 19
AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation Paper • 2305.09515 • Published May 16, 2023 • 3