How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM? Paper • 2502.14502 • Published 3 days ago • 65 • 8
Running 1.37k 1.37k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
AIDE: AI-Driven Exploration in the Space of Code Paper • 2502.13138 • Published 5 days ago • 7 • 6
ActionPiece: Contextually Tokenizing Action Sequences for Generative Recommendation Paper • 2502.13581 • Published 4 days ago • 5 • 3
ActionPiece: Contextually Tokenizing Action Sequences for Generative Recommendation Paper • 2502.13581 • Published 4 days ago • 5
NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation Paper • 2502.12638 • Published 5 days ago • 7
MoM: Linear Sequence Modeling with Mixture-of-Memories Paper • 2502.13685 • Published 4 days ago • 29
Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation Paper • 2502.13145 • Published 5 days ago • 34
Continuous Diffusion Model for Language Modeling Paper • 2502.11564 • Published 6 days ago • 48
Text2World: Benchmarking Large Language Models for Symbolic World Model Generation Paper • 2502.13092 • Published 5 days ago • 12
MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections Paper • 2502.12170 • Published 10 days ago • 10
Large Language Models and Mathematical Reasoning Failures Paper • 2502.11574 • Published 6 days ago • 3
Large Language Models and Mathematical Reasoning Failures Paper • 2502.11574 • Published 6 days ago • 3 • 3
Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language Models Paper • 2502.08130 • Published 12 days ago • 9
We Can't Understand AI Using our Existing Vocabulary Paper • 2502.07586 • Published 12 days ago • 8
We Can't Understand AI Using our Existing Vocabulary Paper • 2502.07586 • Published 12 days ago • 8 • 4
CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation Paper • 2502.08639 • Published 11 days ago • 36
TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation Paper • 2502.07870 • Published 12 days ago • 42
Competitive Programming with Large Reasoning Models Paper • 2502.06807 • Published 20 days ago • 62