-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 624 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 298 -
Group Sequence Policy Optimization
Paper • 2507.18071 • Published • 294 -
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth
Paper • 2509.03867 • Published • 101
Collections
Discover the best community collections!
Collections including paper arxiv:2509.04419
-
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations
Paper • 2508.09789 • Published • 5 -
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
Paper • 2508.13186 • Published • 17 -
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents
Paper • 2508.04038 • Published • 1 -
Prompt Orchestration Markup Language
Paper • 2508.13948 • Published • 45
-
Nuclear Norm Regularization for Deep Learning
Paper • 2405.14544 • Published • 1 -
Token embeddings violate the manifold hypothesis
Paper • 2504.01002 • Published • 1 -
Approximate Nullspace Augmented Finetuning for Robust Vision Transformers
Paper • 2403.10476 • Published • 1 -
ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning
Paper • 2504.00254 • Published • 1
-
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Paper • 2409.11355 • Published • 32 -
Training Language Models to Self-Correct via Reinforcement Learning
Paper • 2409.12917 • Published • 141 -
Beyond Fine-tuning: Unleashing the Potential of Continuous Pretraining for Clinical LLMs
Paper • 2409.14988 • Published • 24 -
LLM Pretraining with Continuous Concepts
Paper • 2502.08524 • Published • 29
-
Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR
Paper • 2509.02522 • Published • 22 -
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning
Paper • 2508.20751 • Published • 85 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 139 -
Towards a Unified View of Large Language Model Post-Training
Paper • 2509.04419 • Published • 43
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 281 • 95 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 35 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 98 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 89
-
Rethinking Data Selection at Scale: Random Selection is Almost All You Need
Paper • 2410.09335 • Published • 17 -
From Generalist to Specialist: Adapting Vision Language Models via Task-Specific Visual Instruction Tuning
Paper • 2410.06456 • Published • 38 -
Emergent properties with repeated examples
Paper • 2410.07041 • Published • 8 -
Personalized Visual Instruction Tuning
Paper • 2410.07113 • Published • 71
-
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method
Paper • 2402.17193 • Published • 27 -
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
Paper • 2410.23743 • Published • 64 -
Direct Preference Optimization Using Sparse Feature-Level Constraints
Paper • 2411.07618 • Published • 17 -
Transformer^2: Self-adaptive LLMs
Paper • 2501.06252 • Published • 55
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 624 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 298 -
Group Sequence Policy Optimization
Paper • 2507.18071 • Published • 294 -
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth
Paper • 2509.03867 • Published • 101
-
Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR
Paper • 2509.02522 • Published • 22 -
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning
Paper • 2508.20751 • Published • 85 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 139 -
Towards a Unified View of Large Language Model Post-Training
Paper • 2509.04419 • Published • 43
-
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations
Paper • 2508.09789 • Published • 5 -
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
Paper • 2508.13186 • Published • 17 -
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents
Paper • 2508.04038 • Published • 1 -
Prompt Orchestration Markup Language
Paper • 2508.13948 • Published • 45
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 281 • 95 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 35 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 98 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 89
-
Nuclear Norm Regularization for Deep Learning
Paper • 2405.14544 • Published • 1 -
Token embeddings violate the manifold hypothesis
Paper • 2504.01002 • Published • 1 -
Approximate Nullspace Augmented Finetuning for Robust Vision Transformers
Paper • 2403.10476 • Published • 1 -
ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning
Paper • 2504.00254 • Published • 1
-
Rethinking Data Selection at Scale: Random Selection is Almost All You Need
Paper • 2410.09335 • Published • 17 -
From Generalist to Specialist: Adapting Vision Language Models via Task-Specific Visual Instruction Tuning
Paper • 2410.06456 • Published • 38 -
Emergent properties with repeated examples
Paper • 2410.07041 • Published • 8 -
Personalized Visual Instruction Tuning
Paper • 2410.07113 • Published • 71
-
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Paper • 2409.11355 • Published • 32 -
Training Language Models to Self-Correct via Reinforcement Learning
Paper • 2409.12917 • Published • 141 -
Beyond Fine-tuning: Unleashing the Potential of Continuous Pretraining for Clinical LLMs
Paper • 2409.14988 • Published • 24 -
LLM Pretraining with Continuous Concepts
Paper • 2502.08524 • Published • 29
-
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method
Paper • 2402.17193 • Published • 27 -
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
Paper • 2410.23743 • Published • 64 -
Direct Preference Optimization Using Sparse Feature-Level Constraints
Paper • 2411.07618 • Published • 17 -
Transformer^2: Self-adaptive LLMs
Paper • 2501.06252 • Published • 55