-
Scaling Laws for Neural Language Models
Paper • 2001.08361 • Published • 7 -
Scaling Laws for Autoregressive Generative Modeling
Paper • 2010.14701 • Published -
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 10 -
A Survey on Data Selection for Language Models
Paper • 2402.16827 • Published • 4
Collections
Discover the best community collections!
Collections including paper arxiv:2401.02954
-
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 346 -
Qwen2.5-Coder Technical Report
Paper • 2409.12186 • Published • 141 -
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
Paper • 2409.12122 • Published • 3 -
Qwen2.5-VL Technical Report
Paper • 2502.13923 • Published • 136
-
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 77 -
Gemma: Open Models Based on Gemini Research and Technology
Paper • 2403.08295 • Published • 48 -
Simple and Scalable Strategies to Continually Pre-train Large Language Models
Paper • 2403.08763 • Published • 50 -
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 44
-
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 62 -
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 44 -
Qwen Technical Report
Paper • 2309.16609 • Published • 35 -
Gemma: Open Models Based on Gemini Research and Technology
Paper • 2403.08295 • Published • 48
-
Rethinking Optimization and Architecture for Tiny Language Models
Paper • 2402.02791 • Published • 13 -
Specialized Language Models with Cheap Inference from Limited Domain Data
Paper • 2402.01093 • Published • 46 -
Scavenging Hyena: Distilling Transformers into Long Convolution Models
Paper • 2401.17574 • Published • 16 -
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper • 2401.02038 • Published • 63
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 146 -
ReFT: Reasoning with Reinforced Fine-Tuning
Paper • 2401.08967 • Published • 30 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 23 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 69
-
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 44 -
Perspectives on the State and Future of Deep Learning -- 2023
Paper • 2312.09323 • Published • 8 -
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
Paper • 2405.15071 • Published • 38 -
Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning
Paper • 2407.10718 • Published • 18
-
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 44 -
Qwen Technical Report
Paper • 2309.16609 • Published • 35 -
GPT-4 Technical Report
Paper • 2303.08774 • Published • 5 -
Gemini: A Family of Highly Capable Multimodal Models
Paper • 2312.11805 • Published • 45