-
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
Paper • 2404.14619 • Published • 127 -
Scaling Laws for Downstream Task Performance of Large Language Models
Paper • 2402.04177 • Published • 18 -
Orca 2: Teaching Small Language Models How to Reason
Paper • 2311.11045 • Published • 73 -
Orca-Math: Unlocking the potential of SLMs in Grade School Math
Paper • 2402.14830 • Published • 24
Collections
Discover the best community collections!
Collections including paper arxiv:2402.04177
-
Scaling Laws for Forgetting When Fine-Tuning Large Language Models
Paper • 2401.05605 • Published -
Scaling Laws for Downstream Task Performance of Large Language Models
Paper • 2402.04177 • Published • 18 -
A Comprehensive Evaluation of Parameter-Efficient Fine-Tuning on Software Engineering Tasks
Paper • 2312.15614 • Published
-
Scaling Laws for Downstream Task Performance of Large Language Models
Paper • 2402.04177 • Published • 18 -
A Tale of Tails: Model Collapse as a Change of Scaling Laws
Paper • 2402.07043 • Published • 15 -
Scaling Laws for Fine-Grained Mixture of Experts
Paper • 2402.07871 • Published • 14 -
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method
Paper • 2402.17193 • Published • 24
-
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Paper • 2402.04291 • Published • 49 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 115 -
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
Paper • 2402.04248 • Published • 31 -
Scaling Laws for Downstream Task Performance of Large Language Models
Paper • 2402.04177 • Published • 18
-
Scaling Laws for Downstream Task Performance of Large Language Models
Paper • 2402.04177 • Published • 18 -
Offline Actor-Critic Reinforcement Learning Scales to Large Models
Paper • 2402.05546 • Published • 5 -
SaulLM-7B: A pioneering Large Language Model for Law
Paper • 2403.03883 • Published • 79 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 609
-
Approximating Two-Layer Feedforward Networks for Efficient Transformers
Paper • 2310.10837 • Published • 11 -
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 97 -
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
Paper • 2310.16795 • Published • 27 -
LLM-FP4: 4-Bit Floating-Point Quantized Transformers
Paper • 2310.16836 • Published • 14
-
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Paper • 2309.14509 • Published • 18 -
LLM Augmented LLMs: Expanding Capabilities through Composition
Paper • 2401.02412 • Published • 37 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 49 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 23