Papers-Fundamentals
updated
RoFormer: Enhanced Transformer with Rotary Position Embedding
Paper
• 2104.09864
• Published • 17
Attention Is All You Need
Paper
• 1706.03762
• Published • 118
Direct Nash Optimization: Teaching Language Models to Self-Improve with
General Preferences
Paper
• 2404.03715
• Published • 62
Zero-Shot Tokenizer Transfer
Paper
• 2405.07883
• Published • 5
Blending Is All You Need: Cheaper, Better Alternative to
Trillion-Parameters LLM
Paper
• 2401.02994
• Published • 52
The Prompt Report: A Systematic Survey of Prompting Techniques
Paper
• 2406.06608
• Published • 68
Extreme Compression of Large Language Models via Additive Quantization
Paper
• 2401.06118
• Published • 14
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open
Language Models
Paper
• 2402.03300
• Published • 142
HyperZcdotZcdotW Operator Connects Slow-Fast Networks for Full
Context Interaction
Paper
• 2401.17948
• Published • 4
Grokfast: Accelerated Grokking by Amplifying Slow Gradients
Paper
• 2405.20233
• Published • 7
Stream of Search (SoS): Learning to Search in Language
Paper
• 2404.03683
• Published • 30
Xmodel-2 Technical Report
Paper
• 2412.19638
• Published • 27
Transformer^2: Self-adaptive LLMs
Paper
• 2501.06252
• Published • 55
Foundations of Large Language Models
Paper
• 2501.09223
• Published • 13
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
Paper
• 2501.12948
• Published • 443
Preference Leakage: A Contamination Problem in LLM-as-a-judge
Paper
• 2502.01534
• Published • 40
Levels of AGI for Operationalizing Progress on the Path to AGI
Paper
• 2311.02462
• Published • 36
Large Language Diffusion Models
Paper
• 2502.09992
• Published • 127
A Survey on Post-training of Large Language Models
Paper
• 2503.06072
• Published • 11
Block Diffusion: Interpolating Between Autoregressive and Diffusion
Language Models
Paper
• 2503.09573
• Published • 76
Transformers without Normalization
Paper
• 2503.10622
• Published • 172
Large Language Model Agent: A Survey on Methodology, Applications and
Challenges
Paper
• 2503.21460
• Published • 83
rasbt/llama-3.2-from-scratch
Updated • 284
A Survey on Inference Engines for Large Language Models: Perspectives on
Optimization and Efficiency
Paper
• 2505.01658
• Published • 39
Insights into DeepSeek-V3: Scaling Challenges and Reflections on
Hardware for AI Architectures
Paper
• 2505.09343
• Published • 76
Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data
Could Be Secretly Stolen!
Paper
• 2505.15656
• Published • 15
Accelerating Scientific Research with Gemini: Case Studies and Common Techniques
Paper
• 2602.03837
• Published • 5
Robot Learning: A Tutorial
Paper
• 2510.12403
• Published • 127
Building AI Coding Agents for the Terminal: Scaffolding, Harness, Context Engineering, and Lessons Learned
Paper
• 2603.05344
• Published • 6