Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling Paper β’ 2502.06703 β’ Published 13 days ago β’ 134
view article Article Mastering Long Contexts in LLMs with KVPress By nvidia and 1 other β’ Jan 23 β’ 63
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 β’ 3 items β’ Updated 28 days ago β’ 360
Qwen2.5-1M Collection The long-context version of Qwen2.5, supporting 1M-token context lengths β’ 2 items β’ Updated 28 days ago β’ 100
KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models Paper β’ 2412.06071 β’ Published Dec 8, 2024 β’ 9
PaliGemma 2 Release Collection Vision-Language Models available in multiple 3B, 10B and 28B variants. β’ 23 items β’ Updated Dec 13, 2024 β’ 141
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. β’ 45 items β’ Updated Nov 28, 2024 β’ 525
Qwen2 Collection Qwen2 language models, including pretrained and instruction-tuned models of 5 sizes, including 0.5B, 1.5B, 7B, 57B-A14B, and 72B. β’ 39 items β’ Updated Nov 28, 2024 β’ 357
Llama 3.2 Collection This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 β’ 15 items β’ Updated Dec 6, 2024 β’ 570
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training Paper β’ 2403.09611 β’ Published Mar 14, 2024 β’ 126
Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions Paper β’ 2406.09264 β’ Published Jun 13, 2024 β’ 1