-
A Survey of Context Engineering for Large Language Models
Paper • 2507.13334 • Published • 249 -
GUI-G^2: Gaussian Reward Modeling for GUI Grounding
Paper • 2507.15846 • Published • 131 -
ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents
Paper • 2507.22827 • Published • 98 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 179
Collections
Discover the best community collections!
Collections including paper arxiv:2508.18265
-
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 179 -
OpenGVLab/InternVL3_5-241B-A28B
Image-Text-to-Text • 241B • Updated • 2.73k • 117 -
OpenGVLab/InternVL3_5-38B
Image-Text-to-Text • 38B • Updated • 7.07k • 24 -
OpenGVLab/InternVL3_5-30B-A3B
Image-Text-to-Text • 31B • Updated • 24.3k • 31
-
Agentic Reinforced Policy Optimization
Paper • 2507.19849 • Published • 148 -
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance
Paper • 2507.22448 • Published • 65 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 179 -
R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning
Paper • 2508.21113 • Published • 103
-
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 165 -
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO
Paper • 2505.22453 • Published • 46 -
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning
Paper • 2505.23380 • Published • 23 -
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models
Paper • 2505.21523 • Published • 14
-
Motif-Technologies/Motif-2.6B
Text Generation • 3B • Updated • 1.69k • 75 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 179 -
OpenGVLab/InternVL3_5-241B-A28B
Image-Text-to-Text • 241B • Updated • 2.73k • 117 -
openbmb/MiniCPM-V-4_5
Image-Text-to-Text • 9B • Updated • 22.6k • 880
-
Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks
Paper • 2310.19909 • Published • 21 -
Memory Augmented Language Models through Mixture of Word Experts
Paper • 2311.10768 • Published • 18 -
FlashDecoding++: Faster Large Language Model Inference on GPUs
Paper • 2311.01282 • Published • 37 -
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
Paper • 2311.04934 • Published • 34
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 6.03k • 1.16k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 14 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 1.18k • 15 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 61
-
ADS-Edit: A Multimodal Knowledge Editing Dataset for Autonomous Driving Systems
Paper • 2503.20756 • Published • 7 -
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset
Paper • 2505.09568 • Published • 97 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 179
-
A Survey of Context Engineering for Large Language Models
Paper • 2507.13334 • Published • 249 -
GUI-G^2: Gaussian Reward Modeling for GUI Grounding
Paper • 2507.15846 • Published • 131 -
ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents
Paper • 2507.22827 • Published • 98 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 179
-
Motif-Technologies/Motif-2.6B
Text Generation • 3B • Updated • 1.69k • 75 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 179 -
OpenGVLab/InternVL3_5-241B-A28B
Image-Text-to-Text • 241B • Updated • 2.73k • 117 -
openbmb/MiniCPM-V-4_5
Image-Text-to-Text • 9B • Updated • 22.6k • 880
-
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 179 -
OpenGVLab/InternVL3_5-241B-A28B
Image-Text-to-Text • 241B • Updated • 2.73k • 117 -
OpenGVLab/InternVL3_5-38B
Image-Text-to-Text • 38B • Updated • 7.07k • 24 -
OpenGVLab/InternVL3_5-30B-A3B
Image-Text-to-Text • 31B • Updated • 24.3k • 31
-
Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks
Paper • 2310.19909 • Published • 21 -
Memory Augmented Language Models through Mixture of Word Experts
Paper • 2311.10768 • Published • 18 -
FlashDecoding++: Faster Large Language Model Inference on GPUs
Paper • 2311.01282 • Published • 37 -
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
Paper • 2311.04934 • Published • 34
-
Agentic Reinforced Policy Optimization
Paper • 2507.19849 • Published • 148 -
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance
Paper • 2507.22448 • Published • 65 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 179 -
R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning
Paper • 2508.21113 • Published • 103
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 6.03k • 1.16k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 14 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 1.18k • 15 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 61
-
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 165 -
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO
Paper • 2505.22453 • Published • 46 -
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning
Paper • 2505.23380 • Published • 23 -
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models
Paper • 2505.21523 • Published • 14
-
ADS-Edit: A Multimodal Knowledge Editing Dataset for Autonomous Driving Systems
Paper • 2503.20756 • Published • 7 -
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset
Paper • 2505.09568 • Published • 97 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 179