Visual-CoG: Stage-Aware Reinforcement Learning with Chain of Guidance for Text-to-Image Generation Paper • 2508.18032 • Published 13 days ago • 40
Ming-Omni: A Unified Multimodal Model for Perception and Generation Paper • 2506.09344 • Published Jun 11 • 27
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published Apr 14 • 286 • 8
view article Article Open-R1: a fully open reproduction of DeepSeek-R1 By eliebak and 2 others • Jan 28 • 879
iFormer: Integrating ConvNet and Transformer for Mobile Application Paper • 2501.15369 • Published Jan 26 • 13
iFormer: Integrating ConvNet and Transformer for Mobile Application Paper • 2501.15369 • Published Jan 26 • 13