Coherent Multimodal Reasoning with Iterative Self-Evaluation for Vision-Language Models Paper • 2508.02886 • Published Aug 4 • 1
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale Paper • 2508.10711 • Published 24 days ago • 141
DINOv3 Collection DINOv3: foundation models producing excellent dense features, outperforming SotA w/o fine-tuning - https://arxiv.org/abs/2508.10104 • 13 items • Updated 16 days ago • 276
view article Article Vision Language Models (Better, Faster, Stronger) By merve and 4 others • May 12 • 522
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Paper • 2506.13585 • Published Jun 16 • 263