SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics Paper • 2506.01844 • Published Jun 2, 2025 • 147
CAST: Contrastive Adaptation and Distillation for Semi-Supervised Instance Segmentation Paper • 2505.21904 • Published May 28, 2025 • 3
MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning Paper • 2505.24871 • Published May 30, 2025 • 23
DINO-R1: Incentivizing Reasoning Capability in Vision Foundation Models Paper • 2505.24025 • Published May 29, 2025 • 27