Yash Thube

thubZ9

AI & ML interests

Multimodal learning • CV • RL

Recent Activity

liked a dataset 2 days ago

HuggingFaceM4/FineVision

updated a model about 2 months ago

thubZ9/sam_lora

published a model about 2 months ago

thubZ9/sam_lora

View all activity

Organizations

upvoted 4 papers 3 months ago

upvoted an article 4 months ago

Article

Vision Language Models Explained

and 1 other •

Apr 11, 2024

• 444

upvoted 2 papers 4 months ago

Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Paper • 2505.03335 • Published May 6 • 184

The Leaderboard Illusion

Paper • 2504.20879 • Published Apr 29 • 70

upvoted 4 papers 5 months ago

Efficient Process Reward Model Training via Active Learning

Paper • 2504.10559 • Published Apr 14 • 13

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14 • 286

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26 • 165

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

Paper • 2504.01990 • Published Mar 31 • 302

upvoted a paper 6 months ago

One-Step Residual Shifting Diffusion for Image Super-Resolution via Distillation

Paper • 2503.13358 • Published Mar 17 • 96

upvoted 2 collections 6 months ago

Cohere Labs Aya Vision

Collection

Aya Vision is a state-of-the-art family of vision models that brings multimodal capabilities to 23 languages. • 5 items • Updated Jul 31 • 70

Gemma 3 Release

Collection

28 items • Updated 27 days ago • 493

upvoted an article 6 months ago

Article

A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality

and 3 others •

Mar 4

• 75

upvoted 5 papers 6 months ago

Unified Reward Model for Multimodal Understanding and Generation

Paper • 2503.05236 • Published Mar 7 • 124

Token-Efficient Long Video Understanding for Multimodal LLMs

Paper • 2503.04130 • Published Mar 6 • 95

Visual-RFT: Visual Reinforcement Fine-Tuning

Paper • 2503.01785 • Published Mar 3 • 83

SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Paper • 2502.18449 • Published Feb 25 • 76

DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks

Paper • 2502.17157 • Published Feb 24 • 53