Tony Zhao
tianchez
AI & ML interests
Multimodal Agent, Generative AI
Recent Activity
reacted
to
their
post
with ๐
about 11 hours ago
Introducing VLM-R1!
GRPO has helped DeepSeek R1 to learn reasoning. Can it also help VLMs perform stronger for general computer vision tasks?
The answer is YES and it generalizes better than SFT. We trained Qwen 2.5 VL 3B on RefCOCO (a visual grounding task) and eval on RefCOCO Val and RefGTA (an OOD task).
https://github.com/om-ai-lab/VLM-R1
new activity
2 days ago
omlab/VLM-R1-Referral-Expression:Fixes 500 error for some users
reacted
to
their
post
with โค๏ธ
6 days ago
Introducing VLM-R1!
GRPO has helped DeepSeek R1 to learn reasoning. Can it also help VLMs perform stronger for general computer vision tasks?
The answer is YES and it generalizes better than SFT. We trained Qwen 2.5 VL 3B on RefCOCO (a visual grounding task) and eval on RefCOCO Val and RefGTA (an OOD task).
https://github.com/om-ai-lab/VLM-R1
Organizations
tianchez's activity
Fixes 500 error for some users
1
#1 opened 4 days ago
by
Tonic

Update to correct ref: omlab/omdet-turbo-swin-tiny-hf
1
#2 opened 5 months ago
by
ozdeadman

Image guided object detection
1
#3 opened 4 months ago
by
godaspeg
is there any opensource repo for this?
3
#1 opened 6 months ago
by
lucasjin