VLM R1 Referral Expression
Highlight described objects in images
Your Lyrics into Complete Songs with Vocals in Multilingual
Import a portrait, click to move the head!
Apply the motion of a video on a portrait
A unified multimodal understanding and generation model.
Interact with Qwen2.5-VL-72B to get responses and generate images