GRPO has helped DeepSeek R1 to learn reasoning. Can it also help VLMs perform stronger for general computer vision tasks?
The answer is YES and it generalizes better than SFT. We trained Qwen 2.5 VL 3B on RefCOCO (a visual grounding task) and eval on RefCOCO Val and RefGTA (an OOD task).
π Supercharge your LLM apps with Langfuse on Hugging Face Spaces!
Langfuse brings end-to-end observability and tooling to accelerate your dev workflow from experiments through production
Now available as a Docker Space directly on the HF Hub! π€
π Trace everything: monitor LLM calls, retrieval, and agent actions with popular frameworks 1β£ One-click deployment: on Spaces with persistent storage and integrated OAuth π Simple Prompt Management: Version, edit, and update without redeployment β Intuitive Evals: Collect user feedback, run model/prompt evaluations, and improve quality π Dataset Creation: Build datasets directly from production data to enhance future performance
Kudos to the Langfuse team for this collab and the awesome, open-first product theyβre building! π @marcklingen@Clemo@MJannik
small but mighty π₯ you can fine-tune SmolVLM on an L4 with batch size of 4 and it will only take 16.4 GB VRAM π«°π» also with gradient accumulation simulated batch size is 16 β¨ I made a notebook that includes all the goodies: QLoRA, gradient accumulation, gradient checkpointing with explanations on how they work π https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb
Want to validate some hparams or figure out what timm model to use before commiting to download or training with a large dataset? Try mini-imagenet: timm/mini-imagenet
I had this sitting on my drive and forgot where I pulled it together from. It's 100 classes of imagenet, 50k train and 10k val images (from ImageNet-1k train set), and 5k test images (from ImageNet-1k val set). 7.4GB instead of > 100GB for the full ImageNet-1k. This ver is not reduced resolution like some other 'mini' versions. Super easy to use with timm train/val scripts, checkout the dataset card.
I often check fine-tuning with even smaller datasets like: * timm/resisc45 * timm/oxford-iiit-pet But those are a bit small to train any modest size model w/o starting from pretrained weights.
Hello, researchers! I've tried to made reading HF Daily Papers easier and made a tool that does reviews with LLMs like Claude 3.5, GPT-4o and sometimes FLUX.
π Classification by topics π Sorting by publication date and HF addition date π Syncing every 2 hours π» Hosted on GitHub π English, Russian, and Chinese π Top by week/month (in progress)