Maybe you are looking for CroQS: Cross-modal Query Suggestion for Text-to-Image Retrieval Paper • 2412.13834 • Published Dec 18, 2024
CountingDINO: A Training-free Pipeline for Class-Agnostic Counting using Unsupervised Backbones Paper • 2504.16570 • Published Apr 23
One Patch to Caption Them All: A Unified Zero-Shot Captioning Framework Paper • 2510.02898 • Published Oct 3 • 4