Post
133
Is it better to show a model too many images once (Diversity), or extract as much information from a small set of images?
I have always wanted to do an ablation study on this and recently I got the chance to do exactly that. Why? In applied domains like robotics, manufacturing, or banking, we rarely have the luxury of internet-scale diverse image datasets. We are often "Data Poor" in terms of diversity but "Data Rich" in depth.
The takeaway? Density is efficient for facts but dangerous for reasoning (logical collapse) if you don't have larger scale data.
More details:
https://huggingface.co/blog/Akhil-Theerthala/diversity-density-for-vision-language-models
I have always wanted to do an ablation study on this and recently I got the chance to do exactly that. Why? In applied domains like robotics, manufacturing, or banking, we rarely have the luxury of internet-scale diverse image datasets. We are often "Data Poor" in terms of diversity but "Data Rich" in depth.
The takeaway? Density is efficient for facts but dangerous for reasoning (logical collapse) if you don't have larger scale data.
More details:
https://huggingface.co/blog/Akhil-Theerthala/diversity-density-for-vision-language-models