The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding Paper • 2502.08946 • Published 11 days ago • 181
ViBe: A Text-to-Video Benchmark for Evaluating Hallucination in Large Multimodal Models Paper • 2411.10867 • Published Nov 16, 2024 • 9
VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation Paper • 2411.13281 • Published Nov 20, 2024 • 20
SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory Paper • 2411.11922 • Published Nov 18, 2024 • 19
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training Paper • 2411.13476 • Published Nov 20, 2024 • 16
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models Paper • 2411.13503 • Published Nov 20, 2024 • 31
SEAGULL: No-reference Image Quality Assessment for Regions of Interest via Vision-Language Instruction Tuning Paper • 2411.10161 • Published Nov 15, 2024 • 9
Evaluating Tokenizer Performance of Large Language Models Across Official Indian Languages Paper • 2411.12240 • Published Nov 19, 2024 • 7
Continuous Speculative Decoding for Autoregressive Image Generation Paper • 2411.11925 • Published Nov 18, 2024 • 16
ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements Paper • 2411.12044 • Published Nov 18, 2024 • 14
Building Trust: Foundations of Security, Safety and Transparency in AI Paper • 2411.12275 • Published Nov 19, 2024 • 11
SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization Paper • 2411.11909 • Published Nov 17, 2024 • 21
FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations Paper • 2411.10818 • Published Nov 16, 2024 • 25