IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs Paper • 2504.15415 • Published Apr 21 • 22
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling Paper • 2501.16975 • Published Jan 28 • 32