Describe Anything: Detailed Localized Image and Video Captioning Paper • 2504.16072 • Published Apr 22 • 63
THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models Paper • 2504.13367 • Published Apr 17 • 25
Make It Count: Text-to-Image Generation with an Accurate Number of Objects Paper • 2406.10210 • Published Jun 14, 2024 • 79
TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation Paper • 2406.08656 • Published Jun 12, 2024 • 8
TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation Paper • 2406.08656 • Published Jun 12, 2024 • 8
TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation Paper • 2406.08656 • Published Jun 12, 2024 • 8 • 1
T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback Paper • 2405.18750 • Published May 29, 2024 • 22
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos Paper • 2406.08407 • Published Jun 12, 2024 • 29
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos Paper • 2406.08407 • Published Jun 12, 2024 • 29
LayoutGPT: Compositional Visual Planning and Generation with Large Language Models Paper • 2305.15393 • Published May 24, 2023 • 1
VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View Paper • 2307.06082 • Published Jul 12, 2023
Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis Paper • 2212.05032 • Published Dec 9, 2022 • 1
T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback Paper • 2405.18750 • Published May 29, 2024 • 22
Neuro-Symbolic Procedural Planning with Commonsense Prompting Paper • 2206.02928 • Published Jun 6, 2022