InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper • 2508.18265 • Published 14 days ago • 182
Persona Vectors: Monitoring and Controlling Character Traits in Language Models Paper • 2507.21509 • Published Jul 29 • 29
Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models Paper • 2507.13344 • Published Jul 17 • 56
HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels Paper • 2507.21809 • Published Jul 29 • 124
ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents Paper • 2507.22827 • Published Jul 30 • 98
DesignBench: A Comprehensive Benchmark for MLLM-based Front-end Code Generation Paper • 2506.06251 • Published Jun 6 • 1