Has GPT-5 Achieved Spatial Intelligence? An Empirical Study Paper • 2508.13142 • Published 2 days ago • 24
Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model Paper • 2508.13009 • Published 3 days ago • 19
Inverse-LLaVA: Eliminating Alignment Pre-training Through Text-to-Vision Mapping Paper • 2508.12466 • Published 3 days ago • 8
Representing Speech Through Autoregressive Prediction of Cochlear Tokens Paper • 2508.11598 • Published 6 days ago • 14
Lumen: Consistent Video Relighting and Harmonious Background Replacement with Video Generative Models Paper • 2508.12945 • Published 3 days ago • 10
S^2-Guidance: Stochastic Self Guidance for Training-Free Enhancement of Diffusion Models Paper • 2508.12880 • Published 3 days ago • 38
Precise Action-to-Video Generation Through Visual Action Prompts Paper • 2508.13104 • Published 3 days ago • 9
Speed Always Wins: A Survey on Efficient Architectures for Large Language Models Paper • 2508.09834 • Published 8 days ago • 41
BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining Paper • 2508.10975 • Published 6 days ago • 48
TexVerse: A Universe of 3D Objects with High-Resolution Textures Paper • 2508.10868 • Published 6 days ago • 13
FantasyTalking2: Timestep-Layer Adaptive Preference Optimization for Audio-Driven Portrait Animation Paper • 2508.11255 • Published 6 days ago • 9
StyleMM: Stylized 3D Morphable Face Model via Text-Driven Aligned Image Translation Paper • 2508.11203 • Published 6 days ago • 8
StableAvatar: Infinite-Length Audio-Driven Avatar Video Generation Paper • 2508.08248 • Published 9 days ago • 24