-
PlayerOne: Egocentric World Simulator
Paper • 2506.09995 • Published • 34 -
Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition
Paper • 2506.17201 • Published • 55 -
Playing with Transformer at 30+ FPS via Next-Frame Diffusion
Paper • 2506.01380 • Published • 2
Collections
Discover the best community collections!
Collections including paper arxiv:2506.09995
-
4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models
Paper • 2503.10437 • Published • 33 -
Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k
Paper • 2503.09642 • Published • 19 -
VGGT: Visual Geometry Grounded Transformer
Paper • 2503.11651 • Published • 29 -
1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering
Paper • 2503.16422 • Published • 14
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 58 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 53 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 44 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 64
-
Video World Models with Long-term Spatial Memory
Paper • 2506.05284 • Published • 53 -
yejunliang23/ShapeLLM-7B-omni
Image-to-3D • 8B • Updated • 3.86k • 12 -
Image Editing As Programs with Diffusion Models
Paper • 2506.04158 • Published • 24 -
VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation
Paper • 2506.03930 • Published • 26
-
Light-A-Video: Training-free Video Relighting via Progressive Light Fusion
Paper • 2502.08590 • Published • 44 -
Distillation Scaling Laws
Paper • 2502.08606 • Published • 49 -
Soundwave: Less is More for Speech-Text Alignment in LLMs
Paper • 2502.12900 • Published • 86 -
Alias-Free Latent Diffusion Models:Improving Fractional Shift Equivariance of Diffusion Latent Space
Paper • 2503.09419 • Published • 6
-
MotionLLM: Understanding Human Behaviors from Human Motions and Videos
Paper • 2405.20340 • Published • 21 -
Spectrally Pruned Gaussian Fields with Neural Compensation
Paper • 2405.00676 • Published • 10 -
Paint by Inpaint: Learning to Add Image Objects by Removing Them First
Paper • 2404.18212 • Published • 30 -
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Paper • 2405.00732 • Published • 122
-
PlayerOne: Egocentric World Simulator
Paper • 2506.09995 • Published • 34 -
Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition
Paper • 2506.17201 • Published • 55 -
Playing with Transformer at 30+ FPS via Next-Frame Diffusion
Paper • 2506.01380 • Published • 2
-
Video World Models with Long-term Spatial Memory
Paper • 2506.05284 • Published • 53 -
yejunliang23/ShapeLLM-7B-omni
Image-to-3D • 8B • Updated • 3.86k • 12 -
Image Editing As Programs with Diffusion Models
Paper • 2506.04158 • Published • 24 -
VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation
Paper • 2506.03930 • Published • 26
-
4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models
Paper • 2503.10437 • Published • 33 -
Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k
Paper • 2503.09642 • Published • 19 -
VGGT: Visual Geometry Grounded Transformer
Paper • 2503.11651 • Published • 29 -
1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering
Paper • 2503.16422 • Published • 14
-
Light-A-Video: Training-free Video Relighting via Progressive Light Fusion
Paper • 2502.08590 • Published • 44 -
Distillation Scaling Laws
Paper • 2502.08606 • Published • 49 -
Soundwave: Less is More for Speech-Text Alignment in LLMs
Paper • 2502.12900 • Published • 86 -
Alias-Free Latent Diffusion Models:Improving Fractional Shift Equivariance of Diffusion Latent Space
Paper • 2503.09419 • Published • 6
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 58 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 53 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 44 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 64
-
MotionLLM: Understanding Human Behaviors from Human Motions and Videos
Paper • 2405.20340 • Published • 21 -
Spectrally Pruned Gaussian Fields with Neural Compensation
Paper • 2405.00676 • Published • 10 -
Paint by Inpaint: Learning to Add Image Objects by Removing Them First
Paper • 2404.18212 • Published • 30 -
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Paper • 2405.00732 • Published • 122