Submitted by akhaliq 60 Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model · 6 authors 3
Submitted by akhaliq 21 SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding · 8 authors 1
Submitted by akhaliq 17 WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens · 6 authors 1
Submitted by akhaliq 15 DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference · 11 authors 2
Submitted by akhaliq 14 SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers · 6 authors 1
Submitted by akhaliq 14 VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models · 7 authors 2
Submitted by akhaliq 11 TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion · 11 authors 1
Submitted by akhaliq 10 Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis · 5 authors 2
Submitted by akhaliq 8 ICON: Incremental CONfidence for Joint Pose and Radiance Field Optimization · 6 authors 1