Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2504.19056

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Paper • 2508.09789 • Published 24 days ago • 5
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Paper • 2508.13186 • Published 23 days ago • 17
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents

Paper • 2508.04038 • Published Aug 6 • 1
Prompt Orchestration Markup Language

Paper • 2508.13948 • Published 18 days ago • 46

Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play

Paper • 2505.02707 • Published May 5 • 86
MUSAR: Exploring Multi-Subject Customization from Single-Subject Dataset via Attention Routing

Paper • 2505.02823 • Published May 5 • 5
PixelHacker: Image Inpainting with Structural and Semantic Consistency

Paper • 2504.20438 • Published Apr 29 • 44
Improving Editability in Image Generation with Layer-wise Memory

Paper • 2505.01079 • Published May 2 • 29

talking-head-generation

DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation

Paper • 2312.13578 • Published Dec 21, 2023 • 29
Splatter Image: Ultra-Fast Single-View 3D Reconstruction

Paper • 2312.13150 • Published Dec 20, 2023 • 16
Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians

Paper • 2312.03029 • Published Dec 5, 2023 • 26
Relightable Gaussian Codec Avatars

Paper • 2312.03704 • Published Dec 6, 2023 • 33

May 2025 - Top Papers

The Leaderboard Illusion

Paper • 2504.20879 • Published Apr 29 • 70
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

Paper • 2505.09343 • Published May 14 • 69
LLMs for Engineering: Teaching Models to Design High Powered Rockets

Paper • 2504.19394 • Published Apr 27 • 14
Generative AI for Character Animation: A Comprehensive Survey of Techniques, Applications, and Future Directions

Paper • 2504.19056 • Published Apr 27 • 18

Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with Image-Based Surface Representation

Paper • 2409.03718 • Published Sep 5, 2024 • 28
Wavelet Latent Diffusion (Wala): Billion-Parameter 3D Generative Model with Compact Wavelet Encodings

Paper • 2411.08017 • Published Nov 12, 2024 • 11
Unleashing Vecset Diffusion Model for Fast Shape Generation

Paper • 2503.16302 • Published Mar 20 • 44
OmniSVG: A Unified Scalable Vector Graphics Generation Model

Paper • 2504.06263 • Published Apr 8 • 181

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Paper • 2508.09789 • Published 24 days ago • 5
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Paper • 2508.13186 • Published 23 days ago • 17
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents

Paper • 2508.04038 • Published Aug 6 • 1
Prompt Orchestration Markup Language

Paper • 2508.13948 • Published 18 days ago • 46

May 2025 - Top Papers

The Leaderboard Illusion

Paper • 2504.20879 • Published Apr 29 • 70
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

Paper • 2505.09343 • Published May 14 • 69
LLMs for Engineering: Teaching Models to Design High Powered Rockets

Paper • 2504.19394 • Published Apr 27 • 14
Generative AI for Character Animation: A Comprehensive Survey of Techniques, Applications, and Future Directions

Paper • 2504.19056 • Published Apr 27 • 18

Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play

Paper • 2505.02707 • Published May 5 • 86
MUSAR: Exploring Multi-Subject Customization from Single-Subject Dataset via Attention Routing

Paper • 2505.02823 • Published May 5 • 5
PixelHacker: Image Inpainting with Structural and Semantic Consistency

Paper • 2504.20438 • Published Apr 29 • 44
Improving Editability in Image Generation with Layer-wise Memory

Paper • 2505.01079 • Published May 2 • 29

Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with Image-Based Surface Representation

Paper • 2409.03718 • Published Sep 5, 2024 • 28
Wavelet Latent Diffusion (Wala): Billion-Parameter 3D Generative Model with Compact Wavelet Encodings

Paper • 2411.08017 • Published Nov 12, 2024 • 11
Unleashing Vecset Diffusion Model for Fast Shape Generation

Paper • 2503.16302 • Published Mar 20 • 44
OmniSVG: A Unified Scalable Vector Graphics Generation Model

Paper • 2504.06263 • Published Apr 8 • 181

talking-head-generation

DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation

Paper • 2312.13578 • Published Dec 21, 2023 • 29
Splatter Image: Ultra-Fast Single-View 3D Reconstruction

Paper • 2312.13150 • Published Dec 20, 2023 • 16
Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians

Paper • 2312.03029 • Published Dec 5, 2023 • 26
Relightable Gaussian Codec Avatars

Paper • 2312.03704 • Published Dec 6, 2023 • 33

Company

TOS Privacy About Jobs

Website

Models Datasets OCR模型免费转Markdown Pricing 模型下载攻略