Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2403.13248

VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models

Paper • 2403.05438 • Published Mar 8, 2024 • 20
Chronos: Learning the Language of Time Series

Paper • 2403.07815 • Published Mar 12, 2024 • 47
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU

Paper • 2403.06504 • Published Mar 11, 2024 • 53
Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures

Paper • 2211.07600 • Published Nov 14, 2022

Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters

Paper • 2403.02677 • Published Mar 5, 2024 • 18
Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models

Paper • 2403.03003 • Published Mar 5, 2024 • 11
InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding

Paper • 2403.01487 • Published Mar 3, 2024 • 15
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1, 2024 • 45

Video as the New Language for Real-World Decision Making

Paper • 2402.17139 • Published Feb 27, 2024 • 19
Learning and Leveraging World Models in Visual Representation Learning

Paper • 2403.00504 • Published Mar 1, 2024 • 32
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies

Paper • 2403.01422 • Published Mar 3, 2024 • 27
VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models

Paper • 2403.05438 • Published Mar 8, 2024 • 20

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27, 2024 • 87
Mora: Enabling Generalist Video Generation via A Multi-Agent Framework

Paper • 2403.13248 • Published Mar 20, 2024 • 78
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 50
UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models

Paper • 2409.20551 • Published Sep 30, 2024 • 15

Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling

Paper • 2401.15977 • Published Jan 29, 2024 • 38
Lumiere: A Space-Time Diffusion Model for Video Generation

Paper • 2401.12945 • Published Jan 23, 2024 • 85
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

Paper • 2307.04725 • Published Jul 10, 2023 • 64
Boximator: Generating Rich and Controllable Motions for Video Synthesis

Paper • 2402.01566 • Published Feb 2, 2024 • 27

GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation

Paper • 2401.04092 • Published Jan 8, 2024 • 21
Mora: Enabling Generalist Video Generation via A Multi-Agent Framework

Paper • 2403.13248 • Published Mar 20, 2024 • 78

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs