Collections
Discover the best community collections!
Collections including paper arxiv:2408.06072
-
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
Paper • 2408.06072 • Published • 39 -
AtomoVideo: High Fidelity Image-to-Video Generation
Paper • 2403.01800 • Published • 22 -
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion
Paper • 2411.04928 • Published • 50 -
AnimateAnything: Consistent and Controllable Animation for Video Generation
Paper • 2411.10836 • Published • 22
-
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 68 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 131 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 53 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 87
-
MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model
Paper • 2405.20222 • Published • 11 -
ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation
Paper • 2406.00908 • Published • 11 -
CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation
Paper • 2406.02509 • Published • 9 -
I4VGen: Image as Stepping Stone for Text-to-Video Generation
Paper • 2406.02230 • Published • 17
-
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Paper • 2405.08748 • Published • 24 -
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Paper • 2405.10300 • Published • 28 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 131 -
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Paper • 2405.11143 • Published • 37
-
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
Paper • 2402.14797 • Published • 20 -
Subobject-level Image Tokenization
Paper • 2402.14327 • Published • 17 -
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Paper • 2402.14905 • Published • 128 -
GPTVQ: The Blessing of Dimensionality for LLM Quantization
Paper • 2402.15319 • Published • 19
-
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
Paper • 2401.09985 • Published • 17 -
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
Paper • 2401.09962 • Published • 9 -
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution
Paper • 2401.10404 • Published • 10 -
ActAnywhere: Subject-Aware Video Background Generation
Paper • 2401.10822 • Published • 13
-
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation
Paper • 2310.19512 • Published • 16 -
MagicDance: Realistic Human Dance Video Generation with Motions & Facial Expressions Transfer
Paper • 2311.12052 • Published • 32 -
Fast View Synthesis of Casual Videos
Paper • 2312.02135 • Published • 11 -
DreamVideo: Composing Your Dream Videos with Customized Subject and Motion
Paper • 2312.04433 • Published • 10
-
MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing
Paper • 2306.10012 • Published • 35 -
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Paper • 2403.05135 • Published • 42 -
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
Paper • 2408.06072 • Published • 39 -
haoningwu/StoryGen
Updated • 4
-
Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation
Paper • 2309.03549 • Published • 6 -
CCEdit: Creative and Controllable Video Editing via Diffusion Models
Paper • 2309.16496 • Published • 9 -
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models
Paper • 2310.11440 • Published • 17 -
LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation
Paper • 2310.10769 • Published • 9