Speed Always Wins: A Survey on Efficient Architectures for Large Language Models Paper • 2508.09834 • Published 25 days ago • 52
LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation Paper • 2508.03694 • Published Aug 5 • 50
FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance Paper • 2505.13437 • Published May 19 • 5
Self-Generated In-Context Examples Improve LLM Agents for Sequential Decision-Making Tasks Paper • 2505.00234 • Published May 1 • 26
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT Paper • 2505.00703 • Published May 1 • 45
Packing Input Frame Context in Next-Frame Prediction Models for Video Generation Paper • 2504.12626 • Published Apr 17 • 52
VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models Paper • 2504.13122 • Published Apr 17 • 21
Temporal Regularization Makes Your Video Generator Stronger Paper • 2503.15417 • Published Mar 19 • 22
LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization Paper • 2503.08619 • Published Mar 11 • 20
SeFAR: Semi-supervised Fine-grained Action Recognition with Temporal Perturbation and Learning Stabilization Paper • 2501.01245 • Published Jan 2 • 5
VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation Paper • 2412.02259 • Published Dec 3, 2024 • 61
Identity-Preserving Text-to-Video Generation by Frequency Decomposition Paper • 2411.17440 • Published Nov 26, 2024 • 38
OmniCreator: Self-Supervised Unified Generation with Universal Editing Paper • 2412.02114 • Published Dec 3, 2024 • 14