SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation Paper • 2502.13128 • Published 5 days ago • 34
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model Paper • 2502.10248 • Published 9 days ago • 49
Magic 1-For-1: Generating One Minute Video Clips within One Minute Paper • 2502.07701 • Published 12 days ago • 32
People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated text Paper • 2501.15654 • Published 28 days ago • 12
RL + Transformer = A General-Purpose Problem Solver Paper • 2501.14176 • Published about 1 month ago • 24
RepVideo: Rethinking Cross-Layer Representation for Video Generation Paper • 2501.08994 • Published Jan 15 • 15
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction Paper • 2501.06282 • Published Jan 10 • 45
Multi-task retriever fine-tuning for domain-specific and efficient RAG Paper • 2501.04652 • Published Jan 8 • 10
HUNYUANPROVER: A Scalable Data Synthesis Framework and Guided Tree Search for Automated Theorem Proving Paper • 2412.20735 • Published Dec 30, 2024 • 11
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs Paper • 2412.21187 • Published Dec 30, 2024 • 39
Slow Perception: Let's Perceive Geometric Figures Step-by-step Paper • 2412.20631 • Published Dec 30, 2024 • 15
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization Paper • 2412.21037 • Published Dec 30, 2024 • 24
IDOL: Instant Photorealistic 3D Human Creation from a Single Image Paper • 2412.14963 • Published Dec 19, 2024 • 6
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution Paper • 2412.15213 • Published Dec 19, 2024 • 26
Autoregressive Video Generation without Vector Quantization Paper • 2412.14169 • Published Dec 18, 2024 • 14
FastVLM: Efficient Vision Encoding for Vision Language Models Paper • 2412.13303 • Published Dec 17, 2024 • 13