Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities Paper • 2507.06261 • Published Jul 7 • 60
π^3: Scalable Permutation-Equivariant Visual Geometry Learning Paper • 2507.13347 • Published Jul 17 • 64
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer Paper • 2504.20690 • Published Apr 29 • 19
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model Paper • 2504.08685 • Published Apr 11 • 130
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning Paper • 2504.07960 • Published Apr 10 • 50
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation Paper • 2503.24379 • Published Mar 31 • 77
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22 • 418
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer Paper • 2501.18427 • Published Jan 30 • 21
Learnings from Scaling Visual Tokenizers for Reconstruction and Generation Paper • 2501.09755 • Published Jan 16 • 37
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps Paper • 2501.09732 • Published Jan 16 • 72
ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning Paper • 2501.06590 • Published Jan 11 • 11