Qiang Cai

cqaimx

AI & ML interests

None yet

Recent Activity

upvoted a paper 7 days ago

Droplet3D: Commonsense Priors from Videos Facilitate 3D Generation

upvoted a paper 5 months ago

Video-T1: Test-Time Scaling for Video Generation

upvoted a paper 5 months ago

Token-Efficient Long Video Understanding for Multimodal LLMs

View all activity

Organizations

None yet

upvoted a paper 7 days ago

Droplet3D: Commonsense Priors from Videos Facilitate 3D Generation

Paper • 2508.20470 • Published 11 days ago • 64

upvoted 4 papers 5 months ago

DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation

Paper • 2503.06053 • Published Mar 8 • 138

upvoted 3 papers 6 months ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published Feb 20 • 147

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published Feb 16 • 165

OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models

Paper • 2502.01061 • Published Feb 3 • 222

liked a Space 6 months ago

3.16k

The Ultra-Scale Playbook

🌌

The ultimate guide to training LLM on large GPU Clusters

liked a model 6 months ago

microsoft/Phi-4-multimodal-instruct

Automatic Speech Recognition • 6B • Updated May 1 • 368k • 1.48k

upvoted 10 papers 6 months ago

Depth Anything V2

Paper • 2406.09414 • Published Jun 13, 2024 • 104

nabla^2DFT: A Universal Quantum Chemistry Dataset of Drug-Like Molecules and a Benchmark for Neural Network Potentials

Paper • 2406.14347 • Published Jun 20, 2024 • 103

The Road Less Scheduled

Paper • 2405.15682 • Published May 24, 2024 • 28

Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training

Paper • 2405.15319 • Published May 24, 2024 • 30

Octo: An Open-Source Generalist Robot Policy

Paper • 2405.12213 • Published May 20, 2024 • 30

Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory

Paper • 2405.08707 • Published May 14, 2024 • 35

Jina CLIP: Your CLIP Model Is Also Your Text Retriever

Paper • 2405.20204 • Published May 30, 2024 • 37

SUTRA: Scalable Multilingual Language Model Architecture

Paper • 2405.06694 • Published May 7, 2024 • 39

Iterative Reasoning Preference Optimization

Paper • 2404.19733 • Published Apr 30, 2024 • 50

CAT3D: Create Anything in 3D with Multi-View Diffusion Models

Paper • 2405.10314 • Published May 16, 2024 • 49

Qiang Cai

AI & ML interests

Recent Activity

Organizations

cqaimx's activity

The Ultra-Scale Playbook