-
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens
Paper • 2406.11271 • Published • 21 -
Diffusion Curriculum: Synthetic-to-Real Generative Curriculum Learning via Image-Guided Diffusion
Paper • 2410.13674 • Published • 17 -
Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities
Paper • 2410.11190 • Published • 22
Haonan Zhang
haonanzhang
AI & ML interests
AI & ML, Multi-modal Learning,Agent,LLM, etc.
Recent Activity
upvoted
a
paper
5 days ago
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse
Attention
updated
a model
17 days ago
Tongyi-ConvAI/MMEvol-Qwen2-7B
updated
a model
18 days ago
Tongyi-ConvAI/MMEvol-LLaMA3-8B
Organizations
Collections
1
spaces
1
models
None public yet
datasets
None public yet