ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation Paper • 2506.18095 • Published Jun 22 • 65
VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents Paper • 2507.04590 • Published Jul 7 • 16
Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation Paper • 2509.00428 • Published 7 days ago • 11