Submitted by akhaliq 55 InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model · 23 authors 1
Submitted by akhaliq 49 Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling · 6 authors 7
Submitted by akhaliq 38 Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling · 12 authors 8
Submitted by akhaliq 24 SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning · 10 authors 1
Submitted by akhaliq 23 Media2Face: Co-speech Facial Animation Generation With Multi-Modality Guidance · 9 authors 4
Submitted by akhaliq 19 Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception · 8 authors 4
Submitted by akhaliq 12 Object-Driven One-Shot Fine-tuning of Text-to-Image Diffusion with Prototypical Embedding · 3 authors 3
Submitted by akhaliq 11 Divide and Conquer: Language Models can Plan and Self-Correct for Compositional Text-to-Image Generation · 6 authors
Submitted by akhaliq 7 Overcoming the Pitfalls of Vision-Language Model Finetuning for OOD Generalization · 4 authors 1