Submitted by akhaliq 44 Music ControlNet: Multiple Time-varying Controls for Music Generation · 4 authors 4
Submitted by akhaliq 29 Story-to-Motion: Synthesizing Infinite and Controllable Character Animation from Long Text · 4 authors
Submitted by akhaliq 28 Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models · 14 authors 2
Submitted by akhaliq 16 To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning · 6 authors
Submitted by akhaliq 15 SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models · 16 authors
Submitted by akhaliq 15 MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks · 11 authors
Submitted by akhaliq 14 GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation · 12 authors 1
Submitted by akhaliq 14 The Impact of Large Language Models on Scientific Discovery: a Preliminary Study using GPT-4 · 2 authors
Submitted by akhaliq 12 LayoutPrompter: Awaken the Design Ability of Large Language Models · 6 authors
Submitted by akhaliq 9 Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer · 6 authors
Submitted by akhaliq 8 Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data · 9 authors
Submitted by akhaliq 5 Frontier Language Models are not Robust to Adversarial Arithmetic, or "What do I need to say so you agree 2+2=5? · 31 authors