-
Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs
Paper • 2403.12596 • Published • 10 -
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
Paper • 2404.13013 • Published • 31 -
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
Paper • 2404.16994 • Published • 36 -
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability
Paper • 2405.14129 • Published • 13
Collections
Discover the best community collections!
Collections including paper arxiv:2312.00589
-
aMUSEd: An Open MUSE Reproduction
Paper • 2401.01808 • Published • 29 -
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations
Paper • 2401.01885 • Published • 28 -
SteinDreamer: Variance Reduction for Text-to-3D Score Distillation via Stein Identity
Paper • 2401.00604 • Published • 6 -
LARP: Language-Agent Role Play for Open-World Games
Paper • 2312.17653 • Published • 32
-
Zephyr: Direct Distillation of LM Alignment
Paper • 2310.16944 • Published • 123 -
Exponentially Faster Language Modelling
Paper • 2311.10770 • Published • 118 -
System 2 Attention (is something you might need too)
Paper • 2311.11829 • Published • 42 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 53
-
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Paper • 2311.00571 • Published • 40 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 50 -
Ziya-VL: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning
Paper • 2310.08166 • Published • 1 -
Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal Assistants
Paper • 2310.00653 • Published • 3
-
A Zero-Shot Language Agent for Computer Control with Structured Reflection
Paper • 2310.08740 • Published • 16 -
AgentTuning: Enabling Generalized Agent Abilities for LLMs
Paper • 2310.12823 • Published • 35 -
AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors
Paper • 2308.10848 • Published • 1 -
CLEX: Continuous Length Extrapolation for Large Language Models
Paper • 2310.16450 • Published • 10