Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2312.00589

about 1 month ago

Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs

Paper • 2403.12596 • Published Mar 19, 2024 • 10
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models

Paper • 2404.13013 • Published Apr 19, 2024 • 31
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

Paper • 2404.16994 • Published Apr 25, 2024 • 36
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability

Paper • 2405.14129 • Published May 23, 2024 • 13

aMUSEd: An Open MUSE Reproduction

Paper • 2401.01808 • Published Jan 3, 2024 • 29
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

Paper • 2401.01885 • Published Jan 3, 2024 • 28
SteinDreamer: Variance Reduction for Text-to-3D Score Distillation via Stein Identity

Paper • 2401.00604 • Published Dec 31, 2023 • 6
LARP: Language-Agent Role Play for Open-World Games

Paper • 2312.17653 • Published Dec 24, 2023 • 32

Merlin:Empowering Multimodal LLMs with Foresight Minds

Paper • 2312.00589 • Published Nov 30, 2023 • 26

Papers related to Multimodel LLM

Merlin:Empowering Multimodal LLMs with Foresight Minds

Paper • 2312.00589 • Published Nov 30, 2023 • 26
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

Paper • 2311.04257 • Published Nov 7, 2023 • 22

Merlin:Empowering Multimodal LLMs with Foresight Minds

Paper • 2312.00589 • Published Nov 30, 2023 • 26

Awesome AI papers!

Zephyr: Direct Distillation of LM Alignment

Paper • 2310.16944 • Published Oct 25, 2023 • 123
Exponentially Faster Language Modelling

Paper • 2311.10770 • Published Nov 15, 2023 • 118
System 2 Attention (is something you might need too)

Paper • 2311.11829 • Published Nov 20, 2023 • 42
Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 53

LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

Paper • 2311.00571 • Published Nov 1, 2023 • 40
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 50
Ziya-VL: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning

Paper • 2310.08166 • Published Oct 12, 2023 • 1
Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal Assistants

Paper • 2310.00653 • Published Oct 1, 2023 • 3

A Zero-Shot Language Agent for Computer Control with Structured Reflection

Paper • 2310.08740 • Published Oct 12, 2023 • 16
AgentTuning: Enabling Generalized Agent Abilities for LLMs

Paper • 2310.12823 • Published Oct 19, 2023 • 35
AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors

Paper • 2308.10848 • Published Aug 21, 2023 • 1
CLEX: Continuous Length Extrapolation for Large Language Models

Paper • 2310.16450 • Published Oct 25, 2023 • 10

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs