-
GLaMM: Pixel Grounding Large Multimodal Model
Paper • 2311.03356 • Published • 35 -
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models
Paper • 2311.07575 • Published • 15 -
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding
Paper • 2311.03354 • Published • 8 -
Language-Informed Visual Concept Learning
Paper • 2312.03587 • Published • 7
Collections
Discover the best community collections!
Collections including paper arxiv:2311.03354
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 146 -
ReFT: Reasoning with Reinforced Fine-Tuning
Paper • 2401.08967 • Published • 30 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 23 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 69
-
GLaMM: Pixel Grounding Large Multimodal Model
Paper • 2311.03356 • Published • 35 -
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding
Paper • 2311.03354 • Published • 8 -
CogVLM: Visual Expert for Pretrained Language Models
Paper • 2311.03079 • Published • 25 -
UnifiedVisionGPT: Streamlining Vision-Oriented AI through Generalized Multimodal Framework
Paper • 2311.10125 • Published • 6
-
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Paper • 2311.03285 • Published • 31 -
GLaMM: Pixel Grounding Large Multimodal Model
Paper • 2311.03356 • Published • 35 -
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding
Paper • 2311.03354 • Published • 8