-
CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis
Paper • 2407.13301 • Published • 56 -
On the Compositional Generalization of Multimodal LLMs for Medical Imaging
Paper • 2412.20070 • Published • 46 -
A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor?
Paper • 2409.15277 • Published • 36 -
From Generalist to Specialist: Adapting Vision Language Models via Task-Specific Visual Instruction Tuning
Paper • 2410.06456 • Published • 36
Collections
Discover the best community collections!
Collections including paper arxiv:2411.14522
-
BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks
Paper • 2412.04626 • Published • 14 -
GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI
Paper • 2411.14522 • Published • 34 -
Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination
Paper • 2411.03823 • Published • 45 -
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data
Paper • 2410.18558 • Published • 19
-
A Survey of Medical Vision-and-Language Applications and Their Techniques
Paper • 2411.12195 • Published -
GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI
Paper • 2411.14522 • Published • 34 -
BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities
Paper • 2412.07769 • Published • 26
-
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation
Paper • 2410.13861 • Published • 53 -
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation
Paper • 2411.07975 • Published • 30 -
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Paper • 2411.10442 • Published • 74 -
Multimodal Autoregressive Pre-training of Large Vision Encoders
Paper • 2411.14402 • Published • 43
-
PDFTriage: Question Answering over Long, Structured Documents
Paper • 2309.08872 • Published • 54 -
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 77 -
Table-GPT: Table-tuned GPT for Diverse Table Tasks
Paper • 2310.09263 • Published • 40 -
Context-Aware Meta-Learning
Paper • 2310.10971 • Published • 17
-
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Paper • 2405.15223 • Published • 13 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 53 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 87 -
Matryoshka Multimodal Models
Paper • 2405.17430 • Published • 31