Kosmos-G: Generating Images in Context with Multimodal Large Language Models Paper • 2310.02992 • Published Oct 4, 2023 • 4
Integrally Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection Paper • 2205.09613 • Published May 19, 2022
BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers Paper • 2208.06366 • Published Aug 12, 2022
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks Paper • 2208.10442 • Published Aug 22, 2022
Multimodal Latent Language Modeling with Next-Token Diffusion Paper • 2412.08635 • Published Dec 11, 2024 • 49
Kosmos-2: Grounding Multimodal Large Language Models to the World Paper • 2306.14824 • Published Jun 26, 2023 • 34