MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers Paper • 2002.10957 • Published Feb 25, 2020 • 1
MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers Paper • 2012.15828 • Published Dec 31, 2020 • 1
s2s-ft: Fine-Tuning Pretrained Transformer Encoders for Sequence-to-Sequence Learning Paper • 2110.13640 • Published Oct 26, 2021
VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts Paper • 2111.02358 • Published Nov 3, 2021 • 1
BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers Paper • 2208.06366 • Published Aug 12, 2022
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks Paper • 2208.10442 • Published Aug 22, 2022
Multimodal Latent Language Modeling with Next-Token Diffusion Paper • 2412.08635 • Published Dec 11, 2024 • 49