SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published 3 days ago • 101
On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective Paper • 2502.14296 • Published 4 days ago • 42
MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections Paper • 2502.12170 • Published 11 days ago • 10
You Do Not Fully Utilize Transformer's Representation Capacity Paper • 2502.09245 • Published 11 days ago • 30
Text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding Paper • 2502.10392 • Published 9 days ago • 6
DarwinLM: Evolutionary Structured Pruning of Large Language Models Paper • 2502.07780 • Published 12 days ago • 17
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling Paper • 2502.09509 • Published 10 days ago • 5
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models Paper • 2502.10458 • Published 12 days ago • 27
HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation Paper • 2502.12148 • Published 6 days ago • 16
How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training Paper • 2502.11196 • Published 7 days ago • 20
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published 8 days ago • 133
HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation Paper • 2502.09838 • Published 10 days ago • 9
Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation Paper • 2502.13145 • Published 5 days ago • 34
Rethinking Diverse Human Preference Learning through Principal Component Analysis Paper • 2502.13131 • Published 5 days ago • 34
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment Paper • 2502.10391 • Published 9 days ago • 29