SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published 3 days ago • 99
HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation Paper • 2502.09838 • Published 10 days ago • 9
You Do Not Fully Utilize Transformer's Representation Capacity Paper • 2502.09245 • Published 10 days ago • 30
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published 7 days ago • 133
Music2Latent2: Audio Compression with Summary Embeddings and Autoregressive Decoding Paper • 2501.17578 • Published 25 days ago • 1
iFormer: Integrating ConvNet and Transformer for Mobile Application Paper • 2501.15369 • Published 29 days ago • 12