Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published 7 days ago • 133
Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning Paper • 2305.14160 • Published May 23, 2023 • 1
Towards Codable Watermarking for Injecting Multi-bits Information to LLMs Paper • 2307.15992 • Published Jul 29, 2023 • 1
Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts Paper • 2408.15664 • Published Aug 28, 2024 • 12