FAST: Factorizable Attention for Speeding up Transformers Paper • 2402.07901 • Published Feb 12, 2024 • 3
DynaGuard: A Dynamic Guardrail Model With User-Defined Policies Paper • 2509.02563 • Published 4 days ago • 18
Exploiting Sparsity for Long Context Inference: Million Token Contexts on Commodity GPUs Paper • 2502.06766 • Published Feb 10