QuickSilver -- Speeding up LLM Inference through Dynamic Token Halting, KV Skipping, Contextual Token Fusion, and Adaptive Matryoshka Quantization
Paper
•
2506.22396
•
Published
Totally Free + Zero Barriers + No Login Required