This repo only contains the AttnGates' weights for Qwen2.5-14B-Instruct Model.
SeerAttention introduces learnable AttnGate modules to accelerate the computationally intensive prefill stage of long-context large language models (LLMs) via dynamic block-level sparsity. The AttnGates are trained in a parameter-efficient self-distillation framework, where they learn to mimic the 2D max-pooled attention patterns of the original frozen model, preserving its integrity while avoiding costly retraining. During inference, these gates generate block-sparse binary masks by applying threshold/TopK to their learned soft scores, enabling efficient computation through a custom block-sparse FlashAttention kernel.
Original Github Repo
https://github.com/microsoft/SeerAttention.
Evaluation Results
PG19 PPL
Density | 8192 tokens (ppl) | 16384 tokens (ppl) | 32768 tokens (ppl) |
---|---|---|---|
0.10 | 8.62 | 8.23 | 8.17 |
0.20 | 8.32 | 8.08 | 8.06 |
0.30 | 8.23 | 8.02 | 8.03 |
0.40 | 8.19 | 8.00 | 8.01 |
0.50 | 8.17 | 7.99 | 8.00 |
1.00 | 8.16 | 7.99 | 8.00 |
LongBench
Dataset | 0-4k (Dense / Sparse) | 4-8k (Dense / Sparse) | 8k+ (Dense / Sparse) |
---|---|---|---|
qasper | 47.23/48.05 | 37.51/37.20 | 35.26/36.49 |
multifieldqa_en | 56.40/56.10 | 47.13/47.36 | 48.64/50.36 |
lcc | 62.32/63.25 | 67.48/66.58 | 61.47/63.53 |
gov_report | 34.26/34.30 | 34.06/33.70 | 33.02/32.52 |
2wikimqa | 51.29/52.13 | 48.03/47.78 | 31.68/30.90 |
multi_news | 26.46/26.21 | 23.71/23.55 | 22.42/22.58 |
samsum | 42.97/42.95 | 41.08/40.23 | 44.88/44.62 |
passage_count | 20.00/19.00 | 07.00/06.00 | 08.00/08.00 |
repobench-p | 64.17/63.63 | 64.87/64.61 | 57.85/58.60 |
trec | 60.00/60.00 | 75.00/74.00 | 71.00/71.00 |
hotpotqa | 58.57/57.16 | 56.87/55.91 | 56.18/56.99 |
triviaqa | 87.63/87.35 | 88.38/90.00 | 88.49/90.15 |
passage_retrieval_en | 99.00/99.00 | 100.0/100.0 | 100.0/100.0 |
averaged score | 54.64/54.55 | 53.16/52.84 | 50.68/51.21 |
averaged density | 0.841 | 0.624 | 0.379 |
- Downloads last month
- 0
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.