Dynamic Hierarchical Sparse Attention (DHSA)
This repository hosts the boundary predictor weights used in Long-Context Modeling with Dynamic Hierarchical Sparse Attention for On-Device LLMs (NeurIPS 2025 Workshop on Efficient Reasoning).
Overview
The boundary predictor is a lightweight transformer module trained to dynamically segment long text sequences into variable-length chunks based on semantic boundaries. It forms the first stage of Dynamic Hierarchical Sparse Attention (DHSA), enabling large language models to efficiently process long contexts by predicting where to attend sparsely.
Model Architecture
The boundary predictor consists of three main parts:
- Shared Encoder – Uses attention and pooling layers to capture the left and right context around each token.
- Feature Fusion – Combines the two contextual features along with their difference, product, and similarity to represent local semantic changes.
- MLP Classifier – Takes the fused features and predicts whether a given position marks a boundary.
This lightweight design efficiently identifies semantic shifts in long text sequences.
Training Data
The predictor was trained on diverse long-context datasets combining multiple reasoning and QA sources:
Data were automatically annotated using an internal semantic-boundary labeling process based on similarity shifts between consecutive key embeddings.
Citation
If you use this model, please cite:
@misc{xiong2025longcontextmodelingdynamichierarchical,
title={Long-Context Modeling with Dynamic Hierarchical Sparse Attention for On-Device LLMs},
author={Siheng Xiong and Joe Zou and Faramarz Fekri and Yae Jee Cho},
year={2025},
eprint={2510.24606},
archivePrefix={arXiv},
primaryClass={cs.CL}
}