Gausson
/

pythia-160m-deduped-n128-SepLLM

sepllm_gpt_neox

Model card Files Files and versions Community

pythia-160m-deduped-n128-SepLLM / README.md

Gausson's picture

Update README.md

71e7461 verified about 2 months ago

|

history blame contribute delete

2.93 kB

	---
	license: mit
	---


	Please refer to the [SepLLM paper - ICML 2025](https://arxiv.org/abs/2412.12094) and our [`GitHub repository`](https://github.com/HKUDS/SepLLM) for using this model.

	To use the checkpoint of this model, you must install the `transformers-4.38.0.post1+sepllm-py3-none-any.whl` released from our [`GitHub repository`](https://github.com/HKUDS/SepLLM). Below are the reference script for testing and a sample of test results. We conducted testing using `lm_eval==0.4.0`.

	```
	CUDA_LAUNCH_BLOCKING=1
	lm_eval --model hf \
	--model_args pretrained=Gausson/pythia-160m-deduped-n128-SepLLM \
	--tasks arc_challenge,arc_easy,lambada_openai,logiqa,piqa,sciq,winogrande,wsc,wikitext \
	--num_fewshot 5 \
	--device cuda:0\
	--batch_size 32
	```


	```
	hf (pretrained=Gausson/pythia-160m-deduped-n128-SepLLM), gen_kwargs: (), limit: None, num_fewshot: 5, batch_size: 32
	\| Tasks \|Version\|Filter\|n-shot\| Metric \| Value \| \|Stderr\|
	\|--------------\|-------\|------\|-----:\|---------------\|------:\|---\|-----:\|
	\|arc_challenge \|Yaml \|none \| 5\|acc \| 0.2014\|± \|0.0117\|
	\| \| \|none \| 5\|acc_norm \| 0.2346\|± \|0.0124\|
	\|arc_easy \|Yaml \|none \| 5\|acc \| 0.4731\|± \|0.0102\|
	\| \| \|none \| 5\|acc_norm \| 0.4520\|± \|0.0102\|
	\|lambada_openai\|Yaml \|none \| 5\|perplexity \|30.1605\|± \|1.0128\|
	\| \| \|none \| 5\|acc \| 0.3315\|± \|0.0066\|
	\|logiqa \|Yaml \|none \| 5\|acc \| 0.2273\|± \|0.0164\|
	\| \| \|none \| 5\|acc_norm \| 0.2857\|± \|0.0177\|
	\|piqa \|Yaml \|none \| 5\|acc \| 0.6464\|± \|0.0112\|
	\| \| \|none \| 5\|acc_norm \| 0.6447\|± \|0.0112\|
	\|sciq \|Yaml \|none \| 5\|acc \| 0.8260\|± \|0.0120\|
	\| \| \|none \| 5\|acc_norm \| 0.8150\|± \|0.0123\|
	\|wikitext \|Yaml \|none \| 5\|word_perplexity\|30.3488\| \| \|
	\| \| \|none \| 5\|byte_perplexity\| 1.8931\| \| \|
	\| \| \|none \| 5\|bits_per_byte \| 0.9207\| \| \|
	\|winogrande \|Yaml \|none \| 5\|acc \| 0.5178\|± \|0.0140\|
	\|wsc \|Yaml \|none \| 5\|acc \| 0.3750\|± \|0.0477\|
	```

	If you find our work helpful, please consider giving us a star ⭐ @ our [`GitHub repository`](https://github.com/HKUDS/SepLLM) and citing our paper. We greatly appreciate your support 😄
	```
	@inproceedings{chen2025sepllm,
	title={{SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator}},
	author={Chen, Guoxuan and Shi, Han and Li, Jiawei and Gao, Yihang and Ren, Xiaozhe and Chen, Yimeng and Jiang, Xin and Li, Zhenguo and Liu, Weiyang and Huang, Chao},
	booktitle={International Conference on Machine Learning},
	year={2025},
	note={Also available at arXiv:2412.12094}
	}
	```