SparseLLM
/

sparsing-law-0.1b-relu

Text Generation

Model card Files Files and versions

sparsing-law-0.1b-relu / README.md

demerzel-iv's picture

upload model

a96a9b3 about 1 year ago

|

573 Bytes

	---
	license: mit
	language:
	- en
	- zh
	---

	# Model Card for sparsing-law-0.1b-relu

	- Paper: [paper](https://arxiv.org/pdf/2411.02335)
	- Repository and demo code: [github](https://github.com/thunlp/SparsingLaw)

	This model is ReLU-activated and contains approximately 0.1 billion non-embedding parameters.

	The model was trained from scratch using the pre-training dataset described in our paper, with the WSD (Warmup-Stable-Decay) learning rate scheduler. It represents the final checkpoint of the stable stage in WSD, meaning it has not undergone the decay stage.