Add library name and pipeline tag
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
@@ -3,7 +3,10 @@ datasets:
|
|
3 |
- HuggingFaceTB/smollm-corpus
|
4 |
language:
|
5 |
- en
|
|
|
|
|
6 |
---
|
|
|
7 |
# Outlier-Safe Pre-Training
|
8 |
|
9 |
[](https://arxiv.org/abs/2506.19697)
|
@@ -25,8 +28,6 @@ A method that prevents outliers but significantly reduces efficiency is unlikely
|
|
25 |
3. 🧩**Ensuring full compatibility with existing inference pipelines**<br/>
|
26 |
We prioritize compatibility with widely adopted inference frameworks such as vLLM and SGLang. Rather than introducing architectural changes that break compatibility, OSP preserves computational invariance, allowing models to be directly integrated into existing pipelines without additional effort.
|
27 |
|
28 |
-
|
29 |
-
|
30 |
## Model Checkpoints
|
31 |
|
32 |
### Final Models
|
@@ -36,7 +37,6 @@ The models were trained on 1 trillion tokens, following the pre-training recipe
|
|
36 |
- [🤗 OSP-1.4B-1T-Adam](https://huggingface.co/dmis-lab/OSP-1.4B-1T-Adam): Trained on the standard Adam optimizer, without any modifications.
|
37 |
- [🤗 OSP-1.4B-1T-Muon-SSNorm-EmbProj](https://huggingface.co/dmis-lab/OSP-1.4B-1T-Muon-SSNorm-EmbProj): Trained on the OSP framework. This is our final model.
|
38 |
|
39 |
-
|
40 |
### Ablation Models
|
41 |
|
42 |
<table>
|
@@ -177,7 +177,6 @@ The models were trained on 1 trillion tokens, following the pre-training recipe
|
|
177 |
</table>
|
178 |
†Model configuration that disables decoupled embedding optimization by training with Muon optimizer without Adam optimization on embedding layers
|
179 |
|
180 |
-
|
181 |
## Training
|
182 |
|
183 |
### Model
|
|
|
3 |
- HuggingFaceTB/smollm-corpus
|
4 |
language:
|
5 |
- en
|
6 |
+
library_name: transformers
|
7 |
+
pipeline_tag: text-generation
|
8 |
---
|
9 |
+
|
10 |
# Outlier-Safe Pre-Training
|
11 |
|
12 |
[](https://arxiv.org/abs/2506.19697)
|
|
|
28 |
3. 🧩**Ensuring full compatibility with existing inference pipelines**<br/>
|
29 |
We prioritize compatibility with widely adopted inference frameworks such as vLLM and SGLang. Rather than introducing architectural changes that break compatibility, OSP preserves computational invariance, allowing models to be directly integrated into existing pipelines without additional effort.
|
30 |
|
|
|
|
|
31 |
## Model Checkpoints
|
32 |
|
33 |
### Final Models
|
|
|
37 |
- [🤗 OSP-1.4B-1T-Adam](https://huggingface.co/dmis-lab/OSP-1.4B-1T-Adam): Trained on the standard Adam optimizer, without any modifications.
|
38 |
- [🤗 OSP-1.4B-1T-Muon-SSNorm-EmbProj](https://huggingface.co/dmis-lab/OSP-1.4B-1T-Muon-SSNorm-EmbProj): Trained on the OSP framework. This is our final model.
|
39 |
|
|
|
40 |
### Ablation Models
|
41 |
|
42 |
<table>
|
|
|
177 |
</table>
|
178 |
†Model configuration that disables decoupled embedding optimization by training with Muon optimizer without Adam optimization on embedding layers
|
179 |
|
|
|
180 |
## Training
|
181 |
|
182 |
### Model
|