Text Generation
Transformers
PyTorch
English
mistral
text-generation-inference
Inference Endpoints
t1101675 commited on
Commit
3bbeaa3
·
verified ·
1 Parent(s): 816469e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +44 -1
README.md CHANGED
@@ -8,4 +8,47 @@ pipeline_tag: text-generation
8
  library_name: transformers
9
  ---
10
 
11
- ## PDS-160M
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  library_name: transformers
9
  ---
10
 
11
+ ## PDS-160M
12
+
13
+ [paper](https://arxiv.org/abs/2410.07064) | [code](https://github.com/microsoft/LMOps/tree/main/data_selection)
14
+
15
+ **PDS-160M** is a 160M model with [Mistral](https://arxiv.org/abs/2310.06825) achitecture pre-trained from scratch on the data selected from the CC split of [Redpajama](https://github.com/togethercomputer/RedPajama-Data), using the PDS framework.
16
+
17
+ The PDS framework is based on the [Pontryagin's maximum principle](https://en.wikipedia.org/wiki/Pontryagin%27s_maximum_principle#:~:text=Pontryagin's%20maximum%20principle%20is%20used,the%20state%20or%20input%20controls.) for optimal pre-training data selection, which not only enjoy strong theoretical support but is also scalable for training large language models.
18
+
19
+ Please refer to our [paper](https://arxiv.org/abs/2410.07064) for more details.
20
+
21
+ ### Overview of the theory:
22
+
23
+ <p align='left'>
24
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/624ac662102fcdff87be51b9/Hdw83Vsb305GRlsqB7c34.png" width="700">
25
+ </p>
26
+
27
+ ### Overview of the PDS framework:
28
+
29
+ <p align='left'>
30
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/624ac662102fcdff87be51b9/YPwluLyZGK7DACH1WqDUN.png" width="700">
31
+ </p>
32
+
33
+ ### Evaluation
34
+
35
+ PDS-selected data improves the performance of language models pre-trained from scratch and saves pre-training comptation. The improvement scales up to large model sizes.
36
+
37
+ <p align='left'>
38
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/624ac662102fcdff87be51b9/6undIr37d10qD73TDiPDK.png" width="600">
39
+ </p>
40
+
41
+ ### Baseline
42
+
43
+ [Conventional Pre-training](https://huggingface.co/Data-Selection/BSL-160M)
44
+
45
+ ### Citation
46
+
47
+ ```bibtex
48
+ @article{gu2024data,
49
+ title={Data Selection via Optimal Control for Language Models},
50
+ author={Gu, Yuxian and Dong, Li and Wang, Hongning and Hao, Yaru and Dong, Qingxiu and Wei, Furu and Huang, Minlie},
51
+ journal={arXiv preprint arXiv:2410.07064},
52
+ year={2024}
53
+ }
54
+ ```