Update README.md
Browse files
README.md
CHANGED
@@ -32,13 +32,13 @@ The model was trained by next-token prediction over a subset of LibriSpeech, Lib
|
|
32 |
|
33 |
### Model Sources
|
34 |
|
35 |
-
- **Repository:** [https://github.com/slp-rl/
|
36 |
- **Paper:** [Soon!]
|
37 |
- **Demo:** [Link](https://pages.cs.huji.ac.il/adiyoss-lab/slamming/)
|
38 |
|
39 |
## Uses
|
40 |
-
This is a base SpeechLM and as such can be used to generate contiuations for speech segments, or as base for further tuning. See the
|
41 |
-
[codebase](https://github.com/slp-rl/
|
42 |
|
43 |
### Out-of-Scope Use
|
44 |
This model was trained on curated speech datasets which contain mainly audio-books and stories, as such the outputs should not be treated as factual in any way.
|
@@ -46,7 +46,7 @@ This model was trained on curated speech datasets which contain mainly audio-boo
|
|
46 |
|
47 |
|
48 |
## How to Get Started with the Model
|
49 |
-
We refer users to the official repository for full usage explainations - [github](https://github.com/slp-rl/
|
50 |
|
51 |
|
52 |
## Training Details
|
@@ -61,12 +61,12 @@ dataset [SpokenSwag](https://huggingface.co/datasets/slprl/SpokenSwag).
|
|
61 |
|
62 |
### Training Procedure
|
63 |
This model was trained by next token prediction over several dataset, and then trained with DPO over [SpokenSwag](https://huggingface.co/datasets/slprl/SpokenSwag).
|
64 |
-
Please refer to the [paper]() or [code](https://github.com/slp-rl/
|
65 |
|
66 |
#### Preprocessing
|
67 |
Speech tokens are extracted from the audio using [Hubert-25hz](https://huggingface.co/slprl/mhubert-base-25hz), and quantised using the
|
68 |
official kmeans released with the model in [textlesslib](https://github.com/facebookresearch/textlesslib/tree/main). Units are de-duplicated.
|
69 |
-
We encourage you to explore the official repository for full details - [github](https://github.com/slp-rl/
|
70 |
|
71 |
|
72 |
## Evaluation
|
@@ -104,7 +104,7 @@ This model was trained as part of ["*Slamming*: Training a Speech Language Model
|
|
104 |
This model was trained using **only 2 Nvidia A100 GPU** for **48 hours**.
|
105 |
|
106 |
#### Software
|
107 |
-
The model was trained using the [*
|
108 |
easy and efficent training of Speech Language Models.
|
109 |
|
110 |
## Citation
|
|
|
32 |
|
33 |
### Model Sources
|
34 |
|
35 |
+
- **Repository:** [https://github.com/slp-rl/slamkit](https://github.com/slp-rl/slamkit)
|
36 |
- **Paper:** [Soon!]
|
37 |
- **Demo:** [Link](https://pages.cs.huji.ac.il/adiyoss-lab/slamming/)
|
38 |
|
39 |
## Uses
|
40 |
+
This is a base SpeechLM and as such can be used to generate contiuations for speech segments, or as base for further tuning. See the _SlamKit_
|
41 |
+
[codebase](https://github.com/slp-rl/slamkit) for more details on usage, and checkout the [demo page](https://pages.cs.huji.ac.il/adiyoss-lab/slamming/) for some generation examples
|
42 |
|
43 |
### Out-of-Scope Use
|
44 |
This model was trained on curated speech datasets which contain mainly audio-books and stories, as such the outputs should not be treated as factual in any way.
|
|
|
46 |
|
47 |
|
48 |
## How to Get Started with the Model
|
49 |
+
We refer users to the official repository for full usage explainations - [github](https://github.com/slp-rl/slamkit).
|
50 |
|
51 |
|
52 |
## Training Details
|
|
|
61 |
|
62 |
### Training Procedure
|
63 |
This model was trained by next token prediction over several dataset, and then trained with DPO over [SpokenSwag](https://huggingface.co/datasets/slprl/SpokenSwag).
|
64 |
+
Please refer to the [paper]() or [code](https://github.com/slp-rl/slamkit) for the full training recipes.
|
65 |
|
66 |
#### Preprocessing
|
67 |
Speech tokens are extracted from the audio using [Hubert-25hz](https://huggingface.co/slprl/mhubert-base-25hz), and quantised using the
|
68 |
official kmeans released with the model in [textlesslib](https://github.com/facebookresearch/textlesslib/tree/main). Units are de-duplicated.
|
69 |
+
We encourage you to explore the official repository for full details - [github](https://github.com/slp-rl/slamkit).
|
70 |
|
71 |
|
72 |
## Evaluation
|
|
|
104 |
This model was trained using **only 2 Nvidia A100 GPU** for **48 hours**.
|
105 |
|
106 |
#### Software
|
107 |
+
The model was trained using the [*SlamKit*](https://github.com/slp-rl/slamkit) codebase which builds upon 🤗transformers extending it to support
|
108 |
easy and efficent training of Speech Language Models.
|
109 |
|
110 |
## Citation
|