Audio-to-Audio
Transformers
Safetensors
speech_language_model
Inference Endpoints
gallilmaimon commited on
Commit
f978b00
·
verified ·
1 Parent(s): 071ffcb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -32,13 +32,13 @@ The model was trained by next-token prediction over a subset of LibriSpeech, Lib
32
 
33
  ### Model Sources
34
 
35
- - **Repository:** [https://github.com/slp-rl/slam](https://github.com/slp-rl/slam)
36
  - **Paper:** [Soon!]
37
  - **Demo:** [Link](https://pages.cs.huji.ac.il/adiyoss-lab/slamming/)
38
 
39
  ## Uses
40
- This is a base SpeechLM and as such can be used to generate contiuations for speech segments, or as base for further tuning. See the _slam_
41
- [codebase](https://github.com/slp-rl/slam) for more details on usage, and checkout the [demo page](https://pages.cs.huji.ac.il/adiyoss-lab/slamming/) for some generation examples
42
 
43
  ### Out-of-Scope Use
44
  This model was trained on curated speech datasets which contain mainly audio-books and stories, as such the outputs should not be treated as factual in any way.
@@ -46,7 +46,7 @@ This model was trained on curated speech datasets which contain mainly audio-boo
46
 
47
 
48
  ## How to Get Started with the Model
49
- We refer users to the official repository for full usage explainations - [github](https://github.com/slp-rl/slam).
50
 
51
 
52
  ## Training Details
@@ -61,12 +61,12 @@ dataset [SpokenSwag](https://huggingface.co/datasets/slprl/SpokenSwag).
61
 
62
  ### Training Procedure
63
  This model was trained by next token prediction over several dataset, and then trained with DPO over [SpokenSwag](https://huggingface.co/datasets/slprl/SpokenSwag).
64
- Please refer to the [paper]() or [code](https://github.com/slp-rl/slam) for the full training recipes.
65
 
66
  #### Preprocessing
67
  Speech tokens are extracted from the audio using [Hubert-25hz](https://huggingface.co/slprl/mhubert-base-25hz), and quantised using the
68
  official kmeans released with the model in [textlesslib](https://github.com/facebookresearch/textlesslib/tree/main). Units are de-duplicated.
69
- We encourage you to explore the official repository for full details - [github](https://github.com/slp-rl/slam).
70
 
71
 
72
  ## Evaluation
@@ -104,7 +104,7 @@ This model was trained as part of ["*Slamming*: Training a Speech Language Model
104
  This model was trained using **only 2 Nvidia A100 GPU** for **48 hours**.
105
 
106
  #### Software
107
- The model was trained using the [*Slam*](https://github.com/slp-rl/slam) codebase which builds upon 🤗transformers extending it to support
108
  easy and efficent training of Speech Language Models.
109
 
110
  ## Citation
 
32
 
33
  ### Model Sources
34
 
35
+ - **Repository:** [https://github.com/slp-rl/slamkit](https://github.com/slp-rl/slamkit)
36
  - **Paper:** [Soon!]
37
  - **Demo:** [Link](https://pages.cs.huji.ac.il/adiyoss-lab/slamming/)
38
 
39
  ## Uses
40
+ This is a base SpeechLM and as such can be used to generate contiuations for speech segments, or as base for further tuning. See the _SlamKit_
41
+ [codebase](https://github.com/slp-rl/slamkit) for more details on usage, and checkout the [demo page](https://pages.cs.huji.ac.il/adiyoss-lab/slamming/) for some generation examples
42
 
43
  ### Out-of-Scope Use
44
  This model was trained on curated speech datasets which contain mainly audio-books and stories, as such the outputs should not be treated as factual in any way.
 
46
 
47
 
48
  ## How to Get Started with the Model
49
+ We refer users to the official repository for full usage explainations - [github](https://github.com/slp-rl/slamkit).
50
 
51
 
52
  ## Training Details
 
61
 
62
  ### Training Procedure
63
  This model was trained by next token prediction over several dataset, and then trained with DPO over [SpokenSwag](https://huggingface.co/datasets/slprl/SpokenSwag).
64
+ Please refer to the [paper]() or [code](https://github.com/slp-rl/slamkit) for the full training recipes.
65
 
66
  #### Preprocessing
67
  Speech tokens are extracted from the audio using [Hubert-25hz](https://huggingface.co/slprl/mhubert-base-25hz), and quantised using the
68
  official kmeans released with the model in [textlesslib](https://github.com/facebookresearch/textlesslib/tree/main). Units are de-duplicated.
69
+ We encourage you to explore the official repository for full details - [github](https://github.com/slp-rl/slamkit).
70
 
71
 
72
  ## Evaluation
 
104
  This model was trained using **only 2 Nvidia A100 GPU** for **48 hours**.
105
 
106
  #### Software
107
+ The model was trained using the [*SlamKit*](https://github.com/slp-rl/slamkit) codebase which builds upon 🤗transformers extending it to support
108
  easy and efficent training of Speech Language Models.
109
 
110
  ## Citation