Update README.md
Browse files
README.md
CHANGED
@@ -14,21 +14,26 @@ license: cc-by-4.0
|
|
14 |
|
15 |
## Open Whisper-style Speech Model (OWSM)
|
16 |
|
17 |
-
OWSM aims to develop fully open speech foundation models using publicly available data and open-source toolkits including [ESPnet](https://github.com/espnet/espnet).
|
18 |
|
19 |
-
Inference examples can be found
|
20 |
The Gradio demo is [here](https://huggingface.co/spaces/pyf98/OWSM_v3_demo).
|
21 |
|
22 |
[OWSM v4]() is the latest version in the OWSM series, which significantly outperforms OWSM v3.1 in LID and multilingual ASR.
|
|
|
|
|
23 |
|
24 |
-
This repo contains a
|
|
|
|
|
25 |
|
26 |
It supports the following speech-to-text tasks:
|
27 |
- Language identification
|
28 |
- Speech recognition
|
29 |
- Speech translation
|
30 |
- Utterance-level timestamp prediction
|
31 |
-
- Long-form
|
|
|
32 |
|
33 |
|
34 |
### OWSM series
|
|
|
14 |
|
15 |
## Open Whisper-style Speech Model (OWSM)
|
16 |
|
17 |
+
OWSM aims to develop fully open speech foundation models using publicly available data and open-source toolkits, including [ESPnet](https://github.com/espnet/espnet).
|
18 |
|
19 |
+
Inference examples can be found on our [project page](https://www.wavlab.org/activities/2024/owsm/).
|
20 |
The Gradio demo is [here](https://huggingface.co/spaces/pyf98/OWSM_v3_demo).
|
21 |
|
22 |
[OWSM v4]() is the latest version in the OWSM series, which significantly outperforms OWSM v3.1 in LID and multilingual ASR.
|
23 |
+
Additionally, OWSM v4 applies 8 times subsampling (instead of 4 times in OWSM v3.1) to the log Mel features, leading to a final resolution of 80 ms in the encoder.
|
24 |
+
When running inference, we recommend setting `maxlenratio=1.0` (default) instead of smaller values.
|
25 |
|
26 |
+
This repo contains a base-sized model with 102M parameters, developed by [Yifan Peng](https://pyf98.github.io/) (CMU).
|
27 |
+
It is trained on 320k hours of public speech data.
|
28 |
+
The newly curated data will be publicly released. Please stay tuned!
|
29 |
|
30 |
It supports the following speech-to-text tasks:
|
31 |
- Language identification
|
32 |
- Speech recognition
|
33 |
- Speech translation
|
34 |
- Utterance-level timestamp prediction
|
35 |
+
- Long-form recognition or translation
|
36 |
+
|
37 |
|
38 |
|
39 |
### OWSM series
|