pyf98 commited on
Commit
2c83b85
·
verified ·
1 Parent(s): 001b0ba

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -4
README.md CHANGED
@@ -14,21 +14,26 @@ license: cc-by-4.0
14
 
15
  ## Open Whisper-style Speech Model (OWSM)
16
 
17
- OWSM aims to develop fully open speech foundation models using publicly available data and open-source toolkits including [ESPnet](https://github.com/espnet/espnet).
18
 
19
- Inference examples can be found in our [project page](https://www.wavlab.org/activities/2024/owsm/).
20
  The Gradio demo is [here](https://huggingface.co/spaces/pyf98/OWSM_v3_demo).
21
 
22
  [OWSM v4]() is the latest version in the OWSM series, which significantly outperforms OWSM v3.1 in LID and multilingual ASR.
 
 
23
 
24
- This repo contains a medium-sized model with 1.02B parameters, developed by [Yifan Peng](https://pyf98.github.io/) (CMU). It is trained on 320k hours of public speech data. The newly curated data will be publicly released. Please stay tuned!
 
 
25
 
26
  It supports the following speech-to-text tasks:
27
  - Language identification
28
  - Speech recognition
29
  - Speech translation
30
  - Utterance-level timestamp prediction
31
- - Long-form transcription
 
32
 
33
 
34
  ### OWSM series
 
14
 
15
  ## Open Whisper-style Speech Model (OWSM)
16
 
17
+ OWSM aims to develop fully open speech foundation models using publicly available data and open-source toolkits, including [ESPnet](https://github.com/espnet/espnet).
18
 
19
+ Inference examples can be found on our [project page](https://www.wavlab.org/activities/2024/owsm/).
20
  The Gradio demo is [here](https://huggingface.co/spaces/pyf98/OWSM_v3_demo).
21
 
22
  [OWSM v4]() is the latest version in the OWSM series, which significantly outperforms OWSM v3.1 in LID and multilingual ASR.
23
+ Additionally, OWSM v4 applies 8 times subsampling (instead of 4 times in OWSM v3.1) to the log Mel features, leading to a final resolution of 80 ms in the encoder.
24
+ When running inference, we recommend setting `maxlenratio=1.0` (default) instead of smaller values.
25
 
26
+ This repo contains a base-sized model with 102M parameters, developed by [Yifan Peng](https://pyf98.github.io/) (CMU).
27
+ It is trained on 320k hours of public speech data.
28
+ The newly curated data will be publicly released. Please stay tuned!
29
 
30
  It supports the following speech-to-text tasks:
31
  - Language identification
32
  - Speech recognition
33
  - Speech translation
34
  - Utterance-level timestamp prediction
35
+ - Long-form recognition or translation
36
+
37
 
38
 
39
  ### OWSM series