nithinraok commited on
Commit
ef8d793
·
1 Parent(s): d97f7ac

update info

Browse files

Signed-off-by: nithinraok <[email protected]>

Files changed (1) hide show
  1. README.md +7 -12
README.md CHANGED
@@ -5,14 +5,8 @@ language:
5
  pipeline_tag: automatic-speech-recognition
6
  library_name: nemo
7
  datasets:
8
- - librispeech_asr
9
- - fisher_corpus
10
- - mozilla-foundation/common_voice_8_0
11
- - National-Singapore-Corpus-Part-1
12
- - vctk
13
- - voxpopuli
14
- - europarl
15
- - multilingual_librispeech
16
  thumbnail: null
17
  tags:
18
  - automatic-speech-recognition
@@ -147,7 +141,7 @@ metrics:
147
  - wer
148
  ---
149
 
150
- # **Parakeet TDT 0.6B V2 (En)**
151
 
152
  <style>
153
  img {
@@ -366,9 +360,10 @@ Performance across different Signal-to-Noise Ratios (SNR) using MUSAN music and
366
  | **SNR Level** | **Avg WER** | **AMI** | **Earnings** | **GigaSpeech** | **LS test-clean** | **LS test-other** | **SPGI** | **Tedlium** | **VoxPopuli** | **Relative Change** |
367
  |:---------------|:-------------:|:----------:|:------------:|:----------------:|:-----------------:|:-----------------:|:-----------:|:-------------:|:---------------:|:-----------------:|
368
  | Clean | 6.05 | 11.16 | 11.15 | 9.74 | 1.69 | 3.19 | 2.17 | 3.38 | 5.95 | - |
369
- | SNR 50 | 6.04 | 11.11 | 11.12 | 9.74 | 1.70 | 3.18 | 2.18 | 3.34 | 5.98 | +0.25% |
370
- | SNR 25 | 6.50 | 12.76 | 11.50 | 9.98 | 1.78 | 3.63 | 2.54 | 3.46 | 6.34 | -7.04% |
371
- | SNR 5 | 8.39 | 19.33 | 13.83 | 11.28 | 2.36 | 5.50 | 3.91 | 3.91 | 6.96 | -38.11% |
 
372
 
373
  ### Telephony Audio Performance
374
  Performance comparison between standard 16kHz audio and telephony-style audio (using μ-law encoding with 16kHz→8kHz→16kHz conversion):
 
5
  pipeline_tag: automatic-speech-recognition
6
  library_name: nemo
7
  datasets:
8
+ - nvidia/Granary
9
+ - nvidia/nemo-asr-set-3.0
 
 
 
 
 
 
10
  thumbnail: null
11
  tags:
12
  - automatic-speech-recognition
 
141
  - wer
142
  ---
143
 
144
+ # ** 🦜 Parakeet TDT 0.6B V2 (En)**
145
 
146
  <style>
147
  img {
 
360
  | **SNR Level** | **Avg WER** | **AMI** | **Earnings** | **GigaSpeech** | **LS test-clean** | **LS test-other** | **SPGI** | **Tedlium** | **VoxPopuli** | **Relative Change** |
361
  |:---------------|:-------------:|:----------:|:------------:|:----------------:|:-----------------:|:-----------------:|:-----------:|:-------------:|:---------------:|:-----------------:|
362
  | Clean | 6.05 | 11.16 | 11.15 | 9.74 | 1.69 | 3.19 | 2.17 | 3.38 | 5.95 | - |
363
+ | SNR 10 | 6.95 | 14.38 | 12.04 | 10.24 | 1.92 | 4.13 | 2.84 | 3.63 | 6.38 | -14.75% |
364
+ | SNR 5 | 8.23 | 18.07 | 13.82 | 11.18 | 2.33 | 5.58 | 3.81 | 4.24 | 6.81 | -35.97% |
365
+ | SNR 0 | 11.88 | 25.43 | 18.59 | 14.32 | 4.40 | 10.07 | 7.27 | 6.42 | 8.54 | -96.28% |
366
+ | SNR -5 | 20.26 | 36.57 | 28.06 | 22.27 | 11.82 | 19.91 | 16.14 | 13.07 | 14.23 | -234.66% |
367
 
368
  ### Telephony Audio Performance
369
  Performance comparison between standard 16kHz audio and telephony-style audio (using μ-law encoding with 16kHz→8kHz→16kHz conversion):