nvidia
/

parakeet-tdt-0.6b-v2

Automatic Speech Recognition

hf-asr-leaderboard

Model card Files Files and versions

nithinraok commited on 11 days ago

Commit

ef8d793

·

1 Parent(s): d97f7ac

update info

Signed-off-by: nithinraok <[email protected]>

Files changed (1) hide show

README.md +7 -12

README.md CHANGED Viewed

@@ -5,14 +5,8 @@ language:
 pipeline_tag: automatic-speech-recognition
 library_name: nemo
 datasets:
-- librispeech_asr
-- fisher_corpus
-- mozilla-foundation/common_voice_8_0
-- National-Singapore-Corpus-Part-1
-- vctk
-- voxpopuli
-- europarl
-- multilingual_librispeech
 thumbnail: null
 tags:
 - automatic-speech-recognition
@@ -147,7 +141,7 @@ metrics:
 - wer
 ---
-# **Parakeet TDT 0.6B V2 (En)**
 <style>
 img {
@@ -366,9 +360,10 @@ Performance across different Signal-to-Noise Ratios (SNR) using MUSAN music and
 | **SNR Level** | **Avg WER** | **AMI** | **Earnings** | **GigaSpeech** | **LS test-clean** | **LS test-other** | **SPGI** | **Tedlium** | **VoxPopuli** | **Relative Change** |
 |:---------------|:-------------:|:----------:|:------------:|:----------------:|:-----------------:|:-----------------:|:-----------:|:-------------:|:---------------:|:-----------------:|
 | Clean | 6.05 | 11.16 | 11.15 | 9.74 | 1.69 | 3.19 | 2.17 | 3.38 | 5.95 | - |
-| SNR 50 | 6.04 | 11.11 | 11.12 | 9.74 | 1.70 | 3.18 | 2.18 | 3.34 | 5.98 | +0.25% |
-| SNR 25 | 6.50 | 12.76 | 11.50 | 9.98 | 1.78 | 3.63 | 2.54 | 3.46 | 6.34 | -7.04% |
-| SNR 5 | 8.39 | 19.33 | 13.83 | 11.28 | 2.36 | 5.50 | 3.91 | 3.91 | 6.96 | -38.11% |
 ### Telephony Audio Performance
 Performance comparison between standard 16kHz audio and telephony-style audio (using μ-law encoding with 16kHz→8kHz→16kHz conversion):

 pipeline_tag: automatic-speech-recognition
 library_name: nemo
 datasets:
+- nvidia/Granary
+- nvidia/nemo-asr-set-3.0
 thumbnail: null
 tags:
 - automatic-speech-recognition
 - wer
 ---
+# ** 🦜 Parakeet TDT 0.6B V2 (En)**
 <style>
 img {
 | **SNR Level** | **Avg WER** | **AMI** | **Earnings** | **GigaSpeech** | **LS test-clean** | **LS test-other** | **SPGI** | **Tedlium** | **VoxPopuli** | **Relative Change** |
 |:---------------|:-------------:|:----------:|:------------:|:----------------:|:-----------------:|:-----------------:|:-----------:|:-------------:|:---------------:|:-----------------:|
 | Clean | 6.05 | 11.16 | 11.15 | 9.74 | 1.69 | 3.19 | 2.17 | 3.38 | 5.95 | - |
+| SNR 10 | 6.95 | 14.38 | 12.04 | 10.24 | 1.92 | 4.13 | 2.84 | 3.63 | 6.38 | -14.75% |
+| SNR 5 | 8.23 | 18.07 | 13.82 | 11.18 | 2.33 | 5.58 | 3.81 | 4.24 | 6.81 | -35.97% |
+| SNR 0 | 11.88 | 25.43 | 18.59 | 14.32 | 4.40 | 10.07 | 7.27 | 6.42 | 8.54 | -96.28% |
+| SNR -5 | 20.26 | 36.57 | 28.06 | 22.27 | 11.82 | 19.91 | 16.14 | 13.07 | 14.23 | -234.66% |
 ### Telephony Audio Performance
 Performance comparison between standard 16kHz audio and telephony-style audio (using μ-law encoding with 16kHz→8kHz→16kHz conversion):