nithinraok
commited on
Commit
·
ef8d793
1
Parent(s):
d97f7ac
update info
Browse filesSigned-off-by: nithinraok <[email protected]>
README.md
CHANGED
@@ -5,14 +5,8 @@ language:
|
|
5 |
pipeline_tag: automatic-speech-recognition
|
6 |
library_name: nemo
|
7 |
datasets:
|
8 |
-
-
|
9 |
-
-
|
10 |
-
- mozilla-foundation/common_voice_8_0
|
11 |
-
- National-Singapore-Corpus-Part-1
|
12 |
-
- vctk
|
13 |
-
- voxpopuli
|
14 |
-
- europarl
|
15 |
-
- multilingual_librispeech
|
16 |
thumbnail: null
|
17 |
tags:
|
18 |
- automatic-speech-recognition
|
@@ -147,7 +141,7 @@ metrics:
|
|
147 |
- wer
|
148 |
---
|
149 |
|
150 |
-
# **Parakeet TDT 0.6B V2 (En)**
|
151 |
|
152 |
<style>
|
153 |
img {
|
@@ -366,9 +360,10 @@ Performance across different Signal-to-Noise Ratios (SNR) using MUSAN music and
|
|
366 |
| **SNR Level** | **Avg WER** | **AMI** | **Earnings** | **GigaSpeech** | **LS test-clean** | **LS test-other** | **SPGI** | **Tedlium** | **VoxPopuli** | **Relative Change** |
|
367 |
|:---------------|:-------------:|:----------:|:------------:|:----------------:|:-----------------:|:-----------------:|:-----------:|:-------------:|:---------------:|:-----------------:|
|
368 |
| Clean | 6.05 | 11.16 | 11.15 | 9.74 | 1.69 | 3.19 | 2.17 | 3.38 | 5.95 | - |
|
369 |
-
| SNR
|
370 |
-
| SNR
|
371 |
-
| SNR
|
|
|
372 |
|
373 |
### Telephony Audio Performance
|
374 |
Performance comparison between standard 16kHz audio and telephony-style audio (using μ-law encoding with 16kHz→8kHz→16kHz conversion):
|
|
|
5 |
pipeline_tag: automatic-speech-recognition
|
6 |
library_name: nemo
|
7 |
datasets:
|
8 |
+
- nvidia/Granary
|
9 |
+
- nvidia/nemo-asr-set-3.0
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
thumbnail: null
|
11 |
tags:
|
12 |
- automatic-speech-recognition
|
|
|
141 |
- wer
|
142 |
---
|
143 |
|
144 |
+
# ** 🦜 Parakeet TDT 0.6B V2 (En)**
|
145 |
|
146 |
<style>
|
147 |
img {
|
|
|
360 |
| **SNR Level** | **Avg WER** | **AMI** | **Earnings** | **GigaSpeech** | **LS test-clean** | **LS test-other** | **SPGI** | **Tedlium** | **VoxPopuli** | **Relative Change** |
|
361 |
|:---------------|:-------------:|:----------:|:------------:|:----------------:|:-----------------:|:-----------------:|:-----------:|:-------------:|:---------------:|:-----------------:|
|
362 |
| Clean | 6.05 | 11.16 | 11.15 | 9.74 | 1.69 | 3.19 | 2.17 | 3.38 | 5.95 | - |
|
363 |
+
| SNR 10 | 6.95 | 14.38 | 12.04 | 10.24 | 1.92 | 4.13 | 2.84 | 3.63 | 6.38 | -14.75% |
|
364 |
+
| SNR 5 | 8.23 | 18.07 | 13.82 | 11.18 | 2.33 | 5.58 | 3.81 | 4.24 | 6.81 | -35.97% |
|
365 |
+
| SNR 0 | 11.88 | 25.43 | 18.59 | 14.32 | 4.40 | 10.07 | 7.27 | 6.42 | 8.54 | -96.28% |
|
366 |
+
| SNR -5 | 20.26 | 36.57 | 28.06 | 22.27 | 11.82 | 19.91 | 16.14 | 13.07 | 14.23 | -234.66% |
|
367 |
|
368 |
### Telephony Audio Performance
|
369 |
Performance comparison between standard 16kHz audio and telephony-style audio (using μ-law encoding with 16kHz→8kHz→16kHz conversion):
|