nvidia
/

canary-1b-v2

Automatic Speech Recognition

automatic-speech-translation

hf-asr-leaderboard

Model card Files Files and versions Community

msekoyan commited on 10 days ago

Commit

75a41cc

·

1 Parent(s): 784a4f3

update long-form inference

Signed-off-by: monica-sekoyan <[email protected]>

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -391,12 +391,12 @@ Number of characters per minute on [MUSAN](https://www.openslr.org/17) \[16] 48
 ### Long-form Inference
-`Canary-1b-v2` achieves strong performance on long-form transcription by using dynamic chunking with 1-second overlap between chunks, allowing for efficient parallel processing. This feature is automatically enabled when calling `.transcribe()` with `batch_size=1` on audio exceeding 40 seconds.
 | **Dataset**             | **WER ↓** |
 | ----------------------- | --------- |
-| Earnings-22             | 13.51%    |
-| This American Life      | 8.65%     |
 **Note:** Presented WERs do not include Punctuation and Capitalization errors.

 ### Long-form Inference
+`Canary-1b-v2` achieves strong performance on long-form transcription by using dynamic chunking with 1-second overlap between chunks, allowing for efficient parallel processing. This dynamic chunking feature is automatically enabled when calling `.transcribe()` on a single audio file, or when using `batch_size=1` with multiple audio files that are longer than 40 seconds.
 | **Dataset**             | **WER ↓** |
 | ----------------------- | --------- |
+| Earnings-22             | 13.78%    |
+| This American Life      | 9.87%     |
 **Note:** Presented WERs do not include Punctuation and Capitalization errors.