Update README.md
Browse files
README.md
CHANGED
@@ -82,11 +82,11 @@ FastConformer [1] is an optimized version of the Conformer model with 8x depthwi
|
|
82 |
|
83 |
## Training
|
84 |
|
85 |
-
The NeMo toolkit [3] was used for training the models for over several hundred epochs. The model
|
86 |
The tokenizers for this model was built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
|
87 |
|
88 |
The model was initialized with the weights of [Spanish FastConformer Hybrid (Transducer and CTC) Large P&C model](https://huggingface.co/nvidia/stt_es_fastconformer_hybrid_large_pc) and fine-tuned to Portuguese using the labeled and unlabeled data(with pseudo-labels).
|
89 |
-
The MLS dataset
|
90 |
|
91 |
## Training Dataset:
|
92 |
|
@@ -124,7 +124,7 @@ The model was trained on around 2200 hours of Portuguese speech data.
|
|
124 |
**Test Hardware:** A5000 GPU
|
125 |
|
126 |
The performance of Automatic Speech Recognition models is measured using Character Error Rate (CER) and Word Error Rate (WER).
|
127 |
-
The following
|
128 |
|
129 |
|
130 |
|
|
|
82 |
|
83 |
## Training
|
84 |
|
85 |
+
The NeMo toolkit [3] was used for training the models for over several hundred epochs. The model was trained with this [example script](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/speech_to_text_finetune.py) and this [base config](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/conf/asr_finetune/speech_to_text_finetune.yaml).
|
86 |
The tokenizers for this model was built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
|
87 |
|
88 |
The model was initialized with the weights of [Spanish FastConformer Hybrid (Transducer and CTC) Large P&C model](https://huggingface.co/nvidia/stt_es_fastconformer_hybrid_large_pc) and fine-tuned to Portuguese using the labeled and unlabeled data(with pseudo-labels).
|
89 |
+
The MLS dataset was used as unlabeled data as it does not contain punctuation and capitalization.
|
90 |
|
91 |
## Training Dataset:
|
92 |
|
|
|
124 |
**Test Hardware:** A5000 GPU
|
125 |
|
126 |
The performance of Automatic Speech Recognition models is measured using Character Error Rate (CER) and Word Error Rate (WER).
|
127 |
+
The following table summarize the performance of the available model in this collection with the Transducer and CTC decoders.
|
128 |
|
129 |
|
130 |
|