Automatic Speech Recognition
NeMo
Portuguese
FastConformer
NeMo
Portuguese
Nune1 commited on
Commit
036da25
·
verified ·
1 Parent(s): 62e183d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -82,11 +82,11 @@ FastConformer [1] is an optimized version of the Conformer model with 8x depthwi
82
 
83
  ## Training
84
 
85
- The NeMo toolkit [3] was used for training the models for over several hundred epochs. The model is trained with this [example script](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/speech_to_text_finetune.py) and this [base config](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/conf/asr_finetune/speech_to_text_finetune.yaml).
86
  The tokenizers for this model was built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
87
 
88
  The model was initialized with the weights of [Spanish FastConformer Hybrid (Transducer and CTC) Large P&C model](https://huggingface.co/nvidia/stt_es_fastconformer_hybrid_large_pc) and fine-tuned to Portuguese using the labeled and unlabeled data(with pseudo-labels).
89
- The MLS dataset is used as unlabeled data as it does not contain punctuation and capitalization.
90
 
91
  ## Training Dataset:
92
 
@@ -124,7 +124,7 @@ The model was trained on around 2200 hours of Portuguese speech data.
124
  **Test Hardware:** A5000 GPU
125
 
126
  The performance of Automatic Speech Recognition models is measured using Character Error Rate (CER) and Word Error Rate (WER).
127
- The following tables summarize the performance of the available model in this collection with the Transducer and CTC decoders.
128
 
129
 
130
 
 
82
 
83
  ## Training
84
 
85
+ The NeMo toolkit [3] was used for training the models for over several hundred epochs. The model was trained with this [example script](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/speech_to_text_finetune.py) and this [base config](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/conf/asr_finetune/speech_to_text_finetune.yaml).
86
  The tokenizers for this model was built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
87
 
88
  The model was initialized with the weights of [Spanish FastConformer Hybrid (Transducer and CTC) Large P&C model](https://huggingface.co/nvidia/stt_es_fastconformer_hybrid_large_pc) and fine-tuned to Portuguese using the labeled and unlabeled data(with pseudo-labels).
89
+ The MLS dataset was used as unlabeled data as it does not contain punctuation and capitalization.
90
 
91
  ## Training Dataset:
92
 
 
124
  **Test Hardware:** A5000 GPU
125
 
126
  The performance of Automatic Speech Recognition models is measured using Character Error Rate (CER) and Word Error Rate (WER).
127
+ The following table summarize the performance of the available model in this collection with the Transducer and CTC decoders.
128
 
129
 
130