nvidia
/

stt_pt_fastconformer_hybrid_large_pc

Automatic Speech Recognition

Model card Files Files and versions Community

Nune1 commited on Jan 3

Commit

036da25

·

verified ·

1 Parent(s): 62e183d

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -82,11 +82,11 @@ FastConformer [1] is an optimized version of the Conformer model with 8x depthwi
 ## Training
-The NeMo toolkit [3] was used for training the models for over several hundred epochs. The model is trained with this [example script](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/speech_to_text_finetune.py) and this [base config](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/conf/asr_finetune/speech_to_text_finetune.yaml).
 The tokenizers for this model was built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
 The model was initialized with the weights of [Spanish FastConformer Hybrid (Transducer and CTC) Large P&C model](https://huggingface.co/nvidia/stt_es_fastconformer_hybrid_large_pc) and fine-tuned to Portuguese using the labeled and unlabeled data(with pseudo-labels).
-The MLS dataset is used as unlabeled data as it does not contain punctuation and capitalization.
 ## Training Dataset:
@@ -124,7 +124,7 @@ The model was trained on around 2200 hours of Portuguese speech data.
 **Test Hardware:** A5000 GPU
 The performance of Automatic Speech Recognition models is measured using Character Error Rate (CER) and Word Error Rate (WER).
-The following tables summarize the performance of the available model in this collection with the Transducer and CTC decoders.

 ## Training
+The NeMo toolkit [3] was used for training the models for over several hundred epochs. The model was trained with this [example script](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/speech_to_text_finetune.py) and this [base config](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/conf/asr_finetune/speech_to_text_finetune.yaml).
 The tokenizers for this model was built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
 The model was initialized with the weights of [Spanish FastConformer Hybrid (Transducer and CTC) Large P&C model](https://huggingface.co/nvidia/stt_es_fastconformer_hybrid_large_pc) and fine-tuned to Portuguese using the labeled and unlabeled data(with pseudo-labels).
+The MLS dataset was used as unlabeled data as it does not contain punctuation and capitalization.
 ## Training Dataset:
 **Test Hardware:** A5000 GPU
 The performance of Automatic Speech Recognition models is measured using Character Error Rate (CER) and Word Error Rate (WER).
+The following table summarize the performance of the available model in this collection with the Transducer and CTC decoders.