Model description
This model is a fine-tuned version of openai/whisper-small on an Indonesian-English CoVoST2 dataset.
Intended uses & limitations
This model is used to predict the English translation of Indonesian audio.
How to Use
This is how to use the model with Faster-Whisper.
Convert the model into the CTranslate2 format with float16 quantization.
!ct2-transformers-converter \ --model cobrayyxx/whisper_translation_ID-EN \ --output_dir ct2-whisper-translation-finetuned \ --quantization float16 \ --copy_files tokenizer_config.json
Load the converted model using
faster_whisper
library.from faster_whisper import WhisperModel model_name = "ct2-whisper-translation-finetuned" # converted model (after fine-tuning) # Run on GPU with FP16 model = WhisperModel(model_name, device="cuda", compute_type="float16")
Now, the loaded model can be used.
tgt_lang = "en" segments, info = model.transcribe(<any-array-of-indonesian-audio>, beam_size=5, language=tgt_lang, vad_filter=True, ) translation = " ".join([segment.text.strip() for segment in segments])
Note: If you faced the kernel error everytime running the code above. You have to install
nvidia-cublas
andnvidia-cudnn
apt update apt install libcudnn9-cuda-12
and Install the library using pip. Read The Documentation for more.
pip install nvidia-cublas-cu12 nvidia-cudnn-cu12==9.* export LD_LIBRARY_PATH=`python3 -c 'import os; import nvidia.cublas.lib; import nvidia.cudnn.lib; print(os.path.dirname(nvidia.cublas.lib.__file__) + ":" + os.path.dirname(nvidia.cudnn.lib.__file__))'`
Special thanks to Yasmin Moslem for her help in resolving this.
Training Procedure
Training Results
Epoch | Training Loss | Validation Loss | WER |
---|---|---|---|
1 | 0.757300 | 0.763333 | 49.192132 |
2 | 0.351300 | 0.778579 | 49.297506 |
3 | 0.156600 | 0.828453 | 49.174570 |
4 | 0.066600 | 0.894528 | 50.087812 |
5 | 0.027600 | 0.944322 | 49.947313 |
6 | 0.013600 | 0.976878 | 49.964875 |
7 | 0.005900 | 1.012044 | 50.544433 |
8 | 0.003300 | 1.050839 | 50.526870 |
9 | 0.002800 | 1.063206 | 50.684932 |
10 | 0.002400 | 1.067140 | 50.807868 |
Model Evaluation
The performance of the baseline and fine-tuned model were evaluated using the BLEU and CHRF++ metrics on the validation dataset. This fine-tuned model shows some improvement over the baseline model.
Model | BLEU | ChrF++ |
---|---|---|
Baseline | 25.87 | 43.79 |
Fine-Tuned | 37.02 | 56.04 |
Evaluation details
- BLEU: Measures the overlap between predicted and reference text based on n-grams.
- CHRF: Uses character n-grams for evaluation, making it particularly suitable for morphologically rich languages.
Framework Versions
- Transformers 4.48.3
- Pytorch 2.5.1+cu124
- Datasets 3.3.0
- Tokenizers 0.21.0
Credits
Huge thanks to Yasmin Moslem for mentoring me.
- Downloads last month
- 68
Model tree for cobrayyxx/whisper_translation_ID-EN
Base model
openai/whisper-small