Overview

This Tunisian Automatic Speech Recognition (ASR) project focuses on developing a system that can accurately transcribe spoken Tunisian Arabic into text. It's a finetuned on a WavLM (an extension of Wav2Vec 2.0 which uses a transformer architecture ) as a base Model and boosted with a KenLM language model located in language_model/languageModel.arpa.

📈 Performance

Tested On a Private Dataset

CER	WER
`9.18%`	`24.78%`
A Private Dataset , 2.5 Hours of Tunisian audio data.

🚀 How To run the web app Locally?

1. Download the repo :

Make sure that you installed the huggingface client before cloning the repo .

> git clone https://huggingface.co/brdhaker3/TunASR

2. install the necessary dependencies :

> pip install -r requirements.txt

3. Adjust the hyperparams.yaml file

Check the hyper parameters file hyperparams.yaml and verify the path of the language model.

4.🌐Run the web app:

To run the web app you have only to execute:

> python app.py

✉️ Contact :

If you have questions, you can send an email to : [email protected]

Dhaker Br

brdhaker3
/

TunASR