---
license: mit
language:
- ar
metrics:
- cer
- wer
base_model:
- microsoft/wavlm-large
---
# Overview

This Tunisian Automatic Speech Recognition (ASR) project focuses on developing a system that can accurately transcribe spoken Tunisian Arabic into text.
It's a finetuned on a WavLM (an extension of Wav2Vec 2.0 which uses a transformer architecture ) as a base Model and boosted with a KenLM language model located in language_model/languageModel.arpa.


## 📈 Performance
Tested On a Private Dataset
| CER     | WER                |
| :------- | :------------------------- |
| `9.18%` | `24.78%` |
A Private Dataset , 2.5 Hours of Tunisian audio data.

## 🚀 How To run the web app Locally?

### 1. Download the repo :
Make sure that you installed the huggingface client before cloning the repo .

```python
> git clone https://huggingface.co/brdhaker3/TunASR
```

### 2. install the necessary dependencies :

```python
> pip install -r requirements.txt
```

### 3. Adjust the hyperparams.yaml file

Check the **hyper parameters** file hyperparams.yaml and verify the path of the **language model**.

### 4.🌐Run the web app:

To run the web app you have only to execute:
```python
> python app.py
```


## ✉️ Contact : 
If you have questions, you can send an email to : benrejebdhaker3@gmail.com 
* [Dhaker Br](https://tn.linkedin.com/in/BrDhaker)