vhdm commited on
Commit
9fc6f09
·
verified ·
1 Parent(s): 86599fc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +95 -39
README.md CHANGED
@@ -5,7 +5,19 @@ language:
5
  license: mit
6
  base_model: openai/whisper-large-v3-turbo
7
  tags:
8
- - generated_from_trainer
 
 
 
 
 
 
 
 
 
 
 
 
9
  datasets:
10
  - vhdm/persian-voice-v1.1
11
  metrics:
@@ -26,57 +38,101 @@ model-index:
26
  value: 14.065335753176045
27
  ---
28
 
29
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
30
- should probably proofread and complete it, then remove this comment. -->
31
 
32
- # vhdm/whisper-v3-turbo-persian-v1.1
33
 
34
- This model is a fine-tuned version of [openai/whisper-large-v3-turbo](https://huggingface.co/openai/whisper-large-v3-turbo) on the vhdm/persian-voice-v1 dataset.
35
- It achieves the following results on the evaluation set:
36
- - Loss: 0.1445
37
- - Wer: 14.0653
38
 
39
- ## Model description
 
 
 
 
 
 
 
 
 
40
 
41
- More information needed
 
 
42
 
43
- ## Intended uses & limitations
 
 
44
 
45
- More information needed
46
 
47
- ## Training and evaluation data
48
 
49
- More information needed
 
 
 
50
 
51
- ## Training procedure
 
 
 
 
 
 
 
 
 
 
 
 
52
 
53
- ### Training hyperparameters
54
 
55
- The following hyperparameters were used during training:
56
- - learning_rate: 1e-05
57
- - train_batch_size: 16
58
- - eval_batch_size: 8
59
- - seed: 42
60
- - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
61
- - lr_scheduler_type: linear
62
- - lr_scheduler_warmup_steps: 500
63
- - training_steps: 5000
64
- - mixed_precision_training: Native AMP
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
65
 
66
- ### Training results
67
 
68
- | Training Loss | Epoch | Step | Validation Loss | Wer |
69
- |:-------------:|:------:|:----:|:---------------:|:-------:|
70
- | 0.219 | 0.6150 | 1000 | 0.2093 | 22.0750 |
71
- | 0.1191 | 1.2300 | 2000 | 0.1698 | 17.8463 |
72
- | 0.1051 | 1.8450 | 3000 | 0.1485 | 15.7895 |
73
- | 0.0644 | 2.4600 | 4000 | 0.1530 | 16.0375 |
74
- | 0.0289 | 3.0750 | 5000 | 0.1445 | 14.0653 |
75
 
 
 
76
 
77
- ### Framework versions
 
 
78
 
79
- - Transformers 4.52.4
80
- - Pytorch 2.7.1+cu118
81
- - Datasets 3.6.0
82
- - Tokenizers 0.21.1
 
5
  license: mit
6
  base_model: openai/whisper-large-v3-turbo
7
  tags:
8
+ - whisper
9
+ - whisper-large-v3
10
+ - persian
11
+ - farsi
12
+ - speech-recognition
13
+ - asr
14
+ - automatic-speech-recognition
15
+ - audio
16
+ - transformers
17
+ - generated_from_trainer
18
+ - h100
19
+ - huggingface
20
+ - vhdm
21
  datasets:
22
  - vhdm/persian-voice-v1.1
23
  metrics:
 
38
  value: 14.065335753176045
39
  ---
40
 
41
+ # 📢 vhdm/whisper-v3-turbo-persian-v1.1
 
42
 
43
+ 🎧 **Fine-tuned Whisper Large V3 Turbo for Persian Speech Recognition**
44
 
45
+ This model is a fine-tuned version of [`openai/whisper-large-v3-turbo`](https://huggingface.co/openai/whisper-large-v3-turbo) trained specifically on high-quality Persian speech data from the [`vhdm/persian-voice-v1`](https://huggingface.co/datasets/vhdm/persian-voice-v1) dataset.
 
 
 
46
 
47
+ ---
48
+
49
+ ## 🧪 Evaluation Results
50
+
51
+ | Metric | Value |
52
+ |--------|-------|
53
+ | **Final Validation Loss** | 0.1445 |
54
+ | **Word Error Rate (WER)** | **14.07%** |
55
+
56
+ The model shows consistent improvement over training and reaches a solid WER of ~14% on clean Persian speech data.
57
 
58
+ ---
59
+
60
+ ## 🧠 Model Description
61
 
62
+ This model aims to bring high-accuracy **automatic speech recognition (ASR)** to Persian language using the Whisper architecture. By leveraging OpenAI's powerful Whisper Large V3 Turbo backbone and carefully curated Persian data, it can transcribe Persian audio with high fidelity.
63
+
64
+ ---
65
 
66
+ ## Intended Use
67
 
68
+ This model is best suited for:
69
 
70
+ - 📱 Transcribing Persian voice notes
71
+ - 🗣️ Real-time or batch ASR for Persian podcasts, videos, and interviews
72
+ - 🔍 Creating searchable transcripts of Persian audio content
73
+ - 🧩 Fine-tuning or domain adaptation for Persian speech tasks
74
 
75
+ ### 🚫 Limitations
76
+
77
+ - The model is fine-tuned on clean audio from specific sources and may perform poorly on noisy, accented, or dialectal speech.
78
+ - Not optimized for real-time streaming ASR (though inference is fast).
79
+ - It may occasionally produce hallucinations (incorrect but plausible words), a common issue in Whisper models.
80
+
81
+ ---
82
+
83
+ ## 📚 Training Data
84
+
85
+ The model was trained on the [`vhdm/persian-voice-v1`](https://huggingface.co/datasets/vhdm/persian-voice-v1) dataset, a curated collection of Persian speech recordings with high-quality transcriptions.
86
+
87
+ ---
88
 
89
+ ## ⚙️ Training Procedure
90
 
91
+ - **Optimizer**: AdamW (`betas=(0.9, 0.999)`, `eps=1e-08`)
92
+ - **Learning Rate**: 1e-5
93
+ - **Batch Sizes**: Train - 16 | Eval - 8
94
+ - **Scheduler**: Linear with 500 warmup steps
95
+ - **Mixed Precision**: Native AMP (automatic mixed precision)
96
+ - **Seed**: 42
97
+ - **Training Steps**: 5000
98
+
99
+ ---
100
+
101
+ ## ⏱️ Training Time & Hardware
102
+
103
+ The model was trained using an **NVIDIA H100 GPU**, and the full fine-tuning process took approximately **20 hours**.
104
+
105
+ ---
106
+
107
+ ## 📈 Training Progress
108
+
109
+ | Step | Training Loss | Validation Loss | WER (%) |
110
+ |------|----------------|-----------------|----------|
111
+ | 1000 | 0.2190 | 0.2093 | 22.07 |
112
+ | 2000 | 0.1191 | 0.1698 | 17.85 |
113
+ | 3000 | 0.1051 | 0.1485 | 15.79 |
114
+ | 4000 | 0.0644 | 0.1530 | 16.03 |
115
+ | 5000 | 0.0289 | 0.1445 | **14.07** |
116
+
117
+ ---
118
+
119
+ ## 🧰 Framework Versions
120
+
121
+ - `transformers`: 4.52.4
122
+ - `torch`: 2.7.1+cu118
123
+ - `datasets`: 3.6.0
124
+ - `tokenizers`: 0.21.1
125
+
126
+ ---
127
 
128
+ ## 🚀 Try it out
129
 
130
+ You can load and test the model using 🤗 Transformers:
 
 
 
 
 
 
131
 
132
+ ```python
133
+ from transformers import pipeline
134
 
135
+ pipe = pipeline("automatic-speech-recognition", model="vhdm/whisper-v3-turbo-persian-v1.1")
136
+ result = pipe("path_to_persian_audio.wav")
137
+ print(result["text"])
138