vhdm
/

whisper-large-fa-v1

@@ -5,7 +5,19 @@ language:
 license: mit
 base_model: openai/whisper-large-v3-turbo
 tags:
-- generated_from_trainer
 datasets:
 - vhdm/persian-voice-v1.1
 metrics:
@@ -26,57 +38,101 @@ model-index:
       value: 14.065335753176045
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# vhdm/whisper-v3-turbo-persian-v1.1
-This model is a fine-tuned version of [openai/whisper-large-v3-turbo](https://huggingface.co/openai/whisper-large-v3-turbo) on the vhdm/persian-voice-v1 dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.1445
-- Wer: 14.0653
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 1e-05
-- train_batch_size: 16
-- eval_batch_size: 8
-- seed: 42
-- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: linear
-- lr_scheduler_warmup_steps: 500
-- training_steps: 5000
-- mixed_precision_training: Native AMP
-### Training results
-| Training Loss | Epoch  | Step | Validation Loss | Wer     |
-|:-------------:|:------:|:----:|:---------------:|:-------:|
-| 0.219         | 0.6150 | 1000 | 0.2093          | 22.0750 |
-| 0.1191        | 1.2300 | 2000 | 0.1698          | 17.8463 |
-| 0.1051        | 1.8450 | 3000 | 0.1485          | 15.7895 |
-| 0.0644        | 2.4600 | 4000 | 0.1530          | 16.0375 |
-| 0.0289        | 3.0750 | 5000 | 0.1445          | 14.0653 |
-### Framework versions
-- Transformers 4.52.4
-- Pytorch 2.7.1+cu118
-- Datasets 3.6.0
-- Tokenizers 0.21.1

 license: mit
 base_model: openai/whisper-large-v3-turbo
 tags:
+  - whisper
+  - whisper-large-v3
+  - persian
+  - farsi
+  - speech-recognition
+  - asr
+  - automatic-speech-recognition
+  - audio
+  - transformers
+  - generated_from_trainer
+  - h100
+  - huggingface
+  - vhdm
 datasets:
 - vhdm/persian-voice-v1.1
 metrics:
       value: 14.065335753176045
 ---
+# 📢 vhdm/whisper-v3-turbo-persian-v1.1
+🎧 **Fine-tuned Whisper Large V3 Turbo for Persian Speech Recognition**
+This model is a fine-tuned version of [`openai/whisper-large-v3-turbo`](https://huggingface.co/openai/whisper-large-v3-turbo) trained specifically on high-quality Persian speech data from the [`vhdm/persian-voice-v1`](https://huggingface.co/datasets/vhdm/persian-voice-v1) dataset.
+---
+## 🧪 Evaluation Results
+| Metric | Value |
+|--------|-------|
+| **Final Validation Loss** | 0.1445 |
+| **Word Error Rate (WER)** | **14.07%** |
+The model shows consistent improvement over training and reaches a solid WER of ~14% on clean Persian speech data.
+---
+## 🧠 Model Description
+This model aims to bring high-accuracy **automatic speech recognition (ASR)** to Persian language using the Whisper architecture. By leveraging OpenAI's powerful Whisper Large V3 Turbo backbone and carefully curated Persian data, it can transcribe Persian audio with high fidelity.
+---
+## ✅ Intended Use
+This model is best suited for:
+- 📱 Transcribing Persian voice notes
+- 🗣️ Real-time or batch ASR for Persian podcasts, videos, and interviews
+- 🔍 Creating searchable transcripts of Persian audio content
+- 🧩 Fine-tuning or domain adaptation for Persian speech tasks
+### 🚫 Limitations
+- The model is fine-tuned on clean audio from specific sources and may perform poorly on noisy, accented, or dialectal speech.
+- Not optimized for real-time streaming ASR (though inference is fast).
+- It may occasionally produce hallucinations (incorrect but plausible words), a common issue in Whisper models.
+---
+## 📚 Training Data
+The model was trained on the [`vhdm/persian-voice-v1`](https://huggingface.co/datasets/vhdm/persian-voice-v1) dataset, a curated collection of Persian speech recordings with high-quality transcriptions.
+---
+## ⚙️ Training Procedure
+- **Optimizer**: AdamW (`betas=(0.9, 0.999)`, `eps=1e-08`)
+- **Learning Rate**: 1e-5
+- **Batch Sizes**: Train - 16 | Eval - 8
+- **Scheduler**: Linear with 500 warmup steps
+- **Mixed Precision**: Native AMP (automatic mixed precision)
+- **Seed**: 42
+- **Training Steps**: 5000
+---
+## ⏱️ Training Time & Hardware
+The model was trained using an **NVIDIA H100 GPU**, and the full fine-tuning process took approximately **20 hours**.
+---
+## 📈 Training Progress
+| Step | Training Loss | Validation Loss | WER (%) |
+|------|----------------|-----------------|----------|
+| 1000 | 0.2190         | 0.2093          | 22.07    |
+| 2000 | 0.1191         | 0.1698          | 17.85    |
+| 3000 | 0.1051         | 0.1485          | 15.79    |
+| 4000 | 0.0644         | 0.1530          | 16.03    |
+| 5000 | 0.0289         | 0.1445          | **14.07** |
+---
+## 🧰 Framework Versions
+- `transformers`: 4.52.4
+- `torch`: 2.7.1+cu118
+- `datasets`: 3.6.0
+- `tokenizers`: 0.21.1
+---
+## 🚀 Try it out
+You can load and test the model using 🤗 Transformers:
+```python
+from transformers import pipeline
+pipe = pipeline("automatic-speech-recognition", model="vhdm/whisper-v3-turbo-persian-v1.1")
+result = pipe("path_to_persian_audio.wav")
+print(result["text"])