|
--- |
|
language: sv |
|
license: mit |
|
tags: |
|
- whisper |
|
- automatic-speech-recognition |
|
- sv |
|
- transformers.js |
|
- onnx |
|
- speech |
|
- audio |
|
- transcription |
|
datasets: |
|
- common_voice |
|
metrics: |
|
- wer |
|
model-index: |
|
- name: whisper-base-onnx-web-v8 |
|
results: |
|
- task: |
|
type: automatic-speech-recognition |
|
dataset: |
|
type: common_voice |
|
name: Common Voice Swedish |
|
metrics: |
|
- type: wer |
|
value: N/A |
|
name: Word Error Rate |
|
--- |
|
|
|
# 🎤 whisper-base-onnx-web-v8 |
|
|
|
Fine-tuned Whisper model for Swedish transcription, optimized for web deployment with Transformers.js. |
|
|
|
## 📋 Model Details |
|
|
|
- **Base Model**: openai/whisper-base |
|
- **Language**: Swedish (sv) |
|
- **Task**: Speech Recognition / Transcription |
|
- **Training Steps**: N/A |
|
- **License**: MIT |
|
|
|
## 🚀 Usage with Transformers.js |
|
|
|
This model is optimized for browser-based transcription using Transformers.js: |
|
|
|
```javascript |
|
import { pipeline } from '@xenova/transformers'; |
|
|
|
// Load the model |
|
const transcriber = await pipeline( |
|
'automatic-speech-recognition', |
|
'markusingvarsson/whisper-base-onnx-web-v8' |
|
); |
|
|
|
// Transcribe audio |
|
const result = await transcriber(audioFile, { |
|
language: 'sv', |
|
task: 'transcribe', |
|
chunk_length_s: 30, |
|
stride_length_s: 5 |
|
}); |
|
|
|
console.log(result.text); |
|
``` |
|
|
|
## 🐍 Usage with Python |
|
|
|
```python |
|
from transformers import pipeline |
|
|
|
# Load pipeline |
|
transcriber = pipeline( |
|
"automatic-speech-recognition", |
|
model="markusingvarsson/whisper-base-onnx-web-v8", |
|
device=0 # Use GPU if available |
|
) |
|
|
|
# Transcribe |
|
result = transcriber( |
|
"audio.wav", |
|
generate_kwargs={"language": "sv", "task": "transcribe"} |
|
) |
|
|
|
print(result["text"]) |
|
``` |
|
|
|
## 📊 Performance |
|
|
|
- **Word Error Rate (WER)**: N/A% |
|
- **Model Size (ONNX)**: ~95MB (quantized) |
|
- **Inference Speed**: 1-2x realtime on modern hardware |
|
|
|
## 🎯 Intended Use |
|
|
|
This model is designed for: |
|
- Voice note transcription |
|
- Meeting transcription |
|
- Swedish podcast transcription |
|
- Real-time speech-to-text in web browsers |
|
- Accessibility applications |
|
|
|
## 🔧 Training Details |
|
|
|
- **Hardware**: GPU/CPU |
|
- **Batch Size**: 8 |
|
- **Learning Rate**: 1e-5 |
|
- **Training Loss**: N/A |
|
|
|
## 📁 Model Files |
|
|
|
- `*.onnx`: ONNX model files for web deployment |
|
- `config.json`: Model configuration |
|
- `tokenizer.json`: Fast tokenizer for Transformers.js |
|
- `processor_config.json`: Audio processing configuration |
|
|
|
## 🌐 Demo |
|
|
|
Try the model in your browser: [Coming Soon] |
|
|
|
## 📝 Limitations |
|
|
|
- Optimized for Swedish language only |
|
- Best performance with clear audio (minimal background noise) |
|
- May struggle with heavy dialects or very fast speech |
|
- Maximum audio length: 30 seconds per chunk |
|
|
|
## 🤝 Citation |
|
|
|
If you use this model, please cite: |
|
|
|
```bibtex |
|
@misc{whisper_base_onnx_web_v8_2024, |
|
title={whisper-base-onnx-web-v8: Swedish Whisper for Web}, |
|
author={markusingvarsson}, |
|
year={2024}, |
|
publisher={Hugging Face}, |
|
url={https://huggingface.co/markusingvarsson/whisper-base-onnx-web-v8} |
|
} |
|
``` |
|
|
|
## 🙏 Acknowledgments |
|
|
|
- OpenAI for the original Whisper model |
|
- Hugging Face for the tools and platform |
|
- The Swedish NLP community |
|
|
|
## 📄 License |
|
|
|
This model is released under the MIT License. |
|
|