markusingvarsson's picture
Upload whisper-base-onnx-web-v8 - Swedish Whisper model
e3eb30a verified
---
language: sv
license: mit
tags:
- whisper
- automatic-speech-recognition
- sv
- transformers.js
- onnx
- speech
- audio
- transcription
datasets:
- common_voice
metrics:
- wer
model-index:
- name: whisper-base-onnx-web-v8
results:
- task:
type: automatic-speech-recognition
dataset:
type: common_voice
name: Common Voice Swedish
metrics:
- type: wer
value: N/A
name: Word Error Rate
---
# 🎤 whisper-base-onnx-web-v8
Fine-tuned Whisper model for Swedish transcription, optimized for web deployment with Transformers.js.
## 📋 Model Details
- **Base Model**: openai/whisper-base
- **Language**: Swedish (sv)
- **Task**: Speech Recognition / Transcription
- **Training Steps**: N/A
- **License**: MIT
## 🚀 Usage with Transformers.js
This model is optimized for browser-based transcription using Transformers.js:
```javascript
import { pipeline } from '@xenova/transformers';
// Load the model
const transcriber = await pipeline(
'automatic-speech-recognition',
'markusingvarsson/whisper-base-onnx-web-v8'
);
// Transcribe audio
const result = await transcriber(audioFile, {
language: 'sv',
task: 'transcribe',
chunk_length_s: 30,
stride_length_s: 5
});
console.log(result.text);
```
## 🐍 Usage with Python
```python
from transformers import pipeline
# Load pipeline
transcriber = pipeline(
"automatic-speech-recognition",
model="markusingvarsson/whisper-base-onnx-web-v8",
device=0 # Use GPU if available
)
# Transcribe
result = transcriber(
"audio.wav",
generate_kwargs={"language": "sv", "task": "transcribe"}
)
print(result["text"])
```
## 📊 Performance
- **Word Error Rate (WER)**: N/A%
- **Model Size (ONNX)**: ~95MB (quantized)
- **Inference Speed**: 1-2x realtime on modern hardware
## 🎯 Intended Use
This model is designed for:
- Voice note transcription
- Meeting transcription
- Swedish podcast transcription
- Real-time speech-to-text in web browsers
- Accessibility applications
## 🔧 Training Details
- **Hardware**: GPU/CPU
- **Batch Size**: 8
- **Learning Rate**: 1e-5
- **Training Loss**: N/A
## 📁 Model Files
- `*.onnx`: ONNX model files for web deployment
- `config.json`: Model configuration
- `tokenizer.json`: Fast tokenizer for Transformers.js
- `processor_config.json`: Audio processing configuration
## 🌐 Demo
Try the model in your browser: [Coming Soon]
## 📝 Limitations
- Optimized for Swedish language only
- Best performance with clear audio (minimal background noise)
- May struggle with heavy dialects or very fast speech
- Maximum audio length: 30 seconds per chunk
## 🤝 Citation
If you use this model, please cite:
```bibtex
@misc{whisper_base_onnx_web_v8_2024,
title={whisper-base-onnx-web-v8: Swedish Whisper for Web},
author={markusingvarsson},
year={2024},
publisher={Hugging Face},
url={https://huggingface.co/markusingvarsson/whisper-base-onnx-web-v8}
}
```
## 🙏 Acknowledgments
- OpenAI for the original Whisper model
- Hugging Face for the tools and platform
- The Swedish NLP community
## 📄 License
This model is released under the MIT License.