File size: 3,213 Bytes

4bd1d07

---
language: sv
license: mit
tags:
  - whisper
  - automatic-speech-recognition
  - sv
  - transformers.js
  - onnx
  - speech
  - audio
  - transcription
datasets:
  - common_voice
metrics:
  - wer
model-index:
  - name: whisper-base-onnx-web-v3
    results:
      - task:
          type: automatic-speech-recognition
        dataset:
          type: common_voice
          name: Common Voice Swedish
        metrics:
          - type: wer
            value: N/A
            name: Word Error Rate
---

# 🎤 whisper-base-onnx-web-v3

Fine-tuned Whisper model for Swedish transcription, optimized for web deployment with Transformers.js.

## 📋 Model Details

- **Base Model**: openai/whisper-base
- **Language**: Swedish (sv)
- **Task**: Speech Recognition / Transcription
- **Training Steps**: N/A
- **License**: MIT

## 🚀 Usage with Transformers.js

This model is optimized for browser-based transcription using Transformers.js:

```javascript
import { pipeline } from '@xenova/transformers';

// Load the model
const transcriber = await pipeline(
  'automatic-speech-recognition',
  'markusingvarsson/whisper-base-onnx-web-v3'
);

// Transcribe audio
const result = await transcriber(audioFile, {
  language: 'sv',
  task: 'transcribe',
  chunk_length_s: 30,
  stride_length_s: 5
});

console.log(result.text);
```

## 🐍 Usage with Python

```python
from transformers import pipeline

# Load pipeline
transcriber = pipeline(
    "automatic-speech-recognition",
    model="markusingvarsson/whisper-base-onnx-web-v3",
    device=0  # Use GPU if available
)

# Transcribe
result = transcriber(
    "audio.wav",
    generate_kwargs={"language": "sv", "task": "transcribe"}
)

print(result["text"])
```

## 📊 Performance

- **Word Error Rate (WER)**: N/A%
- **Model Size (ONNX)**: ~95MB (quantized)
- **Inference Speed**: 1-2x realtime on modern hardware

## 🎯 Intended Use

This model is designed for:
- Voice note transcription
- Meeting transcription
- Swedish podcast transcription
- Real-time speech-to-text in web browsers
- Accessibility applications

## 🔧 Training Details

- **Hardware**: GPU/CPU
- **Batch Size**: 8
- **Learning Rate**: 1e-5
- **Training Loss**: N/A

## 📁 Model Files

- `*.onnx`: ONNX model files for web deployment
- `config.json`: Model configuration
- `tokenizer.json`: Fast tokenizer for Transformers.js
- `processor_config.json`: Audio processing configuration

## 🌐 Demo

Try the model in your browser: [Coming Soon]

## 📝 Limitations

- Optimized for Swedish language only
- Best performance with clear audio (minimal background noise)
- May struggle with heavy dialects or very fast speech
- Maximum audio length: 30 seconds per chunk

## 🤝 Citation

If you use this model, please cite:

```bibtex
@misc{whisper_base_onnx_web_v3_2024,
  title={whisper-base-onnx-web-v3: Swedish Whisper for Web},
  author={markusingvarsson},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/markusingvarsson/whisper-base-onnx-web-v3}
}
```

## 🙏 Acknowledgments

- OpenAI for the original Whisper model
- Hugging Face for the tools and platform
- The Swedish NLP community

## 📄 License

This model is released under the MIT License.