--- language: sv license: mit tags: - whisper - automatic-speech-recognition - sv - transformers.js - onnx - speech - audio - transcription datasets: - common_voice metrics: - wer model-index: - name: whisper-base-onnx-web-v8 results: - task: type: automatic-speech-recognition dataset: type: common_voice name: Common Voice Swedish metrics: - type: wer value: N/A name: Word Error Rate --- # 🎤 whisper-base-onnx-web-v8 Fine-tuned Whisper model for Swedish transcription, optimized for web deployment with Transformers.js. ## 📋 Model Details - **Base Model**: openai/whisper-base - **Language**: Swedish (sv) - **Task**: Speech Recognition / Transcription - **Training Steps**: N/A - **License**: MIT ## 🚀 Usage with Transformers.js This model is optimized for browser-based transcription using Transformers.js: ```javascript import { pipeline } from '@xenova/transformers'; // Load the model const transcriber = await pipeline( 'automatic-speech-recognition', 'markusingvarsson/whisper-base-onnx-web-v8' ); // Transcribe audio const result = await transcriber(audioFile, { language: 'sv', task: 'transcribe', chunk_length_s: 30, stride_length_s: 5 }); console.log(result.text); ``` ## 🐍 Usage with Python ```python from transformers import pipeline # Load pipeline transcriber = pipeline( "automatic-speech-recognition", model="markusingvarsson/whisper-base-onnx-web-v8", device=0 # Use GPU if available ) # Transcribe result = transcriber( "audio.wav", generate_kwargs={"language": "sv", "task": "transcribe"} ) print(result["text"]) ``` ## 📊 Performance - **Word Error Rate (WER)**: N/A% - **Model Size (ONNX)**: ~95MB (quantized) - **Inference Speed**: 1-2x realtime on modern hardware ## 🎯 Intended Use This model is designed for: - Voice note transcription - Meeting transcription - Swedish podcast transcription - Real-time speech-to-text in web browsers - Accessibility applications ## 🔧 Training Details - **Hardware**: GPU/CPU - **Batch Size**: 8 - **Learning Rate**: 1e-5 - **Training Loss**: N/A ## 📁 Model Files - `*.onnx`: ONNX model files for web deployment - `config.json`: Model configuration - `tokenizer.json`: Fast tokenizer for Transformers.js - `processor_config.json`: Audio processing configuration ## 🌐 Demo Try the model in your browser: [Coming Soon] ## 📝 Limitations - Optimized for Swedish language only - Best performance with clear audio (minimal background noise) - May struggle with heavy dialects or very fast speech - Maximum audio length: 30 seconds per chunk ## 🤝 Citation If you use this model, please cite: ```bibtex @misc{whisper_base_onnx_web_v8_2024, title={whisper-base-onnx-web-v8: Swedish Whisper for Web}, author={markusingvarsson}, year={2024}, publisher={Hugging Face}, url={https://huggingface.co/markusingvarsson/whisper-base-onnx-web-v8} } ``` ## 🙏 Acknowledgments - OpenAI for the original Whisper model - Hugging Face for the tools and platform - The Swedish NLP community ## 📄 License This model is released under the MIT License.