Upload whisper-base-onnx-web-v8 - Swedish Whisper model

e3eb30a verified 5 days ago

3.21 kB

	---
	language: sv
	license: mit
	tags:
	- whisper
	- automatic-speech-recognition
	- sv
	- transformers.js
	- onnx
	- speech
	- audio
	- transcription
	datasets:
	- common_voice
	metrics:
	- wer
	model-index:
	- name: whisper-base-onnx-web-v8
	results:
	- task:
	type: automatic-speech-recognition
	dataset:
	type: common_voice
	name: Common Voice Swedish
	metrics:
	- type: wer
	value: N/A
	name: Word Error Rate
	---

	# 🎤 whisper-base-onnx-web-v8

	Fine-tuned Whisper model for Swedish transcription, optimized for web deployment with Transformers.js.

	## 📋 Model Details

	- Base Model: openai/whisper-base
	- Language: Swedish (sv)
	- Task: Speech Recognition / Transcription
	- Training Steps: N/A
	- License: MIT

	## 🚀 Usage with Transformers.js

	This model is optimized for browser-based transcription using Transformers.js:

	```javascript
	import { pipeline } from '@xenova/transformers';

	// Load the model
	const transcriber = await pipeline(
	'automatic-speech-recognition',
	'markusingvarsson/whisper-base-onnx-web-v8'
	);

	// Transcribe audio
	const result = await transcriber(audioFile, {
	language: 'sv',
	task: 'transcribe',
	chunk_length_s: 30,
	stride_length_s: 5
	});

	console.log(result.text);
	```

	## 🐍 Usage with Python

	```python
	from transformers import pipeline

	# Load pipeline
	transcriber = pipeline(
	"automatic-speech-recognition",
	model="markusingvarsson/whisper-base-onnx-web-v8",
	device=0 # Use GPU if available
	)

	# Transcribe
	result = transcriber(
	"audio.wav",
	generate_kwargs={"language": "sv", "task": "transcribe"}
	)

	print(result["text"])
	```

	## 📊 Performance

	- Word Error Rate (WER): N/A%
	- Model Size (ONNX): ~95MB (quantized)
	- Inference Speed: 1-2x realtime on modern hardware

	## 🎯 Intended Use

	This model is designed for:
	- Voice note transcription
	- Meeting transcription
	- Swedish podcast transcription
	- Real-time speech-to-text in web browsers
	- Accessibility applications

	## 🔧 Training Details

	- Hardware: GPU/CPU
	- Batch Size: 8
	- Learning Rate: 1e-5
	- Training Loss: N/A

	## 📁 Model Files

	- `*.onnx`: ONNX model files for web deployment
	- `config.json`: Model configuration
	- `tokenizer.json`: Fast tokenizer for Transformers.js
	- `processor_config.json`: Audio processing configuration

	## 🌐 Demo

	Try the model in your browser: [Coming Soon]

	## 📝 Limitations

	- Optimized for Swedish language only
	- Best performance with clear audio (minimal background noise)
	- May struggle with heavy dialects or very fast speech
	- Maximum audio length: 30 seconds per chunk

	## 🤝 Citation

	If you use this model, please cite:

	```bibtex
	@misc{whisper_base_onnx_web_v8_2024,
	title={whisper-base-onnx-web-v8: Swedish Whisper for Web},
	author={markusingvarsson},
	year={2024},
	publisher={Hugging Face},
	url={https://huggingface.co/markusingvarsson/whisper-base-onnx-web-v8}
	}
	```

	## 🙏 Acknowledgments

	- OpenAI for the original Whisper model
	- Hugging Face for the tools and platform
	- The Swedish NLP community

	## 📄 License

	This model is released under the MIT License.