mitkaj/w2v2BERT-CZ-CV-17.0
This is a fine-tuned Wav2Vec2BERT model for Czech Automatic Speech Recognition (ASR) using CTC loss.
Model Details
- Base Model: facebook/w2v-bert-2.0
- Architecture: Wav2Vec2BertForCTC
- Training: Fine-tuned on Czech Common Voice dataset
- Loss Function: CTC (Connectionist Temporal Classification)
- Vocab Size: 51 tokens
Training Summary
- Training Epochs: 19.97
- Final Training Loss: 0.0305
- Final Evaluation Loss: 0.1450
- Final WER: 0.0583 (5.83%)
- Total Training Time: 5.1 hours
- Total FLOPS: 79819834495052513280 GF
Usage
from transformers import AutoProcessor, AutoModelForCTC
import torch
# Load model and processor
processor = AutoProcessor.from_pretrained("mitkaj/w2v2BERT-CZ-CV-17.0")
model = AutoModelForCTC.from_pretrained("mitkaj/w2v2BERT-CZ-CV-17.0")
# Process audio
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
# Get logits
with torch.no_grad():
logits = model(**inputs).logits
# Decode
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)
Training
This model was trained using the CTC approach on Czech speech data.
Performance
The model was evaluated on Czech test data using WER (Word Error Rate) metric.
Citation
If you use this model, please cite the original Wav2Vec2BERT paper and this fine-tuned version.
- Downloads last month
- 5
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support