mitkaj/w2v2BERT-CZ-CV-17.0

This is a fine-tuned Wav2Vec2BERT model for Czech Automatic Speech Recognition (ASR) using CTC loss.

Model Details

  • Base Model: facebook/w2v-bert-2.0
  • Architecture: Wav2Vec2BertForCTC
  • Training: Fine-tuned on Czech Common Voice dataset
  • Loss Function: CTC (Connectionist Temporal Classification)
  • Vocab Size: 51 tokens

Training Summary

  • Training Epochs: 19.97
  • Final Training Loss: 0.0305
  • Final Evaluation Loss: 0.1450
  • Final WER: 0.0583 (5.83%)
  • Total Training Time: 5.1 hours
  • Total FLOPS: 79819834495052513280 GF

Usage

from transformers import AutoProcessor, AutoModelForCTC
import torch

# Load model and processor
processor = AutoProcessor.from_pretrained("mitkaj/w2v2BERT-CZ-CV-17.0")
model = AutoModelForCTC.from_pretrained("mitkaj/w2v2BERT-CZ-CV-17.0")

# Process audio
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")

# Get logits
with torch.no_grad():
    logits = model(**inputs).logits

# Decode
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)

Training

This model was trained using the CTC approach on Czech speech data.

Performance

The model was evaluated on Czech test data using WER (Word Error Rate) metric.

Citation

If you use this model, please cite the original Wav2Vec2BERT paper and this fine-tuned version.

Downloads last month
5
Safetensors
Model size
606M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support