Kyrgyz Whisper Medium — LoRA Adapter (PEFT)

This repository contains a LoRA/PEFT adapter for Kyrgyz automatic speech recognition (ASR).

Links

What is this?

This repo provides adapter weights only. For inference, you must load the base model and then attach this adapter via PEFT.

If you want a single, standalone checkpoint, use the merged model linked above.

Dataset

  • Training/evaluation dataset: fsicoli/common_voice_22_0 (config: ky)

Results

Evaluation on Common Voice 22.0 Kyrgyz (test split):

  • WER (normalized): 16.2061
  • WER_ortho (orthographic): 19.1491
  • test_loss: 0.1722

Quick check (200 random test samples):

  • WER: 16.1677
  • WER_ortho: 19.6021

Note: WER depends on text normalization (punctuation/case), decoding settings, and audio preprocessing.

Training details

LoRA fine-tuning summary:

  • LoRA: r=8, lora_alpha=16, lora_dropout=0.1
  • Target modules: q_proj, v_proj
  • Steps: max_steps=4000
  • Best checkpoint by WER: checkpoint-4000 (WER=16.21)

Training progress (selected checkpoints):

Step Train loss Val loss WER_ortho WER
500 0.7980 0.7911 44.3501 42.0754
1000 0.3980 0.2043 28.9947 27.8551
1500 0.1712 0.1821 20.7479 17.7343
2000 0.1734 0.1770 20.7569 17.6977
2500 0.1935 0.1743 19.7995 16.8192
3000 0.3406 0.1728 19.8988 16.9656
3500 0.3192 0.1724 19.3840 16.4074
4000 0.1499 0.1722 19.1491 16.2061

How to use

Install

pip install -U "transformers" "peft" "accelerate" "torch"

Inference (Transformers pipeline + PEFT)

import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline

adapter_id = "AleksTv/whisper-medium-ky-lora"

peft_cfg = PeftConfig.from_pretrained(adapter_id)
base_id = peft_cfg.base_model_name_or_path  # nineninesix/kyrgyz-whisper-medium

device = 0 if torch.cuda.is_available() else -1
dtype = torch.float16 if torch.cuda.is_available() else torch.float32

base_model = AutoModelForSpeechSeq2Seq.from_pretrained(
  base_id,
  torch_dtype=dtype,
  device_map="auto" if torch.cuda.is_available() else None,
  low_cpu_mem_usage=True,
  use_safetensors=True,
)

model = PeftModel.from_pretrained(base_model, adapter_id)

# The base model uses custom tokenizer components for Kyrgyz support.
processor = AutoProcessor.from_pretrained(base_id, trust_remote_code=True)

asr = pipeline(
  "automatic-speech-recognition",
  model=model,
  tokenizer=processor.tokenizer,
  feature_extractor=processor.feature_extractor,
  device=device,
)

print(asr("path/to/audio.wav")["text"])

Merge adapter into the base model (standalone weights)

import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor

adapter_id = "AleksTv/whisper-medium-ky-lora"

peft_cfg = PeftConfig.from_pretrained(adapter_id)
base_id = peft_cfg.base_model_name_or_path

dtype = torch.float16 if torch.cuda.is_available() else torch.float32

base_model = AutoModelForSpeechSeq2Seq.from_pretrained(
  base_id,
  torch_dtype=dtype,
  low_cpu_mem_usage=True,
  use_safetensors=True,
)

model = PeftModel.from_pretrained(base_model, adapter_id)
merged = model.merge_and_unload()

out_dir = "whisper-medium-ky-merged"
merged.save_pretrained(out_dir, safe_serialization=True)
AutoProcessor.from_pretrained(base_id, trust_remote_code=True).save_pretrained(out_dir)

Limitations

  • Quality may degrade on very noisy audio, far-field microphones, strong accents, code-switching, or long recordings without segmentation.
  • For production, you typically want VAD/segmentation + post-processing.

License

Apache-2.0.

Downloads last month
11
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AleksTv/whisper-medium-ky-lora

Adapter
(1)
this model

Dataset used to train AleksTv/whisper-medium-ky-lora

Evaluation results