Kyrgyz Whisper Medium — LoRA Adapter (PEFT)

This repository contains a LoRA/PEFT adapter for Kyrgyz automatic speech recognition (ASR).

What is this?

This repo provides adapter weights only. For inference, you must load the base model and then attach this adapter via PEFT.

If you want a single, standalone checkpoint, use the merged model linked above.

Dataset

Training/evaluation dataset: fsicoli/common_voice_22_0 (config: ky)

Results

Evaluation on Common Voice 22.0 Kyrgyz (test split):

WER (normalized): 16.2061
WER_ortho (orthographic): 19.1491
test_loss: 0.1722

Quick check (200 random test samples):

WER: 16.1677
WER_ortho: 19.6021

Note: WER depends on text normalization (punctuation/case), decoding settings, and audio preprocessing.

Training details

LoRA fine-tuning summary:

LoRA: r=8, lora_alpha=16, lora_dropout=0.1
Target modules: q_proj, v_proj
Steps: max_steps=4000
Best checkpoint by WER: checkpoint-4000 (WER=16.21)

Training progress (selected checkpoints):

Step	Train loss	Val loss	WER_ortho	WER
500	0.7980	0.7911	44.3501	42.0754
1000	0.3980	0.2043	28.9947	27.8551
1500	0.1712	0.1821	20.7479	17.7343
2000	0.1734	0.1770	20.7569	17.6977
2500	0.1935	0.1743	19.7995	16.8192
3000	0.3406	0.1728	19.8988	16.9656
3500	0.3192	0.1724	19.3840	16.4074
4000	0.1499	0.1722	19.1491	16.2061

How to use

Install

pip install -U "transformers" "peft" "accelerate" "torch"

Inference (Transformers pipeline + PEFT)

import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline

adapter_id = "AleksTv/whisper-medium-ky-lora"

peft_cfg = PeftConfig.from_pretrained(adapter_id)
base_id = peft_cfg.base_model_name_or_path  # nineninesix/kyrgyz-whisper-medium

device = 0 if torch.cuda.is_available() else -1
dtype = torch.float16 if torch.cuda.is_available() else torch.float32

base_model = AutoModelForSpeechSeq2Seq.from_pretrained(
  base_id,
  torch_dtype=dtype,
  device_map="auto" if torch.cuda.is_available() else None,
  low_cpu_mem_usage=True,
  use_safetensors=True,
)

model = PeftModel.from_pretrained(base_model, adapter_id)

# The base model uses custom tokenizer components for Kyrgyz support.
processor = AutoProcessor.from_pretrained(base_id, trust_remote_code=True)

asr = pipeline(
  "automatic-speech-recognition",
  model=model,
  tokenizer=processor.tokenizer,
  feature_extractor=processor.feature_extractor,
  device=device,
)

print(asr("path/to/audio.wav")["text"])

Merge adapter into the base model (standalone weights)

import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor

adapter_id = "AleksTv/whisper-medium-ky-lora"

peft_cfg = PeftConfig.from_pretrained(adapter_id)
base_id = peft_cfg.base_model_name_or_path

dtype = torch.float16 if torch.cuda.is_available() else torch.float32

base_model = AutoModelForSpeechSeq2Seq.from_pretrained(
  base_id,
  torch_dtype=dtype,
  low_cpu_mem_usage=True,
  use_safetensors=True,
)

model = PeftModel.from_pretrained(base_model, adapter_id)
merged = model.merge_and_unload()

out_dir = "whisper-medium-ky-merged"
merged.save_pretrained(out_dir, safe_serialization=True)
AutoProcessor.from_pretrained(base_id, trust_remote_code=True).save_pretrained(out_dir)

Limitations

Quality may degrade on very noisy audio, far-field microphones, strong accents, code-switching, or long recordings without segmentation.
For production, you typically want VAD/segmentation + post-processing.

License

Apache-2.0.

Downloads last month: 11

Model tree for AleksTv/whisper-medium-ky-lora

Base model

openai/whisper-medium

Finetuned

nineninesix/kyrgyz-whisper-medium

Adapter

(1)

this model

Dataset used to train AleksTv/whisper-medium-ky-lora

Evaluation results

WER (normalized) on Common Voice 22.0 (ky)
test set self-reported

16.206
WER (orthographic) on Common Voice 22.0 (ky)
test set self-reported

19.149

AleksTv
/

whisper-medium-ky-lora

Kyrgyz Whisper Medium — LoRA Adapter (PEFT)

Links

What is this?

Dataset

Results

Training details

How to use

Install

Inference (Transformers pipeline + PEFT)

Merge adapter into the base model (standalone weights)

Limitations

License

Model tree for AleksTv/whisper-medium-ky-lora

Dataset used to train AleksTv/whisper-medium-ky-lora

Evaluation results

Kyrgyz Whisper Medium — LoRA Adapter (PEFT)

Links

What is this?

Dataset

Results

Training details

How to use

Install

Inference (Transformers pipeline + PEFT)

Merge adapter into the base model (standalone weights)

Limitations

License

Model tree for AleksTv/whisper-medium-ky-lora

Dataset used to train AleksTv/whisper-medium-ky-lora

Evaluation results

🎉 Free Image Generator Now Available!