Upload fine-tuned Icelandic Whisper LoRA adapter v1

Browse files

Files changed (14) hide show

README.md +306 -0
adapter_config.json +37 -0
adapter_model.safetensors +3 -0
optimizer.pt +3 -0
preprocessor_config.json +15 -0
rng_state_0.pth +3 -0
rng_state_1.pth +3 -0
rng_state_2.pth +3 -0
rng_state_3.pth +3 -0
rng_state_4.pth +3 -0
rng_state_5.pth +3 -0
scheduler.pt +3 -0
trainer_state.json +502 -0
training_args.bin +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,306 @@

+---
+language: is
+license: mit # Or your chosen license for the adapter, e.g., apache-2.0
+library_name: peft
+tags:
+- openai
+- whisper
+- whisper-large-v3
+- automatic-speech-recognition
+- asr
+- icelandic
+- lora
+- peft
+- speech
+base_model: openai/whisper-large-v3
+datasets:
+- language-and-voice-lab/raddromur_icelandic_speech_22_09 # Fictitious ID for clarity, actual data is local
+- language-and-voice-lab/samromur_milljon
+metrics:
+- wer
+- cer
+model-index:
+- name: whisper-large-v3-lora-is
+  results:
+  - task:
+      type: automatic-speech-recognition
+      name: Automatic Speech Recognition
+    dataset:
+      name: Samrómur Milljón (female_18to49_yrs subset)
+      type: language-and-voice-lab/samromur_milljon
+      config: is
+      split: female_18to49_yrs (1000 samples)
+    metrics:
+      - name: WER
+        type: wer
+        value: 33.07 # From your results
+      - name: CER
+        type: cer
+        value: 10.59 # From your results
+---
+# LoRA Fine-tuned Whisper Large v3 for Icelandic ASR
+This repository contains a LoRA (Low-Rank Adaptation) adapter for the `openai/whisper-large-v3` model, fine-tuned for Automatic Speech Recognition (ASR) in Icelandic.
+The fine-tuning was performed on the "Raddrómur Icelandic Speech 22.09" corpus, and the adapter was evaluated on a subset of the "Samrómur Milljón" dataset.
+## Model Description
+* **Base Model:** `openai/whisper-large-v3`
+* **Fine-tuning Method:** LoRA (Parameter-Efficient Fine-Tuning) using the `peft` library.
+* **Language:** Icelandic (is)
+* **Task:** Automatic Speech Recognition (transcription)
+## Fine-tuning Data
+* **Dataset Name:** Raddrómur Icelandic Speech 22.09
+* **Source:** Language and Voice Laboratory (LVL) at Reykjavík University (RU)
+* **Description:** Approximately 49 hours of Icelandic speech sourced from radio podcasts (primarily RÚV). The audio is 16kHz mono FLAC, with transcriptions automatically aligned.
+* **License:** [Creative Commons Attribution 4.0 International (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/)
+## Evaluation
+The fine-tuned adapter was evaluated against the base `openai/whisper-large-v3` model on a 1000-sample subset of the `female_18to49_yrs` split from the `language-and-voice-lab/samromur_milljon` dataset.
+**Evaluation Metrics (Lower is Better):**
+| Model                | WER (%) | CER (%) |
+| :------------------- | :-----: | :-----: |
+| Base Model           |  34.15  |  11.05  |
+| Fine-tuned Adapter   |  33.07  |  10.59  |
+*(Note: No stereo files were detected in the evaluation subset. Evaluation error flags were False for both, indicating successful completion.)*
+**Comparison Plot:**
+possibly
+**Interpretation:** The fine-tuned LoRA adapter demonstrates a modest improvement over the base `whisper-large-v3` model on this specific Icelandic evaluation subset. The Word Error Rate (WER) was reduced by approximately 1.08 points (absolute), and the Character Error Rate (CER) was reduced by approximately 0.46 points (absolute). Further evaluation on larger or different test sets could provide more comprehensive insights.
+## How to Use
+This LoRA adapter is intended to be used with the base `openai/whisper-large-v3` model.
+First, ensure you have the necessary libraries installed:
+```bash
+# Using pip
+pip install transformers peft torch accelerate soundfile librosa
+# Or using uv
+uv pip install transformers peft torch accelerate soundfile librosa
+```
+Then, you can load the base model and apply the LoRA adapter from the Hugging Face Hub like this:
+```python
+import torch
+from transformers import WhisperProcessor, WhisperForConditionalGeneration
+from peft import PeftModel
+import librosa # Or your preferred audio loading library
+import numpy as np
+# --- Configuration ---
+BASE_MODEL_ID = "openai/whisper-large-v3"
+# Replace with your actual Hugging Face Hub ID for the adapter
+# For example, if you pushed it to "jonasaise/whisper-large-v3-lora-is"
+ADAPTER_HUB_ID = "jonasaise/your-repo-name" # <--- CHANGE THIS
+DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
+# Use the precision your model was trained/evaluated with
+MODEL_PRECISION = torch.bfloat16 if torch.cuda.is_available() and torch.cuda.is_bf16_supported() else torch.float16
+TARGET_LANGUAGE = "is"
+TASK = "transcribe"
+# --- 1. Load Processor ---
+try:
+    processor = WhisperProcessor.from_pretrained(BASE_MODEL_ID, language=TARGET_LANGUAGE, task=TASK)
+except Exception as e:
+    print(f"Error loading processor: {e}")
+    # Fallback if processor isn't found with base model ID (less common for Whisper)
+    # processor = WhisperProcessor.from_pretrained(ADAPTER_HUB_ID, language=TARGET_LANGUAGE, task=TASK)
+# --- 2. Load Base Model ---
+print(f"Loading base model: {BASE_MODEL_ID}...")
+base_model = WhisperForConditionalGeneration.from_pretrained(
+    BASE_MODEL_ID,
+    torch_dtype=MODEL_PRECISION,
+    low_cpu_mem_usage=True,
+    attn_implementation="sdpa" # Recommended for speed if supported, or remove/use "eager"
+)
+print("Base model loaded.")
+# --- 3. Load LoRA Adapter ---
+print(f"Loading LoRA adapter from: {ADAPTER_HUB_ID}...")
+# This loads the adapter weights and applies them to the base model
+model = PeftModel.from_pretrained(base_model, ADAPTER_HUB_ID)
+model = model.to(DEVICE)
+model.eval() # Set to evaluation mode
+print("LoRA adapter loaded and applied. Model is on device:", model.device)
+# --- 4. Prepare Your Audio ---
+# Replace "path/to/your/icelandic_audio.wav" with the actual path to your audio file
+AUDIO_FILE_PATH = "path/to/your/icelandic_audio.wav" # <--- CHANGE THIS
+try:
+    # Load audio and resample to 16kHz mono
+    speech_array, sampling_rate = librosa.load(AUDIO_FILE_PATH, sr=16000, mono=True)
+    print(f"Audio loaded and resampled to 16kHz mono. Duration: {len(speech_array)/sampling_rate:.2f}s")
+except Exception as e:
+    print(f"Error loading audio file {AUDIO_FILE_PATH}: {e}")
+    exit()
+# Process audio to get input features
+input_features = processor(speech_array, sampling_rate=16000, return_tensors="pt").input_features
+# Ensure input_features are on the correct device and precision
+# Note: Autocast during generation will handle precision, but explicit cast can also be done
+input_features = input_features.to(DEVICE) # Move to device
+if MODEL_PRECISION == torch.bfloat16:
+    input_features = input_features.to(torch.bfloat16)
+elif MODEL_PRECISION == torch.float16:
+    input_features = input_features.to(torch.float16)
+print("Input features prepared.")
+# --- 5. Generate Transcription ---
+# Configure generation parameters
+# Use the model's existing generation_config as a base
+generation_config = model.generation_config
+generation_config.language = TARGET_LANGUAGE
+generation_config.task = TASK
+generation_config.forced_decoder_ids = None # Let processor handle this based on task/language
+generation_config.suppress_tokens = []   # Clear any suppressed tokens
+print("Generating transcription...")
+with torch.inference_mode(): # Disables gradient calculations for inference
+    with torch.autocast(device_type=DEVICE, dtype=MODEL_PRECISION, enabled=torch.cuda.is_available()): # Enable autocast for mixed precision
+        predicted_ids = model.generate(input_features, generation_config=generation_config)
+# --- 6. Decode Transcription ---
+transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
+print("-" * 30)
+print(f"Transcription: {transcription}")
+print("-" * 30)
+```
+## Training Procedure
+This section details the setup and hyperparameters used for fine-tuning the LoRA adapter.
+### Data Preprocessing
+The fine-tuning script (`finetune_whisper_ice_lora.py`) performs the following preprocessing steps on the Raddrómur dataset:
+1.  Loads audio file paths and transcriptions from the `metadata.tsv` file.
+2.  Constructs full paths to audio files, accounting for the nested directory structure (e.g., `<DATA_DIR>/speech/<podcast_name_dir>/<podcast_id_dir>/<filename.flac>`).
+3.  Casts audio to 16kHz mono (though Raddrómur is already in this format).
+4.  Splits the dataset into training and test/validation sets (e.g., 90/10 split).
+5.  Uses the `WhisperProcessor` to:
+    * Convert audio arrays into log-Mel input features.
+    * Tokenize the Icelandic transcriptions into label IDs.
+6.  A `DataCollatorSpeechSeq2SeqWithPadding` is used to dynamically pad sequences within each batch.
+### Fine-tuning Hyperparameters & Setup
+The model was fine-tuned using the following configuration:
+* **Base Model:** `openai/whisper-large-v3`
+* **Fine-tuning Method:** LoRA (Low-Rank Adaptation) using `peft`.
+    * `r` (Rank of LoRA matrices): 32 (example, *adjust if different*)
+    * `lora_alpha`: 64 (example, *adjust if different*)
+    * `target_modules`: `["q_proj", "v_proj"]` (example, *adjust if different*)
+    * `lora_dropout`: 0.05 (example, *adjust if different*)
+* **Precision:** BFloat16 (`bf16=True` in `Seq2SeqTrainingArguments`).
+* **Optimizer:** AdamW 8-bit (`optim="adamw_8bit"` in `Seq2SeqTrainingArguments`, requires `bitsandbytes`).
+* **Learning Rate:** e.g., `1e-5` (*adjust to your actual value*).
+* **Batch Size (Per Device):** e.g., `4` (*adjust to your final successful value*).
+* **Gradient Accumulation Steps:** e.g., `8` (*adjust to your final successful value*).
+    * **Effective Batch Size:** (Per-Device Batch Size) \* (Gradient Accumulation Steps) \* (Number of GPUs)
+* **Number of Epochs:** 3 (or `max_steps` if that was used).
+* **Warmup Steps:** e.g., 10% of total steps (*adjust to your actual value*).
+* **Attention Implementation:** Scaled Dot Product Attention (`attn_implementation="sdpa"` during model loading).
+* **Gradient Checkpointing:** Enabled (`model.gradient_checkpointing_enable()`).
+* **Logging:** Weights & Biases (`report_to=["wandb"]`).
+* **Evaluation Strategy during Training:** Evaluated every `eval_steps` (e.g., 36 steps, *adjust to your final value*).
+* **Language & Task:** Icelandic (`is`), Transcribe (`transcribe`).
+### Compute Infrastructure
+* **Hardware:** NVIDIA DGX A100 (initially targeting 5 GPUs, final successful training run used 2 GPUs - `6,7`).
+* **Software:**
+    * Python 3.10
+    * PyTorch
+    * `transformers`
+    * `datasets`
+    * `peft`
+    * `accelerate` (via `torchrun`)
+    * `uv` (for environment management)
+## Intended Use
+This fine-tuned LoRA adapter is intended to improve the performance of `openai/whisper-large-v3` for transcribing general Icelandic speech. It is particularly suited for:
+* Transcribing Icelandic audio content similar in nature to radio podcasts (the primary source of the Raddrómur fine-tuning data).
+* Use cases where improved accuracy on Icelandic specific vocabulary, names, and nuances is desired over the base multilingual model.
+* Applications requiring efficient fine-tuning and deployment, leveraging the small footprint of LoRA adapters.
+## Limitations and Bias
+* **Domain Specificity:** The fine-tuning dataset (Raddrómur) primarily consists of relatively clean radio podcast speech. Performance on other domains of Icelandic speech (e.g., highly noisy environments, strong accents not represented in Raddrómur, spontaneous conversational speech, children's speech beyond what might be in Samrómur Children, if that was used for training the original ASR systems that verified Samrómur Milljón) may vary.
+* **Base Model Biases:** The base `openai/whisper-large-v3` model has its own inherent limitations and potential biases (e.g., demographic performance differences, sensitivity to certain audio characteristics). These may still be present or be amplified/mitigated to some extent by this fine-tuning.
+* **Evaluation Subset:** The reported evaluation metrics are based on a 1000-sample subset of a specific demographic split (`female_18to49_yrs`) from the Samrómur Milljón dataset. Performance might differ on the full dataset, other splits, or other Icelandic evaluation benchmarks.
+* **LoRA Limitations:** While parameter-efficient, LoRA fine-tunes only a small subset of the model's parameters. It might not capture all the nuances that full fine-tuning could, but offers a significant reduction in computational cost.
+### Recommendations
+Users should be aware of the above limitations. It is recommended to:
+* Test the model on a diverse set of Icelandic audio relevant to the specific application before deployment.
+* Consider further fine-tuning or domain adaptation if performance on a specific out-of-domain task is critical.
+* Be mindful of potential biases when using the model in sensitive applications.
+## License
+* **This Adapter:** [Your Chosen License for the Adapter - e.g., MIT, Apache 2.0]
+* **Base Model (`openai/whisper-large-v3`):** The license of the original Whisper model applies to the base weights.
+* **Datasets Used:**
+    * Raddrómur Icelandic Speech 22.09: [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)
+    * Samrómur Milljón: [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)
+## Acknowledgements
+* The Language and Voice Laboratory (LVL) at Reykjavík University for creating the Raddrómur and Samrómur Milljón datasets.
+* The Language Technology Programme for Icelandic 2019-2023, managed by Almannarómur and funded by the Icelandic Ministry of Education, Science and Culture, for funding the dataset creation.
+* OpenAI for the Whisper model.
+* Hugging Face for the `transformers`, `datasets`, `evaluate`, `peft`, and `accelerate` libraries.
+* The Weights & Biases platform for experiment tracking.
+* Astral for the `uv` tool.
+## Citations
+If you use this adapter or build upon this work, please consider citing the original datasets and the base model:
+1.  **Raddrómur Dataset:**
+    Mena, Carlos et al. "Raddrómur Icelandic Speech 22.09". Web Download. Reykjavik University: Language and Voice Lab, 2022.
+2.  **Samrómur Milljón Dataset:**
+    ```bibtex
+    @inproceedings{mena2024samromur,
+        title={Samr{\'o}mur Millj{\'o}n: An ASR Corpus of One Million Verified Read Prompts in Icelandic},
+        author={Mena, Carlos Daniel Hernandez and Gunnarsson, {\TH}orsteinn Da{\dh}i and Gu{\dh}nason, J{\'o}n},
+        booktitle={Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
+        pages={14305--14312},
+        year={2024}
+    }
+    ```
+3.  **Whisper Model:**
+    ```bibtex
+    @inproceedings{radford2023robust,
+      title={Robust Speech Recognition via Large-Scale Weak Supervision},
+      author={Alec Radford and Jong Wook Kim and Tao Xu and Greg Brockman and Christine McLeavey and Ilya Sutskever},
+      booktitle={International Conference on Machine Learning},
+      pages={28492--28518},
+      year={2023},
+      organization={PMLR}
+    }
+    ```

adapter_config.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": {
+    "base_model_class": "WhisperForConditionalGeneration",
+    "parent_library": "transformers.models.whisper.modeling_whisper"
+  },
+  "base_model_name_or_path": "openai/whisper-large-v3",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 64,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "v_proj"
+  ],
+  "task_type": null,
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_rslora": false
+}

adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b2c73cdaaf9502ca8b315e800b1fd77897b987e2007139ea2b13e9027c362840
+size 62969640

optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:075c2a782c7043337fd73bf408d774e7cbd0c2931a4b5a8a7c53258ed16afe76
+size 32397925

preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,15 @@

+{
+  "chunk_length": 30,
+  "dither": 0.0,
+  "feature_extractor_type": "WhisperFeatureExtractor",
+  "feature_size": 128,
+  "hop_length": 160,
+  "n_fft": 400,
+  "n_samples": 480000,
+  "nb_max_frames": 3000,
+  "padding_side": "right",
+  "padding_value": 0.0,
+  "processor_class": "WhisperProcessor",
+  "return_attention_mask": false,
+  "sampling_rate": 16000
+}

rng_state_0.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bd7ee66ab0fd9ddc4c410bdc8d443c5c6be52a37a2fb1d24d9fbd4dfa335e36d
+size 15877

rng_state_1.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0572938e382c7d667720f18c88bb097c31756eae9bedf73385b21e48723121bf
+size 15877

rng_state_2.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7db6751b0cfa1197a9322a223d560a9a86c2025ffd50f323a19678f17b2a9f85
+size 15877

rng_state_3.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:aab4ce4486a7bd20864b06f49e0dd7a74fdadcb1bfc75e43b14c9d6a5aa01cad
+size 15877

rng_state_4.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b1855a3c4b18e64f3b6c9949d1e18239d8c9aac3622dee9c530c2cf1fd3db1e1
+size 15877

rng_state_5.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1bbd56a76da13d6902da2baca599762571693475d4e7a8afb3d0c4807752bd8a
+size 15877

scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c5d580d3c491626d13ae4b819903668bad781890634fa0e2f34b9ac4544083fe
+size 1465

trainer_state.json ADDED Viewed

	@@ -0,0 +1,502 @@

+{
+  "best_global_step": 180,
+  "best_metric": 50.112359550561806,
+  "best_model_checkpoint": "./whisper-large-v3-is-raddromur-lora-wandb/checkpoint-180",
+  "epoch": 2.9856262833675564,
+  "eval_steps": 30,
+  "global_step": 180,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.049281314168377825,
+      "grad_norm": 0.22914348542690277,
+      "learning_rate": 1.111111111111111e-06,
+      "loss": 1.391,
+      "step": 3
+    },
+    {
+      "epoch": 0.09856262833675565,
+      "grad_norm": 0.24495559930801392,
+      "learning_rate": 2.7777777777777783e-06,
+      "loss": 1.417,
+      "step": 6
+    },
+    {
+      "epoch": 0.14784394250513347,
+      "grad_norm": 0.2494819313287735,
+      "learning_rate": 4.444444444444444e-06,
+      "loss": 1.4382,
+      "step": 9
+    },
+    {
+      "epoch": 0.1971252566735113,
+      "grad_norm": 0.23504748940467834,
+      "learning_rate": 6.111111111111112e-06,
+      "loss": 1.3623,
+      "step": 12
+    },
+    {
+      "epoch": 0.2464065708418891,
+      "grad_norm": 0.25508585572242737,
+      "learning_rate": 7.77777777777778e-06,
+      "loss": 1.4247,
+      "step": 15
+    },
+    {
+      "epoch": 0.29568788501026694,
+      "grad_norm": 0.24351638555526733,
+      "learning_rate": 9.444444444444445e-06,
+      "loss": 1.4221,
+      "step": 18
+    },
+    {
+      "epoch": 0.34496919917864477,
+      "grad_norm": 0.2543489933013916,
+      "learning_rate": 9.876543209876543e-06,
+      "loss": 1.4149,
+      "step": 21
+    },
+    {
+      "epoch": 0.3942505133470226,
+      "grad_norm": 0.250897079706192,
+      "learning_rate": 9.691358024691358e-06,
+      "loss": 1.4158,
+      "step": 24
+    },
+    {
+      "epoch": 0.44353182751540043,
+      "grad_norm": 0.23567010462284088,
+      "learning_rate": 9.506172839506174e-06,
+      "loss": 1.3949,
+      "step": 27
+    },
+    {
+      "epoch": 0.4928131416837782,
+      "grad_norm": 0.24683956801891327,
+      "learning_rate": 9.320987654320989e-06,
+      "loss": 1.3688,
+      "step": 30
+    },
+    {
+      "epoch": 0.4928131416837782,
+      "eval_runtime": 745.8583,
+      "eval_samples_per_second": 1.735,
+      "eval_steps_per_second": 0.036,
+      "eval_wer": 53.24536190227332,
+      "step": 30
+    },
+    {
+      "epoch": 0.5420944558521561,
+      "grad_norm": 0.22976352274417877,
+      "learning_rate": 9.135802469135803e-06,
+      "loss": 1.3591,
+      "step": 33
+    },
+    {
+      "epoch": 0.5913757700205339,
+      "grad_norm": 0.24124480783939362,
+      "learning_rate": 8.950617283950618e-06,
+      "loss": 1.3709,
+      "step": 36
+    },
+    {
+      "epoch": 0.6406570841889117,
+      "grad_norm": 0.22739216685295105,
+      "learning_rate": 8.765432098765432e-06,
+      "loss": 1.4126,
+      "step": 39
+    },
+    {
+      "epoch": 0.6899383983572895,
+      "grad_norm": 0.2386259138584137,
+      "learning_rate": 8.580246913580249e-06,
+      "loss": 1.3458,
+      "step": 42
+    },
+    {
+      "epoch": 0.7392197125256673,
+      "grad_norm": 0.23364992439746857,
+      "learning_rate": 8.395061728395062e-06,
+      "loss": 1.3779,
+      "step": 45
+    },
+    {
+      "epoch": 0.7885010266940452,
+      "grad_norm": 0.23184379935264587,
+      "learning_rate": 8.209876543209876e-06,
+      "loss": 1.338,
+      "step": 48
+    },
+    {
+      "epoch": 0.837782340862423,
+      "grad_norm": 0.23423455655574799,
+      "learning_rate": 8.024691358024692e-06,
+      "loss": 1.3115,
+      "step": 51
+    },
+    {
+      "epoch": 0.8870636550308009,
+      "grad_norm": 0.23327411711215973,
+      "learning_rate": 7.839506172839507e-06,
+      "loss": 1.2838,
+      "step": 54
+    },
+    {
+      "epoch": 0.9363449691991786,
+      "grad_norm": 0.24564896523952484,
+      "learning_rate": 7.654320987654322e-06,
+      "loss": 1.3335,
+      "step": 57
+    },
+    {
+      "epoch": 0.9856262833675564,
+      "grad_norm": 0.21617886424064636,
+      "learning_rate": 7.469135802469136e-06,
+      "loss": 1.3044,
+      "step": 60
+    },
+    {
+      "epoch": 0.9856262833675564,
+      "eval_runtime": 755.2022,
+      "eval_samples_per_second": 1.713,
+      "eval_steps_per_second": 0.036,
+      "eval_wer": 53.106872223673896,
+      "step": 60
+    },
+    {
+      "epoch": 1.0492813141683779,
+      "grad_norm": 0.2329856902360916,
+      "learning_rate": 7.283950617283952e-06,
+      "loss": 1.403,
+      "step": 63
+    },
+    {
+      "epoch": 1.0985626283367556,
+      "grad_norm": 0.2415734827518463,
+      "learning_rate": 7.098765432098766e-06,
+      "loss": 1.2926,
+      "step": 66
+    },
+    {
+      "epoch": 1.1478439425051334,
+      "grad_norm": 0.22719435393810272,
+      "learning_rate": 6.913580246913581e-06,
+      "loss": 1.3266,
+      "step": 69
+    },
+    {
+      "epoch": 1.1971252566735113,
+      "grad_norm": 0.22385141253471375,
+      "learning_rate": 6.728395061728395e-06,
+      "loss": 1.3099,
+      "step": 72
+    },
+    {
+      "epoch": 1.2464065708418892,
+      "grad_norm": 0.22575075924396515,
+      "learning_rate": 6.543209876543211e-06,
+      "loss": 1.2993,
+      "step": 75
+    },
+    {
+      "epoch": 1.2956878850102669,
+      "grad_norm": 0.2280450165271759,
+      "learning_rate": 6.358024691358025e-06,
+      "loss": 1.2516,
+      "step": 78
+    },
+    {
+      "epoch": 1.3449691991786448,
+      "grad_norm": 0.21805013716220856,
+      "learning_rate": 6.17283950617284e-06,
+      "loss": 1.2796,
+      "step": 81
+    },
+    {
+      "epoch": 1.3942505133470227,
+      "grad_norm": 0.2454097718000412,
+      "learning_rate": 5.9876543209876546e-06,
+      "loss": 1.2567,
+      "step": 84
+    },
+    {
+      "epoch": 1.4435318275154003,
+      "grad_norm": 0.23440390825271606,
+      "learning_rate": 5.80246913580247e-06,
+      "loss": 1.2578,
+      "step": 87
+    },
+    {
+      "epoch": 1.4928131416837782,
+      "grad_norm": 0.21233566105365753,
+      "learning_rate": 5.617283950617285e-06,
+      "loss": 1.226,
+      "step": 90
+    },
+    {
+      "epoch": 1.4928131416837782,
+      "eval_runtime": 757.9185,
+      "eval_samples_per_second": 1.707,
+      "eval_steps_per_second": 0.036,
+      "eval_wer": 51.94408152599947,
+      "step": 90
+    },
+    {
+      "epoch": 1.542094455852156,
+      "grad_norm": 0.23111841082572937,
+      "learning_rate": 5.432098765432099e-06,
+      "loss": 1.2835,
+      "step": 93
+    },
+    {
+      "epoch": 1.5913757700205338,
+      "grad_norm": 0.22747503221035004,
+      "learning_rate": 5.246913580246914e-06,
+      "loss": 1.1713,
+      "step": 96
+    },
+    {
+      "epoch": 1.6406570841889117,
+      "grad_norm": 0.24629150331020355,
+      "learning_rate": 5.061728395061729e-06,
+      "loss": 1.2652,
+      "step": 99
+    },
+    {
+      "epoch": 1.6899383983572895,
+      "grad_norm": 0.20970605313777924,
+      "learning_rate": 4.876543209876544e-06,
+      "loss": 1.2063,
+      "step": 102
+    },
+    {
+      "epoch": 1.7392197125256672,
+      "grad_norm": 0.2347603589296341,
+      "learning_rate": 4.691358024691358e-06,
+      "loss": 1.1642,
+      "step": 105
+    },
+    {
+      "epoch": 1.7885010266940453,
+      "grad_norm": 0.22151677310466766,
+      "learning_rate": 4.506172839506173e-06,
+      "loss": 1.2559,
+      "step": 108
+    },
+    {
+      "epoch": 1.837782340862423,
+      "grad_norm": 0.21644067764282227,
+      "learning_rate": 4.3209876543209875e-06,
+      "loss": 1.2654,
+      "step": 111
+    },
+    {
+      "epoch": 1.8870636550308009,
+      "grad_norm": 0.2234969586133957,
+      "learning_rate": 4.135802469135803e-06,
+      "loss": 1.1653,
+      "step": 114
+    },
+    {
+      "epoch": 1.9363449691991788,
+      "grad_norm": 0.2156331092119217,
+      "learning_rate": 3.9506172839506175e-06,
+      "loss": 1.172,
+      "step": 117
+    },
+    {
+      "epoch": 1.9856262833675564,
+      "grad_norm": 0.21376466751098633,
+      "learning_rate": 3.7654320987654325e-06,
+      "loss": 1.2796,
+      "step": 120
+    },
+    {
+      "epoch": 1.9856262833675564,
+      "eval_runtime": 760.4652,
+      "eval_samples_per_second": 1.702,
+      "eval_steps_per_second": 0.036,
+      "eval_wer": 51.795139796185,
+      "step": 120
+    },
+    {
+      "epoch": 2.0492813141683777,
+      "grad_norm": 0.2266222983598709,
+      "learning_rate": 3.580246913580247e-06,
+      "loss": 1.3315,
+      "step": 123
+    },
+    {
+      "epoch": 2.0985626283367558,
+      "grad_norm": 0.22814051806926727,
+      "learning_rate": 3.395061728395062e-06,
+      "loss": 1.1759,
+      "step": 126
+    },
+    {
+      "epoch": 2.1478439425051334,
+      "grad_norm": 0.22590585052967072,
+      "learning_rate": 3.2098765432098767e-06,
+      "loss": 1.2064,
+      "step": 129
+    },
+    {
+      "epoch": 2.197125256673511,
+      "grad_norm": 0.22349856793880463,
+      "learning_rate": 3.0246913580246917e-06,
+      "loss": 1.1868,
+      "step": 132
+    },
+    {
+      "epoch": 2.246406570841889,
+      "grad_norm": 0.21798408031463623,
+      "learning_rate": 2.8395061728395062e-06,
+      "loss": 1.1485,
+      "step": 135
+    },
+    {
+      "epoch": 2.295687885010267,
+      "grad_norm": 0.23827993869781494,
+      "learning_rate": 2.6543209876543212e-06,
+      "loss": 1.1347,
+      "step": 138
+    },
+    {
+      "epoch": 2.344969199178645,
+      "grad_norm": 0.21975603699684143,
+      "learning_rate": 2.469135802469136e-06,
+      "loss": 1.152,
+      "step": 141
+    },
+    {
+      "epoch": 2.3942505133470227,
+      "grad_norm": 0.2301456183195114,
+      "learning_rate": 2.283950617283951e-06,
+      "loss": 1.212,
+      "step": 144
+    },
+    {
+      "epoch": 2.4435318275154003,
+      "grad_norm": 0.2236107736825943,
+      "learning_rate": 2.0987654320987654e-06,
+      "loss": 1.2156,
+      "step": 147
+    },
+    {
+      "epoch": 2.4928131416837784,
+      "grad_norm": 0.22880277037620544,
+      "learning_rate": 1.9135802469135804e-06,
+      "loss": 1.1885,
+      "step": 150
+    },
+    {
+      "epoch": 2.4928131416837784,
+      "eval_runtime": 758.1625,
+      "eval_samples_per_second": 1.707,
+      "eval_steps_per_second": 0.036,
+      "eval_wer": 50.802194930755164,
+      "step": 150
+    },
+    {
+      "epoch": 2.542094455852156,
+      "grad_norm": 0.23217734694480896,
+      "learning_rate": 1.7283950617283952e-06,
+      "loss": 1.2508,
+      "step": 153
+    },
+    {
+      "epoch": 2.5913757700205338,
+      "grad_norm": 0.21702837944030762,
+      "learning_rate": 1.54320987654321e-06,
+      "loss": 1.1574,
+      "step": 156
+    },
+    {
+      "epoch": 2.640657084188912,
+      "grad_norm": 0.22827443480491638,
+      "learning_rate": 1.3580246913580248e-06,
+      "loss": 1.1662,
+      "step": 159
+    },
+    {
+      "epoch": 2.6899383983572895,
+      "grad_norm": 0.22730480134487152,
+      "learning_rate": 1.1728395061728396e-06,
+      "loss": 1.1829,
+      "step": 162
+    },
+    {
+      "epoch": 2.739219712525667,
+      "grad_norm": 0.24221959710121155,
+      "learning_rate": 9.876543209876544e-07,
+      "loss": 1.2032,
+      "step": 165
+    },
+    {
+      "epoch": 2.7885010266940453,
+      "grad_norm": 0.22492796182632446,
+      "learning_rate": 8.024691358024692e-07,
+      "loss": 1.1646,
+      "step": 168
+    },
+    {
+      "epoch": 2.837782340862423,
+      "grad_norm": 0.23047611117362976,
+      "learning_rate": 6.17283950617284e-07,
+      "loss": 1.1689,
+      "step": 171
+    },
+    {
+      "epoch": 2.8870636550308006,
+      "grad_norm": 0.22853408753871918,
+      "learning_rate": 4.320987654320988e-07,
+      "loss": 1.1771,
+      "step": 174
+    },
+    {
+      "epoch": 2.9363449691991788,
+      "grad_norm": 0.21958370506763458,
+      "learning_rate": 2.469135802469136e-07,
+      "loss": 1.1692,
+      "step": 177
+    },
+    {
+      "epoch": 2.9856262833675564,
+      "grad_norm": 0.22913524508476257,
+      "learning_rate": 6.17283950617284e-08,
+      "loss": 1.1844,
+      "step": 180
+    },
+    {
+      "epoch": 2.9856262833675564,
+      "eval_runtime": 756.6055,
+      "eval_samples_per_second": 1.71,
+      "eval_steps_per_second": 0.036,
+      "eval_wer": 50.112359550561806,
+      "step": 180
+    }
+  ],
+  "logging_steps": 3,
+  "max_steps": 180,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 3,
+  "save_steps": 30,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 1.1899959344350469e+20,
+  "train_batch_size": 4,
+  "trial_name": null,
+  "trial_params": null
+}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:670f8c0507466e58b10789714ec0c355eea1b14095d32cbc8f8175b865a2e65a
+size 10641