output

This model is a fine-tuned version of mistralai/Voxtral-Mini-3B-2507 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1209
  • Wer: 7.4378

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 6e-06
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • optimizer: Use adamw_bnb_8bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • num_epochs: 4

Training results

Training Loss Epoch Step Validation Loss Wer
0.3459 0.1359 100 0.2102 17.6374
0.9531 0.2717 200 0.1271 7.9300
0.5131 0.4076 300 0.1175 7.6839
0.1454 0.5435 400 0.1172 8.2308
0.4597 0.6793 500 0.1142 7.2737
0.1317 0.8152 600 0.1208 7.1643
0.2293 0.9511 700 0.1112 7.4651
0.0656 1.0870 800 0.1210 7.9300
0.082 1.2228 900 0.1230 7.7386
0.2314 1.3587 1000 0.1249 7.9027
0.0726 1.4946 1100 0.1198 7.7659
0.5566 1.6304 1200 0.1192 8.0394
0.4758 1.7663 1300 0.1187 7.4651
0.0438 1.9022 1400 0.1147 7.5198
0.1857 2.0380 1500 0.1141 7.5198
0.3315 2.1739 1600 0.1163 7.1643
0.3889 2.3098 1700 0.1206 7.2737
0.4921 2.4457 1800 0.1190 7.7659
0.0262 2.5815 1900 0.1212 7.4378
0.2187 2.7174 2000 0.1189 7.6839
0.4816 2.8533 2100 0.1173 7.3831
0.8181 2.9891 2200 0.1176 7.4104
0.1194 3.125 2300 0.1223 7.4378
0.1414 3.2609 2400 0.1210 7.4378
0.0412 3.3967 2500 0.1209 7.5745
0.249 3.5326 2600 0.1217 7.4651
0.3678 3.6685 2700 0.1211 7.3831
0.019 3.8043 2800 0.1212 7.5745
0.2528 3.9402 2900 0.1211 7.5472

Framework versions

  • Transformers 4.54.0
  • Pytorch 2.5.1+cu121
  • Datasets 3.6.0
  • Tokenizers 0.21.0
Downloads last month
7
Safetensors
Model size
4.68B params
Tensor type
F32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for BlahBlah314/output

Finetuned
(9)
this model