output

This model is a fine-tuned version of mistralai/Voxtral-Mini-3B-2507 on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 6e-06
train_batch_size: 2
eval_batch_size: 2
seed: 42
optimizer: Use adamw_bnb_8bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 4

Training Loss	Epoch	Step	Validation Loss	Wer
0.3459	0.1359	100	0.2102	17.6374
0.9531	0.2717	200	0.1271	7.9300
0.5131	0.4076	300	0.1175	7.6839
0.1454	0.5435	400	0.1172	8.2308
0.4597	0.6793	500	0.1142	7.2737
0.1317	0.8152	600	0.1208	7.1643
0.2293	0.9511	700	0.1112	7.4651
0.0656	1.0870	800	0.1210	7.9300
0.082	1.2228	900	0.1230	7.7386
0.2314	1.3587	1000	0.1249	7.9027
0.0726	1.4946	1100	0.1198	7.7659
0.5566	1.6304	1200	0.1192	8.0394
0.4758	1.7663	1300	0.1187	7.4651
0.0438	1.9022	1400	0.1147	7.5198
0.1857	2.0380	1500	0.1141	7.5198
0.3315	2.1739	1600	0.1163	7.1643
0.3889	2.3098	1700	0.1206	7.2737
0.4921	2.4457	1800	0.1190	7.7659
0.0262	2.5815	1900	0.1212	7.4378
0.2187	2.7174	2000	0.1189	7.6839
0.4816	2.8533	2100	0.1173	7.3831
0.8181	2.9891	2200	0.1176	7.4104
0.1194	3.125	2300	0.1223	7.4378
0.1414	3.2609	2400	0.1210	7.4378
0.0412	3.3967	2500	0.1209	7.5745
0.249	3.5326	2600	0.1217	7.4651
0.3678	3.6685	2700	0.1211	7.3831
0.019	3.8043	2800	0.1212	7.5745
0.2528	3.9402	2900	0.1211	7.5472