c5f8ed40-7b3a-41f1-9158-3c56df1c8206

This model is a fine-tuned version of katuni4ka/tiny-random-qwen1.5-moe on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.000214
train_batch_size: 4
eval_batch_size: 4
seed: 140
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 50
training_steps: 500

Training Loss	Epoch	Step	Validation Loss
No log	0.0001	1	11.9311
11.8	0.0048	50	11.8232
11.7624	0.0095	100	11.7956
11.7631	0.0143	150	11.7933
11.7553	0.0191	200	11.7917
11.7537	0.0238	250	11.7911
11.7471	0.0286	300	11.7909
11.7635	0.0333	350	11.7906
11.7679	0.0381	400	11.7905
11.7547	0.0429	450	11.7901
11.7542	0.0476	500	11.7902