wikipedia_30

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 4.0459

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 30
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 40000
  • training_steps: 100000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
No log 2.1471 2000 7.1380
7.213 4.2941 4000 5.8542
7.213 6.4412 6000 5.4036
5.4304 8.5883 8000 5.0499
5.4304 10.7354 10000 4.7606
4.771 12.8824 12000 4.5172
4.771 15.0295 14000 4.3206
4.2888 17.1766 16000 4.1530
4.2888 19.3237 18000 4.0155
3.9141 21.4707 20000 3.8966
3.9141 23.6178 22000 3.8047
3.6154 25.7649 24000 3.7359
3.6154 27.9120 26000 3.6784
3.3661 30.0590 28000 3.6360
3.3661 32.2061 30000 3.6019
3.1473 34.3532 32000 3.5816
3.1473 36.5003 34000 3.5699
2.9533 38.6473 36000 3.5650
2.9533 40.7944 38000 3.5667
2.777 42.9415 40000 3.5747
2.777 45.0886 42000 3.5878
2.6015 47.2356 44000 3.6107
2.6015 49.3827 46000 3.6261
2.4429 51.5298 48000 3.6414
2.4429 53.6769 50000 3.6637
2.3125 55.8239 52000 3.6778
2.3125 57.9710 54000 3.7033
2.1989 60.1181 56000 3.7410
2.1989 62.2652 58000 3.7755
2.1044 64.4122 60000 3.7876
2.1044 66.5593 62000 3.8081
2.0257 68.7064 64000 3.8222
2.0257 70.8535 66000 3.8411
1.9563 73.0005 68000 3.8488
1.9563 75.1476 70000 3.8915
1.8905 77.2947 72000 3.9079
1.8905 79.4418 74000 3.9169
1.836 81.5888 76000 3.9382
1.836 83.7359 78000 3.9430
1.7885 85.8830 80000 3.9471
1.7885 88.0301 82000 3.9668
1.7431 90.1771 84000 3.9860
1.7431 92.3242 86000 4.0088
1.7024 94.4713 88000 4.0132
1.7024 96.6184 90000 4.0260
1.6687 98.7654 92000 4.0358
1.6687 100.9125 94000 4.0290
1.6369 103.0596 96000 4.0422
1.6369 105.2067 98000 4.0445
1.6129 107.3537 100000 4.0459

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
1
Safetensors
Model size
12.7M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.