wikipedia_42

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 4.3376

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 40000
  • training_steps: 100000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
No log 1.4657 2000 7.7675
7.8833 2.9315 4000 7.0282
7.8833 4.3972 6000 6.5853
6.6338 5.8630 8000 6.2212
6.6338 7.3287 10000 5.9097
5.9514 8.7944 12000 5.6314
5.9514 10.2602 14000 5.3996
5.4199 11.7259 16000 5.1990
5.4199 13.1916 18000 5.0405
5.0121 14.6574 20000 4.9063
5.0121 16.1231 22000 4.7988
4.7026 17.5889 24000 4.7078
4.7026 19.0546 26000 4.6257
4.457 20.5203 28000 4.5664
4.457 21.9861 30000 4.5069
4.2579 23.4518 32000 4.4624
4.2579 24.9176 34000 4.4199
4.0916 26.3833 36000 4.3874
4.0916 27.8490 38000 4.3623
3.9507 29.3148 40000 4.3443
3.9507 30.7805 42000 4.3140
3.821 32.2462 44000 4.3072
3.821 33.7120 46000 4.2900
3.7002 35.1777 48000 4.2812
3.7002 36.6435 50000 4.2770
3.6009 38.1092 52000 4.2762
3.6009 39.5749 54000 4.2695
3.5172 41.0407 56000 4.2709
3.5172 42.5064 58000 4.2759
3.4448 43.9722 60000 4.2693
3.4448 45.4379 62000 4.2815
3.3812 46.9036 64000 4.2788
3.3812 48.3694 66000 4.2915
3.3268 49.8351 68000 4.2839
3.3268 51.3008 70000 4.2940
3.2758 52.7666 72000 4.2919
3.2758 54.2323 74000 4.3084
3.2333 55.6981 76000 4.3099
3.2333 57.1638 78000 4.3111
3.1928 58.6295 80000 4.3121
3.1928 60.0953 82000 4.3197
3.1562 61.5610 84000 4.3232
3.1562 63.0267 86000 4.3240
3.1231 64.4925 88000 4.3278
3.1231 65.9582 90000 4.3292
3.0943 67.4240 92000 4.3349
3.0943 68.8897 94000 4.3352
3.0684 70.3554 96000 4.3382
3.0684 71.8212 98000 4.3372
3.0462 73.2869 100000 4.3376

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
2
Safetensors
Model size
12.7M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.