fpadovani's picture
Model save
33dcdf5 verified
|
raw
history blame
4.07 kB
metadata
library_name: transformers
tags:
  - generated_from_trainer
model-index:
  - name: wikipedia_30
    results: []

wikipedia_30

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 4.2325

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 30
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 40000
  • training_steps: 100000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
No log 1.0518 2000 7.5189
7.6141 2.1036 4000 6.5682
7.6141 3.1554 6000 6.1693
6.2396 4.2072 8000 5.9016
6.2396 5.2590 10000 5.6737
5.7217 6.3108 12000 5.4600
5.7217 7.3626 14000 5.2725
5.3064 8.4144 16000 5.1075
5.3064 9.4662 18000 4.9717
4.9744 10.5180 20000 4.8649
4.9744 11.5698 22000 4.7635
4.7273 12.6216 24000 4.6833
4.7273 13.6734 26000 4.6216
4.5397 14.7252 28000 4.5626
4.5397 15.7770 30000 4.5062
4.3839 16.8288 32000 4.4644
4.3839 17.8806 34000 4.4263
4.2529 18.9324 36000 4.3944
4.2529 19.9842 38000 4.3651
4.1409 21.0360 40000 4.3513
4.1409 22.0878 42000 4.3216
4.0334 23.1396 44000 4.3008
4.0334 24.1914 46000 4.2791
3.9378 25.2432 48000 4.2719
3.9378 26.2950 50000 4.2624
3.8595 27.3468 52000 4.2532
3.8595 28.3986 54000 4.2432
3.7927 29.4504 56000 4.2378
3.7927 30.5022 58000 4.2377
3.7367 31.5540 60000 4.2334
3.7367 32.6058 62000 4.2300
3.6869 33.6576 64000 4.2269
3.6869 34.7094 66000 4.2245
3.6416 35.7612 68000 4.2257
3.6416 36.8130 70000 4.2206
3.6032 37.8648 72000 4.2247
3.6032 38.9166 74000 4.2272
3.5669 39.9684 76000 4.2262
3.5669 41.0202 78000 4.2309
3.533 42.0720 80000 4.2304
3.533 43.1238 82000 4.2290
3.5033 44.1757 84000 4.2331
3.5033 45.2275 86000 4.2322
3.4759 46.2793 88000 4.2311
3.4759 47.3311 90000 4.2341
3.4528 48.3829 92000 4.2348
3.4528 49.4347 94000 4.2326
3.4319 50.4865 96000 4.2331
3.4319 51.5383 98000 4.2326
3.4136 52.5901 100000 4.2325

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1