T5LAE

This model is a fine-tuned version of on the HuggingFaceFW/fineweb sample-10BT dataset. It achieves the following results on the evaluation set:

  • Loss: 6.3530
  • Accuracy: 0.0323

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • total_train_batch_size: 16
  • total_eval_batch_size: 16
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • training_steps: 200000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy
7.8503 0.01 2000 7.6894 0.0317
7.3885 0.02 4000 7.3045 0.0291
7.2248 0.03 6000 7.1483 0.0295
7.146 0.04 8000 7.0598 0.0298
7.098 0.05 10000 7.0069 0.0293
7.059 0.06 12000 6.9745 0.0304
7.036 0.07 14000 6.9492 0.0294
7.0083 0.08 16000 6.9298 0.0290
6.9703 0.09 18000 6.9145 0.0294
6.961 0.1 20000 6.9006 0.0303
6.9502 0.11 22000 6.8869 0.0302
6.9297 0.12 24000 6.8809 0.0282
6.9577 0.13 26000 6.8740 0.0288
6.9097 0.14 28000 6.8537 0.0290
6.9034 0.15 30000 6.8485 0.0293
6.9243 0.16 32000 6.8369 0.0292
6.8998 0.17 34000 6.8280 0.0297
6.8914 0.18 36000 6.8237 0.0289
6.8788 0.19 38000 6.8096 0.0306
6.8585 0.2 40000 6.8057 0.0295
6.8719 0.21 42000 6.7966 0.0313
6.8534 0.22 44000 6.7896 0.0297
6.8463 1.0067 46000 6.7795 0.0312
6.8588 1.0167 48000 6.7659 0.0304
6.8477 1.0267 50000 6.7667 0.0293
6.8268 1.0367 52000 6.7545 0.0301
6.8205 1.0467 54000 6.7439 0.0308
6.8035 1.0567 56000 6.7329 0.0297
6.7904 1.0667 58000 6.7233 0.0314
6.781 1.0767 60000 6.7235 0.0290
6.7722 1.0867 62000 6.7047 0.0311
6.7618 1.0967 64000 6.6947 0.0315
6.7821 1.1067 66000 6.6881 0.0309
6.7478 1.1167 68000 6.6781 0.0313
6.7544 1.1267 70000 6.6677 0.0292
6.7451 1.1367 72000 6.6529 0.0314
6.738 1.1467 74000 6.6436 0.0316
6.7223 1.1567 76000 6.6381 0.0312
6.7099 1.1667 78000 6.6245 0.0321
6.6851 1.1767 80000 6.6122 0.0311
6.6702 1.1867 82000 6.5993 0.0314
6.6761 1.1967 84000 6.5896 0.0317
6.6701 1.2067 86000 6.5855 0.0302
6.6696 1.2167 88000 6.5767 0.0313
6.6283 2.0035 90000 6.5673 0.0312
6.6662 2.0135 92000 6.5728 0.0307
6.6544 2.0235 94000 6.5492 0.0310
6.634 2.0335 96000 6.5433 0.0319
6.63 2.0435 98000 6.5395 0.0318
6.6022 2.0535 100000 6.5284 0.0318
6.5875 2.0635 102000 6.5209 0.0316
6.6115 2.0735 104000 6.5107 0.0320
6.5769 2.0835 106000 6.5118 0.0318
6.5941 2.0935 108000 6.4977 0.0312
6.5838 2.1035 110000 6.4884 0.0326
6.579 2.1135 112000 6.4919 0.0316
6.5642 2.1235 114000 6.4880 0.0318
6.5825 2.1335 116000 6.4747 0.0325
6.5625 2.1435 118000 6.4699 0.0310
6.5352 2.1535 120000 6.4664 0.0323
6.5174 2.1635 122000 6.4611 0.0320
6.5338 2.1735 124000 6.4618 0.0323
6.5264 2.1835 126000 6.4524 0.0320
6.533 2.1935 128000 6.4500 0.0315
6.5256 2.2035 130000 6.4433 0.0314
6.5293 2.2135 132000 6.4379 0.0316
6.5199 3.0002 134000 6.4395 0.0324
6.5356 3.0102 136000 6.4327 0.0321
6.4831 3.0202 138000 6.4207 0.0322
6.5051 3.0302 140000 6.4205 0.0311
6.5076 3.0402 142000 6.4148 0.0326
6.5085 3.0502 144000 6.4078 0.0323
6.5023 3.0602 146000 6.4070 0.0325
6.5019 3.0702 148000 6.4053 0.0331
6.4881 3.0802 150000 6.4011 0.0323
6.4642 3.0902 152000 6.4023 0.0316
6.4711 3.1002 154000 6.3948 0.0320
6.4713 3.1102 156000 6.3942 0.0323
6.461 3.1202 158000 6.3899 0.0319
6.4891 3.1302 160000 6.3877 0.0319
6.454 3.1402 162000 6.3834 0.0318
6.4456 3.1502 164000 6.3858 0.0319
6.4825 3.1602 166000 6.3827 0.0325
6.4563 3.1702 168000 6.3758 0.0321
6.4595 3.1802 170000 6.3755 0.0320
6.4525 3.1902 172000 6.3731 0.0319
6.4332 3.2002 174000 6.3691 0.0320
6.4656 3.2102 176000 6.3682 0.0318
6.4312 3.2202 178000 6.3672 0.0323
6.4439 4.0069 180000 6.3707 0.0317
6.4629 4.0169 182000 6.3619 0.0323
6.4505 4.0269 184000 6.3633 0.0324
6.4294 4.0369 186000 6.3594 0.0324
6.4427 4.0469 188000 6.3580 0.0319
6.4237 4.0569 190000 6.3600 0.0321
6.4201 4.0669 192000 6.3591 0.0322
6.4308 4.0769 194000 6.3554 0.0322
6.4349 4.0869 196000 6.3535 0.0323
6.4181 4.0969 198000 6.3542 0.0322
6.4385 4.1069 200000 6.3530 0.0323

Framework versions

  • Transformers 4.49.0.dev0
  • Pytorch 2.5.1+cu121
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
139
Safetensors
Model size
60.5M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Dataset used to train hrezaei/T5LAE

Evaluation results