T5LAA

This model is a fine-tuned version of on the HuggingFaceFW/fineweb sample-10BT dataset. It achieves the following results on the evaluation set:

  • Loss: 4.8058
  • Accuracy: 0.0279

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • total_train_batch_size: 16
  • total_eval_batch_size: 16
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • training_steps: 200000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy
8.7605 0.005 1000 8.5073 0.0339
8.0954 0.01 2000 8.0179 0.0310
7.7188 0.015 3000 7.6839 0.0308
7.4459 0.02 4000 7.4330 0.0329
7.2526 0.025 5000 7.2564 0.0323
7.1018 0.03 6000 7.1287 0.0335
7.014 0.035 7000 7.0243 0.0341
6.9585 0.04 8000 6.9537 0.0316
6.9082 0.045 9000 6.8731 0.0329
6.8857 0.05 10000 6.8224 0.0326
6.8166 0.055 11000 6.8210 0.0324
6.8225 0.06 12000 6.7650 0.0334
6.791 0.065 13000 6.7341 0.0322
6.7786 0.07 14000 6.7270 0.0329
6.7516 0.075 15000 6.6738 0.0336
6.7343 0.08 16000 6.6957 0.0337
6.7027 0.085 17000 6.6473 0.0333
6.6741 0.09 18000 6.6254 0.0345
6.6426 0.095 19000 6.6426 0.0339
6.6475 0.1 20000 6.6046 0.0330
6.6649 0.105 21000 6.5704 0.0342
6.619 0.11 22000 6.5711 0.0324
6.6216 0.115 23000 6.5813 0.0320
6.5812 0.12 24000 6.5470 0.0331
6.5995 0.125 25000 6.5184 0.0338
6.5891 0.13 26000 6.5082 0.0333
6.5767 0.135 27000 6.4814 0.0328
6.5387 0.14 28000 6.5033 0.0324
6.5427 0.145 29000 6.4800 0.0319
6.5139 0.15 30000 6.4772 0.0314
6.5186 0.155 31000 6.4465 0.0323
6.5233 0.16 32000 6.4228 0.0326
6.4659 0.165 33000 6.4369 0.0318
6.4819 0.17 34000 6.3976 0.0338
6.4735 0.175 35000 6.4116 0.0330
6.4659 0.18 36000 6.4191 0.0313
6.443 0.185 37000 6.3790 0.0323
6.448 0.19 38000 6.3910 0.0316
6.421 0.195 39000 6.3719 0.0322
6.4127 0.2 40000 6.3744 0.0320
6.4213 0.205 41000 6.3811 0.0315
6.42 0.21 42000 6.3516 0.0311
6.414 0.215 43000 6.3339 0.0310
6.3899 0.22 44000 6.3502 0.0315
6.3715 1.0017 45000 6.3191 0.0314
6.3588 1.0067 46000 6.3087 0.0315
6.3802 1.0117 47000 6.2925 0.0315
6.3708 1.0167 48000 6.3044 0.0318
6.3189 1.0217 49000 6.3186 0.0308
6.3545 1.0267 50000 6.3024 0.0307
6.3255 1.0317 51000 6.3016 0.0306
6.3162 1.0367 52000 6.2832 0.0317
6.309 1.0417 53000 6.2734 0.0305
6.314 1.0467 54000 6.2505 0.0312
6.293 1.0517 55000 6.2592 0.0317
6.2813 1.0567 56000 6.2271 0.0308
6.2781 1.0617 57000 6.2520 0.0305
6.2625 1.0667 58000 6.2200 0.0309
6.2638 1.0717 59000 6.1990 0.0301
6.2455 1.0767 60000 6.2035 0.0311
6.253 1.0817 61000 6.2160 0.0314
6.2408 1.0867 62000 6.2086 0.0301
6.2332 1.0917 63000 6.1925 0.0298
6.2182 1.0967 64000 6.1664 0.0304
6.2301 1.1017 65000 6.1583 0.0303
6.2379 1.1067 66000 6.1884 0.0305
6.2211 1.1117 67000 6.1614 0.0311
6.2018 1.1167 68000 6.1608 0.0307
6.1969 1.1217 69000 6.1333 0.0305
6.1989 1.1267 70000 6.1314 0.0302
6.2058 1.1317 71000 6.1510 0.0299
6.1994 1.1367 72000 6.1442 0.0295
6.1715 1.1417 73000 6.1396 0.0299
6.1849 1.1467 74000 6.1083 0.0300
6.1709 1.1517 75000 6.0837 0.0302
6.1669 1.1567 76000 6.0925 0.0292
6.16 1.1617 77000 6.0939 0.0292
6.1637 1.1667 78000 6.0950 0.0297
6.1446 1.1717 79000 6.0897 0.0293
6.1231 1.1767 80000 6.0780 0.0298
6.1287 1.1817 81000 6.0912 0.0290
6.1196 1.1867 82000 6.0849 0.0290
6.1136 1.1917 83000 6.0616 0.0294
6.1135 1.1967 84000 6.0516 0.0291
6.1157 1.2017 85000 6.0517 0.0296
6.1102 1.2067 86000 6.0622 0.0291
6.1218 1.2117 87000 6.0639 0.0285
6.1104 1.2167 88000 6.0515 0.0290
6.0777 1.2217 89000 6.0191 0.0295
6.051 2.0035 90000 6.0048 0.0287
6.065 2.0085 91000 6.0302 0.0288
6.0941 2.0135 92000 6.0298 0.0284
6.0833 2.0185 93000 6.0141 0.0287
6.0816 2.0235 94000 6.0137 0.0281
6.0771 2.0285 95000 6.0285 0.0290
6.0646 2.0335 96000 6.0099 0.0277
6.0421 2.0385 97000 6.0031 0.0294
6.0477 2.0435 98000 5.9979 0.0280
6.0317 2.0485 99000 5.9879 0.0286
6.0236 2.0535 100000 5.9789 0.0286
6.0245 2.0585 101000 5.9813 0.0286
6.0046 2.0635 102000 5.9600 0.0272
6.0089 2.0685 103000 5.9696 0.0282
6.0268 2.0735 104000 5.9631 0.0284
6.015 2.0785 105000 5.9860 0.0279
5.9978 2.0835 106000 5.9594 0.0282
6.0095 2.0885 107000 5.9667 0.0280
6.008 2.0935 108000 5.9561 0.0275
5.9912 2.0985 109000 5.9748 0.0278
6.0 2.1035 110000 5.9513 0.0279
5.9981 2.1085 111000 5.9358 0.0277
5.9877 2.1135 112000 5.9350 0.0279
5.9726 2.1185 113000 5.9340 0.0278
5.9696 2.1235 114000 5.9248 0.0274
5.9842 2.1285 115000 5.9515 0.0273
5.9919 2.1335 116000 5.9237 0.0277
5.972 2.1385 117000 5.9278 0.0270
5.9715 2.1435 118000 5.9110 0.0268
5.9727 2.1485 119000 5.9139 0.0275
5.9427 2.1535 120000 5.9278 0.0273
5.9514 2.1585 121000 5.9227 0.0269
5.9217 2.1635 122000 5.9305 0.0273
5.9862 2.1685 123000 5.9092 0.0267
5.9388 2.1735 124000 5.8899 0.0270
5.9429 2.1785 125000 5.8950 0.0267
5.9317 2.1835 126000 5.9110 0.0268
5.9367 2.1885 127000 5.8681 0.0268
5.9273 2.1935 128000 5.8802 0.0274
5.934 2.1985 129000 5.8973 0.0268
5.9229 2.2035 130000 5.8916 0.0270
5.942 2.2085 131000 5.8965 0.0266
5.9224 2.2135 132000 5.8800 0.0268
5.936 2.2185 133000 5.8693 0.0269
5.9129 3.0002 134000 5.8501 0.0265
5.8787 3.0052 135000 5.8702 0.0267
5.9171 3.0102 136000 5.8449 0.0269
5.8931 3.0152 137000 5.8457 0.0270
5.8612 3.0202 138000 5.8630 0.0263
5.8897 3.0252 139000 5.8497 0.0267
5.8772 3.0302 140000 5.8177 0.0263
5.8774 3.0352 141000 5.8212 0.0266
5.8694 3.0402 142000 5.8374 0.0267
5.8561 3.0452 143000 5.7928 0.0267
5.8658 3.0502 144000 5.7936 0.0269
5.8295 3.0552 145000 5.7956 0.0265
5.8444 3.0602 146000 5.7924 0.0264
5.8318 3.0652 147000 5.7651 0.0265
5.8323 3.0702 148000 5.7701 0.0268
5.8239 3.0752 149000 5.7793 0.0264
5.8057 3.0802 150000 5.7676 0.0274
5.7818 3.0852 151000 5.7569 0.0270
5.773 3.0902 152000 5.7408 0.0267
5.7491 3.0952 153000 5.7206 0.0274
5.7655 3.1002 154000 5.7095 0.0268
5.7706 3.1052 155000 5.7079 0.0272
5.7379 3.1102 156000 5.6919 0.0273
5.7374 3.1152 157000 5.6678 0.0274
5.7077 3.1202 158000 5.6482 0.0270
5.7176 3.1252 159000 5.6142 0.0274
5.7077 3.1302 160000 5.6299 0.0275
5.6882 3.1352 161000 5.5914 0.0275
5.6513 3.1402 162000 5.5857 0.0272
5.6516 3.1452 163000 5.5584 0.0274
5.6158 3.1502 164000 5.5223 0.0281
5.6235 3.1552 165000 5.5276 0.0277
5.6308 3.1602 166000 5.4992 0.0282
5.5782 3.1652 167000 5.4890 0.0276
5.5723 3.1702 168000 5.4436 0.0279
5.5417 3.1752 169000 5.4166 0.0284
5.5346 3.1802 170000 5.4036 0.0285
5.5068 3.1852 171000 5.3664 0.0285
5.5024 3.1902 172000 5.3372 0.0286
5.4611 3.1952 173000 5.3065 0.0286
5.4352 3.2002 174000 5.3051 0.0285
5.4305 3.2052 175000 5.2718 0.0290
5.4244 3.2102 176000 5.2341 0.0286
5.406 3.2152 177000 5.1970 0.0287
5.3693 3.2202 178000 5.1883 0.0288
5.3414 4.0019 179000 5.1566 0.0287
5.3252 4.0069 180000 5.1210 0.0291
5.3302 4.0119 181000 5.1127 0.0290
5.3112 4.0169 182000 5.0792 0.0289
5.2651 4.0219 183000 5.0433 0.0291
5.2623 4.0269 184000 5.0256 0.0288
5.2297 4.0319 185000 5.0291 0.0287
5.1991 4.0369 186000 4.9703 0.0288
5.1883 4.0419 187000 4.9758 0.0287
5.1854 4.0469 188000 4.9428 0.0282
5.1636 4.0519 189000 4.9118 0.0284
5.1356 4.0569 190000 4.9047 0.0282
5.1329 4.0619 191000 4.8749 0.0283
5.107 4.0669 192000 4.8771 0.0281
5.1159 4.0719 193000 4.8562 0.0280
5.0892 4.0769 194000 4.8465 0.0279
5.083 4.0819 195000 4.8258 0.0279
5.0824 4.0869 196000 4.8216 0.0280
5.0774 4.0919 197000 4.8172 0.0279
5.0567 4.0969 198000 4.8118 0.0278
5.0657 4.1019 199000 4.8077 0.0278
5.0751 4.1069 200000 4.8058 0.0279

Framework versions

  • Transformers 4.49.0.dev0
  • Pytorch 2.5.1+cu121
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
109
Safetensors
Model size
77M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Dataset used to train hrezaei/T5LAA

Evaluation results