T5LAA2
This model is a fine-tuned version of on the HuggingFaceFW/fineweb sample-10BT dataset. It achieves the following results on the evaluation set:
- Loss: 4.9746
- Accuracy: 0.0349
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- total_train_batch_size: 16
- total_eval_batch_size: 16
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- training_steps: 200000
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
8.1082 | 0.01 | 2000 | 8.0085 | 0.0310 |
7.4447 | 0.02 | 4000 | 7.4334 | 0.0315 |
7.0973 | 0.03 | 6000 | 7.1390 | 0.0314 |
6.9575 | 0.04 | 8000 | 6.9287 | 0.0324 |
6.8881 | 0.05 | 10000 | 6.8389 | 0.0337 |
6.8308 | 0.06 | 12000 | 6.7560 | 0.0331 |
6.7858 | 0.07 | 14000 | 6.7402 | 0.0337 |
6.7431 | 0.08 | 16000 | 6.6976 | 0.0337 |
6.6837 | 0.09 | 18000 | 6.6293 | 0.0340 |
6.66 | 0.1 | 20000 | 6.6135 | 0.0328 |
6.6303 | 0.11 | 22000 | 6.5921 | 0.0325 |
6.5921 | 0.12 | 24000 | 6.5393 | 0.0337 |
6.6028 | 0.13 | 26000 | 6.5255 | 0.0329 |
6.5503 | 0.14 | 28000 | 6.5147 | 0.0326 |
6.5273 | 0.15 | 30000 | 6.4799 | 0.0321 |
6.5339 | 0.16 | 32000 | 6.4449 | 0.0319 |
6.4968 | 0.17 | 34000 | 6.4403 | 0.0318 |
6.4781 | 0.18 | 36000 | 6.4197 | 0.0320 |
6.4579 | 0.19 | 38000 | 6.4102 | 0.0317 |
6.4236 | 0.2 | 40000 | 6.3859 | 0.0313 |
6.4295 | 0.21 | 42000 | 6.3818 | 0.0306 |
6.3988 | 0.22 | 44000 | 6.3365 | 0.0327 |
6.3703 | 1.0067 | 46000 | 6.3021 | 0.0318 |
6.3803 | 1.0167 | 48000 | 6.3204 | 0.0320 |
6.3618 | 1.0267 | 50000 | 6.3032 | 0.0312 |
6.323 | 1.0367 | 52000 | 6.2990 | 0.0305 |
6.3208 | 1.0467 | 54000 | 6.2684 | 0.0312 |
6.2884 | 1.0567 | 56000 | 6.2435 | 0.0305 |
6.2682 | 1.0667 | 58000 | 6.2377 | 0.0301 |
6.2536 | 1.0767 | 60000 | 6.1934 | 0.0303 |
6.2466 | 1.0867 | 62000 | 6.2002 | 0.0301 |
6.222 | 1.0967 | 64000 | 6.1915 | 0.0300 |
6.243 | 1.1067 | 66000 | 6.1834 | 0.0293 |
6.2053 | 1.1167 | 68000 | 6.1616 | 0.0299 |
6.2029 | 1.1267 | 70000 | 6.1284 | 0.0296 |
6.198 | 1.1367 | 72000 | 6.1381 | 0.0292 |
6.1838 | 1.1467 | 74000 | 6.1051 | 0.0299 |
6.1672 | 1.1567 | 76000 | 6.0780 | 0.0289 |
6.1604 | 1.1667 | 78000 | 6.0737 | 0.0288 |
6.1217 | 1.1767 | 80000 | 6.0762 | 0.0286 |
6.1147 | 1.1867 | 82000 | 6.0566 | 0.0287 |
6.1067 | 1.1967 | 84000 | 6.0456 | 0.0286 |
6.1013 | 1.2067 | 86000 | 6.0242 | 0.0284 |
6.0998 | 1.2167 | 88000 | 6.0249 | 0.0281 |
6.0444 | 2.0035 | 90000 | 6.0138 | 0.0274 |
6.0844 | 2.0135 | 92000 | 6.0001 | 0.0274 |
6.0707 | 2.0235 | 94000 | 5.9964 | 0.0274 |
6.0536 | 2.0335 | 96000 | 5.9791 | 0.0271 |
6.0356 | 2.0435 | 98000 | 5.9854 | 0.0272 |
6.0136 | 2.0535 | 100000 | 5.9646 | 0.0273 |
5.9952 | 2.0635 | 102000 | 5.9612 | 0.0262 |
6.0154 | 2.0735 | 104000 | 5.9358 | 0.0274 |
5.9869 | 2.0835 | 106000 | 5.9149 | 0.0273 |
5.9975 | 2.0935 | 108000 | 5.9366 | 0.0269 |
5.9888 | 2.1035 | 110000 | 5.9123 | 0.0267 |
5.9796 | 2.1135 | 112000 | 5.9214 | 0.0271 |
5.9614 | 2.1235 | 114000 | 5.8995 | 0.0269 |
5.9857 | 2.1335 | 116000 | 5.9275 | 0.0264 |
5.9644 | 2.1435 | 118000 | 5.9066 | 0.0264 |
5.9345 | 2.1535 | 120000 | 5.9039 | 0.0267 |
5.9149 | 2.1635 | 122000 | 5.8752 | 0.0276 |
5.9313 | 2.1735 | 124000 | 5.8851 | 0.0263 |
5.9262 | 2.1835 | 126000 | 5.8796 | 0.0262 |
5.919 | 2.1935 | 128000 | 5.8653 | 0.0268 |
5.9173 | 2.2035 | 130000 | 5.8649 | 0.0260 |
5.9142 | 2.2135 | 132000 | 5.8633 | 0.0266 |
5.9055 | 3.0002 | 134000 | 5.8366 | 0.0262 |
5.9042 | 3.0102 | 136000 | 5.8207 | 0.0262 |
5.8497 | 3.0202 | 138000 | 5.8287 | 0.0266 |
5.8606 | 3.0302 | 140000 | 5.7986 | 0.0263 |
5.8573 | 3.0402 | 142000 | 5.7948 | 0.0265 |
5.8488 | 3.0502 | 144000 | 5.7460 | 0.0270 |
5.825 | 3.0602 | 146000 | 5.7428 | 0.0269 |
5.8103 | 3.0702 | 148000 | 5.7150 | 0.0273 |
5.779 | 3.0802 | 150000 | 5.7306 | 0.0273 |
5.7424 | 3.0902 | 152000 | 5.6899 | 0.0271 |
5.736 | 3.1002 | 154000 | 5.6589 | 0.0278 |
5.7026 | 3.1102 | 156000 | 5.6290 | 0.0283 |
5.6729 | 3.1202 | 158000 | 5.6141 | 0.0279 |
5.6802 | 3.1302 | 160000 | 5.5824 | 0.0284 |
5.6288 | 3.1402 | 162000 | 5.5451 | 0.0288 |
5.5959 | 3.1502 | 164000 | 5.5078 | 0.0300 |
5.6142 | 3.1602 | 166000 | 5.4618 | 0.0303 |
5.5646 | 3.1702 | 168000 | 5.4429 | 0.0304 |
5.5365 | 3.1802 | 170000 | 5.3986 | 0.0311 |
5.5131 | 3.1902 | 172000 | 5.3711 | 0.0316 |
5.4601 | 3.2002 | 174000 | 5.3532 | 0.0318 |
5.4596 | 3.2102 | 176000 | 5.3133 | 0.0321 |
5.4167 | 3.2202 | 178000 | 5.2558 | 0.0328 |
5.3816 | 4.0069 | 180000 | 5.2237 | 0.0333 |
5.3758 | 4.0169 | 182000 | 5.1653 | 0.0336 |
5.3366 | 4.0269 | 184000 | 5.1590 | 0.0335 |
5.2862 | 4.0369 | 186000 | 5.1374 | 0.0336 |
5.281 | 4.0469 | 188000 | 5.0837 | 0.0340 |
5.2351 | 4.0569 | 190000 | 5.0673 | 0.0344 |
5.2101 | 4.0669 | 192000 | 5.0382 | 0.0343 |
5.2015 | 4.0769 | 194000 | 4.9934 | 0.0350 |
5.1979 | 4.0869 | 196000 | 4.9951 | 0.0349 |
5.1725 | 4.0969 | 198000 | 4.9806 | 0.0348 |
5.1922 | 4.1069 | 200000 | 4.9746 | 0.0349 |
Framework versions
- Transformers 4.49.0.dev0
- Pytorch 2.5.1+cu121
- Datasets 3.2.0
- Tokenizers 0.21.0
- Downloads last month
- 69
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
Dataset used to train hrezaei/T5LAA2
Evaluation results
- Accuracy on HuggingFaceFW/fineweb sample-10BTself-reported0.035