T5LAE

This model is a fine-tuned version of on the HuggingFaceFW/fineweb sample-10BT dataset. It achieves the following results on the evaluation set:

Loss: 6.3530
Accuracy: 0.0323

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 2
total_train_batch_size: 16
total_eval_batch_size: 16
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
training_steps: 200000
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
7.8503	0.01	2000	7.6894	0.0317
7.3885	0.02	4000	7.3045	0.0291
7.2248	0.03	6000	7.1483	0.0295
7.146	0.04	8000	7.0598	0.0298
7.098	0.05	10000	7.0069	0.0293
7.059	0.06	12000	6.9745	0.0304
7.036	0.07	14000	6.9492	0.0294
7.0083	0.08	16000	6.9298	0.0290
6.9703	0.09	18000	6.9145	0.0294
6.961	0.1	20000	6.9006	0.0303
6.9502	0.11	22000	6.8869	0.0302
6.9297	0.12	24000	6.8809	0.0282
6.9577	0.13	26000	6.8740	0.0288
6.9097	0.14	28000	6.8537	0.0290
6.9034	0.15	30000	6.8485	0.0293
6.9243	0.16	32000	6.8369	0.0292
6.8998	0.17	34000	6.8280	0.0297
6.8914	0.18	36000	6.8237	0.0289
6.8788	0.19	38000	6.8096	0.0306
6.8585	0.2	40000	6.8057	0.0295
6.8719	0.21	42000	6.7966	0.0313
6.8534	0.22	44000	6.7896	0.0297
6.8463	1.0067	46000	6.7795	0.0312
6.8588	1.0167	48000	6.7659	0.0304
6.8477	1.0267	50000	6.7667	0.0293
6.8268	1.0367	52000	6.7545	0.0301
6.8205	1.0467	54000	6.7439	0.0308
6.8035	1.0567	56000	6.7329	0.0297
6.7904	1.0667	58000	6.7233	0.0314
6.781	1.0767	60000	6.7235	0.0290
6.7722	1.0867	62000	6.7047	0.0311
6.7618	1.0967	64000	6.6947	0.0315
6.7821	1.1067	66000	6.6881	0.0309
6.7478	1.1167	68000	6.6781	0.0313
6.7544	1.1267	70000	6.6677	0.0292
6.7451	1.1367	72000	6.6529	0.0314
6.738	1.1467	74000	6.6436	0.0316
6.7223	1.1567	76000	6.6381	0.0312
6.7099	1.1667	78000	6.6245	0.0321
6.6851	1.1767	80000	6.6122	0.0311
6.6702	1.1867	82000	6.5993	0.0314
6.6761	1.1967	84000	6.5896	0.0317
6.6701	1.2067	86000	6.5855	0.0302
6.6696	1.2167	88000	6.5767	0.0313
6.6283	2.0035	90000	6.5673	0.0312
6.6662	2.0135	92000	6.5728	0.0307
6.6544	2.0235	94000	6.5492	0.0310
6.634	2.0335	96000	6.5433	0.0319
6.63	2.0435	98000	6.5395	0.0318
6.6022	2.0535	100000	6.5284	0.0318
6.5875	2.0635	102000	6.5209	0.0316
6.6115	2.0735	104000	6.5107	0.0320
6.5769	2.0835	106000	6.5118	0.0318
6.5941	2.0935	108000	6.4977	0.0312
6.5838	2.1035	110000	6.4884	0.0326
6.579	2.1135	112000	6.4919	0.0316
6.5642	2.1235	114000	6.4880	0.0318
6.5825	2.1335	116000	6.4747	0.0325
6.5625	2.1435	118000	6.4699	0.0310
6.5352	2.1535	120000	6.4664	0.0323
6.5174	2.1635	122000	6.4611	0.0320
6.5338	2.1735	124000	6.4618	0.0323
6.5264	2.1835	126000	6.4524	0.0320
6.533	2.1935	128000	6.4500	0.0315
6.5256	2.2035	130000	6.4433	0.0314
6.5293	2.2135	132000	6.4379	0.0316
6.5199	3.0002	134000	6.4395	0.0324
6.5356	3.0102	136000	6.4327	0.0321
6.4831	3.0202	138000	6.4207	0.0322
6.5051	3.0302	140000	6.4205	0.0311
6.5076	3.0402	142000	6.4148	0.0326
6.5085	3.0502	144000	6.4078	0.0323
6.5023	3.0602	146000	6.4070	0.0325
6.5019	3.0702	148000	6.4053	0.0331
6.4881	3.0802	150000	6.4011	0.0323
6.4642	3.0902	152000	6.4023	0.0316
6.4711	3.1002	154000	6.3948	0.0320
6.4713	3.1102	156000	6.3942	0.0323
6.461	3.1202	158000	6.3899	0.0319
6.4891	3.1302	160000	6.3877	0.0319
6.454	3.1402	162000	6.3834	0.0318
6.4456	3.1502	164000	6.3858	0.0319
6.4825	3.1602	166000	6.3827	0.0325
6.4563	3.1702	168000	6.3758	0.0321
6.4595	3.1802	170000	6.3755	0.0320
6.4525	3.1902	172000	6.3731	0.0319
6.4332	3.2002	174000	6.3691	0.0320
6.4656	3.2102	176000	6.3682	0.0318
6.4312	3.2202	178000	6.3672	0.0323
6.4439	4.0069	180000	6.3707	0.0317
6.4629	4.0169	182000	6.3619	0.0323
6.4505	4.0269	184000	6.3633	0.0324
6.4294	4.0369	186000	6.3594	0.0324
6.4427	4.0469	188000	6.3580	0.0319
6.4237	4.0569	190000	6.3600	0.0321
6.4201	4.0669	192000	6.3591	0.0322
6.4308	4.0769	194000	6.3554	0.0322
6.4349	4.0869	196000	6.3535	0.0323
6.4181	4.0969	198000	6.3542	0.0322
6.4385	4.1069	200000	6.3530	0.0323

Framework versions

Transformers 4.49.0.dev0
Pytorch 2.5.1+cu121
Datasets 3.2.0
Tokenizers 0.21.0

hrezaei
/

T5LAE

T5LAE

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train hrezaei/T5LAE

Evaluation results