flan-t5-small-squad-qag

This model is a fine-tuned version of google/flan-t5-small on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 6.1573

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 3e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 100

Training results

Training Loss	Epoch	Step	Validation Loss
40.773	0.5714	1	41.7049
58.3411	1.5714	2	39.3183
54.8652	2.5714	3	37.3843
53.7579	3.5714	4	35.8088
52.5214	4.5714	5	34.5335
50.0236	5.5714	6	33.5388
49.5252	6.5714	7	32.7734
48.018	7.5714	8	32.1632
46.7346	8.5714	9	31.6080
45.4348	9.5714	10	31.0589
44.8246	10.5714	11	30.5032
44.1633	11.5714	12	29.9093
42.8213	12.5714	13	29.2965
43.2365	13.5714	14	28.6880
41.5266	14.5714	15	28.0847
40.6435	15.5714	16	27.4881
40.1899	16.5714	17	26.9148
39.3795	17.5714	18	26.3482
38.4061	18.5714	19	25.8042
38.4415	19.5714	20	25.2741
36.9642	20.5714	21	24.7624
36.3868	21.5714	22	24.2690
36.2422	22.5714	23	23.7877
35.3793	23.5714	24	23.3194
34.9853	24.5714	25	22.8591
34.0927	25.5714	26	22.4058
33.2451	26.5714	27	21.9624
32.8551	27.5714	28	21.5381
32.1326	28.5714	29	21.1176
31.84	29.5714	30	20.6980
31.2982	30.5714	31	20.2775
30.8415	31.5714	32	19.8578
30.073	32.5714	33	19.4395
29.8896	33.5714	34	19.0213
29.2583	34.5714	35	18.6041
28.5195	35.5714	36	18.1902
27.7352	36.5714	37	17.7715
28.0043	37.5714	38	17.3529
26.7202	38.5714	39	16.9311
26.8391	39.5714	40	16.5091
26.0355	40.5714	41	16.0881
25.5678	41.5714	42	15.6670
25.281	42.5714	43	15.2460
24.9389	43.5714	44	14.8265
24.2087	44.5714	45	14.4072
24.0442	45.5714	46	13.9871
23.5964	46.5714	47	13.5686
22.5465	47.5714	48	13.1483
22.0742	48.5714	49	12.7263
21.9666	49.5714	50	12.3055
21.1685	50.5714	51	11.8917
21.1257	51.5714	52	11.4814
20.2889	52.5714	53	11.0750
20.3047	53.5714	54	10.6724
19.8761	54.5714	55	10.2840
19.0577	55.5714	56	9.9060
18.6548	56.5714	57	9.5428
18.7313	57.5714	58	9.2004
18.247	58.5714	59	8.8795
17.7508	59.5714	60	8.5831
17.1485	60.5714	61	8.3108
16.8734	61.5714	62	8.0638
16.7851	62.5714	63	7.8416
16.2609	63.5714	64	7.6450
16.1574	64.5714	65	7.4740
15.8518	65.5714	66	7.3281
15.8425	66.5714	67	7.2009
15.3619	67.5714	68	7.0914
15.5268	68.5714	69	6.9991
15.3891	69.5714	70	6.9188
14.7154	70.5714	71	6.8483
14.5997	71.5714	72	6.7852
14.6067	72.5714	73	6.7290
14.4925	73.5714	74	6.6800
14.326	74.5714	75	6.6356
14.0346	75.5714	76	6.5929
13.9427	76.5714	77	6.5531
13.8931	77.5714	78	6.5155
13.6341	78.5714	79	6.4793
13.7549	79.5714	80	6.4462
13.4067	80.5714	81	6.4152
13.4218	81.5714	82	6.3872
13.1982	82.5714	83	6.3615
13.0855	83.5714	84	6.3381
12.9228	84.5714	85	6.3163
12.8098	85.5714	86	6.2966
12.9304	86.5714	87	6.2780
13.0	87.5714	88	6.2604
12.6473	88.5714	89	6.2440
12.4884	89.5714	90	6.2286
12.8845	90.5714	91	6.2152
12.3722	91.5714	92	6.2033
12.5444	92.5714	93	6.1931
12.3583	93.5714	94	6.1844
12.3182	94.5714	95	6.1766
12.345	95.5714	96	6.1702
12.3766	96.5714	97	6.1649
12.7799	97.5714	98	6.1610
12.505	98.5714	99	6.1586
12.2264	99.5714	100	6.1573

Framework versions

Transformers 4.48.3
Pytorch 2.5.1+cu124
Datasets 3.3.0
Tokenizers 0.21.0

devagonal
/

flan-t5-small-squad-qag

flan-t5-small-squad-qag

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for devagonal/flan-t5-small-squad-qag

Evaluation results