wikipedia_30

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss
No log	2.1471	2000	7.1380
7.213	4.2941	4000	5.8542
7.213	6.4412	6000	5.4036
5.4304	8.5883	8000	5.0499
5.4304	10.7354	10000	4.7606
4.771	12.8824	12000	4.5172
4.771	15.0295	14000	4.3206
4.2888	17.1766	16000	4.1530
4.2888	19.3237	18000	4.0155
3.9141	21.4707	20000	3.8966
3.9141	23.6178	22000	3.8047
3.6154	25.7649	24000	3.7359
3.6154	27.9120	26000	3.6784
3.3661	30.0590	28000	3.6360
3.3661	32.2061	30000	3.6019
3.1473	34.3532	32000	3.5816
3.1473	36.5003	34000	3.5699
2.9533	38.6473	36000	3.5650
2.9533	40.7944	38000	3.5667
2.777	42.9415	40000	3.5747
2.777	45.0886	42000	3.5878
2.6015	47.2356	44000	3.6107
2.6015	49.3827	46000	3.6261
2.4429	51.5298	48000	3.6414
2.4429	53.6769	50000	3.6637
2.3125	55.8239	52000	3.6778
2.3125	57.9710	54000	3.7033
2.1989	60.1181	56000	3.7410
2.1989	62.2652	58000	3.7755
2.1044	64.4122	60000	3.7876
2.1044	66.5593	62000	3.8081
2.0257	68.7064	64000	3.8222
2.0257	70.8535	66000	3.8411
1.9563	73.0005	68000	3.8488
1.9563	75.1476	70000	3.8915
1.8905	77.2947	72000	3.9079
1.8905	79.4418	74000	3.9169
1.836	81.5888	76000	3.9382
1.836	83.7359	78000	3.9430
1.7885	85.8830	80000	3.9471
1.7885	88.0301	82000	3.9668
1.7431	90.1771	84000	3.9860
1.7431	92.3242	86000	4.0088
1.7024	94.4713	88000	4.0132
1.7024	96.6184	90000	4.0260
1.6687	98.7654	92000	4.0358
1.6687	100.9125	94000	4.0290
1.6369	103.0596	96000	4.0422
1.6369	105.2067	98000	4.0445
1.6129	107.3537	100000	4.0459