ditransitives_removed_seed-42_1e-3

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 32
eval_batch_size: 64
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 256
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 32000
num_epochs: 20.0
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss	Accuracy
6.1054	0.9998	1507	4.5218	0.2801
4.1313	1.9997	3014	4.0830	0.3122
3.9303	2.9995	4521	3.8739	0.3273
3.6971	4.0	6029	3.7495	0.3384
3.6089	4.9998	7536	3.6714	0.3456
3.5019	5.9997	9043	3.6255	0.3494
3.44	6.9995	10550	3.5873	0.3534
3.3899	8.0	12058	3.5776	0.3544
3.3437	8.9998	13565	3.5597	0.3571
3.3226	9.9997	15072	3.5447	0.3587
3.2818	10.9995	16579	3.5331	0.3599
3.2776	12.0	18087	3.5290	0.3604
3.2409	12.9998	19594	3.5242	0.3616
3.2466	13.9997	21101	3.5150	0.3623
3.2122	14.9995	22608	3.5189	0.3618
3.2248	16.0	24116	3.5207	0.3620
3.1929	16.9998	25623	3.5077	0.3634
3.2095	17.9997	27130	3.5113	0.3637
3.1794	18.9995	28637	3.5053	0.3634
3.1995	19.9967	30140	3.5169	0.3627