ditransitives_removed_seed-42_1e-3
This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 3.5169
- Accuracy: 0.3627
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 32
- eval_batch_size: 64
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 256
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 32000
- num_epochs: 20.0
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
6.1054 | 0.9998 | 1507 | 4.5218 | 0.2801 |
4.1313 | 1.9997 | 3014 | 4.0830 | 0.3122 |
3.9303 | 2.9995 | 4521 | 3.8739 | 0.3273 |
3.6971 | 4.0 | 6029 | 3.7495 | 0.3384 |
3.6089 | 4.9998 | 7536 | 3.6714 | 0.3456 |
3.5019 | 5.9997 | 9043 | 3.6255 | 0.3494 |
3.44 | 6.9995 | 10550 | 3.5873 | 0.3534 |
3.3899 | 8.0 | 12058 | 3.5776 | 0.3544 |
3.3437 | 8.9998 | 13565 | 3.5597 | 0.3571 |
3.3226 | 9.9997 | 15072 | 3.5447 | 0.3587 |
3.2818 | 10.9995 | 16579 | 3.5331 | 0.3599 |
3.2776 | 12.0 | 18087 | 3.5290 | 0.3604 |
3.2409 | 12.9998 | 19594 | 3.5242 | 0.3616 |
3.2466 | 13.9997 | 21101 | 3.5150 | 0.3623 |
3.2122 | 14.9995 | 22608 | 3.5189 | 0.3618 |
3.2248 | 16.0 | 24116 | 3.5207 | 0.3620 |
3.1929 | 16.9998 | 25623 | 3.5077 | 0.3634 |
3.2095 | 17.9997 | 27130 | 3.5113 | 0.3637 |
3.1794 | 18.9995 | 28637 | 3.5053 | 0.3634 |
3.1995 | 19.9967 | 30140 | 3.5169 | 0.3627 |
Framework versions
- Transformers 4.46.2
- Pytorch 2.5.1+cu124
- Datasets 3.2.0
- Tokenizers 0.20.0
- Downloads last month
- 8
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.