ads_o_fr_13
This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 3.9097
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 256
- eval_batch_size: 256
- seed: 13
- optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 20
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 7.4816 | 1.0 | 109 | 6.1251 |
| 5.2479 | 2.0 | 218 | 4.8178 |
| 4.5633 | 3.0 | 327 | 4.4960 |
| 4.3166 | 4.0 | 436 | 4.3039 |
| 4.1561 | 5.0 | 545 | 4.1806 |
| 4.0364 | 6.0 | 654 | 4.0962 |
| 3.9474 | 7.0 | 763 | 4.0343 |
| 3.8753 | 8.0 | 872 | 3.9900 |
| 3.8118 | 9.0 | 981 | 3.9578 |
| 3.7548 | 10.0 | 1090 | 3.9341 |
| 3.7009 | 11.0 | 1199 | 3.9164 |
| 3.6487 | 12.0 | 1308 | 3.9004 |
| 3.6004 | 13.0 | 1417 | 3.8925 |
| 3.5523 | 14.0 | 1526 | 3.8943 |
| 3.5049 | 15.0 | 1635 | 3.8889 |
| 3.4614 | 16.0 | 1744 | 3.8933 |
| 3.4222 | 17.0 | 1853 | 3.8971 |
| 3.3888 | 18.0 | 1962 | 3.9032 |
| 3.3613 | 19.0 | 2071 | 3.9074 |
| 3.3411 | 20.0 | 2180 | 3.9097 |
Framework versions
- Transformers 4.56.1
- Pytorch 2.8.0+cu128
- Datasets 4.0.0
- Tokenizers 0.22.0
- Downloads last month
- 372