--- library_name: transformers license: mit base_model: FacebookAI/xlm-roberta-large tags: - generated_from_trainer model-index: - name: xlm-roberta-large-bs-16-lr-0.0001-ep-1-wp-0.1-gacc-8-gnm-1.0-FP16-mx-512-v0.1 results: [] --- # xlm-roberta-large-bs-16-lr-0.0001-ep-1-wp-0.1-gacc-8-gnm-1.0-FP16-mx-512-v0.1 This model is a fine-tuned version of [FacebookAI/xlm-roberta-large](https://huggingface.co/FacebookAI/xlm-roberta-large) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 2.2438 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0001 - train_batch_size: 16 - eval_batch_size: 16 - seed: 42 - gradient_accumulation_steps: 8 - total_train_batch_size: 128 - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: linear - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:----:|:---------------:| | 17.501 | 0.0055 | 50 | 4.8573 | | 16.4238 | 0.0109 | 100 | 4.2333 | | 15.0223 | 0.0164 | 150 | 4.1599 | | 14.6734 | 0.0219 | 200 | 4.0074 | | 14.8891 | 0.0273 | 250 | nan | | 14.0058 | 0.0328 | 300 | 3.5820 | | 13.7471 | 0.0382 | 350 | 3.4834 | | 14.0411 | 0.0437 | 400 | 3.4724 | | 13.7614 | 0.0492 | 450 | 3.4450 | | 13.728 | 0.0546 | 500 | 3.3631 | | 13.6001 | 0.0601 | 550 | 3.3878 | | 12.943 | 0.0656 | 600 | 3.3878 | | 14.0021 | 0.0710 | 650 | 3.1696 | | 13.4041 | 0.0765 | 700 | 3.2144 | | 13.2302 | 0.0819 | 750 | 3.1456 | | 13.3945 | 0.0874 | 800 | 3.1081 | | 13.3763 | 0.0929 | 850 | 3.0475 | | 13.3499 | 0.0983 | 900 | 3.2461 | | 13.5559 | 0.1038 | 950 | 3.0163 | | 13.839 | 0.1093 | 1000 | 3.0701 | | 13.3534 | 0.1147 | 1050 | 2.9885 | | 13.2552 | 0.1202 | 1100 | 3.0023 | | 13.6676 | 0.1256 | 1150 | nan | | 13.1216 | 0.1311 | 1200 | 3.0053 | | 12.6853 | 0.1366 | 1250 | 2.8969 | | 12.9434 | 0.1420 | 1300 | 2.9016 | | 12.2164 | 0.1475 | 1350 | 2.8974 | | 12.825 | 0.1530 | 1400 | 2.9705 | | 12.7314 | 0.1584 | 1450 | 2.8804 | | 12.7405 | 0.1639 | 1500 | 2.8514 | | 12.5693 | 0.1694 | 1550 | 2.8858 | | 12.2698 | 0.1748 | 1600 | 2.8437 | | 12.19 | 0.1803 | 1650 | 2.9199 | | 12.2267 | 0.1857 | 1700 | 2.7915 | | 12.1787 | 0.1912 | 1750 | 2.9066 | | 12.1286 | 0.1967 | 1800 | 2.8383 | | 12.3344 | 0.2021 | 1850 | nan | | 13.0251 | 0.2076 | 1900 | 2.8345 | | 12.4427 | 0.2131 | 1950 | 2.7413 | | 12.6127 | 0.2185 | 2000 | 2.7285 | | 12.6358 | 0.2240 | 2050 | 2.7807 | | 12.2132 | 0.2294 | 2100 | 2.7657 | | 12.5298 | 0.2349 | 2150 | 2.7935 | | 12.156 | 0.2404 | 2200 | 2.6942 | | 12.2265 | 0.2458 | 2250 | 2.7374 | | 12.0772 | 0.2513 | 2300 | 2.6400 | | 11.7906 | 0.2568 | 2350 | 2.6862 | | 11.5912 | 0.2622 | 2400 | 2.6664 | | 12.242 | 0.2677 | 2450 | 2.7530 | | 11.3089 | 0.2731 | 2500 | 2.7606 | | 11.2301 | 0.2786 | 2550 | 2.6787 | | 11.9706 | 0.2841 | 2600 | 2.7440 | | 11.5268 | 0.2895 | 2650 | 2.6760 | | 11.8031 | 0.2950 | 2700 | 2.6846 | | 11.6836 | 0.3005 | 2750 | nan | | 11.4748 | 0.3059 | 2800 | 2.6796 | | 11.9102 | 0.3114 | 2850 | 2.7101 | | 11.4223 | 0.3169 | 2900 | 2.7066 | | 12.0939 | 0.3223 | 2950 | 2.5908 | | 11.5229 | 0.3278 | 3000 | nan | | 10.8909 | 0.3332 | 3050 | 2.5104 | | 11.2679 | 0.3387 | 3100 | 2.6391 | | 11.6102 | 0.3442 | 3150 | 2.6375 | | 11.1783 | 0.3496 | 3200 | 2.5392 | | 11.5862 | 0.3551 | 3250 | 2.6254 | | 11.0802 | 0.3606 | 3300 | 2.4951 | | 11.2194 | 0.3660 | 3350 | 2.5535 | | 10.8891 | 0.3715 | 3400 | 2.4888 | | 11.1372 | 0.3769 | 3450 | 2.6514 | | 11.1702 | 0.3824 | 3500 | nan | | 11.1283 | 0.3879 | 3550 | 2.4935 | | 11.858 | 0.3933 | 3600 | 2.6377 | | 10.6952 | 0.3988 | 3650 | 2.5486 | | 11.1094 | 0.4043 | 3700 | 2.5827 | | 10.5929 | 0.4097 | 3750 | 2.5155 | | 10.9796 | 0.4152 | 3800 | 2.6333 | | 11.4408 | 0.4207 | 3850 | 2.4885 | | 11.3756 | 0.4261 | 3900 | 2.6248 | | 10.6489 | 0.4316 | 3950 | 2.5080 | | 11.2278 | 0.4370 | 4000 | 2.6829 | | 10.9081 | 0.4425 | 4050 | nan | | 10.3177 | 0.4480 | 4100 | 2.5467 | | 11.1393 | 0.4534 | 4150 | 2.4981 | | 11.109 | 0.4589 | 4200 | 2.5696 | | 10.5874 | 0.4644 | 4250 | 2.5346 | | 10.2922 | 0.4698 | 4300 | 2.5247 | | 11.1379 | 0.4753 | 4350 | 2.5050 | | 10.9258 | 0.4807 | 4400 | 2.4393 | | 10.7622 | 0.4862 | 4450 | 2.5386 | | 10.5537 | 0.4917 | 4500 | 2.4742 | | 10.6157 | 0.4971 | 4550 | 2.5183 | | 10.5721 | 0.5026 | 4600 | 2.4624 | | 10.448 | 0.5081 | 4650 | nan | | 10.9621 | 0.5135 | 4700 | 2.4363 | | 10.5947 | 0.5190 | 4750 | 2.4489 | | 10.4982 | 0.5244 | 4800 | nan | | 10.241 | 0.5299 | 4850 | 2.4834 | | 10.8498 | 0.5354 | 4900 | nan | | 10.291 | 0.5408 | 4950 | 2.4880 | | 10.032 | 0.5463 | 5000 | 2.4780 | | 10.6992 | 0.5518 | 5050 | 2.4536 | | 10.3189 | 0.5572 | 5100 | 2.5406 | | 10.36 | 0.5627 | 5150 | 2.5421 | | 10.1413 | 0.5682 | 5200 | 2.5299 | | 10.4146 | 0.5736 | 5250 | 2.4525 | | 10.0561 | 0.5791 | 5300 | 2.5126 | | 10.3447 | 0.5845 | 5350 | 2.4347 | | 10.2634 | 0.5900 | 5400 | 2.3891 | | 10.067 | 0.5955 | 5450 | 2.4418 | | 10.479 | 0.6009 | 5500 | 2.4801 | | 9.8486 | 0.6064 | 5550 | 2.4651 | | 10.2608 | 0.6119 | 5600 | 2.3497 | | 10.0271 | 0.6173 | 5650 | 2.5478 | | 9.8674 | 0.6228 | 5700 | 2.3528 | | 10.1599 | 0.6282 | 5750 | 2.4087 | | 9.9866 | 0.6337 | 5800 | 2.3972 | | 10.5326 | 0.6392 | 5850 | 2.4910 | | 10.2033 | 0.6446 | 5900 | 2.3823 | | 9.8695 | 0.6501 | 5950 | 2.3799 | | 10.0466 | 0.6556 | 6000 | 2.4245 | | 9.5177 | 0.6610 | 6050 | 2.4596 | | 10.4291 | 0.6665 | 6100 | 2.4178 | | 10.0009 | 0.6719 | 6150 | 2.3328 | | 10.0692 | 0.6774 | 6200 | 2.3533 | | 9.6967 | 0.6829 | 6250 | 2.4248 | | 9.9892 | 0.6883 | 6300 | 2.3493 | | 10.1783 | 0.6938 | 6350 | 2.3389 | | 10.019 | 0.6993 | 6400 | 2.4507 | | 9.8618 | 0.7047 | 6450 | 2.2831 | | 10.3984 | 0.7102 | 6500 | 2.3761 | | 9.919 | 0.7157 | 6550 | 2.5036 | | 9.2917 | 0.7211 | 6600 | 2.3926 | | 9.6774 | 0.7266 | 6650 | 2.3494 | | 10.0028 | 0.7320 | 6700 | 2.3653 | | 9.6192 | 0.7375 | 6750 | 2.3574 | | 9.9689 | 0.7430 | 6800 | 2.4544 | | 10.0934 | 0.7484 | 6850 | 2.4070 | | 10.0145 | 0.7539 | 6900 | 2.3699 | | 9.559 | 0.7594 | 6950 | nan | | 10.5713 | 0.7648 | 7000 | 2.3410 | | 9.7507 | 0.7703 | 7050 | nan | | 9.9102 | 0.7757 | 7100 | 2.4138 | | 9.4241 | 0.7812 | 7150 | 2.2941 | | 9.6202 | 0.7867 | 7200 | 2.3024 | | 9.5112 | 0.7921 | 7250 | 2.3756 | | 9.4726 | 0.7976 | 7300 | 2.3240 | | 9.5841 | 0.8031 | 7350 | 2.4397 | | 9.1056 | 0.8085 | 7400 | nan | | 9.0733 | 0.8140 | 7450 | 2.3982 | | 9.9461 | 0.8194 | 7500 | 2.3694 | | 9.1871 | 0.8249 | 7550 | 2.3681 | | 9.723 | 0.8304 | 7600 | 2.3977 | | 9.7697 | 0.8358 | 7650 | 2.4167 | | 9.2425 | 0.8413 | 7700 | 2.2994 | | 9.5511 | 0.8468 | 7750 | 2.3465 | | 9.8158 | 0.8522 | 7800 | 2.3081 | | 9.4219 | 0.8577 | 7850 | 2.2640 | | 9.4233 | 0.8632 | 7900 | 2.3290 | | 9.3864 | 0.8686 | 7950 | 2.2964 | | 9.4981 | 0.8741 | 8000 | 2.2984 | | 9.1101 | 0.8795 | 8050 | 2.3284 | | 9.1299 | 0.8850 | 8100 | 2.3426 | | 8.9554 | 0.8905 | 8150 | 2.3206 | | 9.5779 | 0.8959 | 8200 | 2.2987 | | 9.1416 | 0.9014 | 8250 | 2.3276 | | 9.4434 | 0.9069 | 8300 | 2.2201 | | 9.1004 | 0.9123 | 8350 | 2.2855 | | 9.3678 | 0.9178 | 8400 | 2.3188 | | 9.2545 | 0.9232 | 8450 | 2.3988 | | 9.3835 | 0.9287 | 8500 | 2.2233 | | 9.7359 | 0.9342 | 8550 | 2.2780 | | 9.2803 | 0.9396 | 8600 | 2.3142 | | 8.9966 | 0.9451 | 8650 | 2.2083 | | 9.2548 | 0.9506 | 8700 | 2.4125 | | 10.0036 | 0.9560 | 8750 | 2.1931 | | 9.4264 | 0.9615 | 8800 | 2.1629 | | 9.102 | 0.9669 | 8850 | 2.3306 | | 9.3087 | 0.9724 | 8900 | 2.2894 | | 8.9155 | 0.9779 | 8950 | 2.2347 | | 9.1586 | 0.9833 | 9000 | 2.3156 | | 9.2523 | 0.9888 | 9050 | nan | | 9.541 | 0.9943 | 9100 | 2.2957 | | 9.4701 | 0.9997 | 9150 | 2.2438 | ### Framework versions - Transformers 4.47.1 - Pytorch 2.5.1+cu124 - Datasets 3.1.0 - Tokenizers 0.21.0