qwen2.5-0.5b-sft-25
This model is a fine-tuned version of Qwen/Qwen2.5-0.5B on the hZzy/SFT_new dataset. It achieves the following results on the evaluation set:
- Loss: 4.8045
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 3
- gradient_accumulation_steps: 8
- total_train_batch_size: 96
- total_eval_batch_size: 12
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 10
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
3.4028 | 0.0365 | 5 | 3.3513 |
3.3695 | 0.0729 | 10 | 3.3424 |
3.35 | 0.1094 | 15 | 3.3145 |
3.3055 | 0.1459 | 20 | 3.2534 |
3.2405 | 0.1823 | 25 | 3.1755 |
3.1534 | 0.2188 | 30 | 3.1125 |
3.1062 | 0.2552 | 35 | 3.0519 |
3.0267 | 0.2917 | 40 | 2.9690 |
2.9463 | 0.3282 | 45 | 2.9127 |
2.8754 | 0.3646 | 50 | 2.8559 |
2.8207 | 0.4011 | 55 | 2.7941 |
2.7374 | 0.4376 | 60 | 2.7358 |
2.6771 | 0.4740 | 65 | 2.6877 |
2.636 | 0.5105 | 70 | 2.6438 |
2.5848 | 0.5469 | 75 | 2.6022 |
2.532 | 0.5834 | 80 | 2.5622 |
2.4987 | 0.6199 | 85 | 2.5223 |
2.4358 | 0.6563 | 90 | 2.4851 |
2.4119 | 0.6928 | 95 | 2.4507 |
2.3687 | 0.7293 | 100 | 2.4186 |
2.3183 | 0.7657 | 105 | 2.3887 |
2.2827 | 0.8022 | 110 | 2.3616 |
2.2406 | 0.8387 | 115 | 2.3393 |
2.2093 | 0.8751 | 120 | 2.3150 |
2.1705 | 0.9116 | 125 | 2.2945 |
2.1322 | 0.9480 | 130 | 2.2731 |
2.1359 | 0.9845 | 135 | 2.2563 |
2.0614 | 1.0210 | 140 | 2.2424 |
2.0018 | 1.0574 | 145 | 2.2277 |
1.9863 | 1.0939 | 150 | 2.2170 |
1.9724 | 1.1304 | 155 | 2.2054 |
1.9252 | 1.1668 | 160 | 2.1963 |
1.8979 | 1.2033 | 165 | 2.1882 |
1.8958 | 1.2397 | 170 | 2.1821 |
1.8375 | 1.2762 | 175 | 2.1831 |
1.818 | 1.3127 | 180 | 2.1760 |
1.7963 | 1.3491 | 185 | 2.1743 |
1.8217 | 1.3856 | 190 | 2.1734 |
1.7672 | 1.4221 | 195 | 2.1786 |
1.7074 | 1.4585 | 200 | 2.1854 |
1.7135 | 1.4950 | 205 | 2.1875 |
1.6742 | 1.5314 | 210 | 2.1954 |
1.6577 | 1.5679 | 215 | 2.1947 |
1.6024 | 1.6044 | 220 | 2.2123 |
1.6272 | 1.6408 | 225 | 2.2123 |
1.566 | 1.6773 | 230 | 2.2140 |
1.5337 | 1.7138 | 235 | 2.2185 |
1.5457 | 1.7502 | 240 | 2.2344 |
1.5015 | 1.7867 | 245 | 2.2457 |
1.4954 | 1.8232 | 250 | 2.2656 |
1.476 | 1.8596 | 255 | 2.2690 |
1.4716 | 1.8961 | 260 | 2.2816 |
1.4273 | 1.9325 | 265 | 2.2847 |
1.3919 | 1.9690 | 270 | 2.2793 |
1.3851 | 2.0055 | 275 | 2.2966 |
1.2981 | 2.0419 | 280 | 2.3274 |
1.2814 | 2.0784 | 285 | 2.3330 |
1.2736 | 2.1149 | 290 | 2.3312 |
1.2007 | 2.1513 | 295 | 2.3435 |
1.2254 | 2.1878 | 300 | 2.3802 |
1.176 | 2.2242 | 305 | 2.3973 |
1.2075 | 2.2607 | 310 | 2.3862 |
1.1551 | 2.2972 | 315 | 2.3936 |
1.198 | 2.3336 | 320 | 2.4092 |
1.1336 | 2.3701 | 325 | 2.4326 |
1.1318 | 2.4066 | 330 | 2.4501 |
1.103 | 2.4430 | 335 | 2.4602 |
1.118 | 2.4795 | 340 | 2.4965 |
1.09 | 2.5160 | 345 | 2.5208 |
1.0911 | 2.5524 | 350 | 2.5272 |
1.0578 | 2.5889 | 355 | 2.5331 |
1.0403 | 2.6253 | 360 | 2.5241 |
1.0326 | 2.6618 | 365 | 2.5398 |
1.0187 | 2.6983 | 370 | 2.5680 |
0.9639 | 2.7347 | 375 | 2.5864 |
0.9774 | 2.7712 | 380 | 2.6144 |
0.9968 | 2.8077 | 385 | 2.6572 |
0.9566 | 2.8441 | 390 | 2.6481 |
0.9272 | 2.8806 | 395 | 2.6594 |
0.9221 | 2.9170 | 400 | 2.6491 |
0.9206 | 2.9535 | 405 | 2.6680 |
0.8799 | 2.9900 | 410 | 2.6838 |
0.8552 | 3.0264 | 415 | 2.7329 |
0.8216 | 3.0629 | 420 | 2.7356 |
0.84 | 3.0994 | 425 | 2.7741 |
0.7934 | 3.1358 | 430 | 2.8322 |
0.812 | 3.1723 | 435 | 2.8657 |
0.8109 | 3.2088 | 440 | 2.8937 |
0.7825 | 3.2452 | 445 | 2.9137 |
0.7763 | 3.2817 | 450 | 2.9293 |
0.777 | 3.3181 | 455 | 2.9468 |
0.7487 | 3.3546 | 460 | 2.9743 |
0.7677 | 3.3911 | 465 | 2.9984 |
0.7607 | 3.4275 | 470 | 2.9930 |
0.7453 | 3.4640 | 475 | 3.0155 |
0.7244 | 3.5005 | 480 | 3.0068 |
0.7108 | 3.5369 | 485 | 3.0263 |
0.7101 | 3.5734 | 490 | 2.9943 |
0.6938 | 3.6098 | 495 | 3.0026 |
0.6798 | 3.6463 | 500 | 3.0478 |
0.6957 | 3.6828 | 505 | 3.0962 |
0.6599 | 3.7192 | 510 | 3.1179 |
0.6669 | 3.7557 | 515 | 3.1637 |
0.6369 | 3.7922 | 520 | 3.1779 |
0.6418 | 3.8286 | 525 | 3.1732 |
0.6771 | 3.8651 | 530 | 3.1779 |
0.6434 | 3.9015 | 535 | 3.2210 |
0.6237 | 3.9380 | 540 | 3.1923 |
0.6335 | 3.9745 | 545 | 3.1995 |
0.6027 | 4.0109 | 550 | 3.2534 |
0.5673 | 4.0474 | 555 | 3.2862 |
0.5941 | 4.0839 | 560 | 3.2781 |
0.572 | 4.1203 | 565 | 3.3229 |
0.5512 | 4.1568 | 570 | 3.3519 |
0.5644 | 4.1933 | 575 | 3.3804 |
0.5652 | 4.2297 | 580 | 3.4017 |
0.5499 | 4.2662 | 585 | 3.4768 |
0.5406 | 4.3026 | 590 | 3.4529 |
0.5478 | 4.3391 | 595 | 3.4290 |
0.5334 | 4.3756 | 600 | 3.4491 |
0.5472 | 4.4120 | 605 | 3.4766 |
0.5292 | 4.4485 | 610 | 3.5048 |
0.5216 | 4.4850 | 615 | 3.6019 |
0.5054 | 4.5214 | 620 | 3.5419 |
0.5239 | 4.5579 | 625 | 3.5749 |
0.4993 | 4.5943 | 630 | 3.5809 |
0.5045 | 4.6308 | 635 | 3.5857 |
0.5122 | 4.6673 | 640 | 3.5984 |
0.4905 | 4.7037 | 645 | 3.6182 |
0.5005 | 4.7402 | 650 | 3.6510 |
0.4692 | 4.7767 | 655 | 3.6446 |
0.4996 | 4.8131 | 660 | 3.6743 |
0.4792 | 4.8496 | 665 | 3.6728 |
0.481 | 4.8861 | 670 | 3.6977 |
0.4993 | 4.9225 | 675 | 3.7459 |
0.471 | 4.9590 | 680 | 3.7445 |
0.4979 | 4.9954 | 685 | 3.7539 |
0.4427 | 5.0319 | 690 | 3.7856 |
0.4391 | 5.0684 | 695 | 3.8017 |
0.4396 | 5.1048 | 700 | 3.8481 |
0.4412 | 5.1413 | 705 | 3.8722 |
0.4232 | 5.1778 | 710 | 3.8464 |
0.4319 | 5.2142 | 715 | 3.8549 |
0.4249 | 5.2507 | 720 | 3.9393 |
0.4187 | 5.2871 | 725 | 3.9709 |
0.423 | 5.3236 | 730 | 3.9221 |
0.4357 | 5.3601 | 735 | 3.9165 |
0.4057 | 5.3965 | 740 | 3.9499 |
0.4114 | 5.4330 | 745 | 3.9972 |
0.4203 | 5.4695 | 750 | 3.9683 |
0.4067 | 5.5059 | 755 | 3.9992 |
0.412 | 5.5424 | 760 | 4.0263 |
0.4097 | 5.5789 | 765 | 4.0477 |
0.4031 | 5.6153 | 770 | 4.0130 |
0.4064 | 5.6518 | 775 | 4.0363 |
0.3878 | 5.6882 | 780 | 4.0770 |
0.4017 | 5.7247 | 785 | 4.0458 |
0.4067 | 5.7612 | 790 | 4.0821 |
0.403 | 5.7976 | 795 | 4.0853 |
0.4037 | 5.8341 | 800 | 4.1023 |
0.3856 | 5.8706 | 805 | 4.1016 |
0.3944 | 5.9070 | 810 | 4.1343 |
0.386 | 5.9435 | 815 | 4.0983 |
0.3953 | 5.9799 | 820 | 4.1593 |
0.3779 | 6.0164 | 825 | 4.2123 |
0.3552 | 6.0529 | 830 | 4.2179 |
0.3636 | 6.0893 | 835 | 4.2617 |
0.3689 | 6.1258 | 840 | 4.2406 |
0.3719 | 6.1623 | 845 | 4.2694 |
0.3655 | 6.1987 | 850 | 4.2655 |
0.3526 | 6.2352 | 855 | 4.2406 |
0.3651 | 6.2716 | 860 | 4.3019 |
0.3687 | 6.3081 | 865 | 4.2735 |
0.3539 | 6.3446 | 870 | 4.2967 |
0.3548 | 6.3810 | 875 | 4.3397 |
0.3514 | 6.4175 | 880 | 4.3039 |
0.354 | 6.4540 | 885 | 4.3482 |
0.3537 | 6.4904 | 890 | 4.3211 |
0.3496 | 6.5269 | 895 | 4.3648 |
0.348 | 6.5634 | 900 | 4.3463 |
0.3415 | 6.5998 | 905 | 4.3704 |
0.3542 | 6.6363 | 910 | 4.3777 |
0.3345 | 6.6727 | 915 | 4.3697 |
0.352 | 6.7092 | 920 | 4.4153 |
0.343 | 6.7457 | 925 | 4.3800 |
0.3445 | 6.7821 | 930 | 4.4223 |
0.3495 | 6.8186 | 935 | 4.4179 |
0.3387 | 6.8551 | 940 | 4.4201 |
0.3351 | 6.8915 | 945 | 4.4395 |
0.3503 | 6.9280 | 950 | 4.4323 |
0.3358 | 6.9644 | 955 | 4.4621 |
0.331 | 7.0009 | 960 | 4.4445 |
0.3286 | 7.0374 | 965 | 4.5664 |
0.3082 | 7.0738 | 970 | 4.5114 |
0.3374 | 7.1103 | 975 | 4.5675 |
0.3217 | 7.1468 | 980 | 4.5296 |
0.3195 | 7.1832 | 985 | 4.5777 |
0.3233 | 7.2197 | 990 | 4.5433 |
0.3212 | 7.2562 | 995 | 4.5648 |
0.3167 | 7.2926 | 1000 | 4.5686 |
0.3232 | 7.3291 | 1005 | 4.5661 |
0.3328 | 7.3655 | 1010 | 4.5963 |
0.322 | 7.4020 | 1015 | 4.5819 |
0.322 | 7.4385 | 1020 | 4.6099 |
0.3164 | 7.4749 | 1025 | 4.5745 |
0.3169 | 7.5114 | 1030 | 4.5936 |
0.3215 | 7.5479 | 1035 | 4.6230 |
0.3202 | 7.5843 | 1040 | 4.6132 |
0.3293 | 7.6208 | 1045 | 4.6172 |
0.3184 | 7.6572 | 1050 | 4.6160 |
0.3169 | 7.6937 | 1055 | 4.6323 |
0.3172 | 7.7302 | 1060 | 4.6271 |
0.3061 | 7.7666 | 1065 | 4.6317 |
0.3108 | 7.8031 | 1070 | 4.6392 |
0.3136 | 7.8396 | 1075 | 4.6369 |
0.3209 | 7.8760 | 1080 | 4.6514 |
0.305 | 7.9125 | 1085 | 4.6410 |
0.3179 | 7.9490 | 1090 | 4.6598 |
0.3079 | 7.9854 | 1095 | 4.6556 |
0.3118 | 8.0219 | 1100 | 4.6821 |
0.3088 | 8.0583 | 1105 | 4.7342 |
0.3072 | 8.0948 | 1110 | 4.7028 |
0.298 | 8.1313 | 1115 | 4.7099 |
0.2998 | 8.1677 | 1120 | 4.7381 |
0.3012 | 8.2042 | 1125 | 4.7328 |
0.2964 | 8.2407 | 1130 | 4.7283 |
0.2975 | 8.2771 | 1135 | 4.7397 |
0.3001 | 8.3136 | 1140 | 4.7421 |
0.2972 | 8.3500 | 1145 | 4.7296 |
0.3097 | 8.3865 | 1150 | 4.7425 |
0.308 | 8.4230 | 1155 | 4.7552 |
0.306 | 8.4594 | 1160 | 4.7413 |
0.3019 | 8.4959 | 1165 | 4.7465 |
0.3148 | 8.5324 | 1170 | 4.7622 |
0.2988 | 8.5688 | 1175 | 4.7521 |
0.3031 | 8.6053 | 1180 | 4.7495 |
0.2922 | 8.6418 | 1185 | 4.7595 |
0.2975 | 8.6782 | 1190 | 4.7682 |
0.3082 | 8.7147 | 1195 | 4.7557 |
0.3018 | 8.7511 | 1200 | 4.7503 |
0.3032 | 8.7876 | 1205 | 4.7610 |
0.3065 | 8.8241 | 1210 | 4.7699 |
0.3007 | 8.8605 | 1215 | 4.7654 |
0.3085 | 8.8970 | 1220 | 4.7626 |
0.3007 | 8.9335 | 1225 | 4.7650 |
0.3084 | 8.9699 | 1230 | 4.7675 |
0.2976 | 9.0064 | 1235 | 4.7681 |
0.2943 | 9.0428 | 1240 | 4.7838 |
0.2909 | 9.0793 | 1245 | 4.8035 |
0.2986 | 9.1158 | 1250 | 4.8131 |
0.3068 | 9.1522 | 1255 | 4.8090 |
0.2959 | 9.1887 | 1260 | 4.8023 |
0.2994 | 9.2252 | 1265 | 4.7985 |
0.2817 | 9.2616 | 1270 | 4.7998 |
0.3049 | 9.2981 | 1275 | 4.8020 |
0.293 | 9.3345 | 1280 | 4.8025 |
0.3065 | 9.3710 | 1285 | 4.8029 |
0.2901 | 9.4075 | 1290 | 4.8026 |
0.2959 | 9.4439 | 1295 | 4.8027 |
0.3024 | 9.4804 | 1300 | 4.8026 |
0.3027 | 9.5169 | 1305 | 4.8035 |
0.2927 | 9.5533 | 1310 | 4.8047 |
0.286 | 9.5898 | 1315 | 4.8052 |
0.2976 | 9.6263 | 1320 | 4.8055 |
0.3021 | 9.6627 | 1325 | 4.8057 |
0.2974 | 9.6992 | 1330 | 4.8060 |
0.2954 | 9.7356 | 1335 | 4.8057 |
0.2969 | 9.7721 | 1340 | 4.8051 |
0.301 | 9.8086 | 1345 | 4.8046 |
0.2915 | 9.8450 | 1350 | 4.8044 |
0.295 | 9.8815 | 1355 | 4.8045 |
0.3019 | 9.9180 | 1360 | 4.8045 |
0.2969 | 9.9544 | 1365 | 4.8044 |
0.3036 | 9.9909 | 1370 | 4.8045 |
Framework versions
- Transformers 4.42.0
- Pytorch 2.3.0+cu121
- Datasets 3.2.0
- Tokenizers 0.19.1
- Downloads last month
- 4
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.
Model tree for hZzy/qwen2.5-0.5b-sft-25
Base model
Qwen/Qwen2.5-0.5B