Visualize in Weights & Biases

qwen2.5-0.5b-sft-25

This model is a fine-tuned version of Qwen/Qwen2.5-0.5B on the hZzy/SFT_new dataset. It achieves the following results on the evaluation set:

  • Loss: 4.8045

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 3
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 96
  • total_eval_batch_size: 12
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
3.4028 0.0365 5 3.3513
3.3695 0.0729 10 3.3424
3.35 0.1094 15 3.3145
3.3055 0.1459 20 3.2534
3.2405 0.1823 25 3.1755
3.1534 0.2188 30 3.1125
3.1062 0.2552 35 3.0519
3.0267 0.2917 40 2.9690
2.9463 0.3282 45 2.9127
2.8754 0.3646 50 2.8559
2.8207 0.4011 55 2.7941
2.7374 0.4376 60 2.7358
2.6771 0.4740 65 2.6877
2.636 0.5105 70 2.6438
2.5848 0.5469 75 2.6022
2.532 0.5834 80 2.5622
2.4987 0.6199 85 2.5223
2.4358 0.6563 90 2.4851
2.4119 0.6928 95 2.4507
2.3687 0.7293 100 2.4186
2.3183 0.7657 105 2.3887
2.2827 0.8022 110 2.3616
2.2406 0.8387 115 2.3393
2.2093 0.8751 120 2.3150
2.1705 0.9116 125 2.2945
2.1322 0.9480 130 2.2731
2.1359 0.9845 135 2.2563
2.0614 1.0210 140 2.2424
2.0018 1.0574 145 2.2277
1.9863 1.0939 150 2.2170
1.9724 1.1304 155 2.2054
1.9252 1.1668 160 2.1963
1.8979 1.2033 165 2.1882
1.8958 1.2397 170 2.1821
1.8375 1.2762 175 2.1831
1.818 1.3127 180 2.1760
1.7963 1.3491 185 2.1743
1.8217 1.3856 190 2.1734
1.7672 1.4221 195 2.1786
1.7074 1.4585 200 2.1854
1.7135 1.4950 205 2.1875
1.6742 1.5314 210 2.1954
1.6577 1.5679 215 2.1947
1.6024 1.6044 220 2.2123
1.6272 1.6408 225 2.2123
1.566 1.6773 230 2.2140
1.5337 1.7138 235 2.2185
1.5457 1.7502 240 2.2344
1.5015 1.7867 245 2.2457
1.4954 1.8232 250 2.2656
1.476 1.8596 255 2.2690
1.4716 1.8961 260 2.2816
1.4273 1.9325 265 2.2847
1.3919 1.9690 270 2.2793
1.3851 2.0055 275 2.2966
1.2981 2.0419 280 2.3274
1.2814 2.0784 285 2.3330
1.2736 2.1149 290 2.3312
1.2007 2.1513 295 2.3435
1.2254 2.1878 300 2.3802
1.176 2.2242 305 2.3973
1.2075 2.2607 310 2.3862
1.1551 2.2972 315 2.3936
1.198 2.3336 320 2.4092
1.1336 2.3701 325 2.4326
1.1318 2.4066 330 2.4501
1.103 2.4430 335 2.4602
1.118 2.4795 340 2.4965
1.09 2.5160 345 2.5208
1.0911 2.5524 350 2.5272
1.0578 2.5889 355 2.5331
1.0403 2.6253 360 2.5241
1.0326 2.6618 365 2.5398
1.0187 2.6983 370 2.5680
0.9639 2.7347 375 2.5864
0.9774 2.7712 380 2.6144
0.9968 2.8077 385 2.6572
0.9566 2.8441 390 2.6481
0.9272 2.8806 395 2.6594
0.9221 2.9170 400 2.6491
0.9206 2.9535 405 2.6680
0.8799 2.9900 410 2.6838
0.8552 3.0264 415 2.7329
0.8216 3.0629 420 2.7356
0.84 3.0994 425 2.7741
0.7934 3.1358 430 2.8322
0.812 3.1723 435 2.8657
0.8109 3.2088 440 2.8937
0.7825 3.2452 445 2.9137
0.7763 3.2817 450 2.9293
0.777 3.3181 455 2.9468
0.7487 3.3546 460 2.9743
0.7677 3.3911 465 2.9984
0.7607 3.4275 470 2.9930
0.7453 3.4640 475 3.0155
0.7244 3.5005 480 3.0068
0.7108 3.5369 485 3.0263
0.7101 3.5734 490 2.9943
0.6938 3.6098 495 3.0026
0.6798 3.6463 500 3.0478
0.6957 3.6828 505 3.0962
0.6599 3.7192 510 3.1179
0.6669 3.7557 515 3.1637
0.6369 3.7922 520 3.1779
0.6418 3.8286 525 3.1732
0.6771 3.8651 530 3.1779
0.6434 3.9015 535 3.2210
0.6237 3.9380 540 3.1923
0.6335 3.9745 545 3.1995
0.6027 4.0109 550 3.2534
0.5673 4.0474 555 3.2862
0.5941 4.0839 560 3.2781
0.572 4.1203 565 3.3229
0.5512 4.1568 570 3.3519
0.5644 4.1933 575 3.3804
0.5652 4.2297 580 3.4017
0.5499 4.2662 585 3.4768
0.5406 4.3026 590 3.4529
0.5478 4.3391 595 3.4290
0.5334 4.3756 600 3.4491
0.5472 4.4120 605 3.4766
0.5292 4.4485 610 3.5048
0.5216 4.4850 615 3.6019
0.5054 4.5214 620 3.5419
0.5239 4.5579 625 3.5749
0.4993 4.5943 630 3.5809
0.5045 4.6308 635 3.5857
0.5122 4.6673 640 3.5984
0.4905 4.7037 645 3.6182
0.5005 4.7402 650 3.6510
0.4692 4.7767 655 3.6446
0.4996 4.8131 660 3.6743
0.4792 4.8496 665 3.6728
0.481 4.8861 670 3.6977
0.4993 4.9225 675 3.7459
0.471 4.9590 680 3.7445
0.4979 4.9954 685 3.7539
0.4427 5.0319 690 3.7856
0.4391 5.0684 695 3.8017
0.4396 5.1048 700 3.8481
0.4412 5.1413 705 3.8722
0.4232 5.1778 710 3.8464
0.4319 5.2142 715 3.8549
0.4249 5.2507 720 3.9393
0.4187 5.2871 725 3.9709
0.423 5.3236 730 3.9221
0.4357 5.3601 735 3.9165
0.4057 5.3965 740 3.9499
0.4114 5.4330 745 3.9972
0.4203 5.4695 750 3.9683
0.4067 5.5059 755 3.9992
0.412 5.5424 760 4.0263
0.4097 5.5789 765 4.0477
0.4031 5.6153 770 4.0130
0.4064 5.6518 775 4.0363
0.3878 5.6882 780 4.0770
0.4017 5.7247 785 4.0458
0.4067 5.7612 790 4.0821
0.403 5.7976 795 4.0853
0.4037 5.8341 800 4.1023
0.3856 5.8706 805 4.1016
0.3944 5.9070 810 4.1343
0.386 5.9435 815 4.0983
0.3953 5.9799 820 4.1593
0.3779 6.0164 825 4.2123
0.3552 6.0529 830 4.2179
0.3636 6.0893 835 4.2617
0.3689 6.1258 840 4.2406
0.3719 6.1623 845 4.2694
0.3655 6.1987 850 4.2655
0.3526 6.2352 855 4.2406
0.3651 6.2716 860 4.3019
0.3687 6.3081 865 4.2735
0.3539 6.3446 870 4.2967
0.3548 6.3810 875 4.3397
0.3514 6.4175 880 4.3039
0.354 6.4540 885 4.3482
0.3537 6.4904 890 4.3211
0.3496 6.5269 895 4.3648
0.348 6.5634 900 4.3463
0.3415 6.5998 905 4.3704
0.3542 6.6363 910 4.3777
0.3345 6.6727 915 4.3697
0.352 6.7092 920 4.4153
0.343 6.7457 925 4.3800
0.3445 6.7821 930 4.4223
0.3495 6.8186 935 4.4179
0.3387 6.8551 940 4.4201
0.3351 6.8915 945 4.4395
0.3503 6.9280 950 4.4323
0.3358 6.9644 955 4.4621
0.331 7.0009 960 4.4445
0.3286 7.0374 965 4.5664
0.3082 7.0738 970 4.5114
0.3374 7.1103 975 4.5675
0.3217 7.1468 980 4.5296
0.3195 7.1832 985 4.5777
0.3233 7.2197 990 4.5433
0.3212 7.2562 995 4.5648
0.3167 7.2926 1000 4.5686
0.3232 7.3291 1005 4.5661
0.3328 7.3655 1010 4.5963
0.322 7.4020 1015 4.5819
0.322 7.4385 1020 4.6099
0.3164 7.4749 1025 4.5745
0.3169 7.5114 1030 4.5936
0.3215 7.5479 1035 4.6230
0.3202 7.5843 1040 4.6132
0.3293 7.6208 1045 4.6172
0.3184 7.6572 1050 4.6160
0.3169 7.6937 1055 4.6323
0.3172 7.7302 1060 4.6271
0.3061 7.7666 1065 4.6317
0.3108 7.8031 1070 4.6392
0.3136 7.8396 1075 4.6369
0.3209 7.8760 1080 4.6514
0.305 7.9125 1085 4.6410
0.3179 7.9490 1090 4.6598
0.3079 7.9854 1095 4.6556
0.3118 8.0219 1100 4.6821
0.3088 8.0583 1105 4.7342
0.3072 8.0948 1110 4.7028
0.298 8.1313 1115 4.7099
0.2998 8.1677 1120 4.7381
0.3012 8.2042 1125 4.7328
0.2964 8.2407 1130 4.7283
0.2975 8.2771 1135 4.7397
0.3001 8.3136 1140 4.7421
0.2972 8.3500 1145 4.7296
0.3097 8.3865 1150 4.7425
0.308 8.4230 1155 4.7552
0.306 8.4594 1160 4.7413
0.3019 8.4959 1165 4.7465
0.3148 8.5324 1170 4.7622
0.2988 8.5688 1175 4.7521
0.3031 8.6053 1180 4.7495
0.2922 8.6418 1185 4.7595
0.2975 8.6782 1190 4.7682
0.3082 8.7147 1195 4.7557
0.3018 8.7511 1200 4.7503
0.3032 8.7876 1205 4.7610
0.3065 8.8241 1210 4.7699
0.3007 8.8605 1215 4.7654
0.3085 8.8970 1220 4.7626
0.3007 8.9335 1225 4.7650
0.3084 8.9699 1230 4.7675
0.2976 9.0064 1235 4.7681
0.2943 9.0428 1240 4.7838
0.2909 9.0793 1245 4.8035
0.2986 9.1158 1250 4.8131
0.3068 9.1522 1255 4.8090
0.2959 9.1887 1260 4.8023
0.2994 9.2252 1265 4.7985
0.2817 9.2616 1270 4.7998
0.3049 9.2981 1275 4.8020
0.293 9.3345 1280 4.8025
0.3065 9.3710 1285 4.8029
0.2901 9.4075 1290 4.8026
0.2959 9.4439 1295 4.8027
0.3024 9.4804 1300 4.8026
0.3027 9.5169 1305 4.8035
0.2927 9.5533 1310 4.8047
0.286 9.5898 1315 4.8052
0.2976 9.6263 1320 4.8055
0.3021 9.6627 1325 4.8057
0.2974 9.6992 1330 4.8060
0.2954 9.7356 1335 4.8057
0.2969 9.7721 1340 4.8051
0.301 9.8086 1345 4.8046
0.2915 9.8450 1350 4.8044
0.295 9.8815 1355 4.8045
0.3019 9.9180 1360 4.8045
0.2969 9.9544 1365 4.8044
0.3036 9.9909 1370 4.8045

Framework versions

  • Transformers 4.42.0
  • Pytorch 2.3.0+cu121
  • Datasets 3.2.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
494M params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for hZzy/qwen2.5-0.5b-sft-25

Base model

Qwen/Qwen2.5-0.5B
Finetuned
(126)
this model

Dataset used to train hZzy/qwen2.5-0.5b-sft-25