qwen2.5-0.5b-sft-25

This model is a fine-tuned version of Qwen/Qwen2.5-0.5B on the hZzy/SFT_new dataset. It achieves the following results on the evaluation set:

Loss: 4.8045

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 3
gradient_accumulation_steps: 8
total_train_batch_size: 96
total_eval_batch_size: 12
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
3.4028	0.0365	5	3.3513
3.3695	0.0729	10	3.3424
3.35	0.1094	15	3.3145
3.3055	0.1459	20	3.2534
3.2405	0.1823	25	3.1755
3.1534	0.2188	30	3.1125
3.1062	0.2552	35	3.0519
3.0267	0.2917	40	2.9690
2.9463	0.3282	45	2.9127
2.8754	0.3646	50	2.8559
2.8207	0.4011	55	2.7941
2.7374	0.4376	60	2.7358
2.6771	0.4740	65	2.6877
2.636	0.5105	70	2.6438
2.5848	0.5469	75	2.6022
2.532	0.5834	80	2.5622
2.4987	0.6199	85	2.5223
2.4358	0.6563	90	2.4851
2.4119	0.6928	95	2.4507
2.3687	0.7293	100	2.4186
2.3183	0.7657	105	2.3887
2.2827	0.8022	110	2.3616
2.2406	0.8387	115	2.3393
2.2093	0.8751	120	2.3150
2.1705	0.9116	125	2.2945
2.1322	0.9480	130	2.2731
2.1359	0.9845	135	2.2563
2.0614	1.0210	140	2.2424
2.0018	1.0574	145	2.2277
1.9863	1.0939	150	2.2170
1.9724	1.1304	155	2.2054
1.9252	1.1668	160	2.1963
1.8979	1.2033	165	2.1882
1.8958	1.2397	170	2.1821
1.8375	1.2762	175	2.1831
1.818	1.3127	180	2.1760
1.7963	1.3491	185	2.1743
1.8217	1.3856	190	2.1734
1.7672	1.4221	195	2.1786
1.7074	1.4585	200	2.1854
1.7135	1.4950	205	2.1875
1.6742	1.5314	210	2.1954
1.6577	1.5679	215	2.1947
1.6024	1.6044	220	2.2123
1.6272	1.6408	225	2.2123
1.566	1.6773	230	2.2140
1.5337	1.7138	235	2.2185
1.5457	1.7502	240	2.2344
1.5015	1.7867	245	2.2457
1.4954	1.8232	250	2.2656
1.476	1.8596	255	2.2690
1.4716	1.8961	260	2.2816
1.4273	1.9325	265	2.2847
1.3919	1.9690	270	2.2793
1.3851	2.0055	275	2.2966
1.2981	2.0419	280	2.3274
1.2814	2.0784	285	2.3330
1.2736	2.1149	290	2.3312
1.2007	2.1513	295	2.3435
1.2254	2.1878	300	2.3802
1.176	2.2242	305	2.3973
1.2075	2.2607	310	2.3862
1.1551	2.2972	315	2.3936
1.198	2.3336	320	2.4092
1.1336	2.3701	325	2.4326
1.1318	2.4066	330	2.4501
1.103	2.4430	335	2.4602
1.118	2.4795	340	2.4965
1.09	2.5160	345	2.5208
1.0911	2.5524	350	2.5272
1.0578	2.5889	355	2.5331
1.0403	2.6253	360	2.5241
1.0326	2.6618	365	2.5398
1.0187	2.6983	370	2.5680
0.9639	2.7347	375	2.5864
0.9774	2.7712	380	2.6144
0.9968	2.8077	385	2.6572
0.9566	2.8441	390	2.6481
0.9272	2.8806	395	2.6594
0.9221	2.9170	400	2.6491
0.9206	2.9535	405	2.6680
0.8799	2.9900	410	2.6838
0.8552	3.0264	415	2.7329
0.8216	3.0629	420	2.7356
0.84	3.0994	425	2.7741
0.7934	3.1358	430	2.8322
0.812	3.1723	435	2.8657
0.8109	3.2088	440	2.8937
0.7825	3.2452	445	2.9137
0.7763	3.2817	450	2.9293
0.777	3.3181	455	2.9468
0.7487	3.3546	460	2.9743
0.7677	3.3911	465	2.9984
0.7607	3.4275	470	2.9930
0.7453	3.4640	475	3.0155
0.7244	3.5005	480	3.0068
0.7108	3.5369	485	3.0263
0.7101	3.5734	490	2.9943
0.6938	3.6098	495	3.0026
0.6798	3.6463	500	3.0478
0.6957	3.6828	505	3.0962
0.6599	3.7192	510	3.1179
0.6669	3.7557	515	3.1637
0.6369	3.7922	520	3.1779
0.6418	3.8286	525	3.1732
0.6771	3.8651	530	3.1779
0.6434	3.9015	535	3.2210
0.6237	3.9380	540	3.1923
0.6335	3.9745	545	3.1995
0.6027	4.0109	550	3.2534
0.5673	4.0474	555	3.2862
0.5941	4.0839	560	3.2781
0.572	4.1203	565	3.3229
0.5512	4.1568	570	3.3519
0.5644	4.1933	575	3.3804
0.5652	4.2297	580	3.4017
0.5499	4.2662	585	3.4768
0.5406	4.3026	590	3.4529
0.5478	4.3391	595	3.4290
0.5334	4.3756	600	3.4491
0.5472	4.4120	605	3.4766
0.5292	4.4485	610	3.5048
0.5216	4.4850	615	3.6019
0.5054	4.5214	620	3.5419
0.5239	4.5579	625	3.5749
0.4993	4.5943	630	3.5809
0.5045	4.6308	635	3.5857
0.5122	4.6673	640	3.5984
0.4905	4.7037	645	3.6182
0.5005	4.7402	650	3.6510
0.4692	4.7767	655	3.6446
0.4996	4.8131	660	3.6743
0.4792	4.8496	665	3.6728
0.481	4.8861	670	3.6977
0.4993	4.9225	675	3.7459
0.471	4.9590	680	3.7445
0.4979	4.9954	685	3.7539
0.4427	5.0319	690	3.7856
0.4391	5.0684	695	3.8017
0.4396	5.1048	700	3.8481
0.4412	5.1413	705	3.8722
0.4232	5.1778	710	3.8464
0.4319	5.2142	715	3.8549
0.4249	5.2507	720	3.9393
0.4187	5.2871	725	3.9709
0.423	5.3236	730	3.9221
0.4357	5.3601	735	3.9165
0.4057	5.3965	740	3.9499
0.4114	5.4330	745	3.9972
0.4203	5.4695	750	3.9683
0.4067	5.5059	755	3.9992
0.412	5.5424	760	4.0263
0.4097	5.5789	765	4.0477
0.4031	5.6153	770	4.0130
0.4064	5.6518	775	4.0363
0.3878	5.6882	780	4.0770
0.4017	5.7247	785	4.0458
0.4067	5.7612	790	4.0821
0.403	5.7976	795	4.0853
0.4037	5.8341	800	4.1023
0.3856	5.8706	805	4.1016
0.3944	5.9070	810	4.1343
0.386	5.9435	815	4.0983
0.3953	5.9799	820	4.1593
0.3779	6.0164	825	4.2123
0.3552	6.0529	830	4.2179
0.3636	6.0893	835	4.2617
0.3689	6.1258	840	4.2406
0.3719	6.1623	845	4.2694
0.3655	6.1987	850	4.2655
0.3526	6.2352	855	4.2406
0.3651	6.2716	860	4.3019
0.3687	6.3081	865	4.2735
0.3539	6.3446	870	4.2967
0.3548	6.3810	875	4.3397
0.3514	6.4175	880	4.3039
0.354	6.4540	885	4.3482
0.3537	6.4904	890	4.3211
0.3496	6.5269	895	4.3648
0.348	6.5634	900	4.3463
0.3415	6.5998	905	4.3704
0.3542	6.6363	910	4.3777
0.3345	6.6727	915	4.3697
0.352	6.7092	920	4.4153
0.343	6.7457	925	4.3800
0.3445	6.7821	930	4.4223
0.3495	6.8186	935	4.4179
0.3387	6.8551	940	4.4201
0.3351	6.8915	945	4.4395
0.3503	6.9280	950	4.4323
0.3358	6.9644	955	4.4621
0.331	7.0009	960	4.4445
0.3286	7.0374	965	4.5664
0.3082	7.0738	970	4.5114
0.3374	7.1103	975	4.5675
0.3217	7.1468	980	4.5296
0.3195	7.1832	985	4.5777
0.3233	7.2197	990	4.5433
0.3212	7.2562	995	4.5648
0.3167	7.2926	1000	4.5686
0.3232	7.3291	1005	4.5661
0.3328	7.3655	1010	4.5963
0.322	7.4020	1015	4.5819
0.322	7.4385	1020	4.6099
0.3164	7.4749	1025	4.5745
0.3169	7.5114	1030	4.5936
0.3215	7.5479	1035	4.6230
0.3202	7.5843	1040	4.6132
0.3293	7.6208	1045	4.6172
0.3184	7.6572	1050	4.6160
0.3169	7.6937	1055	4.6323
0.3172	7.7302	1060	4.6271
0.3061	7.7666	1065	4.6317
0.3108	7.8031	1070	4.6392
0.3136	7.8396	1075	4.6369
0.3209	7.8760	1080	4.6514
0.305	7.9125	1085	4.6410
0.3179	7.9490	1090	4.6598
0.3079	7.9854	1095	4.6556
0.3118	8.0219	1100	4.6821
0.3088	8.0583	1105	4.7342
0.3072	8.0948	1110	4.7028
0.298	8.1313	1115	4.7099
0.2998	8.1677	1120	4.7381
0.3012	8.2042	1125	4.7328
0.2964	8.2407	1130	4.7283
0.2975	8.2771	1135	4.7397
0.3001	8.3136	1140	4.7421
0.2972	8.3500	1145	4.7296
0.3097	8.3865	1150	4.7425
0.308	8.4230	1155	4.7552
0.306	8.4594	1160	4.7413
0.3019	8.4959	1165	4.7465
0.3148	8.5324	1170	4.7622
0.2988	8.5688	1175	4.7521
0.3031	8.6053	1180	4.7495
0.2922	8.6418	1185	4.7595
0.2975	8.6782	1190	4.7682
0.3082	8.7147	1195	4.7557
0.3018	8.7511	1200	4.7503
0.3032	8.7876	1205	4.7610
0.3065	8.8241	1210	4.7699
0.3007	8.8605	1215	4.7654
0.3085	8.8970	1220	4.7626
0.3007	8.9335	1225	4.7650
0.3084	8.9699	1230	4.7675
0.2976	9.0064	1235	4.7681
0.2943	9.0428	1240	4.7838
0.2909	9.0793	1245	4.8035
0.2986	9.1158	1250	4.8131
0.3068	9.1522	1255	4.8090
0.2959	9.1887	1260	4.8023
0.2994	9.2252	1265	4.7985
0.2817	9.2616	1270	4.7998
0.3049	9.2981	1275	4.8020
0.293	9.3345	1280	4.8025
0.3065	9.3710	1285	4.8029
0.2901	9.4075	1290	4.8026
0.2959	9.4439	1295	4.8027
0.3024	9.4804	1300	4.8026
0.3027	9.5169	1305	4.8035
0.2927	9.5533	1310	4.8047
0.286	9.5898	1315	4.8052
0.2976	9.6263	1320	4.8055
0.3021	9.6627	1325	4.8057
0.2974	9.6992	1330	4.8060
0.2954	9.7356	1335	4.8057
0.2969	9.7721	1340	4.8051
0.301	9.8086	1345	4.8046
0.2915	9.8450	1350	4.8044
0.295	9.8815	1355	4.8045
0.3019	9.9180	1360	4.8045
0.2969	9.9544	1365	4.8044
0.3036	9.9909	1370	4.8045

Framework versions

Transformers 4.42.0
Pytorch 2.3.0+cu121
Datasets 3.2.0
Tokenizers 0.19.1

hZzy
/

qwen2.5-0.5b-sft-25

qwen2.5-0.5b-sft-25

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for hZzy/qwen2.5-0.5b-sft-25

Dataset used to train hZzy/qwen2.5-0.5b-sft-25

Evaluation results