zephyr-7b-dpo-qlora

This model is a fine-tuned version of TII-Frontier-Team/falcon3-3b-instruct on the TII-Frontier-Team/Reasoning_DPO dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0299
  • Rewards/chosen: -4.6362
  • Rewards/rejected: -10.4479
  • Rewards/accuracies: 0.9306
  • Rewards/margins: 5.8117
  • Logps/rejected: -1080.7013
  • Logps/chosen: -496.4129
  • Logits/rejected: 2.0470
  • Logits/chosen: 2.2558

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 128
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6913 0.0315 100 0.6911 0.0007 -0.0036 0.6220 0.0042 -36.2718 -32.7285 -1.6824 -1.6348
0.6742 0.0629 200 0.6751 0.0003 -0.0454 0.6276 0.0458 -40.4596 -32.7631 -1.5097 -1.4586
0.6081 0.0944 300 0.5872 -0.5193 -0.8644 0.6619 0.3451 -122.3552 -84.7303 -0.4701 -0.3830
0.4463 0.1258 400 0.3978 -2.0312 -3.2212 0.7190 1.1900 -358.0407 -235.9217 -0.3673 -0.2101
0.3548 0.1573 500 0.3048 -2.5142 -4.1605 0.7698 1.6464 -451.9689 -284.2137 0.4417 0.6033
0.3014 0.1887 600 0.2395 -2.7662 -4.8033 0.7963 2.0371 -516.2451 -309.4138 1.0026 1.1670
0.25 0.2202 700 0.1989 -3.1039 -5.4194 0.8235 2.3155 -577.8538 -343.1828 1.3421 1.5051
0.2163 0.2517 800 0.1564 -3.4535 -6.3881 0.8369 2.9346 -674.7255 -378.1511 1.8084 1.9697
0.178 0.2831 900 0.1349 -3.4355 -6.5411 0.8586 3.1056 -690.0276 -376.3503 1.7688 1.9492
0.1736 0.3146 1000 0.1127 -3.5471 -6.9599 0.8668 3.4128 -731.9055 -387.5069 2.0848 2.2440
0.1474 0.3460 1100 0.0982 -3.6177 -7.2322 0.8799 3.6145 -759.1403 -394.5700 1.8280 2.0076
0.1382 0.3775 1200 0.0819 -4.3123 -8.3603 0.8862 4.0480 -871.9455 -464.0287 2.0966 2.2833
0.1133 0.4089 1300 0.0714 -4.0671 -8.3309 0.8955 4.2638 -869.0029 -439.5055 1.9082 2.1044
0.1209 0.4404 1400 0.0634 -4.8366 -9.4739 0.8933 4.6374 -983.3081 -516.4533 2.0574 2.2678
0.1057 0.4718 1500 0.0575 -4.1835 -8.8581 0.9019 4.6746 -921.7241 -451.1488 2.0907 2.2780
0.1057 0.5033 1600 0.0536 -4.2093 -8.9250 0.9131 4.7157 -928.4156 -453.7231 2.0198 2.2136
0.0881 0.5348 1700 0.0490 -4.4577 -9.3694 0.9101 4.9118 -972.8605 -478.5644 1.8760 2.0804
0.0847 0.5662 1800 0.0441 -4.2531 -9.4108 0.9131 5.1578 -977.0005 -458.1054 2.0999 2.2904
0.0713 0.5977 1900 0.0411 -4.4101 -9.6543 0.9168 5.2442 -1001.3448 -473.8065 2.0887 2.2861
0.0553 0.6291 2000 0.0378 -4.9687 -10.5782 0.9123 5.6095 -1093.7402 -529.6686 2.0469 2.2608
0.0668 0.6606 2100 0.0362 -4.7485 -10.3227 0.9190 5.5741 -1068.1823 -507.6488 2.1354 2.3368
0.0528 0.6920 2200 0.0356 -4.6766 -10.2170 0.9175 5.5404 -1057.6173 -500.4605 1.9572 2.1594
0.0596 0.7235 2300 0.0340 -4.6180 -10.2121 0.9235 5.5942 -1057.1299 -494.5929 2.0041 2.2117
0.063 0.7550 2400 0.0328 -4.5357 -10.1876 0.9257 5.6519 -1054.6713 -486.3653 2.1493 2.3488
0.0558 0.7864 2500 0.0311 -4.7155 -10.5680 0.9261 5.8526 -1092.7185 -504.3435 2.1208 2.3275
0.0552 0.8179 2600 0.0312 -4.6574 -10.3658 0.9254 5.7084 -1072.4943 -498.5399 2.0544 2.2592
0.066 0.8493 2700 0.0305 -4.6506 -10.4766 0.9287 5.8259 -1083.5740 -497.8611 2.0914 2.2968
0.0568 0.8808 2800 0.0302 -4.6423 -10.4629 0.9302 5.8206 -1082.2051 -497.0266 2.0957 2.3026
0.0602 0.9122 2900 0.0299 -4.6260 -10.4608 0.9299 5.8348 -1081.9958 -495.3989 2.0861 2.2911
0.0634 0.9437 3000 0.0298 -4.6454 -10.4843 0.9313 5.8389 -1084.3455 -497.3409 2.0655 2.2739
0.0602 0.9751 3100 0.0299 -4.6289 -10.4404 0.9302 5.8116 -1079.9603 -495.6860 2.0537 2.2623

Framework versions

  • PEFT 0.13.0
  • Transformers 4.45.1
  • Pytorch 2.4.1+cu121
  • Datasets 3.0.1
  • Tokenizers 0.20.0
Downloads last month
47
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.