mistralit2_1000_STEPS_5e7_rate_0.1_beta_DPO

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.7132
  • Rewards/chosen: -3.0068
  • Rewards/rejected: -5.0778
  • Rewards/accuracies: 0.6813
  • Rewards/margins: 2.0710
  • Logps/rejected: -79.3505
  • Logps/chosen: -53.4537
  • Logits/rejected: -2.5776
  • Logits/chosen: -2.5788

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 4
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6418 0.1 50 0.6447 -0.5872 -0.7568 0.5736 0.1696 -36.1403 -29.2577 -2.8316 -2.8320
0.5915 0.2 100 0.6534 -2.5902 -3.2664 0.6000 0.6762 -61.2361 -49.2879 -2.5920 -2.5930
0.6181 0.29 150 0.6108 -1.7262 -2.4531 0.6352 0.7270 -53.1036 -40.6475 -2.6698 -2.6708
0.5919 0.39 200 0.6201 -0.8739 -1.3497 0.6110 0.4758 -42.0694 -32.1245 -2.8217 -2.8224
0.7232 0.49 250 0.6496 -2.3019 -2.8348 0.6110 0.5328 -56.9199 -46.4053 -2.8105 -2.8116
0.6175 0.59 300 0.6052 -1.3274 -2.0772 0.6440 0.7497 -49.3443 -36.6603 -2.8706 -2.8714
0.6294 0.68 350 0.5762 -0.5378 -1.3786 0.6484 0.8407 -42.3582 -28.7642 -2.8508 -2.8515
0.5572 0.78 400 0.5838 -2.3342 -3.3990 0.6615 1.0648 -62.5628 -46.7279 -2.9194 -2.9202
0.5339 0.88 450 0.6065 -2.3478 -3.1946 0.6615 0.8468 -60.5187 -46.8642 -2.8735 -2.8743
0.5162 0.98 500 0.6054 -1.8059 -2.8617 0.6593 1.0558 -57.1895 -41.4452 -2.8408 -2.8416
0.1367 1.07 550 0.5967 -1.5441 -3.2437 0.6923 1.6996 -61.0093 -38.8268 -2.7152 -2.7164
0.1427 1.17 600 0.6612 -2.6012 -4.5496 0.6923 1.9484 -74.0686 -49.3976 -2.6127 -2.6140
0.2423 1.27 650 0.6953 -3.2920 -5.2913 0.6835 1.9992 -81.4852 -56.3063 -2.5920 -2.5933
0.2461 1.37 700 0.6994 -3.0907 -5.0995 0.6791 2.0088 -79.5678 -54.2931 -2.5993 -2.6005
0.05 1.46 750 0.7081 -2.9719 -5.0539 0.6835 2.0820 -79.1113 -53.1052 -2.5893 -2.5906
0.1265 1.56 800 0.7096 -2.9511 -5.0249 0.6791 2.0739 -78.8217 -52.8965 -2.5798 -2.5810
0.1903 1.66 850 0.7099 -3.0000 -5.0705 0.6769 2.0705 -79.2773 -53.3856 -2.5782 -2.5795
0.1908 1.76 900 0.7144 -3.0075 -5.0795 0.6857 2.0720 -79.3678 -53.4610 -2.5779 -2.5792
0.2293 1.86 950 0.7119 -3.0087 -5.0829 0.6835 2.0742 -79.4011 -53.4726 -2.5778 -2.5790
0.1167 1.95 1000 0.7132 -3.0068 -5.0778 0.6813 2.0710 -79.3505 -53.4537 -2.5776 -2.5788

Framework versions

  • Transformers 4.38.2
  • Pytorch 2.0.0+cu117
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
8
Safetensors
Model size
7.24B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for tsavage68/mistralit2_1000_STEPS_5e7_rate_0.1_beta_DPO

Finetuned
(926)
this model