mistralit2_1000_STEPS_5e7_rate_03_beta_DPO

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0554
  • Rewards/chosen: -4.6458
  • Rewards/rejected: -7.9897
  • Rewards/accuracies: 0.6593
  • Rewards/margins: 3.3439
  • Logps/rejected: -55.2048
  • Logps/chosen: -38.8718
  • Logits/rejected: -2.6256
  • Logits/chosen: -2.6266

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 4
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6053 0.1 50 0.6740 -0.3080 -0.4763 0.5429 0.1682 -30.1599 -24.4126 -2.8583 -2.8587
0.6455 0.2 100 0.6888 -1.9296 -2.8473 0.6110 0.9176 -38.0633 -29.8180 -2.7011 -2.7016
0.6646 0.29 150 0.8842 -4.4677 -5.7716 0.5956 1.3039 -47.8112 -38.2782 -2.7068 -2.7076
0.8576 0.39 200 0.8269 0.2095 -0.3290 0.5341 0.5385 -29.6690 -22.6876 -2.8074 -2.8077
0.9282 0.49 250 0.8715 -3.3030 -4.1864 0.5758 0.8834 -42.5272 -34.3958 -2.8320 -2.8326
0.883 0.59 300 0.8491 -1.6930 -2.7293 0.5846 1.0364 -37.6702 -29.0290 -2.8023 -2.8028
0.7641 0.68 350 0.8305 -0.5284 -1.4934 0.5868 0.9650 -33.5504 -25.1471 -2.8008 -2.8013
0.8485 0.78 400 0.8168 -1.8042 -3.2662 0.6286 1.4620 -39.4597 -29.3999 -2.8978 -2.8983
0.6637 0.88 450 0.9089 -4.1779 -5.6349 0.6220 1.4570 -47.3556 -37.3123 -2.7996 -2.8003
0.8293 0.98 500 0.7790 -1.7260 -3.1768 0.6242 1.4508 -39.1617 -29.1392 -2.7937 -2.7943
0.1061 1.07 550 0.8642 -2.6748 -4.9677 0.6659 2.2929 -45.1314 -32.3019 -2.7609 -2.7616
0.1183 1.17 600 1.0052 -4.2792 -7.1691 0.6527 2.8899 -52.4695 -37.6498 -2.6760 -2.6769
0.3423 1.27 650 1.0032 -4.1972 -7.1444 0.6571 2.9472 -52.3871 -37.3765 -2.6563 -2.6572
0.3015 1.37 700 1.0111 -4.0263 -7.1542 0.6549 3.1280 -52.4198 -36.8067 -2.6518 -2.6526
0.0814 1.46 750 1.0416 -4.3351 -7.5972 0.6484 3.2621 -53.8964 -37.8360 -2.6335 -2.6344
0.1279 1.56 800 1.0511 -4.6097 -7.9321 0.6505 3.3224 -55.0127 -38.7514 -2.6277 -2.6287
0.1507 1.66 850 1.0478 -4.6393 -7.9834 0.6484 3.3441 -55.1838 -38.8501 -2.6262 -2.6272
0.2148 1.76 900 1.0515 -4.6439 -7.9924 0.6527 3.3485 -55.2139 -38.8655 -2.6260 -2.6270
0.2291 1.86 950 1.0554 -4.6452 -7.9877 0.6505 3.3425 -55.1980 -38.8697 -2.6257 -2.6267
0.13 1.95 1000 1.0554 -4.6458 -7.9897 0.6593 3.3439 -55.2048 -38.8718 -2.6256 -2.6266

Framework versions

  • Transformers 4.38.2
  • Pytorch 2.0.0+cu117
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
8
Safetensors
Model size
7.24B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for tsavage68/mistralit2_1000_STEPS_5e7_rate_03_beta_DPO

Finetuned
(926)
this model