mistralit2_1000_STEPS_5e7_rate_03_beta_DPO
This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.0554
- Rewards/chosen: -4.6458
- Rewards/rejected: -7.9897
- Rewards/accuracies: 0.6593
- Rewards/margins: 3.3439
- Logps/rejected: -55.2048
- Logps/chosen: -38.8718
- Logits/rejected: -2.6256
- Logits/chosen: -2.6266
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 4
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6053 | 0.1 | 50 | 0.6740 | -0.3080 | -0.4763 | 0.5429 | 0.1682 | -30.1599 | -24.4126 | -2.8583 | -2.8587 |
0.6455 | 0.2 | 100 | 0.6888 | -1.9296 | -2.8473 | 0.6110 | 0.9176 | -38.0633 | -29.8180 | -2.7011 | -2.7016 |
0.6646 | 0.29 | 150 | 0.8842 | -4.4677 | -5.7716 | 0.5956 | 1.3039 | -47.8112 | -38.2782 | -2.7068 | -2.7076 |
0.8576 | 0.39 | 200 | 0.8269 | 0.2095 | -0.3290 | 0.5341 | 0.5385 | -29.6690 | -22.6876 | -2.8074 | -2.8077 |
0.9282 | 0.49 | 250 | 0.8715 | -3.3030 | -4.1864 | 0.5758 | 0.8834 | -42.5272 | -34.3958 | -2.8320 | -2.8326 |
0.883 | 0.59 | 300 | 0.8491 | -1.6930 | -2.7293 | 0.5846 | 1.0364 | -37.6702 | -29.0290 | -2.8023 | -2.8028 |
0.7641 | 0.68 | 350 | 0.8305 | -0.5284 | -1.4934 | 0.5868 | 0.9650 | -33.5504 | -25.1471 | -2.8008 | -2.8013 |
0.8485 | 0.78 | 400 | 0.8168 | -1.8042 | -3.2662 | 0.6286 | 1.4620 | -39.4597 | -29.3999 | -2.8978 | -2.8983 |
0.6637 | 0.88 | 450 | 0.9089 | -4.1779 | -5.6349 | 0.6220 | 1.4570 | -47.3556 | -37.3123 | -2.7996 | -2.8003 |
0.8293 | 0.98 | 500 | 0.7790 | -1.7260 | -3.1768 | 0.6242 | 1.4508 | -39.1617 | -29.1392 | -2.7937 | -2.7943 |
0.1061 | 1.07 | 550 | 0.8642 | -2.6748 | -4.9677 | 0.6659 | 2.2929 | -45.1314 | -32.3019 | -2.7609 | -2.7616 |
0.1183 | 1.17 | 600 | 1.0052 | -4.2792 | -7.1691 | 0.6527 | 2.8899 | -52.4695 | -37.6498 | -2.6760 | -2.6769 |
0.3423 | 1.27 | 650 | 1.0032 | -4.1972 | -7.1444 | 0.6571 | 2.9472 | -52.3871 | -37.3765 | -2.6563 | -2.6572 |
0.3015 | 1.37 | 700 | 1.0111 | -4.0263 | -7.1542 | 0.6549 | 3.1280 | -52.4198 | -36.8067 | -2.6518 | -2.6526 |
0.0814 | 1.46 | 750 | 1.0416 | -4.3351 | -7.5972 | 0.6484 | 3.2621 | -53.8964 | -37.8360 | -2.6335 | -2.6344 |
0.1279 | 1.56 | 800 | 1.0511 | -4.6097 | -7.9321 | 0.6505 | 3.3224 | -55.0127 | -38.7514 | -2.6277 | -2.6287 |
0.1507 | 1.66 | 850 | 1.0478 | -4.6393 | -7.9834 | 0.6484 | 3.3441 | -55.1838 | -38.8501 | -2.6262 | -2.6272 |
0.2148 | 1.76 | 900 | 1.0515 | -4.6439 | -7.9924 | 0.6527 | 3.3485 | -55.2139 | -38.8655 | -2.6260 | -2.6270 |
0.2291 | 1.86 | 950 | 1.0554 | -4.6452 | -7.9877 | 0.6505 | 3.3425 | -55.1980 | -38.8697 | -2.6257 | -2.6267 |
0.13 | 1.95 | 1000 | 1.0554 | -4.6458 | -7.9897 | 0.6593 | 3.3439 | -55.2048 | -38.8718 | -2.6256 | -2.6266 |
Framework versions
- Transformers 4.38.2
- Pytorch 2.0.0+cu117
- Datasets 2.18.0
- Tokenizers 0.15.2
- Downloads last month
- 8
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
Model tree for tsavage68/mistralit2_1000_STEPS_5e7_rate_03_beta_DPO
Base model
mistralai/Mistral-7B-Instruct-v0.2