|
--- |
|
license: apache-2.0 |
|
base_model: teknium/OpenHermes-2.5-Mistral-7B |
|
tags: |
|
- generated_from_trainer |
|
model-index: |
|
- name: openhermes-mistral-2.5-7b-dpo-test |
|
results: [] |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
# openhermes-mistral-2.5-7b-dpo-test |
|
|
|
This model is a fine-tuned version of [teknium/OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) on the None dataset. |
|
It achieves the following results on the evaluation set: |
|
- Loss: 0.4487 |
|
- Rewards/chosen: -0.2951 |
|
- Rewards/rejected: -2.2421 |
|
- Rewards/accuracies: 0.875 |
|
- Rewards/margins: 1.9470 |
|
- Logps/rejected: -257.4751 |
|
- Logps/chosen: -204.3027 |
|
- Logits/rejected: -3.0752 |
|
- Logits/chosen: -3.0485 |
|
|
|
## Model description |
|
|
|
More information needed |
|
|
|
## Intended uses & limitations |
|
|
|
More information needed |
|
|
|
## Training and evaluation data |
|
|
|
More information needed |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 0.0001 |
|
- train_batch_size: 1 |
|
- eval_batch_size: 8 |
|
- seed: 42 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: linear |
|
- lr_scheduler_warmup_steps: 2 |
|
- training_steps: 200 |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |
|
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| |
|
| 0.1645 | 0.01 | 10 | 0.5339 | 0.3993 | -0.1483 | 0.6875 | 0.5476 | -236.5374 | -197.3593 | -3.1575 | -3.1872 | |
|
| 0.0519 | 0.01 | 20 | 0.5521 | 0.2239 | -0.4486 | 0.625 | 0.6725 | -239.5405 | -199.1127 | -3.1969 | -3.2456 | |
|
| 0.1618 | 0.01 | 30 | 0.5866 | -0.0538 | -0.8893 | 0.5625 | 0.8355 | -243.9472 | -201.8902 | -3.2286 | -3.2525 | |
|
| 0.1752 | 0.02 | 40 | 0.5943 | -0.2184 | -1.2057 | 0.5 | 0.9873 | -247.1112 | -203.5360 | -3.2201 | -3.2477 | |
|
| 0.3811 | 0.03 | 50 | 0.6973 | -0.6180 | -1.8146 | 0.5 | 1.1966 | -253.2001 | -207.5316 | -3.1943 | -3.2034 | |
|
| 1.158 | 0.03 | 60 | 0.6347 | -0.4710 | -1.7363 | 0.5625 | 1.2653 | -252.4173 | -206.0622 | -3.1655 | -3.1197 | |
|
| 0.8751 | 0.04 | 70 | 0.6103 | -0.4061 | -1.5966 | 0.5625 | 1.1905 | -251.0201 | -205.4132 | -3.1360 | -3.0544 | |
|
| 0.7811 | 0.04 | 80 | 0.6405 | -0.4774 | -1.6574 | 0.5625 | 1.1799 | -251.6278 | -206.1260 | -3.1337 | -3.0492 | |
|
| 1.4305 | 0.04 | 90 | 0.6257 | -0.4784 | -1.6184 | 0.5625 | 1.1399 | -251.2379 | -206.1361 | -3.1251 | -3.0489 | |
|
| 0.5478 | 0.05 | 100 | 0.6191 | -0.5317 | -1.7067 | 0.5625 | 1.1750 | -252.1214 | -206.6691 | -3.1207 | -3.0753 | |
|
| 0.6344 | 0.06 | 110 | 0.5691 | -0.4827 | -1.7734 | 0.5625 | 1.2907 | -252.7882 | -206.1789 | -3.1075 | -3.0806 | |
|
| 0.5405 | 0.06 | 120 | 0.5337 | -0.4681 | -2.1739 | 0.8125 | 1.7058 | -256.7935 | -206.0332 | -3.1124 | -3.0733 | |
|
| 0.7848 | 0.07 | 130 | 0.5390 | -0.5288 | -2.3789 | 0.8125 | 1.8501 | -258.8436 | -206.6404 | -3.1019 | -3.0628 | |
|
| 1.3119 | 0.07 | 140 | 0.4753 | -0.3276 | -2.0907 | 0.875 | 1.7631 | -255.9614 | -204.6279 | -3.0904 | -3.0648 | |
|
| 0.3636 | 0.07 | 150 | 0.4555 | -0.2566 | -2.0064 | 0.625 | 1.7498 | -255.1179 | -203.9175 | -3.0804 | -3.0640 | |
|
| 0.427 | 0.08 | 160 | 0.4614 | -0.2900 | -2.0804 | 0.625 | 1.7904 | -255.8585 | -204.2518 | -3.0721 | -3.0518 | |
|
| 0.8971 | 0.09 | 170 | 0.4629 | -0.3117 | -2.1791 | 0.875 | 1.8673 | -256.8448 | -204.4694 | -3.0711 | -3.0468 | |
|
| 0.6219 | 0.09 | 180 | 0.4560 | -0.3042 | -2.2114 | 0.875 | 1.9073 | -257.1686 | -204.3934 | -3.0743 | -3.0485 | |
|
| 0.7551 | 0.1 | 190 | 0.4520 | -0.3007 | -2.2400 | 0.875 | 1.9392 | -257.4540 | -204.3593 | -3.0755 | -3.0481 | |
|
| 1.0917 | 0.1 | 200 | 0.4487 | -0.2951 | -2.2421 | 0.875 | 1.9470 | -257.4751 | -204.3027 | -3.0752 | -3.0485 | |
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.34.1 |
|
- Pytorch 2.1.0+cu121 |
|
- Datasets 2.14.6 |
|
- Tokenizers 0.14.1 |
|
|