voxmenthe
/

openhermes-mistral-2.5-7b-dpo-test

Generated from Trainer

Model card Files Files and versions Community

openhermes-mistral-2.5-7b-dpo-test / README.md

voxmenthe

voxmenthe/openhermes-mistral-2.5-7b-dpo-test

73cb91b over 1 year ago

preview code

raw

history blame contribute delete

5.69 kB

	---
	license: apache-2.0
	base_model: teknium/OpenHermes-2.5-Mistral-7B
	tags:
	- generated_from_trainer
	model-index:
	- name: openhermes-mistral-2.5-7b-dpo-test
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# openhermes-mistral-2.5-7b-dpo-test

	This model is a fine-tuned version of [teknium/OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.4487
	- Rewards/chosen: -0.2951
	- Rewards/rejected: -2.2421
	- Rewards/accuracies: 0.875
	- Rewards/margins: 1.9470
	- Logps/rejected: -257.4751
	- Logps/chosen: -204.3027
	- Logits/rejected: -3.0752
	- Logits/chosen: -3.0485

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0001
	- train_batch_size: 1
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 2
	- training_steps: 200

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.1645 \| 0.01 \| 10 \| 0.5339 \| 0.3993 \| -0.1483 \| 0.6875 \| 0.5476 \| -236.5374 \| -197.3593 \| -3.1575 \| -3.1872 \|
	\| 0.0519 \| 0.01 \| 20 \| 0.5521 \| 0.2239 \| -0.4486 \| 0.625 \| 0.6725 \| -239.5405 \| -199.1127 \| -3.1969 \| -3.2456 \|
	\| 0.1618 \| 0.01 \| 30 \| 0.5866 \| -0.0538 \| -0.8893 \| 0.5625 \| 0.8355 \| -243.9472 \| -201.8902 \| -3.2286 \| -3.2525 \|
	\| 0.1752 \| 0.02 \| 40 \| 0.5943 \| -0.2184 \| -1.2057 \| 0.5 \| 0.9873 \| -247.1112 \| -203.5360 \| -3.2201 \| -3.2477 \|
	\| 0.3811 \| 0.03 \| 50 \| 0.6973 \| -0.6180 \| -1.8146 \| 0.5 \| 1.1966 \| -253.2001 \| -207.5316 \| -3.1943 \| -3.2034 \|
	\| 1.158 \| 0.03 \| 60 \| 0.6347 \| -0.4710 \| -1.7363 \| 0.5625 \| 1.2653 \| -252.4173 \| -206.0622 \| -3.1655 \| -3.1197 \|
	\| 0.8751 \| 0.04 \| 70 \| 0.6103 \| -0.4061 \| -1.5966 \| 0.5625 \| 1.1905 \| -251.0201 \| -205.4132 \| -3.1360 \| -3.0544 \|
	\| 0.7811 \| 0.04 \| 80 \| 0.6405 \| -0.4774 \| -1.6574 \| 0.5625 \| 1.1799 \| -251.6278 \| -206.1260 \| -3.1337 \| -3.0492 \|
	\| 1.4305 \| 0.04 \| 90 \| 0.6257 \| -0.4784 \| -1.6184 \| 0.5625 \| 1.1399 \| -251.2379 \| -206.1361 \| -3.1251 \| -3.0489 \|
	\| 0.5478 \| 0.05 \| 100 \| 0.6191 \| -0.5317 \| -1.7067 \| 0.5625 \| 1.1750 \| -252.1214 \| -206.6691 \| -3.1207 \| -3.0753 \|
	\| 0.6344 \| 0.06 \| 110 \| 0.5691 \| -0.4827 \| -1.7734 \| 0.5625 \| 1.2907 \| -252.7882 \| -206.1789 \| -3.1075 \| -3.0806 \|
	\| 0.5405 \| 0.06 \| 120 \| 0.5337 \| -0.4681 \| -2.1739 \| 0.8125 \| 1.7058 \| -256.7935 \| -206.0332 \| -3.1124 \| -3.0733 \|
	\| 0.7848 \| 0.07 \| 130 \| 0.5390 \| -0.5288 \| -2.3789 \| 0.8125 \| 1.8501 \| -258.8436 \| -206.6404 \| -3.1019 \| -3.0628 \|
	\| 1.3119 \| 0.07 \| 140 \| 0.4753 \| -0.3276 \| -2.0907 \| 0.875 \| 1.7631 \| -255.9614 \| -204.6279 \| -3.0904 \| -3.0648 \|
	\| 0.3636 \| 0.07 \| 150 \| 0.4555 \| -0.2566 \| -2.0064 \| 0.625 \| 1.7498 \| -255.1179 \| -203.9175 \| -3.0804 \| -3.0640 \|
	\| 0.427 \| 0.08 \| 160 \| 0.4614 \| -0.2900 \| -2.0804 \| 0.625 \| 1.7904 \| -255.8585 \| -204.2518 \| -3.0721 \| -3.0518 \|
	\| 0.8971 \| 0.09 \| 170 \| 0.4629 \| -0.3117 \| -2.1791 \| 0.875 \| 1.8673 \| -256.8448 \| -204.4694 \| -3.0711 \| -3.0468 \|
	\| 0.6219 \| 0.09 \| 180 \| 0.4560 \| -0.3042 \| -2.2114 \| 0.875 \| 1.9073 \| -257.1686 \| -204.3934 \| -3.0743 \| -3.0485 \|
	\| 0.7551 \| 0.1 \| 190 \| 0.4520 \| -0.3007 \| -2.2400 \| 0.875 \| 1.9392 \| -257.4540 \| -204.3593 \| -3.0755 \| -3.0481 \|
	\| 1.0917 \| 0.1 \| 200 \| 0.4487 \| -0.2951 \| -2.2421 \| 0.875 \| 1.9470 \| -257.4751 \| -204.3027 \| -3.0752 \| -3.0485 \|


	### Framework versions

	- Transformers 4.34.1
	- Pytorch 2.1.0+cu121
	- Datasets 2.14.6
	- Tokenizers 0.14.1