Model save

Files changed (6) hide show

README.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 base_model: mistralai/Ministral-8B-Instruct-2410
 library_name: transformers
-model_name: reflect_mini8Bit_Om2G8kOm2AgG8k40kIpsdpT02
 tags:
 - generated_from_trainer
 - trl
@@ -9,7 +9,7 @@ tags:
 licence: license
 ---
-# Model Card for reflect_mini8Bit_Om2G8kOm2AgG8k40kIpsdpT02
 This model is a fine-tuned version of [mistralai/Ministral-8B-Instruct-2410](https://huggingface.co/mistralai/Ministral-8B-Instruct-2410).
 It has been trained using [TRL](https://github.com/huggingface/trl).
@@ -20,14 +20,14 @@ It has been trained using [TRL](https://github.com/huggingface/trl).
 from transformers import pipeline
 question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
-generator = pipeline("text-generation", model="RyanYr/reflect_mini8Bit_Om2G8kOm2AgG8k40kIpsdpT02", device="cuda")
 output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
 print(output["generated_text"])
 ```
 ## Training procedure
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/yyr/huggingface/runs/uun0ytpj)
 This model was trained with DPO, a method introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://huggingface.co/papers/2305.18290).

 ---
 base_model: mistralai/Ministral-8B-Instruct-2410
 library_name: transformers
+model_name: reflect_mini8Bit_Om2G8kOm2AgG8k40kIpsdpT1
 tags:
 - generated_from_trainer
 - trl
 licence: license
 ---
+# Model Card for reflect_mini8Bit_Om2G8kOm2AgG8k40kIpsdpT1
 This model is a fine-tuned version of [mistralai/Ministral-8B-Instruct-2410](https://huggingface.co/mistralai/Ministral-8B-Instruct-2410).
 It has been trained using [TRL](https://github.com/huggingface/trl).
 from transformers import pipeline
 question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
+generator = pipeline("text-generation", model="RyanYr/reflect_mini8Bit_Om2G8kOm2AgG8k40kIpsdpT1", device="cuda")
 output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
 print(output["generated_text"])
 ```
 ## Training procedure
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/yyr/huggingface/runs/kdzwa0gl)
 This model was trained with DPO, a method introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://huggingface.co/papers/2305.18290).

last_checkpoint/model-00001-of-00004.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c4117303f4be9b97ddb10195a7f1a812e91561cd90ea810d5ec17aaee97c938b
 size 4983016096

 version https://git-lfs.github.com/spec/v1
+oid sha256:07eb7c315bbb4e3da4b3c65a4effcfcabe3b55018c9b6235afbe46b45a7cb47e
 size 4983016096

last_checkpoint/model-00002-of-00004.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:def07944035e06c0434bd72075a71ced40845d039af7774e91ff83c005d9c92c
 size 4999836776

 version https://git-lfs.github.com/spec/v1
+oid sha256:32b3ff53e68129cab66948efe4bff608a0d0ac69645d32186ff6dd17f0c71f00
 size 4999836776

last_checkpoint/model-00003-of-00004.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:331d2388494f13122bc7b08ddf0cceccbd5022b4efa037b8f3b466b8073a3232
 size 4983067960

 version https://git-lfs.github.com/spec/v1
+oid sha256:61fe331b7687901c814c10461d3af972ca1c4761d95fb0b626ba3ceec7b5beb2
 size 4983067960

last_checkpoint/model-00004-of-00004.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:44f2dab95fadf4be92b4cdd435489335b4718b1db8316d78b7cbdc21a26ccf68
 size 1073750144

 version https://git-lfs.github.com/spec/v1
+oid sha256:5de10b97927d0f07135501e4eac8e203205863cb4a7e197565835326c9d2e065
 size 1073750144

last_checkpoint/training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:da611df76d37b7438ad654aae0944d344d6fd43df2bc53615d21bfe4451aecef
 size 8056

 version https://git-lfs.github.com/spec/v1
+oid sha256:b4a9368ff66c902f09859d2561c42082708e68abdd109e26fa3ca72c2cdd839f
 size 8056