RyanYr commited on
Commit
9aa4e77
·
verified ·
1 Parent(s): 73544ec

Model save

Browse files
README.md CHANGED
@@ -1,7 +1,7 @@
1
  ---
2
  base_model: mistralai/Ministral-8B-Instruct-2410
3
  library_name: transformers
4
- model_name: reflect_mini8Bit_Om2G8kOm2AgG8k40kIpsdpT02
5
  tags:
6
  - generated_from_trainer
7
  - trl
@@ -9,7 +9,7 @@ tags:
9
  licence: license
10
  ---
11
 
12
- # Model Card for reflect_mini8Bit_Om2G8kOm2AgG8k40kIpsdpT02
13
 
14
  This model is a fine-tuned version of [mistralai/Ministral-8B-Instruct-2410](https://huggingface.co/mistralai/Ministral-8B-Instruct-2410).
15
  It has been trained using [TRL](https://github.com/huggingface/trl).
@@ -20,14 +20,14 @@ It has been trained using [TRL](https://github.com/huggingface/trl).
20
  from transformers import pipeline
21
 
22
  question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
23
- generator = pipeline("text-generation", model="RyanYr/reflect_mini8Bit_Om2G8kOm2AgG8k40kIpsdpT02", device="cuda")
24
  output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
25
  print(output["generated_text"])
26
  ```
27
 
28
  ## Training procedure
29
 
30
- [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/yyr/huggingface/runs/uun0ytpj)
31
 
32
  This model was trained with DPO, a method introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://huggingface.co/papers/2305.18290).
33
 
 
1
  ---
2
  base_model: mistralai/Ministral-8B-Instruct-2410
3
  library_name: transformers
4
+ model_name: reflect_mini8Bit_Om2G8kOm2AgG8k40kIpsdpT1
5
  tags:
6
  - generated_from_trainer
7
  - trl
 
9
  licence: license
10
  ---
11
 
12
+ # Model Card for reflect_mini8Bit_Om2G8kOm2AgG8k40kIpsdpT1
13
 
14
  This model is a fine-tuned version of [mistralai/Ministral-8B-Instruct-2410](https://huggingface.co/mistralai/Ministral-8B-Instruct-2410).
15
  It has been trained using [TRL](https://github.com/huggingface/trl).
 
20
  from transformers import pipeline
21
 
22
  question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
23
+ generator = pipeline("text-generation", model="RyanYr/reflect_mini8Bit_Om2G8kOm2AgG8k40kIpsdpT1", device="cuda")
24
  output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
25
  print(output["generated_text"])
26
  ```
27
 
28
  ## Training procedure
29
 
30
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/yyr/huggingface/runs/kdzwa0gl)
31
 
32
  This model was trained with DPO, a method introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://huggingface.co/papers/2305.18290).
33
 
last_checkpoint/model-00001-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c4117303f4be9b97ddb10195a7f1a812e91561cd90ea810d5ec17aaee97c938b
3
  size 4983016096
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:07eb7c315bbb4e3da4b3c65a4effcfcabe3b55018c9b6235afbe46b45a7cb47e
3
  size 4983016096
last_checkpoint/model-00002-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:def07944035e06c0434bd72075a71ced40845d039af7774e91ff83c005d9c92c
3
  size 4999836776
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:32b3ff53e68129cab66948efe4bff608a0d0ac69645d32186ff6dd17f0c71f00
3
  size 4999836776
last_checkpoint/model-00003-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:331d2388494f13122bc7b08ddf0cceccbd5022b4efa037b8f3b466b8073a3232
3
  size 4983067960
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:61fe331b7687901c814c10461d3af972ca1c4761d95fb0b626ba3ceec7b5beb2
3
  size 4983067960
last_checkpoint/model-00004-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:44f2dab95fadf4be92b4cdd435489335b4718b1db8316d78b7cbdc21a26ccf68
3
  size 1073750144
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5de10b97927d0f07135501e4eac8e203205863cb4a7e197565835326c9d2e065
3
  size 1073750144
last_checkpoint/training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:da611df76d37b7438ad654aae0944d344d6fd43df2bc53615d21bfe4451aecef
3
  size 8056
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b4a9368ff66c902f09859d2561c42082708e68abdd109e26fa3ca72c2cdd839f
3
  size 8056