nielsr HF Staff commited on
Commit
9596fe2
·
verified ·
1 Parent(s): c004c9b

Add library name and link to code

Browse files

This PR adds a link to the Github repository to improve discoverability. It also ensures the appropriate library is known (which is Transformers) and makes it appear on the top right corner of the model page.

Files changed (1) hide show
  1. README.md +6 -4
README.md CHANGED
@@ -1,10 +1,11 @@
1
  ---
 
 
2
  language:
3
  - en
4
  license: apache-2.0
5
- datasets:
6
- - openbmb/UltraFeedback
7
  pipeline_tag: text-generation
 
8
  model-index:
9
  - name: SPPO-Llama-3-8B-Instruct-GPM-2B
10
  results:
@@ -104,6 +105,8 @@ model-index:
104
 
105
  General Preference Modeling with Preference Representations for Aligning Language Models (https://arxiv.org/abs/2410.02197)
106
 
 
 
107
  # SPPO-Llama-3-8B-Instruct-GPM-2B
108
 
109
  This model was developed using [SPPO](https://arxiv.org/abs/2405.00675) at iteration 3 and the [General Preference representation Model (GPM)](https://arxiv.org/abs/2410.02197) (specifically, using [GPM-Gemma-2B](https://huggingface.co/general-preference/GPM-Gemma-2B)), based on the [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) architecture as starting point. We utilized the prompt sets from the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset, splited to 3 parts for 3 iterations by [snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset](https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset). All responses used are synthetic.
@@ -165,5 +168,4 @@ The following hyperparameters were used during training:
165
  journal={arXiv preprint arXiv:2410.02197},
166
  year={2024}
167
  }
168
- ```
169
-
 
1
  ---
2
+ datasets:
3
+ - openbmb/UltraFeedback
4
  language:
5
  - en
6
  license: apache-2.0
 
 
7
  pipeline_tag: text-generation
8
+ library_name: transformers
9
  model-index:
10
  - name: SPPO-Llama-3-8B-Instruct-GPM-2B
11
  results:
 
105
 
106
  General Preference Modeling with Preference Representations for Aligning Language Models (https://arxiv.org/abs/2410.02197)
107
 
108
+ This code can be found at https://github.com/general-preference/general-preference-model
109
+
110
  # SPPO-Llama-3-8B-Instruct-GPM-2B
111
 
112
  This model was developed using [SPPO](https://arxiv.org/abs/2405.00675) at iteration 3 and the [General Preference representation Model (GPM)](https://arxiv.org/abs/2410.02197) (specifically, using [GPM-Gemma-2B](https://huggingface.co/general-preference/GPM-Gemma-2B)), based on the [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) architecture as starting point. We utilized the prompt sets from the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset, splited to 3 parts for 3 iterations by [snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset](https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset). All responses used are synthetic.
 
168
  journal={arXiv preprint arXiv:2410.02197},
169
  year={2024}
170
  }
171
+ ```