Add library name and link to code
Browse filesThis PR adds a link to the Github repository to improve discoverability. It also ensures the appropriate library is known (which is Transformers) and makes it appear on the top right corner of the model page.
README.md
CHANGED
@@ -1,10 +1,11 @@
|
|
1 |
---
|
|
|
|
|
2 |
language:
|
3 |
- en
|
4 |
license: apache-2.0
|
5 |
-
datasets:
|
6 |
-
- openbmb/UltraFeedback
|
7 |
pipeline_tag: text-generation
|
|
|
8 |
model-index:
|
9 |
- name: SPPO-Llama-3-8B-Instruct-GPM-2B
|
10 |
results:
|
@@ -104,6 +105,8 @@ model-index:
|
|
104 |
|
105 |
General Preference Modeling with Preference Representations for Aligning Language Models (https://arxiv.org/abs/2410.02197)
|
106 |
|
|
|
|
|
107 |
# SPPO-Llama-3-8B-Instruct-GPM-2B
|
108 |
|
109 |
This model was developed using [SPPO](https://arxiv.org/abs/2405.00675) at iteration 3 and the [General Preference representation Model (GPM)](https://arxiv.org/abs/2410.02197) (specifically, using [GPM-Gemma-2B](https://huggingface.co/general-preference/GPM-Gemma-2B)), based on the [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) architecture as starting point. We utilized the prompt sets from the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset, splited to 3 parts for 3 iterations by [snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset](https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset). All responses used are synthetic.
|
@@ -165,5 +168,4 @@ The following hyperparameters were used during training:
|
|
165 |
journal={arXiv preprint arXiv:2410.02197},
|
166 |
year={2024}
|
167 |
}
|
168 |
-
```
|
169 |
-
|
|
|
1 |
---
|
2 |
+
datasets:
|
3 |
+
- openbmb/UltraFeedback
|
4 |
language:
|
5 |
- en
|
6 |
license: apache-2.0
|
|
|
|
|
7 |
pipeline_tag: text-generation
|
8 |
+
library_name: transformers
|
9 |
model-index:
|
10 |
- name: SPPO-Llama-3-8B-Instruct-GPM-2B
|
11 |
results:
|
|
|
105 |
|
106 |
General Preference Modeling with Preference Representations for Aligning Language Models (https://arxiv.org/abs/2410.02197)
|
107 |
|
108 |
+
This code can be found at https://github.com/general-preference/general-preference-model
|
109 |
+
|
110 |
# SPPO-Llama-3-8B-Instruct-GPM-2B
|
111 |
|
112 |
This model was developed using [SPPO](https://arxiv.org/abs/2405.00675) at iteration 3 and the [General Preference representation Model (GPM)](https://arxiv.org/abs/2410.02197) (specifically, using [GPM-Gemma-2B](https://huggingface.co/general-preference/GPM-Gemma-2B)), based on the [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) architecture as starting point. We utilized the prompt sets from the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset, splited to 3 parts for 3 iterations by [snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset](https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset). All responses used are synthetic.
|
|
|
168 |
journal={arXiv preprint arXiv:2410.02197},
|
169 |
year={2024}
|
170 |
}
|
171 |
+
```
|
|