Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,6 @@
|
|
|
|
|
|
|
|
1 |
*(Pronounced "Gee RP Oh". The name is a sort-of pun because it was aligned with the GRPO algorithm, but is for RP (roleplay). Therefore, gRPo.)*
|
2 |
|
3 |
This is an experimental proof of concept model trained with [Augmentoolkit's](https://github.com/e-p-armstrong/augmentoolkit) GRPO pipeline. The Reinforcement Learning done attempted to maximize the amount of emotion that the model wrote with.
|
@@ -26,4 +29,7 @@ Typical min P settings seem to work alright, though on some sampling params repe
|
|
26 |
|
27 |
Fundamentally this is an experimental method applied to a slightly-continually-trained Mistral 7b v0.2, due to the agedness of its base it might lack some of the raw intelligence of newer models.
|
28 |
|
29 |
-
Try using [Augmentoolkit's](https://github.com/e-p-armstrong/augmentoolkit) GRPO pipeline to do RL on your own RP models! No code changes required, just use a prompt that grades responses you like highly.
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: llama3.1
|
3 |
+
---
|
4 |
*(Pronounced "Gee RP Oh". The name is a sort-of pun because it was aligned with the GRPO algorithm, but is for RP (roleplay). Therefore, gRPo.)*
|
5 |
|
6 |
This is an experimental proof of concept model trained with [Augmentoolkit's](https://github.com/e-p-armstrong/augmentoolkit) GRPO pipeline. The Reinforcement Learning done attempted to maximize the amount of emotion that the model wrote with.
|
|
|
29 |
|
30 |
Fundamentally this is an experimental method applied to a slightly-continually-trained Mistral 7b v0.2, due to the agedness of its base it might lack some of the raw intelligence of newer models.
|
31 |
|
32 |
+
Try using [Augmentoolkit's](https://github.com/e-p-armstrong/augmentoolkit) GRPO pipeline to do RL on your own RP models! No code changes required, just use a prompt that grades responses you like highly.
|
33 |
+
|
34 |
+
Q: Why the Llama license?
|
35 |
+
A: The Deepseek Llama Distil model was used as the quality grader. I am not sure if this actually means the license has to kick in, since the model's outputs were not used to make this one directly. But, caution.
|