Heralax commited on
Commit
03d0612
·
verified ·
1 Parent(s): f39ee02

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -1
README.md CHANGED
@@ -1,3 +1,6 @@
 
 
 
1
  *(Pronounced "Gee RP Oh". The name is a sort-of pun because it was aligned with the GRPO algorithm, but is for RP (roleplay). Therefore, gRPo.)*
2
 
3
  This is an experimental proof of concept model trained with [Augmentoolkit's](https://github.com/e-p-armstrong/augmentoolkit) GRPO pipeline. The Reinforcement Learning done attempted to maximize the amount of emotion that the model wrote with.
@@ -26,4 +29,7 @@ Typical min P settings seem to work alright, though on some sampling params repe
26
 
27
  Fundamentally this is an experimental method applied to a slightly-continually-trained Mistral 7b v0.2, due to the agedness of its base it might lack some of the raw intelligence of newer models.
28
 
29
- Try using [Augmentoolkit's](https://github.com/e-p-armstrong/augmentoolkit) GRPO pipeline to do RL on your own RP models! No code changes required, just use a prompt that grades responses you like highly.
 
 
 
 
1
+ ---
2
+ license: llama3.1
3
+ ---
4
  *(Pronounced "Gee RP Oh". The name is a sort-of pun because it was aligned with the GRPO algorithm, but is for RP (roleplay). Therefore, gRPo.)*
5
 
6
  This is an experimental proof of concept model trained with [Augmentoolkit's](https://github.com/e-p-armstrong/augmentoolkit) GRPO pipeline. The Reinforcement Learning done attempted to maximize the amount of emotion that the model wrote with.
 
29
 
30
  Fundamentally this is an experimental method applied to a slightly-continually-trained Mistral 7b v0.2, due to the agedness of its base it might lack some of the raw intelligence of newer models.
31
 
32
+ Try using [Augmentoolkit's](https://github.com/e-p-armstrong/augmentoolkit) GRPO pipeline to do RL on your own RP models! No code changes required, just use a prompt that grades responses you like highly.
33
+
34
+ Q: Why the Llama license?
35
+ A: The Deepseek Llama Distil model was used as the quality grader. I am not sure if this actually means the license has to kick in, since the model's outputs were not used to make this one directly. But, caution.