Update README.md
Browse files
README.md
CHANGED
@@ -24,4 +24,6 @@ Using the hardcoded system prompt prefix is heavily encouraged.
|
|
24 |
|
25 |
Typical min P settings seem to work alright, though on some sampling params repeitition is observed, be careful and experiment a bit.
|
26 |
|
|
|
|
|
27 |
Try using [Augmentoolkit's](https://github.com/e-p-armstrong/augmentoolkit) GRPO pipeline to do RL on your own RP models! No code changes required, just use a prompt that grades responses you like highly.
|
|
|
24 |
|
25 |
Typical min P settings seem to work alright, though on some sampling params repeitition is observed, be careful and experiment a bit.
|
26 |
|
27 |
+
Fundamentally this is an experimental method applied to a slightly-continually-trained Mistral 7b v0.2, due to the agedness of its base it might lack some of the raw intelligence of newer models.
|
28 |
+
|
29 |
Try using [Augmentoolkit's](https://github.com/e-p-armstrong/augmentoolkit) GRPO pipeline to do RL on your own RP models! No code changes required, just use a prompt that grades responses you like highly.
|