Heralax commited on
Commit
f39ee02
·
verified ·
1 Parent(s): a210d0d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -24,4 +24,6 @@ Using the hardcoded system prompt prefix is heavily encouraged.
24
 
25
  Typical min P settings seem to work alright, though on some sampling params repeitition is observed, be careful and experiment a bit.
26
 
 
 
27
  Try using [Augmentoolkit's](https://github.com/e-p-armstrong/augmentoolkit) GRPO pipeline to do RL on your own RP models! No code changes required, just use a prompt that grades responses you like highly.
 
24
 
25
  Typical min P settings seem to work alright, though on some sampling params repeitition is observed, be careful and experiment a bit.
26
 
27
+ Fundamentally this is an experimental method applied to a slightly-continually-trained Mistral 7b v0.2, due to the agedness of its base it might lack some of the raw intelligence of newer models.
28
+
29
  Try using [Augmentoolkit's](https://github.com/e-p-armstrong/augmentoolkit) GRPO pipeline to do RL on your own RP models! No code changes required, just use a prompt that grades responses you like highly.