Heralax commited on
Commit
f9c5d72
·
verified ·
1 Parent(s): 03d0612

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -0
README.md CHANGED
@@ -32,4 +32,5 @@ Fundamentally this is an experimental method applied to a slightly-continually-t
32
  Try using [Augmentoolkit's](https://github.com/e-p-armstrong/augmentoolkit) GRPO pipeline to do RL on your own RP models! No code changes required, just use a prompt that grades responses you like highly.
33
 
34
  Q: Why the Llama license?
 
35
  A: The Deepseek Llama Distil model was used as the quality grader. I am not sure if this actually means the license has to kick in, since the model's outputs were not used to make this one directly. But, caution.
 
32
  Try using [Augmentoolkit's](https://github.com/e-p-armstrong/augmentoolkit) GRPO pipeline to do RL on your own RP models! No code changes required, just use a prompt that grades responses you like highly.
33
 
34
  Q: Why the Llama license?
35
+
36
  A: The Deepseek Llama Distil model was used as the quality grader. I am not sure if this actually means the license has to kick in, since the model's outputs were not used to make this one directly. But, caution.