Update README.md
Browse files
README.md
CHANGED
@@ -32,4 +32,5 @@ Fundamentally this is an experimental method applied to a slightly-continually-t
|
|
32 |
Try using [Augmentoolkit's](https://github.com/e-p-armstrong/augmentoolkit) GRPO pipeline to do RL on your own RP models! No code changes required, just use a prompt that grades responses you like highly.
|
33 |
|
34 |
Q: Why the Llama license?
|
|
|
35 |
A: The Deepseek Llama Distil model was used as the quality grader. I am not sure if this actually means the license has to kick in, since the model's outputs were not used to make this one directly. But, caution.
|
|
|
32 |
Try using [Augmentoolkit's](https://github.com/e-p-armstrong/augmentoolkit) GRPO pipeline to do RL on your own RP models! No code changes required, just use a prompt that grades responses you like highly.
|
33 |
|
34 |
Q: Why the Llama license?
|
35 |
+
|
36 |
A: The Deepseek Llama Distil model was used as the quality grader. I am not sure if this actually means the license has to kick in, since the model's outputs were not used to make this one directly. But, caution.
|