Heralax
/

llama-gRPo-thoughtprocess

Model card Files Files and versions Community

Heralax commited on Jun 7

Commit

f9c5d72

·

verified ·

1 Parent(s): 03d0612

Update README.md

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -32,4 +32,5 @@ Fundamentally this is an experimental method applied to a slightly-continually-t
 Try using [Augmentoolkit's](https://github.com/e-p-armstrong/augmentoolkit) GRPO pipeline to do RL on your own RP models! No code changes required, just use a prompt that grades responses you like highly.
 Q: Why the Llama license?
 A: The Deepseek Llama Distil model was used as the quality grader. I am not sure if this actually means the license has to kick in, since the model's outputs were not used to make this one directly. But, caution.

 Try using [Augmentoolkit's](https://github.com/e-p-armstrong/augmentoolkit) GRPO pipeline to do RL on your own RP models! No code changes required, just use a prompt that grades responses you like highly.
 Q: Why the Llama license?
 A: The Deepseek Llama Distil model was used as the quality grader. I am not sure if this actually means the license has to kick in, since the model's outputs were not used to make this one directly. But, caution.