Any documents?

#1
by invokerliang - opened

Are you working on something related to R1 applied to Lean proofs? Is there any documents to this model?

No documents yet as I just fine-tuned the Qwen 1.5B distill of R1 on the DeepSeek Prover dataset with unsloth.
This was basically a test to see if I could fine-tune a reasoning model without it forgetting how to reason. It didn't extend its reasoning to Lean4 (the dataset isn't CoT) and was instable/collapsed often.
I would like to work on R1 on Lean proofs (I would just need to setup a Lean enviroment then do GRPO based on the compiler's result) but I do not have enough compute to execute this at a scale that's worth trying.

Sign up or log in to comment