albincsergo/qwen-lean-r1-new

No documents yet as I just fine-tuned the Qwen 1.5B distill of R1 on the DeepSeek Prover dataset with unsloth.
This was basically a test to see if I could fine-tune a reasoning model without it forgetting how to reason. It didn't extend its reasoning to Lean4 (the dataset isn't CoT) and was instable/collapsed often.
I would like to work on R1 on Lean proofs (I would just need to setup a Lean enviroment then do GRPO based on the compiler's result) but I do not have enough compute to execute this at a scale that's worth trying.

albincsergo
/

qwen-lean-r1-new

Any documents?