ajagota71/llama-3-2-1b-rlhf-kl-p5-target-2p5-lr-3e-6-checkpoint-epoch-80 Reinforcement Learning • 1B • Updated Jul 6 • 10
ajagota71/llama-3-2-1b-rlhf-kl-p5-target-2p5-lr-3e-6-checkpoint-epoch-100 Reinforcement Learning • 1B • Updated Jul 6 • 24
ajagota71/llama-3-2-1b-rlhf-kl-p5-target-2p5-lr-3e-6 Reinforcement Learning • 1B • Updated Jul 6 • 13
tensorblock/Nellyw888_VeriReason-codeLlama-7b-RTLCoder-Verilog-GRPO-reasoning-tb-GGUF Reinforcement Learning • 7B • Updated 23 days ago • 138
mradermacher/Qwen3-14B-ARPO-DeepSearch-GGUF Reinforcement Learning • 15B • Updated 9 days ago • 3.04k • 1
mradermacher/Qwen3-14B-ARPO-DeepSearch-i1-GGUF Reinforcement Learning • 15B • Updated 9 days ago • 2.91k • 1
mradermacher/CscSQL-Merge-Qwen2.5-Coder-0.5B-Instruct-GGUF Reinforcement Learning • 0.6B • Updated 21 days ago • 185
mradermacher/CscSQL-Merge-Qwen2.5-Coder-1.5B-Instruct-GGUF Reinforcement Learning • 2B • Updated 21 days ago • 432