konstantin-ketterer/Qwen2-3B-GRPO-max-advantage-4x-oversampling-reference-m-sync-0.9-32-no-wd-0.02-warmup Updated 1 day ago
konstantin-ketterer/Qwen2-3B-GRPO-baseline-reference-m-sync-0.9-32-no-wd-0.02-warmup Updated about 13 hours ago