tyzhu/sokoban-qwen2.5-7b-it-reward-0.0-text_with_coordinates-env-add_worldmodel_1 Updated 27 days ago
tyzhu/sokoban-qwen2.5-7b-it-reward-1.0-text_with_coordinates-env-add_worldmodel_1 Updated 30 days ago
tyzhu/squad2mem8b_recite_from_wikipedia-r1-grpo-llama3.1-8b-em-warmup-0.05-rouge-rougeL-t1 Updated Aug 5