Running 518 518 Scaling test-time compute π Enhance math problem solving by scaling test-time compute
view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr β’ 16 days ago β’ 42
DocLLM: A layout-aware generative language model for multimodal document understanding Paper β’ 2401.00908 β’ Published Dec 31, 2023 β’ 181 β’ 25
paulofinardi/OIG_small_chip2_portuguese_brasil Viewer β’ Updated Mar 19, 2023 β’ 210k β’ 62 β’ 14
paulofinardi/OIG_small_chip2_portuguese_brasil Viewer β’ Updated Mar 19, 2023 β’ 210k β’ 62 β’ 14