Raja Biswas's picture

Raja Biswas

rbiswasfc

·

AI & ML interests

NLP, Generative AI

Recent Activity

upvoted an article about 19 hours ago

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

upvoted an article about 19 hours ago

Illustrating Reinforcement Learning from Human Feedback (RLHF)

liked a model about 19 hours ago

deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

View all activity

Organizations

rbiswasfc's activity

upvoted 2 articles about 19 hours ago

Article

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

By

•

16 days ago

• 42

Article

Illustrating Reinforcement Learning from Human Feedback (RLHF)

Dec 9, 2022

• 169

upvoted 2 collections 5 days ago

SimpleRL

The collection for the Project "Simple Reinforcement Learning for Reasoning" • 2 items • Updated 5 days ago • 4

CodeI/O

Collection for CodeI/O @ https://codei-o.github.io/ • 15 items • Updated 11 days ago • 6

upvoted a paper 7 days ago

Learn Your Reference Model for Real Good Alignment

Paper • 2404.09656 • Published Apr 15, 2024 • 84

upvoted an article 8 days ago

Article

How NuminaMath Won the 1st AIMO Progress Prize

Jul 11, 2024

• 116

upvoted a collection 8 days ago

NuminaMath

Datasets and models for training SOTA math LLMs. See our GitHub for training & inference code: https://github.com/project-numina/aimo-progress-prize • 7 items • Updated 13 days ago • 75

upvoted an article 10 days ago

Article

1 Billion Classifications

11 days ago

• 38

upvoted 4 papers 12 days ago

Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2

Paper • 2502.03544 • Published 18 days ago • 42

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Paper • 2502.05171 • Published 16 days ago • 114

Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning

Paper • 2502.06781 • Published 13 days ago • 59

Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling

Paper • 2502.06703 • Published 13 days ago • 134

upvoted 2 collections 12 days ago

OpenR1-Math

Dataset and SFT model distilled from DeepSeek-R1. Check out our blog post for more details: https://huggingface.co/blog/open-r1/update-2 • 3 items • Updated 9 days ago • 6

🧠 Reasoning datasets

Datasets with reasoning traces for math and code released by the community • 12 items • Updated 4 days ago • 79

upvoted a paper 12 days ago

The Curse of Depth in Large Language Models

Paper • 2502.05795 • Published 15 days ago • 31

upvoted an article 13 days ago

Article

Open R1: Update #2

By

and 6 others •

13 days ago

• 184

upvoted a paper 14 days ago

On Teacher Hacking in Language Model Distillation

Paper • 2502.02671 • Published 19 days ago • 17

upvoted an article 14 days ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

27 days ago

• 770

upvoted 2 papers 14 days ago

Demystifying Long Chain-of-Thought Reasoning in LLMs

Paper • 2502.03373 • Published 18 days ago • 51

LIMO: Less is More for Reasoning

Paper • 2502.03387 • Published 18 days ago • 56