3 8

Wenkai Yang

Keven16

https://keven980716.github.io/

keven980716

AI & ML interests

None yet

Recent Activity

upvoted an article 8 days ago

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment

commented on a paper 11 days ago

ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models

published a model about 1 month ago

Keven16/Qwen2.5-32B-TOPS-Iter-DPO-Preview

View all activity

Organizations

None yet

upvoted an article 8 days ago

Article

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment

•

Feb 11

• 64

commented a paper 11 days ago

ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models

Paper • 2508.18773 • Published 12 days ago • 14 •

published 2 models about 1 month ago

Keven16/Qwen2.5-32B-TOPS-Iter-DPO-Preview

33B • Updated May 15 • 5

Keven16/Qwen2.5-32B-TOPS-Iter-DPO

33B • Updated May 15 • 2

upvoted a paper about 1 month ago

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26 • 148

published 3 models about 1 month ago

published a model 2 months ago

Keven16/DeepCritic-7B-RL1.5-PRM800K

8B • Updated Jun 25 • 14

updated a model 2 months ago

Keven16/DeepCritic-7B-RL1.5-PRM800K

8B • Updated Jun 25 • 14

published a model 2 months ago

Keven16/DeepCritic-7B-RL1.5-Numina

8B • Updated Jun 23 • 8

updated a model 3 months ago

Keven16/DeepCritic-7B-RL1.5-Numina

8B • Updated Jun 23 • 8

upvoted 2 papers 3 months ago

MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

Paper • 2506.13585 • Published Jun 16 • 263

Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 260

upvoted 2 papers 4 months ago

LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning

Paper • 2505.16933 • Published May 22 • 34

Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning

Paper • 2505.16410 • Published May 22 • 57

published 2 datasets 4 months ago

Keven16/DeepCritic-RL-Data

Viewer • Updated May 13 • 55k • 6

Keven16/DeepCritic-4.5K

Preview • Updated May 13 • 12

published 2 models 4 months ago

Keven16/DeepCritic-7B-RL-Numina

8B • Updated May 12 • 5

Keven16/DeepCritic-7B-RL-PRM800K

8B • Updated May 12 • 2

Wenkai Yang

AI & ML interests

Recent Activity

Organizations

Keven16's activity

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment