Wenkai Yang's picture

3 8

Wenkai Yang

Keven16

·

https://keven980716.github.io/

keven980716

AI & ML interests

None yet

Recent Activity

upvoted an article 8 days ago

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment

commented on a paper 11 days ago

ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models

published a model about 1 month ago

Keven16/Qwen2.5-32B-TOPS-Iter-DPO-Preview

View all activity

Organizations

None yet

commented a paper 11 days ago

ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models

Paper • 2508.18773 • Published 12 days ago • 14 •

commented 2 papers 4 months ago

DeepCritic: Deliberate Critique with Large Language Models

Paper • 2505.00662 • Published May 1 • 55 •

DeepCritic: Deliberate Critique with Large Language Models

Paper • 2505.00662 • Published May 1 • 55 •

commented a paper about 1 year ago

Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization

Paper • 2406.11431 • Published Jun 17, 2024 • 4 •