weiliu's picture

weiliu

thinkwee

·

https://thinkwee.top/about/

AI & ML interests

LLM reasoning, agents

Recent Activity

upvoted a paper 11 days ago

Analysing Chain of Thought Dynamics: Active Guidance or Unfaithful Post-hoc Rationalisation?

upvoted a paper 17 days ago

Agentic Reinforced Policy Optimization

updated a collection 18 days ago

View all activity

Organizations

None yet

New activity in thinkwee/NOVEReason_5k about 1 month ago

[bot] Conversion to Parquet

#1 opened about 1 month ago by

parquet-converter

commented a paper 3 months ago

NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning

Paper • 2505.16022 • Published May 21 • 3 •

commented 2 papers 4 months ago

NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning

Paper • 2505.16022 • Published May 21 • 3 •

NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning

Paper • 2505.16022 • Published May 21 • 3 •