Bingxiang He

hbx

https://hbx-hbx.github.io/

AI & ML interests

NLP

Recent Activity

upvoted a paper 1 day ago

Towards a Unified View of Large Language Model Post-Training

upvoted a paper 19 days ago

SSRL: Self-Search Reinforcement Learning

upvoted a paper 25 days ago

UserBench: An Interactive Gym Environment for User-Centric Agents

View all activity

Organizations

None yet

upvoted a paper 1 day ago

Towards a Unified View of Large Language Model Post-Training

Paper • 2509.04419 • Published 2 days ago • 52

upvoted a paper 19 days ago

SSRL: Self-Search Reinforcement Learning

Paper • 2508.10874 • Published 23 days ago • 91

upvoted a paper 25 days ago

UserBench: An Interactive Gym Environment for User-Centric Agents

Paper • 2507.22034 • Published Jul 29 • 29

liked a model about 1 month ago

openbmb/MiniCPM-V-4

Image-Text-to-Text • 4B • Updated 25 days ago • 21k • 460

upvoted a paper 3 months ago

MiniCPM4: Ultra-Efficient LLMs on End Devices

Paper • 2506.07900 • Published Jun 9 • 90

upvoted a collection 3 months ago

MiniCPM4

Collection

MiniCPM4: Ultra-Efficient LLMs on End Devices • 28 items • Updated about 2 hours ago • 74

upvoted a paper 3 months ago

The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models

Paper • 2505.22617 • Published May 28 • 130

upvoted 2 papers 5 months ago

TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published Apr 22 • 120

ToolRL: Reward is All Tool Learning Needs

Paper • 2504.13958 • Published Apr 16 • 46

authored a paper 7 months ago

Process Reinforcement through Implicit Rewards

Paper • 2502.01456 • Published Feb 3 • 62

upvoted a paper 7 months ago

Process Reinforcement through Implicit Rewards

Paper • 2502.01456 • Published Feb 3 • 62

liked a model 8 months ago

PRIME-RL/Eurus-2-7B-PRIME

Text Generation • 8B • Updated Feb 19 • 1.3k • 62

upvoted an article 8 months ago

Article

Process Reinforcement through Implicit Rewards

and 1 other •

Jan 3

• 29

upvoted a paper 9 months ago

Free Process Rewards without Process Labels

Paper • 2412.01981 • Published Dec 2, 2024 • 35

upvoted a collection over 1 year ago

Eurus

Collection

Advancing LLM Reasoning Generalists with Preference Trees • 11 items • Updated 30 days ago • 25

updated a dataset over 1 year ago

hbx/IN3

Viewer • Updated Feb 20, 2024 • 1.37k • 100 • 7

updated a model over 1 year ago

hbx/Mistral-Interact

Text Generation • Updated Feb 20, 2024 • 14 • 3

updated a dataset over 1 year ago

hbx/IN3-interaction

Viewer • Updated Feb 20, 2024 • 2.53k • 59 • 3

liked a model over 1 year ago

hbx/Mistral-Interact

Text Generation • Updated Feb 20, 2024 • 14 • 3

Bingxiang He

AI & ML interests

Recent Activity

Organizations

hbx's activity

Process Reinforcement through Implicit Rewards