Tencent

Team

Verified

AI & ML interests

None defined yet.

Recent Activity

aleclyu new activity about 5 hours ago

tencent/HunyuanOCR:Update README

aleclyu new activity about 5 hours ago

tencent/HunyuanOCR:Update README

aleclyu new activity 1 day ago

tencent/HunyuanOCR:Update README

View all activity

Papers

EntroPIC: Towards Stable Long-Term Training of LLMs via Entropy Stabilization with Proportional-Integral Control

MarsRL: Advancing Multi-Agent Reasoning System via Reinforcement Learning with Agentic Pipeline Parallelism

View all Papers

tencent 's Papers 28

Submitted by

Kai Yang

EntroPIC: Towards Stable Long-Term Training of LLMs via Entropy Stabilization with Proportional-Integral Control

tencent

Submitted by

liu

MarsRL: Advancing Multi-Agent Reasoning System via Reinforcement Learning with Agentic Pipeline Parallelism

tencent

Submitted by

Chenchen Zhang

DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation

tencent

Submitted by

Zihao Yi

Too Good to be Bad: On the Failure of LLMs to Role-Play Villains

tencent

Submitted by

Ke Li

LTD-Bench: Evaluating Large Language Models by Letting Them Draw

tencent

Submitted by

Chenze Shao

Continuous Autoregressive Language Models

tencent

Submitted by

Tian Lan

The End of Manual Decoding: Towards Truly End-to-End Language Models

tencent

Submitted by

Dian Yu

Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values

tencent

Submitted by

Liyang He

ChronoPlay: A Framework for Modeling Dual Dynamics and Authenticity in Game RAG Benchmarks

tencent

Submitted by

Chenchen Zhang

ReLook: Vision-Grounded RL with a Multimodal LLM Critic for Agentic Web Coding

tencent

Submitted by

Wenhao Yu

Don't Throw Away Your Pretrained Model

tencent

Submitted by

taesiri

Training-Free Group Relative Policy Optimization

tencent

Submitted by

Hao Wu

GCPO: When Contrast Fails, Go Gold

tencent

Submitted by

Guanhua Huang

Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward

tencent

Submitted by

Zhenwen Liang

CLUE: Non-parametric Verification from Experience via Hidden-State Clustering

tencent

Submitted by

Rui Liu

VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning

tencent

2

Submitted by

Zhaopeng Tu

BatonVoice: An Operationalist Framework for Enhancing Controllable Speech Synthesis with Linguistic Intelligence from LLMs

tencent

2

Submitted by

xuxin

Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners

tencent

2

Submitted by

Zhongwen Xu

Cogito, Ergo Ludo: An Agent that Learns to Play by Reasoning and Planning

tencent

2

Submitted by

taesiri

HunyuanImage 3.0 Technical Report

tencent

Submitted by

taesiri

Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning

tencent

Submitted by

Zhongwen Xu

Single-stream Policy Optimization

tencent

3

Submitted by

Xinyu Yang

Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

tencent

Submitted by

Wenhao Yu

Self-Rewarding Vision-Language Model via Reasoning Decomposition

tencent

Submitted by

Zhongwen Xu

Understanding Tool-Integrated Reasoning

tencent

4

Submitted by

Chengsong Huang

R-Zero: Self-Evolving Reasoning LLM from Zero Data

tencent

Submitted by

Yulei Qin

Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models

tencent

Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization

tencent