tencent/Hunyuan-7B-Instruct-FP8
Text Generation
•
8B
•
Updated
•
66
•
5
None defined yet.
MarsRL: Advancing Multi-Agent Reasoning System via Reinforcement Learning with Agentic Pipeline Parallelism
DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation