Aligning Reward Model with Competitive Exam Marking Schemes ( JEE mains for Aryabhata-1.0 )

#7
by Haryaksh - opened

To the PhysicsWallah AI Team,

Congratulations on the successful launch and open-sourcing of the Aryabhatta-1.0 model. It's a significant contribution to the Indian AI ecosystem.

I have a suggestion regarding the model's training methodology, specifically for enhancing its capabilities in cracking competitive exams like the JEE Mains.
My suggestion is to experiment with a reward model for reinforcement learning that directly mirrors the marking scheme of the target exam. For instance, using the JEE Mains pattern:
• +4 reward for selecting the single correct answer in a multiple-choice question.
• -1 reward (penalty) for selecting an incorrect answer.
• 0 reward for not attempting the question or for answers that are not strictly right/wrong (e.g., subjective explanations).
The core rationale is: If the model's ultimate goal is to excel in an exam, its training and reward process should be fundamentally aligned with how the exam itself measures success. This approach could train the model to be more decisive and accurate in high-stakes, objective-based scenarios.

Furthermore, if your team is open to community contributions, I would be very interested in helping to implement and test this reward mechanism myself. I am eager to contribute to this project in a hands-on capacity.

Thank you for your consideration and for your pioneering work. I look forward to seeing Aryabhatta's future development.

Best Regards,
Haryaksh

PhysicsWallah org

Hi @Haryaksh

Thanks for showing your interest. We will connect with you soon. Feel free to explore the model and share feedback in the meantime.

Hi @pw-ai-research ,

Great work! I was wondering if you could open-source the dataset as well.

Hi @pw-ai-research

PhysicsWallahAI/Aryabhata1.0 could not reproduce results, scored 10% in JEE Main’s 2025

Complete Report: https://medium.com/fundamentals-of-artificial-intellegence/physicswallahai-aryabhata1-0-could-not-reproduce-the-results-scored-10-on-jee-mains-176c38c5d384
Claims vs Reality: https://lnkd.in/gZbvb4FZ
Recommendations Guide: https://lnkd.in/gA3VHCaD
Reproducible Code/Notes: https://lnkd.in/gH7PnVCG

Sign up or log in to comment