How to reproduce the results on hle(Humanity's Last Exam)?
#95
by
wenhanli
- opened
Hello,
thanks for your great work!
When I try to reproduce the results on hle, I get a much lower acc as 8%, which is 14.9% from your blog.
- My server is lanuched with vllm
vllm serve gpt-oss-120b
- I use the code from
https://github.com/centerforaisafety/hle
in which the temperature and top_p are not set. May this be the key problem?
And are there any other points I need to pay attention to?
Best regards!