How to reproduce the results on hle(Humanity's Last Exam)?

#95
by wenhanli - opened

Hello,

thanks for your great work!

When I try to reproduce the results on hle, I get a much lower acc as 8%, which is 14.9% from your blog.

  1. My server is lanuched with vllm
vllm serve gpt-oss-120b
  1. I use the code from
https://github.com/centerforaisafety/hle

in which the temperature and top_p are not set. May this be the key problem?

And are there any other points I need to pay attention to?

Best regards!

Sign up or log in to comment