How to reproduce the results on hle(Humanity's Last Exam)?

#95

by wenhanli - opened 12 days ago

12 days ago

Hello,

thanks for your great work!

When I try to reproduce the results on hle, I get a much lower acc as 8%, which is 14.9% from your blog.

vllm serve gpt-oss-120b

https://github.com/centerforaisafety/hle

in which the temperature and top_p are not set. May this be the key problem?

And are there any other points I need to pay attention to?

Best regards!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment