BigCodeArena: Judging code generations end to end with code executions
•
16
None defined yet.
BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions