BigCodeBench: Benchmarking Large Language Models on Solving Practical and Challenging Programming Tasks
•
52
None defined yet.
Compare two AI models by sending them coding tasks
Evaluate code samples using specified parameters
Evaluate code samples using specified parameters
Explore and analyze code completion benchmarks
Display a PDF file in a web app
Submit code models for evaluation and view leaderboard
Check if your GitHub repo is in The Stack dataset
Search for code snippets in a dataset
Search for code snippets in a dataset
Generate code and answers from questions
Start a web server for an app
Generate code snippets in Python, Java, JavaScript