Good for coding?

#1
by urtuuuu - opened

Can you give some coding scenarios to see how good it is? Because with this famous and pretty simple question looks like it fails, just like r1 distilled.

Explain the bug in the following code:

from time import sleep
from multiprocessing.pool import ThreadPool
 
def task():
    sleep(1)
    return 'all done'

if __name__ == '__main__':
    with ThreadPool() as pool:
        result = pool.apply_async(task())
        value = result.get()
        print(value)

Answer: Basically >>> result = pool.apply_async(task) not (task())

Everything i tried so far, non thinking models do better. Or am i using wrong temperature or something? It's 0.6

Sorry to break it to everyone, especially to those model creators who claim their ~7B model is good for coding, yet apparently never tested it for coding...

There's no such thing as ~7B model good for coding!!! Benchmarks are 💩. Real world testing always shows such claims like 7B model good for coding are blatant lies!

Wanna models that are somewhat decent for coding tasks? 32B are absolute entry point, meaning they MAY or MAY NOT figure out the problem with the code, but at the very least they will not screw it up even further (well not as much as smaller models that is)...

Sign up or log in to comment