Why use a small model like the 1.5B? Instead of a larger one? Is there a reason?

#15
by likewendy - opened

Why use a small model like the 1.5B? Instead of a larger one? Is there a reason?

That's surely about training cost @likewendy , always better to experiment on smaller and if promising, go bigger. I read somewhere that really small LMs can struggle to pick up the RL. I think they targeted a model just above this limit.

I see! I thought of many reasons, but the only one I hadn’t considered was money.

... I hadn’t considered was money.

lol

Sign up or log in to comment