Why use a small model like the 1.5B? Instead of a larger one? Is there a reason?
#15
by
likewendy
- opened
Why use a small model like the 1.5B? Instead of a larger one? Is there a reason?
That's surely about training cost @likewendy , always better to experiment on smaller and if promising, go bigger. I read somewhere that really small LMs can struggle to pick up the RL. I think they targeted a model just above this limit.
I see! I thought of many reasons, but the only one I hadn’t considered was money.
... I hadn’t considered was money.
lol