Request: FP8 / BF16 version of model?

#53
by Epliz - opened

Hi,

It seems that first reports from users over reddit seems to indicate that the model is maybe hallucinating quite a bit at times, which might be in parts because of the MXFP4 quantization of the MLP layers?

If the models have been trained in FP8 or BF16 precision, would it be possible for you to upload those versions as well?
Those models might not run on single GPUs anymore (except on AMD MI355x), but I am sure people will still be able to use them on multi-GPU machines.

Best regards,
Epliz

Apparently the model was trained directly in 4bit, so I doubt we'll see these

Sign up or log in to comment