Quantize more models?

#3
by MiaoCata - opened

Great work! But here's a fine-tuned model called DeepScaleR which has a better performance, could you quantize it with NexaQuant from the original Q8_0?

I think Q8_0 and FP16 provide similar performance, so using Q8_0 may be even faster

Sign up or log in to comment