Can This release just like your R1 671b 2.51bit?
#1
by
pty819
- opened
Hello, I was deeply impressed and amazed by the almost perfect performance of your 2.51bit model after trying it out. However, I am currently limited by hardware and would like to try using the dynamically quantized 70b version to provide model services for my company. Could you please tell me what methods I should use to convert these weights into GGUF format and use llama server to provide services? I noticed that currently I must use the unsloth library to load this model, and there is no way to provide an external OpenAI API interface.