gguf when? cmon, its been 11 min already!

by Hanswalter - opened 2 days ago

2 days ago

lol well darn, i had plans today... oof... as a quantizer, i wonder if i should wait for the -Instruct ? is that out yet? lol...

MarinaraSpaghetti

2 days ago

•

edited 2 days ago

Better call @bartowski

ubergarm

2 days ago

@MarinaraSpaghetti

I'll put up the Bat-towski signal!

gghfez

2 days ago

@ubergarm I was hoping to see you in one of these threads :D

martinsky

2 days ago

+1 gguf please

BounharAbdelaziz

2 days ago

wait for instruct model, not sure how gguf of the base model could be usefull for personal usage

Ada321

2 days ago

Base models are good for creative writing.

llama-anon

2 days ago

This comment has been hidden (marked as Abuse)

mtcl

2 days ago

lol well darn, i had plans today... oof... as a quantizer, i wonder if i should wait for the -Instruct ? is that out yet? lol...

How dare you have plans when ds puts out a new model!!! 😂

ccocks-deca

2 days ago

"Why is the GGUF so late it's been 20 seconds already!"

mtcl

2 days ago

i think lets wait for instruct version. I am very patient. very very very patient.

createthis

2 days ago

•

edited 2 days ago

I think llama.cpp needs to be updated first.

gghfez

2 days ago

I think llama.cpp needs to be updated first.

https://huggingface.co/deepseek-ai/DeepSeek-V3-Base/blob/main/model.safetensors.index.json

https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Base/blob/main/model.safetensors.index.json

These look identical

createthis

2 days ago

I figured out how to create the bf16 safetensors, now I'm creating the bf16 gguf. We'll see.

createthis

2 days ago

•

edited 2 days ago

Yeah, seems like it needs some changes to llama.cpp. I got it inferring but the chat template seems messed up.

bartowski

2 days ago

I'm throwing a Q4_K_M up soon while I work on imatrix and further quants

bartowski

2 days ago

@createthis it's also a base model so chatting is not going to be as reliable without giving it a multi turn prompt

bartowski

1 day ago

https://huggingface.co/bartowski/deepseek-ai_DeepSeek-V3.1-Base-Q4_K_M-GGUF

In case anyone wants to try Q4_K_M

createthis

1 day ago

@bartowski Thanks for the llama-cli example. TIL.

gopi87

1 day ago

@MarinaraSpaghetti

I'll put up the Bat-towski signal!

when will be in ik_llama :p

nicoboss

1 day ago

Team mradermacher is now generating quants. You can follow the progress on the status page under https://hf.tst.eu/status.html. The first static quants should appear under https://huggingface.co/mradermacher/DeepSeek-V3.1-Base-GGUF within the next few hours.

gghfez

about 21 hours ago

when will be in ik_llama :p

Yeah, I need ik_llama to fit decent quality Deepseek on my hardware too.

I'm making an iq2_ks for myself ( @ubergarm 's cookbook and his calibration dataset for the imatrix). I'll upload it if nobody else has made anything better by the time it's done.

I've also uploaded the upcast bf16 ggufs: gghfez/DeepSeek-V3.1-Base-256x21B-BF16 if it helps anyone else making ik_llama quants.

ubergarm

about 4 hours ago

@gghfez

Thanks for providing something for -Base for people to try out, keep in mind that imatrix was made for an earlier version so might not be so accurate using against a different version of the weights.

I'm gonna start working on the Instruct now that it is ready: https://huggingface.co/deepseek-ai/DeepSeek-V3.1

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment