GLM 4 32B too?
Hey! I'll take a look at it, but don't expect it anytime soon, too many things in my backlog
I will give it a try, but it will be a couple of days:
I'm just creating a 12-headed (~ 0.4B
) distilled version of Qwen2.5-0.5B-Instruct
that I can use for future draft models (ie: instead of having to trim to 12 heads and then retrain every time for a new model).
After I have this then I should be able to create draft models using way less data (hopefully)... I can generate around 0.5B
tokens per day using 7 GPUs, and hope to have at least 2B
tokens for creating the distilled version.
If it works, then I'll try GLM 4-32B-0414
first as it seems a good test case with no tiny models available to use as a draft.
@alamios @jukofyork Thanks, looking forward to it!
There still seems to be problems with this model in llama.cpp
:
https://github.com/ggml-org/llama.cpp/issues/12946
So I'll probably hold off until it's working 100%.
Any update on this? 👀
Any update on this? 👀
Sorry, forgot all about this: I did try with GLM 4-32B-0414
but it didn't work due to some weird issue with the lack of a <BOS>
symbol :/
What about the general pruned 12 head version?
@jukofyork I tried your transplant-vocab script with llama 3.2 1b as a draft for GLM 4 32B, got this error:
So just wanted to clarify, is this the problem you're running into?
Also, I was wondering: is this an issue that could be resolved with enough training?
@jukofyork I tried your transplant-vocab script with llama 3.2 1b as a draft for GLM 4 32B, got this error:
So just wanted to clarify, is this the problem you're running into?
Also, I was wondering: is this an issue that could be resolved with enough training?
I think there is something wrong with the GLM 4 32B
model and transformers, as I can't even load it to train control vectors from either. It has some really odd BOS
type token with "mask" in the name when I tried to fix this last time.
See: https://huggingface.co/THUDM/GLM-4-32B-0414/blob/main/tokenizer_config.json
"chat_template": "[gMASK]<sop>{%- if tools -%}<|system|>\n# 可用工具\n{% for tool in tools %}{%- set function = tool.function if tool.get(\"function\") else tool %}\n\n## {{ function.name }}\n\n{{ function | tojson(indent=4, ensure_ascii=False) }}\n在调用上述函数时,请使用 Json 格式表示调用的参数。{%- endfor %}{%- endif -%}{%- for msg in messages %}{%- if msg.role == 'system' %}<|system|>\n{{ msg.content }}{%- endif %}{%- endfor %}{%- for message in messages if message.role != 'system' %}{%- set role = message['role'] %}{%- set content = message['content'] %}{%- set meta = message.get(\"metadata\", \"\") %}{%- if role == 'user' %}<|user|>\n{{ content }}{%- elif role == 'assistant' and not meta %}<|assistant|>\n{{ content }}{%- elif role == 'assistant' and meta %}<|assistant|>{{ meta }} \n{{ content }}{%- elif role == 'observation' %}<|observation|>\n{{ content }}{%- endif %}{%- endfor %}{% if add_generation_prompt %}<|assistant|>\n{% endif %}",
Hmm yeah, that's strange. I'll look more closely if I have the time. Right now I'm just messing around with a draft model for the new Reka Flash 3.1.
It might be possible to put that odd token back as the BOS
token and get it to work, but too busy with other stuff ATM to try fiddling with it anymore.
https://huggingface.co/jukofyork/GLM-4.5-DRAFT-0.6B-v3.0
https://huggingface.co/jukofyork/GLM-4.5-DRAFT-0.6B-v3.0-GGUF
This should hopefully work on the older zai-org/GLM-4-32B-0414
(and zai-org/GLM-4.5-Air
) as it uses the same tokeniser.