GLM 4 32B too?

#2
by qingy2024 - opened

Hi @alamios !

This model works really well. Are you planning to create an equivalent 0.5B draft model for the new GLM 4-32B-0414 model?

Owner

Hey! I'll take a look at it, but don't expect it anytime soon, too many things in my backlog

I will give it a try, but it will be a couple of days:

I'm just creating a 12-headed (~ 0.4B) distilled version of Qwen2.5-0.5B-Instruct that I can use for future draft models (ie: instead of having to trim to 12 heads and then retrain every time for a new model).

After I have this then I should be able to create draft models using way less data (hopefully)... I can generate around 0.5B tokens per day using 7 GPUs, and hope to have at least 2B tokens for creating the distilled version.

If it works, then I'll try GLM 4-32B-0414 first as it seems a good test case with no tiny models available to use as a draft.

@alamios @jukofyork Thanks, looking forward to it!

Any update on this? 👀

Any update on this? 👀

Sorry, forgot all about this: I did try with GLM 4-32B-0414 but it didn't work due to some weird issue with the lack of a <BOS> symbol :/

What about the general pruned 12 head version?

@jukofyork I tried your transplant-vocab script with llama 3.2 1b as a draft for GLM 4 32B, got this error:

image.png

So just wanted to clarify, is this the problem you're running into?

Also, I was wondering: is this an issue that could be resolved with enough training?

@jukofyork I tried your transplant-vocab script with llama 3.2 1b as a draft for GLM 4 32B, got this error:

image.png

So just wanted to clarify, is this the problem you're running into?

Also, I was wondering: is this an issue that could be resolved with enough training?

I think there is something wrong with the GLM 4 32B model and transformers, as I can't even load it to train control vectors from either. It has some really odd BOS type token with "mask" in the name when I tried to fix this last time.

See: https://huggingface.co/THUDM/GLM-4-32B-0414/blob/main/tokenizer_config.json

  "chat_template": "[gMASK]<sop>{%- if tools -%}<|system|>\n# 可用工具\n{% for tool in tools %}{%- set function = tool.function if tool.get(\"function\") else tool %}\n\n## {{ function.name }}\n\n{{ function | tojson(indent=4, ensure_ascii=False) }}\n在调用上述函数时,请使用 Json 格式表示调用的参数。{%- endfor %}{%- endif -%}{%- for msg in messages %}{%- if msg.role == 'system' %}<|system|>\n{{ msg.content }}{%- endif %}{%- endfor %}{%- for message in messages if message.role != 'system' %}{%- set role = message['role'] %}{%- set content = message['content'] %}{%- set meta = message.get(\"metadata\", \"\") %}{%- if role == 'user' %}<|user|>\n{{ content }}{%- elif role == 'assistant' and not meta %}<|assistant|>\n{{ content }}{%- elif role == 'assistant' and meta %}<|assistant|>{{ meta }} \n{{ content }}{%- elif role == 'observation' %}<|observation|>\n{{ content }}{%- endif %}{%- endfor %}{% if add_generation_prompt %}<|assistant|>\n{% endif %}",

Hmm yeah, that's strange. I'll look more closely if I have the time. Right now I'm just messing around with a draft model for the new Reka Flash 3.1.

It might be possible to put that odd token back as the BOS token and get it to work, but too busy with other stuff ATM to try fiddling with it anymore.

https://huggingface.co/jukofyork/GLM-4.5-DRAFT-0.6B-v3.0
https://huggingface.co/jukofyork/GLM-4.5-DRAFT-0.6B-v3.0-GGUF

This should hopefully work on the older zai-org/GLM-4-32B-0414 (and zai-org/GLM-4.5-Air) as it uses the same tokeniser.

Sign up or log in to comment