GLM 4 32B too?

by qingy2024 - opened Apr 21

qingy2024

Apr 21

This model works really well. Are you planning to create an equivalent 0.5B draft model for the new GLM 4-32B-0414 model?

alamios

Owner Apr 23

Hey! I'll take a look at it, but don't expect it anytime soon, too many things in my backlog

jukofyork

Apr 23

•

edited Apr 23

I will give it a try, but it will be a couple of days:

I'm just creating a 12-headed (~ 0.4B) distilled version of Qwen2.5-0.5B-Instruct that I can use for future draft models (ie: instead of having to trim to 12 heads and then retrain every time for a new model).

After I have this then I should be able to create draft models using way less data (hopefully)... I can generate around 0.5B tokens per day using 7 GPUs, and hope to have at least 2B tokens for creating the distilled version.

If it works, then I'll try GLM 4-32B-0414 first as it seems a good test case with no tiny models available to use as a draft.

qingy2024

Apr 23

@alamios @jukofyork Thanks, looking forward to it!

jukofyork

Apr 24

@alamios @jukofyork Thanks, looking forward to it!

There still seems to be problems with this model in llama.cpp:

https://github.com/ggml-org/llama.cpp/issues/12946

https://old.reddit.com/r/LocalLLaMA/comments/1k6ably/bartowski_just_updated_his_glm432b_quants_working/mooou7n/

https://old.reddit.com/r/LocalLLaMA/comments/1k6ably/bartowski_just_updated_his_glm432b_quants_working/mopjtp7/

So I'll probably hold off until it's working 100%.

melekuk

Jul 13

Any update on this? 👀

jukofyork

Jul 13

•

edited Jul 13

Any update on this? 👀

Sorry, forgot all about this: I did try with GLM 4-32B-0414 but it didn't work due to some weird issue with the lack of a <BOS> symbol :/

melekuk

Jul 13

What about the general pruned 12 head version?

qingy2024

Jul 14

@jukofyork I tried your transplant-vocab script with llama 3.2 1b as a draft for GLM 4 32B, got this error:

So just wanted to clarify, is this the problem you're running into?

Also, I was wondering: is this an issue that could be resolved with enough training?

jukofyork

Jul 14

@jukofyork I tried your transplant-vocab script with llama 3.2 1b as a draft for GLM 4 32B, got this error:

So just wanted to clarify, is this the problem you're running into?

Also, I was wondering: is this an issue that could be resolved with enough training?

I think there is something wrong with the GLM 4 32B model and transformers, as I can't even load it to train control vectors from either. It has some really odd BOS type token with "mask" in the name when I tried to fix this last time.

jukofyork

Jul 14

See: https://huggingface.co/THUDM/GLM-4-32B-0414/blob/main/tokenizer_config.json

  "chat_template": "[gMASK]<sop>{%- if tools -%}<|system|>\n# 可用工具\n{% for tool in tools %}{%- set function = tool.function if tool.get(\"function\") else tool %}\n\n## {{ function.name }}\n\n{{ function | tojson(indent=4, ensure_ascii=False) }}\n在调用上述函数时，请使用 Json 格式表示调用的参数。{%- endfor %}{%- endif -%}{%- for msg in messages %}{%- if msg.role == 'system' %}<|system|>\n{{ msg.content }}{%- endif %}{%- endfor %}{%- for message in messages if message.role != 'system' %}{%- set role = message['role'] %}{%- set content = message['content'] %}{%- set meta = message.get(\"metadata\", \"\") %}{%- if role == 'user' %}<|user|>\n{{ content }}{%- elif role == 'assistant' and not meta %}<|assistant|>\n{{ content }}{%- elif role == 'assistant' and meta %}<|assistant|>{{ meta }} \n{{ content }}{%- elif role == 'observation' %}<|observation|>\n{{ content }}{%- endif %}{%- endfor %}{% if add_generation_prompt %}<|assistant|>\n{% endif %}",

qingy2024

Jul 14

Hmm yeah, that's strange. I'll look more closely if I have the time. Right now I'm just messing around with a draft model for the new Reka Flash 3.1.

jukofyork

Jul 15

It might be possible to put that odd token back as the BOS token and get it to work, but too busy with other stuff ATM to try fiddling with it anymore.

jukofyork

about 1 month ago

https://huggingface.co/jukofyork/GLM-4.5-DRAFT-0.6B-v3.0
https://huggingface.co/jukofyork/GLM-4.5-DRAFT-0.6B-v3.0-GGUF

This should hopefully work on the older zai-org/GLM-4-32B-0414 (and zai-org/GLM-4.5-Air) as it uses the same tokeniser.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment