convert to gguf : AttributeError: TikTokenTokenizer has no attribute vocab
#2
by
Doctor-Chad-PhD
- opened
Hi,
I'm trying to quant this model to gguf but am getting this error:
INFO:transformers_modules.Moonlight-16B-A3B-Instruct.tokenization_moonshot:Reloaded tiktoken model from /home/me/Moonlight-16B-A3B-Instruct/tiktoken.model
INFO:transformers_modules.Moonlight-16B-A3B-Instruct.tokenization_moonshot:#words: 163842 - BOS ID: 163584 - EOS ID: 163585
Traceback (most recent call last):
File "/home/me/llama.cpp/convert_hf_to_gguf.py", line 5112, in <module>
main()
~~~~^^
File "/home/me/llama.cpp/convert_hf_to_gguf.py", line 5102, in main
model_instance.write_vocab()
~~~~~~~~~~~~~~~~~~~~~~~~~~^^
File "/home/me/llama.cpp/convert_hf_to_gguf.py", line 450, in write_vocab
self.prepare_metadata(vocab_only=False)
~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/home/me/llama.cpp/convert_hf_to_gguf.py", line 433, in prepare_metadata
self.set_vocab()
~~~~~~~~~~~~~~^^
File "/home/me/llama.cpp/convert_hf_to_gguf.py", line 4058, in set_vocab
self._set_vocab_gpt2()
~~~~~~~~~~~~~~~~~~~~^^
File "/home/me/llama.cpp/convert_hf_to_gguf.py", line 728, in _set_vocab_gpt2
tokens, toktypes, tokpre = self.get_vocab_base()
~~~~~~~~~~~~~~~~~~~^^
File "/home/me/llama.cpp/convert_hf_to_gguf.py", line 523, in get_vocab_base
vocab_size = self.hparams.get("vocab_size", len(tokenizer.vocab))
^^^^^^^^^^^^^^^
File "/home/me/llama.cpp/venv/lib/python3.13/site-packages/transformers/tokenization_utils_base.py", line 1108, in __getattr__
raise AttributeError(f"{self.__class__.__name__} has no attribute {key}")
AttributeError: TikTokenTokenizer has no attribute vocab
Do you know how to fix this?
Thanks!
You should use tokenizer.vocab_size
instead of len(tokenizer.vocab)
Thank you @grapevine-AI , it works now!