convert to gguf : AttributeError: TikTokenTokenizer has no attribute vocab

#2
by Doctor-Chad-PhD - opened

Hi,

I'm trying to quant this model to gguf but am getting this error:

INFO:transformers_modules.Moonlight-16B-A3B-Instruct.tokenization_moonshot:Reloaded tiktoken model from /home/me/Moonlight-16B-A3B-Instruct/tiktoken.model
INFO:transformers_modules.Moonlight-16B-A3B-Instruct.tokenization_moonshot:#words: 163842 - BOS ID: 163584 - EOS ID: 163585
Traceback (most recent call last):
  File "/home/me/llama.cpp/convert_hf_to_gguf.py", line 5112, in <module>
    main()
    ~~~~^^
  File "/home/me/llama.cpp/convert_hf_to_gguf.py", line 5102, in main
    model_instance.write_vocab()
    ~~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/home/me/llama.cpp/convert_hf_to_gguf.py", line 450, in write_vocab
    self.prepare_metadata(vocab_only=False)
    ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/home/me/llama.cpp/convert_hf_to_gguf.py", line 433, in prepare_metadata
    self.set_vocab()
    ~~~~~~~~~~~~~~^^
  File "/home/me/llama.cpp/convert_hf_to_gguf.py", line 4058, in set_vocab
    self._set_vocab_gpt2()
    ~~~~~~~~~~~~~~~~~~~~^^
  File "/home/me/llama.cpp/convert_hf_to_gguf.py", line 728, in _set_vocab_gpt2
    tokens, toktypes, tokpre = self.get_vocab_base()
                               ~~~~~~~~~~~~~~~~~~~^^
  File "/home/me/llama.cpp/convert_hf_to_gguf.py", line 523, in get_vocab_base
    vocab_size = self.hparams.get("vocab_size", len(tokenizer.vocab))
                                                    ^^^^^^^^^^^^^^^
  File "/home/me/llama.cpp/venv/lib/python3.13/site-packages/transformers/tokenization_utils_base.py", line 1108, in __getattr__
    raise AttributeError(f"{self.__class__.__name__} has no attribute {key}")
AttributeError: TikTokenTokenizer has no attribute vocab

Do you know how to fix this?
Thanks!

You should use tokenizer.vocab_size instead of len(tokenizer.vocab)

Thank you @grapevine-AI , it works now!

Sign up or log in to comment