Tokenizer Mismatch
#28
by
clpoehl
- opened
2025-08-05 11:45:54,728 - INFO - Vocab size: 150000
2025-08-05 11:45:54,729 - INFO - Cutting vocab to first 130072 tokens.
2025-08-05 11:45:55,168 - INFO - Tokenizer IDs -> EOT: None, EOS: None, UNK: 0, PAD: 0
2025-08-05 11:45:55,342 - INFO - Model-declared vocab_size: 131072
Does anyone know why I get the tokenizer mismatch using the mistral_common tokenizer?