Fix: add <eos> token at the end of chat template

#12

by adhi29 - opened Oct 17, 2024

base: refs/heads/main

←

from: refs/pr/12

Discussion Files changed

-1

adhi29

Oct 17, 2024

When we try to generate training data for Instruction Fine-Tuning for the code gemma instruct model using the "apply_chat_template" function, the jinja template doesn't add token at the end of generation. This results in the model not-learning/unlearning the concept of end_of_sentence tokens.

The ideal behavior of chat template is to generate templates for training if "add_generation_prompt" = False and "continue_final_message" = False. But that's not the case here. This new tokenizer_config.json file fixes that problem.

Upload tokenizer_config.jsone0a885ce

adhi29 changed pull request title from Upload tokenizer_config.json to Fix: add <eos> token at the end of chat template Oct 17, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment