Wrap assistant messages inside "{% generation %}" markers in chat_template.jinja

Added "{% generation %}" markers enables the TRL [SFTTrainer](https://huggingface.co/docs/trl/main/en/sft_trainer#trl.SFTConfig)'s `assistant_only_loss` config option. `assistant_only_loss` tells the SFTTrainer to only enable gradients on the assistant messages, which are wrapped around `{% generation %}` by this PR. I confirmed that this behaves as expected by using this custom template for the `gpt-oss-20b` tokenizer as the `processing_class` for SFTTrainer.

See this transformers [PR](https://github.com/huggingface/transformers/pull/30650) that introduced this change
See also how [trl/trainer/sft_trainer.py](https://github.com/huggingface/trl/blob/206964ce16e15f2afd4f8f12fe49d1d828312f97/trl/trainer/sft_trainer.py#L845) uses this marker in [transformers/utils/chat_template_utils.py](https://github.com/huggingface/transformers/blob/52c6c1bb6e27ca87c4faede34a4c2a7404c17c4d/src/transformers/utils/chat_template_utils.py#L475).

Code segment to verify the masking is done correctly, where assistant tokens are printed in green:
```
tokenizer = AutoTokenizer.from_pretrained('openai/gpt-oss-20b', trust_remote_code=True)
tokenizer.chat_template = CORRECTED_JINJA_TEMPLATE

templated_output = tokenizer.apply_chat_template(
sample['messages'],
tokenize=True,
add_generation_prompt=False,
return_assistant_tokens_mask=True,
return_dict=True,
)

print("Visualizing token masks. Green text is used for loss calculation.\n")
GREEN = "\033[92m"
RESET = "\033[0m"

input_ids = templated_output['input_ids']
assistant_mask = templated_output['assistant_masks']

if len(input_ids) != len(assistant_mask):
raise ValueError("Mismatch between input_ids and assistant_masks length.")

current_chunk_tokens = []
current_mask_status = None

for token_id, is_assistant in zip(input_ids, assistant_mask):
mask_status = bool(is_assistant)
if current_mask_status is None:
current_mask_status = mask_status

if mask_status != current_mask_status:
# Decode and print the completed chunk
decoded_text = tokenizer.decode(current_chunk_tokens, skip_special_tokens=False)
if current_mask_status:
print(f"{GREEN}{decoded_text}{RESET}", end="")
else:
print(decoded_text, end="")

# Start a new chunk
current_chunk_tokens = [token_id]
current_mask_status = mask_status
else:
current_chunk_tokens.append(token_id)

# Print the final chunk after the loop
if current_chunk_tokens:
decoded_text = tokenizer.decode(current_chunk_tokens, skip_special_tokens=False)
if current_mask_status:
print(f"{GREEN}{decoded_text}{RESET}", end="")
else:
print(decoded_text, end="")
```

Prints something like:
```
<|start|>user<|message|>USER_MESSAGE<|end|>[GREEN_STARTS]<|start|>assistant<|channel|>analysis<|message|>...<|call|>[GREEN_ENDS]
```

Files changed (1) hide show

chat_template.jinja +24 -17

chat_template.jinja CHANGED Viewed

@@ -288,30 +288,37 @@
             {%- endif %}
             {%- if message.content and message.thinking %}
                 {{- raise_exception("Cannot pass both content and thinking in an assistant message with tool calls! Put the analysis message in one or the other, but not both.") }}
-            {%- elif message.content and not future_final_message.found %}
-                {{- "<|start|>assistant<|channel|>analysis<|message|>" + message.content + "<|end|>" }}
-            {%- elif message.thinking and not future_final_message.found %}
-                {{- "<|start|>assistant<|channel|>analysis<|message|>" + message.thinking + "<|end|>" }}
             {%- endif %}
-            {{- "<|start|>assistant to=" }}
-            {{- "functions." + tool_call.name + "<|channel|>commentary " }}
-            {{- (tool_call.content_type if tool_call.content_type is defined else "json") + "<|message|>" }}
-            {{- tool_call.arguments|tojson }}
-            {{- "<|call|>" }}
             {%- set last_tool_call.name = tool_call.name %}
         {%- elif loop.last and not add_generation_prompt %}
             {#- Only render the CoT if the final turn is an assistant turn and add_generation_prompt is false #}
             {#- This is a situation that should only occur in training, never in inference. #}
-            {%- if "thinking" in message %}
-                {{- "<|start|>assistant<|channel|>analysis<|message|>" + message.thinking + "<|end|>" }}
-            {%- endif %}
-            {#- <|return|> indicates the end of generation, but <|end|> does not #}
-            {#- <|return|> should never be an input to the model, but we include it as the final token #}
-            {#- when training, so the model learns to emit it. #}
-            {{- "<|start|>assistant<|channel|>final<|message|>" + message.content + "<|return|>" }}
         {%- else %}
             {#- CoT is dropped during all previous turns, so we never render it for inference #}
-            {{- "<|start|>assistant<|channel|>final<|message|>" + message.content + "<|end|>" }}
             {%- set last_tool_call.name = none %}
         {%- endif %}
     {%- elif message.role == 'tool' -%}

             {%- endif %}
             {%- if message.content and message.thinking %}
                 {{- raise_exception("Cannot pass both content and thinking in an assistant message with tool calls! Put the analysis message in one or the other, but not both.") }}
             {%- endif %}
+            {% generation %}
+                {%- if message.content and not future_final_message.found %}
+                    {{- "<|start|>assistant<|channel|>analysis<|message|>" + message.content + "<|end|>" }}
+                {%- elif message.thinking and not future_final_message.found %}
+                    {{- "<|start|>assistant<|channel|>analysis<|message|>" + message.thinking + "<|end|>" }}
+                {%- endif %}
+                {{- "<|start|>assistant to=" }}
+                {{- "functions." + tool_call.name + "<|channel|>commentary " }}
+                {{- (tool_call.content_type if tool_call.content_type is defined else "json") + "<|message|>" }}
+                {{- tool_call.arguments|tojson }}
+                {{- "<|call|>" }}
+            {% endgeneration %}
             {%- set last_tool_call.name = tool_call.name %}
         {%- elif loop.last and not add_generation_prompt %}
             {#- Only render the CoT if the final turn is an assistant turn and add_generation_prompt is false #}
             {#- This is a situation that should only occur in training, never in inference. #}
+            {% generation %}
+                {%- if "thinking" in message %}
+                    {{- "<|start|>assistant<|channel|>analysis<|message|>" + message.thinking + "<|end|>" }}
+                {%- endif %}
+                {#- <|return|> indicates the end of generation, but <|end|> does not #}
+                {#- <|return|> should never be an input to the model, but we include it as the final token #}
+                {#- when training, so the model learns to emit it. #}
+                {{- "<|start|>assistant<|channel|>final<|message|>" + message.content + "<|return|>" }}
+            {% endgeneration %}
         {%- else %}
             {#- CoT is dropped during all previous turns, so we never render it for inference #}
+            {% generation %}
+                {{- "<|start|>assistant<|channel|>final<|message|>" + message.content + "<|end|>" }}
+            {% endgeneration %}
             {%- set last_tool_call.name = none %}
         {%- endif %}
     {%- elif message.role == 'tool' -%}