fix missing the `{% generation %}` keyword while using tokenizer.apply_chat_template(...return_assistant_tokens_mask=True)

```python
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("/opt/tiger/gpt-oss-20b")

messages = [
{
"role": "user",
"content": "hi"
},
{
"role": "assistant",
"thinking": "think a moment",
"content": "Hello"
}
]

print(tokenizer.apply_chat_template(messages, tokenize=False).split('<|end|>', 1)[1])

processed = tokenizer.apply_chat_template(
messages,
reasoning_effort="high",
return_assistant_tokens_mask=True,
return_dict=True,)

first_end = processed["input_ids"].index(200007) + 1
print(processed['input_ids'][first_end:])
print(processed['attention_mask'][first_end:])
print(processed['assistant_masks'][first_end:])
```
Original Output:
```plain
return_assistant_tokens_mask==True but chat template does not contain `{% generation %}` keyword.
<|start|>user<|message|>hi<|end|><|start|>assistant<|channel|>analysis<|message|>think a moment<|end|><|start|>assistant<|channel|>final<|message|>Hello<|return|>
[200006, 1428, 200008, 3686, 200007, 200006, 173781, 200005, 35644, 200008, 49631, 261, 4205, 200007, 200006, 173781, 200005, 17196, 200008, 13225, 200002]
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
```
Output after fixing:
```plain
<|start|>user<|message|>hi<|end|><|start|>assistant<|channel|>analysis<|message|>think a moment<|end|><|start|>assistant<|channel|>final<|message|>Hello<|return|>
[200006, 1428, 200008, 3686, 200007, 200006, 173781, 200005, 35644, 200008, 49631, 261, 4205, 200007, 200006, 173781, 200005, 17196, 200008, 13225, 200002]
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
```

Files changed (1) hide show

chat_template.jinja +2 -0

chat_template.jinja CHANGED Viewed

@@ -259,6 +259,7 @@
 {%- for message in loop_messages -%}
     {#- At this point only assistant/user/tool messages should remain #}
     {%- if message.role == 'assistant' -%}
         {#- Checks to ensure the messages are being passed in the format we expect #}
         {%- if "content" in message %}
             {%- if "<|channel|>analysis<|message|>" in message.content or "<|channel|>final<|message|>" in message.content %}
@@ -314,6 +315,7 @@
             {{- "<|start|>assistant<|channel|>final<|message|>" + message.content + "<|end|>" }}
             {%- set last_tool_call.name = none %}
         {%- endif %}
     {%- elif message.role == 'tool' -%}
         {%- if last_tool_call.name is none %}
             {{- raise_exception("Message has tool role, but there was no previous assistant message with a tool call!") }}

 {%- for message in loop_messages -%}
     {#- At this point only assistant/user/tool messages should remain #}
     {%- if message.role == 'assistant' -%}
+    {% generation %}
         {#- Checks to ensure the messages are being passed in the format we expect #}
         {%- if "content" in message %}
             {%- if "<|channel|>analysis<|message|>" in message.content or "<|channel|>final<|message|>" in message.content %}
             {{- "<|start|>assistant<|channel|>final<|message|>" + message.content + "<|end|>" }}
             {%- set last_tool_call.name = none %}
         {%- endif %}
+    {% endgeneration %}
     {%- elif message.role == 'tool' -%}
         {%- if last_tool_call.name is none %}
             {{- raise_exception("Message has tool role, but there was no previous assistant message with a tool call!") }}