support prefix complete

#158

Changed chat template to support chat completion functionality as described in the official DeepSeek API documentation. see API doc from deepseek

The following logit changes were implemented:
• If the message has prefix=true, nothing will be appended afterward.
• If prefix=true, then add_generation_prompt will be ignored.

unit test code is bellow:

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1")

old_template = tokenizer.chat_template
prefix_template = '''{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='', is_first_sp=true, is_prefix=false) %}{%- for message in messages %}{%- if message['role'] == 'system' %}{%- if ns.is_first_sp %}{% set ns.system_prompt = ns.system_prompt + message['content'] %}{% set ns.is_first_sp = false %}{%- else %}{% set ns.system_prompt = ns.system_prompt + '\n\n' + message['content'] %}{%- endif %}{%- endif %}{%- endfor %}{{ bos_token }}{{ ns.system_prompt }}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{{'<|User|>' + message['content']}}{%- endif %}{%- if message['role'] == 'assistant' and 'tool_calls' in message %}{%- set ns.is_tool = false -%}{%- for tool in message['tool_calls'] %}{%- if not ns.is_first %}{%- if message['content'] is none %}{{'<|Assistant|><|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<|tool▁call▁end|>'}}{%- else %}{{'<|Assistant|>' + message['content'] + '<|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<|tool▁call▁end|>'}}{%- endif %}{%- set ns.is_first = true -%}{%- else %}{{'\n' + '<|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<|tool▁call▁end|>'}}{%- endif %}{%- endfor %}{{'<|tool▁calls▁end|><|end▁of▁sentence|>'}}{%- endif %}{%- if message['role'] == 'assistant' and 'tool_calls' not in message %}{%- if ns.is_tool %}{{'<|tool▁outputs▁end|>' + message['content'] + '<|end▁of▁sentence|>'}}{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{% if '</think>' in content %}{% set content = content.split('</think>')[-1] %}{% endif %}{% if 'prefix' in message and message['prefix'] == true %}{{'<|Assistant|>' + content}}{%- set ns.is_prefix = true -%}{%- else %}{{'<|Assistant|>' + content + '<|end▁of▁sentence|>'}}{% endif %}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<|tool▁outputs▁begin|><|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}{%- set ns.is_output_first = false %}{%- else %}{{'<|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<|tool▁outputs▁end|>'}}{% endif %}{% if add_generation_prompt and not ns.is_tool and not ns.is_prefix %}{{'<|Assistant|><think>\n'}}{% endif %}'''

conversation = [
    dict(role='user', content='hello'),
    dict(role='assistant', content='hello, nice to meet you!'),
    dict(role='user', content='1+1='),
    dict(role='assistant', content='<think>The user want to know', prefix=True),
]
tokenizer.chat_template = old_template
print(tokenizer.apply_chat_template(conversation, add_generation_prompt=True, tokenize=False, ))

tokenizer.chat_template = prefix_template
print(tokenizer.apply_chat_template(conversation, add_generation_prompt=True, tokenize=False))

output should be:

<|begin▁of▁sentence|><|User|>hello<|Assistant|>hello, nice to meet you!<|end▁of▁sentence|><|User|>1+1=<|Assistant|><think>The user want to know<|end▁of▁sentence|><|Assistant|><think>

<|begin▁of▁sentence|><|User|>hello<|Assistant|>hello, nice to meet you!<|end▁of▁sentence|><|User|>1+1=<|Assistant|><think>The user want to know

I really hope that other providers can implement and have this feature. Prefilling reasoning and content can help the model escape local optima for some prompts.

Yes, I’m using a critic model to steer the R1 model’s thought process, and so far, it works perfectly. To do this, you need this functionality. My contingency solution is to deploy the R1 model using SGLang, which supports Native Generation APIs, allowing you to send arranged text inputs to the model.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment