|
--- |
|
model_name: Apertus-8B-Instruct-2509-Math-Step-DPO-10K |
|
base_model: swiss-ai/Apertus-8B-Instruct-2509 |
|
license: apache-2.0 |
|
language: |
|
- en |
|
- multilingual |
|
tags: |
|
- multilingual |
|
- compliant |
|
- swiss-ai |
|
- apertus |
|
- fine-tuned |
|
- aqua_rat |
|
- text-generation |
|
pipeline_tag: text-generation |
|
library_name: transformers |
|
author: Safouane El Ghazouali |
|
author_email: [email protected] |
|
model_creator: Safouane El Ghazouali |
|
location: Switzerland |
|
datasets: |
|
- xinlai/Math-Step-DPO-10K |
|
--- |
|
|
|
# safouaneelg/Apertus-8B-Instruct-2509-AQUA-RAT-SFT |
|
|
|
Apertus has released two models: 70B and 8B parameter multi-language model. |
|
Check out the model info here: [Swiss-AI/LLM](https://huggingface.co/collections/swiss-ai/apertus-llm-68b699e65415c231ace3b059) |
|
|
|
# Finetuned via DPO (Direct Preference Optimization) on Math-Step-DPO-10K |
|
|
|
This repo contains the fine-tuned version of Apertus on [AQuA-RAT dataset](https://huggingface.co/datasets/deepmind/aqua_rat). |
|
|
|
The fine-tuning was performed using Unsloth on one GPU (RTX A6000 48 GB) with the following parameters: |
|
|
|
- per_device_train_batch_size: 2 |
|
- gradient_accumulation_steps: 4 |
|
- num_train_epochs: 3 |
|
- learning_rate: 5e-5 |
|
- fp16/bf16: Enabled based on hardware support |
|
- lr_scheduler_type: linear |
|
- seed: 3407 |
|
|
|
**Dataset format**: |
|
|
|
The dataset has been format for compliance with DPO |
|
|
|
```python |
|
def format_for_dpo(example): |
|
user_content = example.get("prompt", "") |
|
if example.get("initial_reason_steps"): |
|
user_content = user_content + "\n" + example.get("initial_reason_steps", "") |
|
messages = [ |
|
{"role": "system", "content": SYSTEM_PROMPT}, |
|
{"role": "user", "content": user_content} |
|
] |
|
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
|
|
# pick chosen/rejected content from dataset fields if available, |
|
# field required for DPO are: 'chosen', 'rejected' |
|
chosen_text = pick_field(example, chosen_candidates) |
|
rejected_text = pick_field(example, rejected_candidates) |
|
|
|
chosen_reason = example.get("chosen_reason_steps", "") or example.get("reason_chosen", "") |
|
chosen_answer = example.get("chosen_answer", "") or chosen_text |
|
rejected_reason = example.get("rejected_reason_steps", "") or example.get("reason_rejected", "") |
|
rejected_answer = example.get("rejected_answer", "") or rejected_text |
|
|
|
chosen_completion = make_completion(chosen_reason, chosen_answer) |
|
rejected_completion = make_completion(rejected_reason, rejected_answer) |
|
|
|
return { |
|
"prompt": prompt, |
|
"chosen": chosen_completion, |
|
"rejected": rejected_completion, |
|
} |
|
``` |
|
|
|
**Training plots** |
|
|
|
Below some plot of the reward training using DPO: |
|
|
|
 |
|
|
|
 |
|
|
|
## How to use |
|
|
|
You can run this fine-tuned version using the below instructions: |
|
|
|
1. `Transformers 4.56.0` are required to run the model, and I used `Unsloth 2025.8.10`. |
|
|
|
```bash |
|
pip install -U transformers unsloth |
|
``` |
|
|
|
2. I have personally managed to run it after setting the xiELU activation function which can theoretically be installed via the below command line. |
|
|
|
```bash |
|
pip install git+https://github.com/rubber-duck-debug/xielu |
|
``` |
|
|
|
If you struggle, check the xiELU installation in my other tune model ([safouaneelg/Apertus-8B-Instruct-2509-GSM8k-SFT](https://huggingface.co/safouaneelg/Apertus-8B-Instruct-2509-GSM8k-SFT)). |
|
|
|
3. Run inference using Unsloth pipeline (if you have `StaticLayer` error, comment/uncomment the arg `prompt_lookup_num_tokens=None`) |
|
|
|
```python |
|
import torch |
|
from unsloth import FastLanguageModel |
|
|
|
model, tokenizer = FastLanguageModel.from_pretrained( |
|
model_name="./output_merged/apertus_dpo_math_step_merged", # Path to your merged model |
|
max_seq_length=2048, |
|
load_in_4bit=True, |
|
load_in_8bit=False, |
|
full_finetuning=False, |
|
) |
|
|
|
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
|
model.eval() |
|
|
|
sample_question = """ |
|
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn? |
|
|
|
""" |
|
|
|
messages = [ |
|
{"role": "user", "content": sample_question} |
|
] |
|
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
|
|
inputs = tokenizer(input_text, return_tensors="pt", truncation=True, max_length=2048).to(device) |
|
|
|
outputs = model.generate( |
|
**inputs, |
|
max_new_tokens=512*1, |
|
do_sample=True, |
|
temperature=0.6, |
|
top_p=0.9, |
|
pad_token_id=tokenizer.pad_token_id, |
|
eos_token_id=tokenizer.eos_token_id, |
|
#prompt_lookup_num_tokens=None |
|
) |
|
|
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
|
|
print("Model Response:") |
|
print(response) |
|
|
|
del model, tokenizer |
|
torch.cuda.empty_cache() |
|
``` |
|
|
|
output example: |
|
|
|
 |
|
|
|
## Citation |
|
|
|
```bibtex |
|
@misc{swissai2025apertus, |
|
title={{Apertus: Democratizing Open and Compliant LLMs for Global Language Environments}}, |
|
author={Apertus Team}, |
|
year={2025}, |
|
howpublished={\url{https://huggingface.co/swiss-ai/Apertus-8B-Instruct-2509}} |
|
} |
|
``` |