|
--- |
|
library_name: transformers |
|
license: apache-2.0 |
|
datasets: |
|
- kurakurai/luth-sft |
|
language: |
|
- fr |
|
- en |
|
base_model: |
|
- Qwen/Qwen3-0.6B |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
 |
|
|
|
--- |
|
|
|
# Luth-0.6B-Instruct |
|
|
|
**Luth-0.6B-Instruct** is a French fine-tuned version of [Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B), trained on the [Luth-SFT](https://huggingface.co/datasets/kurakurai/luth-sft) dataset. The model has drastically improved its French capabilities in instruction following, math, and general knowledge. Additionally, its English capabilities have remained stable and have even increased in some areas. |
|
|
|
Our Evaluation, training and data scripts are available on [GitHub](https://github.com/kurakurai/Luth), along with the [Blog](https://huggingface.co/blog/MaxLSB/luth) we wrote. |
|
|
|
## Model Details |
|
|
|
Luth was trained using full fine-tuning on the Luth-SFT dataset with [Axolotl](https://github.com/axolotl-ai-cloud/axolotl). The resulting model was then merged with the base Qwen3-0.6B model. This process successfully retained the model's English capabilities while improving its performance on nearly all selected benchmarks in both French and English. |
|
|
|
## Benchmark Results |
|
|
|
We used LightEval for evaluation, with custom tasks for the French benchmarks. The models were evaluated with a `temperature=0`. |
|
|
|
### Evaluation Visualizations |
|
|
|
**French Evaluation:** |
|
|
|
 |
|
|
|
**English Evaluation:** |
|
|
|
 |
|
|
|
### French Benchmark Scores |
|
|
|
| Benchmark | Qwen3-0.6B | Qwen2.5-0.5B-Instruct | Luth-0.6B-Instruct | |
|
|-------------------|------------------|-----------------------|-----------------| |
|
| ifeval-fr | 44.45 | 22.18 | <u>48.24</u> | |
|
| gpqa-diamond-fr | 28.93 | 23.86 | <u>33.50</u> | |
|
| mmlu-fr | 27.16 | 35.04 | <u>40.23</u> | |
|
| math-500-fr | 29.20 | 10.00 | <u>43.00</u> | |
|
| arc-chall-fr | 31.31 | 28.23 | <u>33.88</u> | |
|
| hellaswag-fr | 25.11 | <u>51.45</u> | 45.70 | |
|
|
|
### English Benchmark Scores |
|
|
|
| Benchmark | Qwen3-0.6B | Qwen2.5-0.5B-Instruct | Luth-0.6B-Instruct | |
|
|-------------------|------------------|-----------------------|-----------------| |
|
| ifeval-en | <u>57.86</u> | 29.21 | 53.97 | |
|
| gpqa-diamond-en | <u>29.80</u> | 26.77 | 28.28 | |
|
| mmlu-en | 36.85 | 43.80 | <u>48.10</u> | |
|
| math-500-en | 45.00 | 31.80 | <u>47.80</u> | |
|
| arc-chall-en | 33.62 | 32.17 | <u>35.92</u> | |
|
| hellaswag-en | 42.91 | <u>49.56</u> | 46.96 | |
|
|
|
## Code Example |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("kurakurai/Luth-0.6B-Instruct") |
|
model = AutoModelForCausalLM.from_pretrained("kurakurai/Luth-0.6B-Instruct") |
|
messages = [ |
|
{"role": "user", "content": "Quelle est la capitale de la France?"}, |
|
] |
|
inputs = tokenizer.apply_chat_template( |
|
messages, |
|
add_generation_prompt=True, |
|
tokenize=True, |
|
return_dict=True, |
|
return_tensors="pt", |
|
).to(model.device) |
|
|
|
outputs = model.generate(**inputs, max_new_tokens=100) |
|
print( |
|
tokenizer.decode( |
|
outputs[0][inputs["input_ids"].shape[-1] :], skip_special_tokens=True |
|
) |
|
) |
|
``` |
|
|
|
## Citation |
|
|
|
```bibtex |
|
@misc{luth2025kurakurai, |
|
title = {Luth-0.6B-Instruct}, |
|
author = {Kurakura AI Team}, |
|
year = {2025}, |
|
howpublished = {\url{https://huggingface.co/kurakurai/Luth-0.6B-Instruct}}, |
|
note = {Qwen3-0.6B fine-tuned on French datasets} |
|
} |
|
``` |
|
|