|
--- |
|
license: gemma |
|
language: |
|
- bg |
|
base_model: |
|
- INSAIT-Institute/BgGPT-Gemma-2-2.6B-IT-v1.0 |
|
tags: |
|
- function_calling |
|
- MCP |
|
- tool_use |
|
--- |
|
|
|
# Tucan-2.6B-v1.0 |
|
|
|
## Bulgarian Language Models for Function Calling 🇧🇬 |
|
|
|
**Paper: https://arxiv.org/abs/2506.23394** |
|
|
|
## Overview 🚀 |
|
|
|
TUCAN (Tool-Using Capable Assistant Navigator) is a series of open-source Bulgarian language models fine-tuned specifically for function calling and tool use. |
|
|
|
These models can interact with external tools, APIs, and databases, making them appropriate for building AI agents and [Model Context Protocol (MCP)](https://arxiv.org/abs/2503.23278) applications. |
|
|
|
Built on top of [BgGPT models](https://huggingface.co/collections/INSAIT-Institute/bggpt-gemma-2-673b972fe9902749ac90f6fe) from [INSAIT Institute](https://insait.ai/), which were themselves built on [Gemma 2](https://arxiv.org/pdf/2408.00118), Tucan models have been enhanced with function-calling capabilities. |
|
|
|
## Motivation 🎯 |
|
|
|
Although BgGPT models demonstrate [strong Bulgarian language comprehension](https://arxiv.org/pdf/2412.10893), they face challenges in maintaining the precise formatting necessary for consistent function calling. Despite implementing detailed system prompts, their performance in this specific task remains suboptimal. |
|
|
|
This project addresses that gap by fine-tuning BgGPT, providing the Bulgarian AI community with proper tool-use capabilities in their native language. |
|
|
|
## Models and variants 📦 |
|
Available in three sizes with full models, LoRA adapters, and quantized GGUF variants: |
|
|
|
<div align="center"> |
|
|
|
| Model Size | Full Model | LoRA Adapter | GGUF (Quantized) | |
|
|------------|------------|--------------|------------------| |
|
| **2.6B** | [Tucan-2.6B-v1.0](https://huggingface.co/llm-bg/Tucan-2.6B-v1.0) 📍| [LoRA](https://huggingface.co/llm-bg/Tucan-2.6B-v1.0-LoRA) | [GGUF](https://huggingface.co/llm-bg/Tucan-2.6B-v1.0-GGUF) | |
|
| **9B** | [Tucan-9B-v1.0](https://huggingface.co/llm-bg/Tucan-9B-v1.0) | [LoRA](https://huggingface.co/llm-bg/Tucan-9B-v1.0-LoRA) | [GGUF](https://huggingface.co/llm-bg/Tucan-9B-v1.0-GGUF) | |
|
| **27B** | [Tucan-27B-v1.0](https://huggingface.co/llm-bg/Tucan-27B-v1.0) | [LoRA](https://huggingface.co/llm-bg/Tucan-27B-v1.0-LoRA) | [GGUF](https://huggingface.co/llm-bg/Tucan-27B-v1.0-GGUF) | |
|
|
|
*GGUF variants include: q4_k_m, q5_k_m, q6_k, q8_0, q4_0 quantizations* |
|
|
|
📍 *Current model/repo* |
|
|
|
</div> |
|
|
|
Models and quantizations are also available for easy use in Ollama: https://ollama.com/s_emanuilov/tucan |
|
|
|
## Benchmarks 📊 |
|
|
|
All evaluations were performed using the [Tucan evaluation framework](https://github.com/s-emanuilov/tucan), with results averaged across multiple runs. Tucan models demonstrate superior function-calling capabilities compared to their BgGPT counterparts, with particularly strong improvements in smaller model sizes. To ensure no catastrophic forgetting occurred, we evaluated knowledge retention using [EleutherAI's lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) on Bulgarian benchmarks, confirming that each Tucan model maintains performance on par with its BgGPT equivalent. |
|
|
|
<div align="center"> |
|
|
|
| Model | Function Calling | HellaswagBG | WinograndeBG | ARC-Easy-BG | ARC-Challenge-BG | |
|
|-------|-----------------|-------------|--------------|-------------|------------------| |
|
| **Tucan-2.6B-v1.0** 🔥 | **0.7875** | 0.5924 | 0.6456 | 0.5657 | 0.3754 | |
|
| **Tucan-9B-v1.0** 🔥 | **0.8667** | 0.7046 | 0.7151 | 0.7024 | 0.5188 | |
|
| **Tucan-27B-v1.0** 🔥 | **0.875** | 0.6179 | 0.6275 | 0.6486 | 0.442 | |
|
| BgGPT-Gemma-2-2.6B-IT-v1.0 | 0.5874 | 0.6306 | 0.5821 | 0.5657 | 0.372 | |
|
| BgGPT-Gemma-2-9B-IT-v1.0 | 0.7833 | 0.7057 | 0.719 | 0.7231 | 0.5188 | |
|
| BgGPT-Gemma-2-27B-IT-v1.0 | 0.8667 | 0.62 | 0.6212 | 0.6587 | 0.459 | |
|
|
|
*Note: 27B models were evaluated in 8-bit precision for comparison purposes.* |
|
|
|
</div> |
|
|
|
## Usage 🛠️ |
|
|
|
### Quick start ⚡ |
|
```bash |
|
pip install -U "transformers[torch]" accelerate bitsandbytes |
|
``` |
|
|
|
### Prompt format ⚙️ |
|
**Critical:** Use this format for function calling for the best results. |
|
|
|
<details> |
|
<summary><strong>📋 Required system prompt template</strong></summary> |
|
|
|
``` |
|
<bos><start_of_turn>user |
|
Ти си полезен AI асистент, който предоставя полезни и точни отговори. |
|
|
|
Имаш достъп и можеш да извикаш една или повече функции, за да помогнеш с потребителското запитване. Използвай ги, само ако е необходимо и подходящо. |
|
|
|
Когато използваш функция, форматирай извикването ѝ в блок ```tool_call``` на отделен ред, a след това ще получиш резултат от изпълнението в блок ```toll_response```. |
|
|
|
## Шаблон за извикване: |
|
```tool_call |
|
{"name": <function-name>, "arguments": <args-json-object>}``` |
|
|
|
## Налични функции: |
|
[your function definitions here] |
|
|
|
## Потребителска заявка: |
|
[your query in Bulgarian]<end_of_turn> |
|
<start_of_turn>model |
|
``` |
|
|
|
</details> |
|
|
|
### Note 📝 |
|
**The model only generates the `tool_call` blocks with function names and parameters - it doesn't actually execute the functions.** Your client application must parse these generated calls, execute the actual functions (API calls, database queries, etc.), and provide the results back to the model in `tool_response` blocks for the conversation to continue the interperation of the results. A full demo is comming soon. |
|
|
|
### Python example 🐍 |
|
|
|
<details> |
|
<summary><strong>💻 Complete Working Example</strong></summary> |
|
|
|
```python |
|
import torch |
|
import json |
|
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig |
|
|
|
# Load model |
|
model_name = "s-emanuilov/Tucan-2.6B-v1.0" |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_name, |
|
torch_dtype=torch.bfloat16, |
|
device_map="auto", |
|
attn_implementation="eager" # Required for Gemma models |
|
) |
|
|
|
# Create prompt with system template |
|
def create_prompt(functions, user_query): |
|
system_prompt = """Ти си полезен AI асистент, който предоставя полезни и точни отговори. |
|
|
|
Имаш достъп и можеш да извикаш една или повече функции, за да помогнеш с потребителското запитване. Използвай ги, само ако е необходимо и подходящо. |
|
|
|
Когато използваш функция, форматирай извикването ѝ в блок ```tool_call``` на отделен ред, a след това ще получиш резултат от изпълнението в блок ```toll_response```. |
|
|
|
## Шаблон за извикване: |
|
```tool_call |
|
{{"name": <function-name>, "arguments": <args-json-object>}}``` |
|
""" |
|
|
|
functions_text = json.dumps(functions, ensure_ascii=False, indent=2) |
|
full_prompt = f"{system_prompt}\n## Налични функции:\n{functions_text}\n\n## Потребителска заявка:\n{user_query}" |
|
|
|
chat = [{"role": "user", "content": full_prompt}] |
|
return tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True) |
|
|
|
# Example usage |
|
functions = [{ |
|
"name": "create_calendar_event", |
|
"description": "Creates a new event in Google Calendar.", |
|
"parameters": { |
|
"type": "object", |
|
"properties": { |
|
"title": {"type": "string"}, |
|
"date": {"type": "string"}, |
|
"start_time": {"type": "string"}, |
|
"end_time": {"type": "string"} |
|
}, |
|
"required": ["title", "date", "start_time", "end_time"] |
|
} |
|
}] |
|
|
|
query = "Създай събитие 'Годишен преглед' за 8-ми юни 2025 от 14:00 до 14:30." |
|
|
|
# Generate response |
|
prompt = create_prompt(functions, query) |
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
|
|
|
outputs = model.generate( |
|
**inputs, |
|
max_new_tokens=2048, |
|
temperature=0.1, |
|
top_k=25, |
|
top_p=1.0, |
|
repetition_penalty=1.1, |
|
do_sample=True, |
|
eos_token_id=[tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<end_of_turn>")], |
|
pad_token_id=tokenizer.eos_token_id |
|
) |
|
|
|
result = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True) |
|
print(result) |
|
``` |
|
|
|
</details> |
|
|
|
## Performance & Dataset 📊 |
|
|
|
> 📄 **Full methodology, dataset details, and comprehensive evaluation results coming in the upcoming paper** |
|
|
|
**Dataset:** 10,000+ bilingual (Bulgarian/English) function-calling examples across 1,000+ topics, including tool calls with single/multiple arguments, optional parameters, follow-up queries, multi-tool selection, ambiguous queries requiring clarification, and conversational interactions without tool use. Data sourced from manual curation and synthetic generation (Gemini Pro 2.5/GPT-4.1/Sonnet 4). |
|
|
|
**Results:** Significant improvements in tool-use capabilities over base BgGPT models: 34.1% for 2.6B, 10.6% for 9B, and 1.0% for 27B models in [internal benchmarks](https://github.com/s-emanuilov/tucan). Beyond raw function-calling scores, all Tucan models demonstrate more natural conversational flow while maintaining tool-use capabilities, retaining their base knowledge. |
|
|
|
## Acknowledgments 🙏 |
|
Built on top of [BgGPT series](https://huggingface.co/collections/INSAIT-Institute/bggpt-gemma-2-673b972fe9902749ac90f6fe). |
|
|
|
## Questions & Contact 💬 |
|
For questions, collaboration, or feedback: **[Connect on LinkedIn](https://www.linkedin.com/in/simeon-emanuilov/)** |
|
|