metadata
license: cc-by-nc-4.0
tags:
- text-generation
- llama-3.1-8b-instruct
- function-calling
- finetuned-model
- trl
- lora
- Salesforce/xlam-function-calling-60k
datasets:
- Salesforce/xlam-function-calling-60k
base_model: meta-llama/Llama-3.1-8B-Instruct
library_name: transformers
languages:
- en
pipeline_tag: text-generation
Llama-3.1-8B-Instruct Fine-tuned on xLAM
Overview
This is a fine-tuned version of the Llama-3.1-8B-Instruct model. The model was trained using Hugging Face's TRL library on the xLAM dataset for function calling capabilities.
Model Details
- Developed by: ermiaazarkhalili
- License: cc-by-nc-4.0
- languages: en
- Finetuned from model: meta-llama/Llama-3.1-8B-Instruct
- Model size: Llama-3.1-8B-Instruct parameters
- Vocab size: 128,256 tokens
- Max sequence length: 2,048 tokens
- Tensor type: BF16
- Pad token:
<|eot_id|>
(ID: 128009)
Training Information
The model was fine-tuned using the following configuration:
Training Libraries
- Hugging Face TRL Library for advanced training techniques
- LoRA (Low-Rank Adaptation) for parameter-efficient training
- 4-bit quantization for memory efficiency
Training Parameters
- Learning Rate: 0.0001
- Batch Size: 16
- Gradient Accumulation Steps: 8
- Max Training Steps: 1,000
- Warmup Ratio: 0.1
- Max Sequence Length: 2,048
- Output Directory: ./Llama_3_1_8B_Instruct_xLAM
LoRA Configuration
- LoRA Rank (r): 16
- LoRA Alpha: 32
- Target Modules: Query and Value projections
- LoRA Dropout: 0.1
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
"ermiaazarkhalili/Llama-3.1-8B-Instruct_Function_Calling_xLAM",
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
"ermiaazarkhalili/Llama-3.1-8B-Instruct_Function_Calling_xLAM",
trust_remote_code=True
)
text= "<user>Check if the numbers 8 and 1233 are powers of two.</user>\n\n<tools>"
# Tokenize and generate
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id
)
# Decode response
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
generated_text = response[len(text):].strip()
print(generated_text)
Dataset
The model was trained on the xLAM dataset.
Model Performance
This fine-tuned model demonstrates improved capabilities in:
- Function Detection: Identifying when to call functions
- Parameter Extraction: Extracting correct parameters from user queries
- Output Formatting: Generating properly structured function calls
- Tool Integration: Working with external APIs and tools
Credits
This model was developed by ermiaazarkhalili and leverages the capabilities of:
- Llama-3.1-8B-Instruct base model
- Hugging Face TRL for advanced fine-tuning techniques
- LoRA for parameter-efficient adaptation
Contact
For any inquiries or support, please reach out to the developer at ermiaazarkhalili.
Acknowledgments
We would like to thank the creators of:
- Llama-3.1-8B-Instruct for the excellent base model
- Hugging Face for the TRL library and infrastructure
- xLAM dataset contributors
- LoRA researchers for parameter-efficient fine-tuning methods
Citation
If you use this model, please cite:
@misc{ermiaazarkhalili_Llama-3.1-8B-Instruct_Function_Calling_xLAM,
author = {ermiaazarkhalili},
title = { Fine-tuning Llama-3.1-8B-Instruct on xLAM for Function Calling},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/ermiaazarkhalili/Llama-3.1-8B-Instruct_Function_Calling_xLAM}}
}