FastLlama-Logo

These are only LoRA adapters of FastLlama-3.2-1B-Instruct. You should also import the base model in order to use them!

You can use ChatML & Alpaca format.

You can chat with the model via this space.

Overview:

FastLlama is a highly optimized version of the Llama-3.2-1B-Instruct model. Designed for superior performance in constrained environments, it combines speed, compactness, and high accuracy. This version has been fine-tuned using the MetaMathQA-50k section of the HuggingFaceTB/smoltalk dataset to enhance its mathematical reasoning and problem-solving abilities.

Features:

Lightweight and Fast: Optimized to deliver Llama-class capabilities with reduced computational overhead. Fine-Tuned for Math Reasoning: Utilizes MetaMathQA-50k for better handling of complex mathematical problems and logical reasoning tasks. Instruction-Tuned: Pre-trained on instruction-following tasks, making it robust in understanding and executing detailed queries. Versatile Use Cases: Suitable for educational tools, tutoring systems, or any application requiring mathematical reasoning.

Performance Highlights:

Smaller Footprint: The model delivers comparable results to larger counterparts while operating efficiently on smaller hardware. Enhanced Accuracy: Demonstrates improved performance on mathematical QA benchmarks. Instruction Adherence: Retains high fidelity in understanding and following user instructions, even for complex queries.

Loading the Model:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from peft import PeftModel, PeftConfig

base_model_id = "meta-llama/Llama-3.2-1B-Instruct"  # Base model ID
adapter_id = "suayptalha/FastLlama-3.2-LoRA"  # Adapter ID

tokenizer = AutoTokenizer.from_pretrained(base_model_id)
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

model = PeftModel.from_pretrained(base_model, adapter_id)

# Text generation pipeline
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are a friendly assistant named FastLlama."},
    {"role": "user", "content": "Who are you?"},
]

outputs = pipe(
    messages,
    max_new_tokens=256,
)

print(outputs[0]["generated_text"][-1])

Dataset:

Dataset: MetaMathQA-50k

The MetaMathQA-50k subset of HuggingFaceTB/smoltalk was selected for fine-tuning due to its focus on mathematical reasoning, multi-step problem-solving, and logical inference. The dataset includes:

Algebraic problems Geometric reasoning tasks Statistical and probabilistic questions Logical deduction problems

Model Fine-Tuning:

Fine-tuning was conducted using the following configuration:

Learning Rate: 2e-4

Epochs: 1

Optimizer: AdamW

Framework: Unsloth

License:

This model is licensed under the Apache 2.0 License. See the LICENSE file for details.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Model tree for suayptalha/FastLlama-3.2-LoRA

Adapter
(172)
this model

Dataset used to train suayptalha/FastLlama-3.2-LoRA

Collection including suayptalha/FastLlama-3.2-LoRA