Better SQL Agent - Llama 3.1 8B

Training Results

Training Samples: 19,480 (SQL analytics + technical conversations)
Hardware: NVIDIA 4x A10G GPU (96GB VRAM)

Model Description

This is a high-performance fine-tuned version of Meta-Llama-3.1-8B-Instruct, specifically optimized for:

SQL query generation and optimization
Data analysis and insights
Technical assistance and debugging
Tool-based workflows

Training Configuration

Base Model: meta-llama/Llama-3.1-8B-Instruct
Training Method: LoRA (Low-Rank Adaptation)
- Rank: 16, Alpha: 32, Dropout: 0.05
Quantization: 4-bit with BF16 training precision
Context Length: 128K tokens (extended from base)
Optimizer: AdamW with cosine scheduling

Quick Start

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load the fine-tuned model
model_name = "abhishekgahlot/better-sql-agent-llama"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

# Generate SQL query
prompt = """<|begin_of_text|><|start_header_id|>user<|end_header_id|>

Create a SQL query to find the top 5 customers by total revenue in 2024:

<|eot_id|><|start_header_id|>assistant<|end_header_id|>

"""

inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=256,
        temperature=0.7,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], 
                          skip_special_tokens=True)
print(response)

Performance Metrics

Metric	Value
Starting Loss	1.53
Final Loss	0.0508
Loss Reduction	96.7%
Training Time	8.9 hours

Use Cases

SQL Generation: Create complex queries from natural language
Data Analysis: Generate insights and analytical queries
Code Assistance: Debug and optimize SQL code
Technical Support: Answer database and analytics questions
Learning Aid: Explain SQL concepts and best practices

Training Data

The model was trained on a curated dataset of 19,480 high-quality examples including:

SQL query generation tasks
Data analysis conversations
Technical problem-solving dialogues
Tool usage patterns and workflows

Optimization Features

4-bit Quantization: Reduced memory footprint
Flash Attention: Optimized attention mechanism
Mixed Precision: BF16 training for efficiency

License

This model inherits the Llama 3.1 license from the base model. Please review the official license for usage terms.

Acknowledgments

Based on Meta's Llama 3.1 8B Instruct model

Model Card Contact

For questions about this model, please open an issue in the repository or contact the model author.

abhishekgahlot
/

better-sql-agent-llama