File size: 5,667 Bytes

---
license: mit
language:
- tr
- en
library_name: transformers
tags:
- kubernetes
- devops
- quantized
- 4bit
- gemma3
- bitsandbytes
base_model: aciklab/kubernetes-ai
model_type: gemma3
quantized_by: aciklab
---

# Kubernetes AI - 4bit Safetensors

Fine-tuned Gemma 3 12B model specialized for answering Kubernetes questions in Turkish, quantized to 4bit format for efficient inference with reduced memory footprint.

## Model Description

This repository contains a 4bit quantized version of the Kubernetes AI model, optimized for running on consumer hardware with reduced VRAM/RAM requirements. The model uses BitsAndBytes quantization with safetensors format for fast loading and efficient inference.

**Primary Purpose:** Answer Kubernetes-related questions in Turkish language with minimal hardware requirements.

## Model Specifications

| Specification | Details |
|---------------|---------|
| **Format** | Safetensors (4bit quantized) |
| **Base Model** | unsloth/gemma-3-12b-it-qat-bnb-4bit |
| **Quantization** | 4bit (BitsAndBytes) |
| **Model Size** | ~7.2 GB |
| **Memory Usage** | ~8-10 GB VRAM/RAM |
| **Precision** | 4bit weights, FP16 compute |

## Quick Start

### Installation

```bash
# Install required packages
pip install torch transformers accelerate bitsandbytes safetensors
```

### Basic Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model_name = "aciklab/kubernetes-ai-4bit"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    trust_remote_code=True
)

# Prepare input
prompt = "Kubernetes'te 3 replikaya sahip bir deployment nasıl oluştururum?"

# Format with chat template
messages = [
    {"role": "system", "content": "Sen Kubernetes konusunda uzmanlaşmış bir yapay zeka asistanısın. Kubernetes ile ilgili soruları Türkçe olarak yanıtlıyorsun."},
    {"role": "user", "content": prompt}
]

input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

# Generate response
outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=1.0,
    top_p=0.95,
    top_k=64,
    repetition_penalty=1.05,
    do_sample=True
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```

## Training Details

This model is based on the [aciklab/kubernetes-ai](https://huggingface.co/aciklab/kubernetes-ai) LoRA adapters:

- **Base Model:** unsloth/gemma-3-12b-it-qat-bnb-4bit
- **Training Method:** LoRA (Low-Rank Adaptation)
- **LoRA Rank:** 8
- **Target Modules:** q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- **Training Dataset:** ~157,210 examples from Kubernetes docs, Stack Overflow, and DevOps datasets
- **Training Time:** 28 hours on NVIDIA RTX 5070 12GB
- **Max Sequence Length:** 1024 tokens

### Training Dataset Summary

| Dataset Category | Count | Description |
|-----------------|-------|-------------|
| **Kubernetes Official Docs** | 8,910 | Concepts, kubectl, setup, tasks, tutorials |
| **Stack Overflow** | 52,000 | Kubernetes Q&A from community |
| **DevOps Datasets** | 62,500 | General DevOps and Kubernetes content |
| **Configurations & CLI** | 36,800 | Kubernetes configs, kubectl examples, operators |
| **Total** | **~157,210** | Comprehensive Kubernetes knowledge base |

## Quantization Details

This model uses 4bit quantization with BitsAndBytes for optimal memory efficiency:

- **Source:** Merged LoRA adapters with base model
- **Quantization Method:** BitsAndBytes 4bit (NF4)
- **Compute Precision:** FP16
- **Format:** Safetensors (fast loading)
- **Memory Footprint:** ~7.2 GB on disk, ~8-10 GB in memory

### Advantages of 4bit Format

- **Efficient Memory Usage:** Runs on GPUs with 8GB+ VRAM
- **Fast Loading:** Safetensors format loads quickly
- **Good Quality:** Minimal accuracy loss compared to full precision
- **Framework Support:** Compatible with Transformers, vLLM, Text Generation Inference
- **Flexible Deployment:** Can run on CPU with acceptable speed

## Hardware Requirements

### Minimum (GPU)
- **GPU:** 8GB VRAM 
- **RAM:** 8GB system memory
- **Storage:** 10GB free space

### Recommended
- **GPU:** 12GB+ VRAM 
- **RAM:** 16GB system memory
- **Storage:** 15GB free space



## Limitations

- **Language:** Optimized primarily for Turkish and English.
- **Domain:** Specialized for Kubernetes; may not perform well on general topics
- **Quantization:** 4bit quantization may occasionally affect response quality on complex queries

## License

This model is released under the **MIT License**. Free to use in commercial and open-source projects.

## Citation

If you use this model in your research or applications, please cite:

```bibtex
@misc{kubernetes-ai-4bit,
  author = {HAVELSAN/Açıklab},
  title = {Kubernetes AI - 4bit Safetensors},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/aciklab/kubernetes-ai-4bit}}
}
```

## Contact

**Produced by:** HAVELSAN/Açıklab

For questions, feedback, or issues, please open an issue on the model repository or contact us through HuggingFace.

## Related Models

- [aciklab/kubernetes-ai](https://huggingface.co/aciklab/kubernetes-ai) - Original LoRA adapters
- [aciklab/kubernetes-ai-GGUF](https://huggingface.co/aciklab/kubernetes-ai-GGUF) - GGUF quantized versions for llama.cpp

---

**Note:** This is a 4bit quantized model ready for immediate use with the Transformers library. No additional model merging or quantization required.