File size: 5,667 Bytes
62dcb9c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8335a67
62dcb9c
 
 
 
8335a67
62dcb9c
 
 
 
 
 
 
8335a67
62dcb9c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
---
license: mit
language:
- tr
- en
library_name: transformers
tags:
- kubernetes
- devops
- quantized
- 4bit
- gemma3
- bitsandbytes
base_model: aciklab/kubernetes-ai
model_type: gemma3
quantized_by: aciklab
---

# Kubernetes AI - 4bit Safetensors

Fine-tuned Gemma 3 12B model specialized for answering Kubernetes questions in Turkish, quantized to 4bit format for efficient inference with reduced memory footprint.

## Model Description

This repository contains a 4bit quantized version of the Kubernetes AI model, optimized for running on consumer hardware with reduced VRAM/RAM requirements. The model uses BitsAndBytes quantization with safetensors format for fast loading and efficient inference.

**Primary Purpose:** Answer Kubernetes-related questions in Turkish language with minimal hardware requirements.

## Model Specifications

| Specification | Details |
|---------------|---------|
| **Format** | Safetensors (4bit quantized) |
| **Base Model** | unsloth/gemma-3-12b-it-qat-bnb-4bit |
| **Quantization** | 4bit (BitsAndBytes) |
| **Model Size** | ~7.2 GB |
| **Memory Usage** | ~8-10 GB VRAM/RAM |
| **Precision** | 4bit weights, FP16 compute |

## Quick Start

### Installation

```bash
# Install required packages
pip install torch transformers accelerate bitsandbytes safetensors
```

### Basic Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model_name = "aciklab/kubernetes-ai-4bit"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    trust_remote_code=True
)

# Prepare input
prompt = "Kubernetes'te 3 replikaya sahip bir deployment nasıl oluştururum?"

# Format with chat template
messages = [
    {"role": "system", "content": "Sen Kubernetes konusunda uzmanlaşmış bir yapay zeka asistanısın. Kubernetes ile ilgili soruları Türkçe olarak yanıtlıyorsun."},
    {"role": "user", "content": prompt}
]

input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

# Generate response
outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=1.0,
    top_p=0.95,
    top_k=64,
    repetition_penalty=1.05,
    do_sample=True
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```

## Training Details

This model is based on the [aciklab/kubernetes-ai](https://huggingface.co/aciklab/kubernetes-ai) LoRA adapters:

- **Base Model:** unsloth/gemma-3-12b-it-qat-bnb-4bit
- **Training Method:** LoRA (Low-Rank Adaptation)
- **LoRA Rank:** 8
- **Target Modules:** q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- **Training Dataset:** ~157,210 examples from Kubernetes docs, Stack Overflow, and DevOps datasets
- **Training Time:** 28 hours on NVIDIA RTX 5070 12GB
- **Max Sequence Length:** 1024 tokens

### Training Dataset Summary

| Dataset Category | Count | Description |
|-----------------|-------|-------------|
| **Kubernetes Official Docs** | 8,910 | Concepts, kubectl, setup, tasks, tutorials |
| **Stack Overflow** | 52,000 | Kubernetes Q&A from community |
| **DevOps Datasets** | 62,500 | General DevOps and Kubernetes content |
| **Configurations & CLI** | 36,800 | Kubernetes configs, kubectl examples, operators |
| **Total** | **~157,210** | Comprehensive Kubernetes knowledge base |

## Quantization Details

This model uses 4bit quantization with BitsAndBytes for optimal memory efficiency:

- **Source:** Merged LoRA adapters with base model
- **Quantization Method:** BitsAndBytes 4bit (NF4)
- **Compute Precision:** FP16
- **Format:** Safetensors (fast loading)
- **Memory Footprint:** ~7.2 GB on disk, ~8-10 GB in memory

### Advantages of 4bit Format

- **Efficient Memory Usage:** Runs on GPUs with 8GB+ VRAM
- **Fast Loading:** Safetensors format loads quickly
- **Good Quality:** Minimal accuracy loss compared to full precision
- **Framework Support:** Compatible with Transformers, vLLM, Text Generation Inference
- **Flexible Deployment:** Can run on CPU with acceptable speed

## Hardware Requirements

### Minimum (GPU)
- **GPU:** 8GB VRAM 
- **RAM:** 8GB system memory
- **Storage:** 10GB free space

### Recommended
- **GPU:** 12GB+ VRAM 
- **RAM:** 16GB system memory
- **Storage:** 15GB free space



## Limitations

- **Language:** Optimized primarily for Turkish and English.
- **Domain:** Specialized for Kubernetes; may not perform well on general topics
- **Quantization:** 4bit quantization may occasionally affect response quality on complex queries

## License

This model is released under the **MIT License**. Free to use in commercial and open-source projects.

## Citation

If you use this model in your research or applications, please cite:

```bibtex
@misc{kubernetes-ai-4bit,
  author = {HAVELSAN/Açıklab},
  title = {Kubernetes AI - 4bit Safetensors},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/aciklab/kubernetes-ai-4bit}}
}
```

## Contact

**Produced by:** HAVELSAN/Açıklab

For questions, feedback, or issues, please open an issue on the model repository or contact us through HuggingFace.

## Related Models

- [aciklab/kubernetes-ai](https://huggingface.co/aciklab/kubernetes-ai) - Original LoRA adapters
- [aciklab/kubernetes-ai-GGUF](https://huggingface.co/aciklab/kubernetes-ai-GGUF) - GGUF quantized versions for llama.cpp

---

**Note:** This is a 4bit quantized model ready for immediate use with the Transformers library. No additional model merging or quantization required.