File size: 5,667 Bytes
62dcb9c 8335a67 62dcb9c 8335a67 62dcb9c 8335a67 62dcb9c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 |
---
license: mit
language:
- tr
- en
library_name: transformers
tags:
- kubernetes
- devops
- quantized
- 4bit
- gemma3
- bitsandbytes
base_model: aciklab/kubernetes-ai
model_type: gemma3
quantized_by: aciklab
---
# Kubernetes AI - 4bit Safetensors
Fine-tuned Gemma 3 12B model specialized for answering Kubernetes questions in Turkish, quantized to 4bit format for efficient inference with reduced memory footprint.
## Model Description
This repository contains a 4bit quantized version of the Kubernetes AI model, optimized for running on consumer hardware with reduced VRAM/RAM requirements. The model uses BitsAndBytes quantization with safetensors format for fast loading and efficient inference.
**Primary Purpose:** Answer Kubernetes-related questions in Turkish language with minimal hardware requirements.
## Model Specifications
| Specification | Details |
|---------------|---------|
| **Format** | Safetensors (4bit quantized) |
| **Base Model** | unsloth/gemma-3-12b-it-qat-bnb-4bit |
| **Quantization** | 4bit (BitsAndBytes) |
| **Model Size** | ~7.2 GB |
| **Memory Usage** | ~8-10 GB VRAM/RAM |
| **Precision** | 4bit weights, FP16 compute |
## Quick Start
### Installation
```bash
# Install required packages
pip install torch transformers accelerate bitsandbytes safetensors
```
### Basic Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load model and tokenizer
model_name = "aciklab/kubernetes-ai-4bit"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
trust_remote_code=True
)
# Prepare input
prompt = "Kubernetes'te 3 replikaya sahip bir deployment nasıl oluştururum?"
# Format with chat template
messages = [
{"role": "system", "content": "Sen Kubernetes konusunda uzmanlaşmış bir yapay zeka asistanısın. Kubernetes ile ilgili soruları Türkçe olarak yanıtlıyorsun."},
{"role": "user", "content": prompt}
]
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
# Generate response
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=1.0,
top_p=0.95,
top_k=64,
repetition_penalty=1.05,
do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```
## Training Details
This model is based on the [aciklab/kubernetes-ai](https://huggingface.co/aciklab/kubernetes-ai) LoRA adapters:
- **Base Model:** unsloth/gemma-3-12b-it-qat-bnb-4bit
- **Training Method:** LoRA (Low-Rank Adaptation)
- **LoRA Rank:** 8
- **Target Modules:** q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- **Training Dataset:** ~157,210 examples from Kubernetes docs, Stack Overflow, and DevOps datasets
- **Training Time:** 28 hours on NVIDIA RTX 5070 12GB
- **Max Sequence Length:** 1024 tokens
### Training Dataset Summary
| Dataset Category | Count | Description |
|-----------------|-------|-------------|
| **Kubernetes Official Docs** | 8,910 | Concepts, kubectl, setup, tasks, tutorials |
| **Stack Overflow** | 52,000 | Kubernetes Q&A from community |
| **DevOps Datasets** | 62,500 | General DevOps and Kubernetes content |
| **Configurations & CLI** | 36,800 | Kubernetes configs, kubectl examples, operators |
| **Total** | **~157,210** | Comprehensive Kubernetes knowledge base |
## Quantization Details
This model uses 4bit quantization with BitsAndBytes for optimal memory efficiency:
- **Source:** Merged LoRA adapters with base model
- **Quantization Method:** BitsAndBytes 4bit (NF4)
- **Compute Precision:** FP16
- **Format:** Safetensors (fast loading)
- **Memory Footprint:** ~7.2 GB on disk, ~8-10 GB in memory
### Advantages of 4bit Format
- **Efficient Memory Usage:** Runs on GPUs with 8GB+ VRAM
- **Fast Loading:** Safetensors format loads quickly
- **Good Quality:** Minimal accuracy loss compared to full precision
- **Framework Support:** Compatible with Transformers, vLLM, Text Generation Inference
- **Flexible Deployment:** Can run on CPU with acceptable speed
## Hardware Requirements
### Minimum (GPU)
- **GPU:** 8GB VRAM
- **RAM:** 8GB system memory
- **Storage:** 10GB free space
### Recommended
- **GPU:** 12GB+ VRAM
- **RAM:** 16GB system memory
- **Storage:** 15GB free space
## Limitations
- **Language:** Optimized primarily for Turkish and English.
- **Domain:** Specialized for Kubernetes; may not perform well on general topics
- **Quantization:** 4bit quantization may occasionally affect response quality on complex queries
## License
This model is released under the **MIT License**. Free to use in commercial and open-source projects.
## Citation
If you use this model in your research or applications, please cite:
```bibtex
@misc{kubernetes-ai-4bit,
author = {HAVELSAN/Açıklab},
title = {Kubernetes AI - 4bit Safetensors},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/aciklab/kubernetes-ai-4bit}}
}
```
## Contact
**Produced by:** HAVELSAN/Açıklab
For questions, feedback, or issues, please open an issue on the model repository or contact us through HuggingFace.
## Related Models
- [aciklab/kubernetes-ai](https://huggingface.co/aciklab/kubernetes-ai) - Original LoRA adapters
- [aciklab/kubernetes-ai-GGUF](https://huggingface.co/aciklab/kubernetes-ai-GGUF) - GGUF quantized versions for llama.cpp
---
**Note:** This is a 4bit quantized model ready for immediate use with the Transformers library. No additional model merging or quantization required.
|