--- license: mit language: - tr - en library_name: transformers tags: - kubernetes - devops - quantized - 4bit - gemma3 - bitsandbytes base_model: aciklab/kubernetes-ai model_type: gemma3 quantized_by: aciklab --- # Kubernetes AI - 4bit Safetensors Fine-tuned Gemma 3 12B model specialized for answering Kubernetes questions in Turkish, quantized to 4bit format for efficient inference with reduced memory footprint. ## Model Description This repository contains a 4bit quantized version of the Kubernetes AI model, optimized for running on consumer hardware with reduced VRAM/RAM requirements. The model uses BitsAndBytes quantization with safetensors format for fast loading and efficient inference. **Primary Purpose:** Answer Kubernetes-related questions in Turkish language with minimal hardware requirements. ## Model Specifications | Specification | Details | |---------------|---------| | **Format** | Safetensors (4bit quantized) | | **Base Model** | unsloth/gemma-3-12b-it-qat-bnb-4bit | | **Quantization** | 4bit (BitsAndBytes) | | **Model Size** | ~7.2 GB | | **Memory Usage** | ~8-10 GB VRAM/RAM | | **Precision** | 4bit weights, FP16 compute | ## Quick Start ### Installation ```bash # Install required packages pip install torch transformers accelerate bitsandbytes safetensors ``` ### Basic Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch # Load model and tokenizer model_name = "aciklab/kubernetes-ai-4bit" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, device_map="auto", trust_remote_code=True ) # Prepare input prompt = "Kubernetes'te 3 replikaya sahip bir deployment nasıl oluştururum?" # Format with chat template messages = [ {"role": "system", "content": "Sen Kubernetes konusunda uzmanlaşmış bir yapay zeka asistanısın. Kubernetes ile ilgili soruları Türkçe olarak yanıtlıyorsun."}, {"role": "user", "content": prompt} ] input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(input_text, return_tensors="pt").to(model.device) # Generate response outputs = model.generate( **inputs, max_new_tokens=512, temperature=1.0, top_p=0.95, top_k=64, repetition_penalty=1.05, do_sample=True ) response = tokenizer.decode(outputs[0], skip_special_tokens=True) print(response) ``` ## Training Details This model is based on the [aciklab/kubernetes-ai](https://huggingface.co/aciklab/kubernetes-ai) LoRA adapters: - **Base Model:** unsloth/gemma-3-12b-it-qat-bnb-4bit - **Training Method:** LoRA (Low-Rank Adaptation) - **LoRA Rank:** 8 - **Target Modules:** q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj - **Training Dataset:** ~157,210 examples from Kubernetes docs, Stack Overflow, and DevOps datasets - **Training Time:** 28 hours on NVIDIA RTX 5070 12GB - **Max Sequence Length:** 1024 tokens ### Training Dataset Summary | Dataset Category | Count | Description | |-----------------|-------|-------------| | **Kubernetes Official Docs** | 8,910 | Concepts, kubectl, setup, tasks, tutorials | | **Stack Overflow** | 52,000 | Kubernetes Q&A from community | | **DevOps Datasets** | 62,500 | General DevOps and Kubernetes content | | **Configurations & CLI** | 36,800 | Kubernetes configs, kubectl examples, operators | | **Total** | **~157,210** | Comprehensive Kubernetes knowledge base | ## Quantization Details This model uses 4bit quantization with BitsAndBytes for optimal memory efficiency: - **Source:** Merged LoRA adapters with base model - **Quantization Method:** BitsAndBytes 4bit (NF4) - **Compute Precision:** FP16 - **Format:** Safetensors (fast loading) - **Memory Footprint:** ~7.2 GB on disk, ~8-10 GB in memory ### Advantages of 4bit Format - **Efficient Memory Usage:** Runs on GPUs with 8GB+ VRAM - **Fast Loading:** Safetensors format loads quickly - **Good Quality:** Minimal accuracy loss compared to full precision - **Framework Support:** Compatible with Transformers, vLLM, Text Generation Inference - **Flexible Deployment:** Can run on CPU with acceptable speed ## Hardware Requirements ### Minimum (GPU) - **GPU:** 8GB VRAM - **RAM:** 8GB system memory - **Storage:** 10GB free space ### Recommended - **GPU:** 12GB+ VRAM - **RAM:** 16GB system memory - **Storage:** 15GB free space ## Limitations - **Language:** Optimized primarily for Turkish and English. - **Domain:** Specialized for Kubernetes; may not perform well on general topics - **Quantization:** 4bit quantization may occasionally affect response quality on complex queries ## License This model is released under the **MIT License**. Free to use in commercial and open-source projects. ## Citation If you use this model in your research or applications, please cite: ```bibtex @misc{kubernetes-ai-4bit, author = {HAVELSAN/Açıklab}, title = {Kubernetes AI - 4bit Safetensors}, year = {2025}, publisher = {HuggingFace}, howpublished = {\url{https://huggingface.co/aciklab/kubernetes-ai-4bit}} } ``` ## Contact **Produced by:** HAVELSAN/Açıklab For questions, feedback, or issues, please open an issue on the model repository or contact us through HuggingFace. ## Related Models - [aciklab/kubernetes-ai](https://huggingface.co/aciklab/kubernetes-ai) - Original LoRA adapters - [aciklab/kubernetes-ai-GGUF](https://huggingface.co/aciklab/kubernetes-ai-GGUF) - GGUF quantized versions for llama.cpp --- **Note:** This is a 4bit quantized model ready for immediate use with the Transformers library. No additional model merging or quantization required.