Upload folder using huggingface_hub

Browse files

Files changed (13) hide show

.gitattributes +1 -0
README.md +324 -3
added_tokens.json +3 -0
chat_template.jinja +47 -0
config.json +132 -0
generation_config.json +13 -0
model-00001-of-00002.safetensors +3 -0
model-00002-of-00002.safetensors +3 -0
model.safetensors.index.json +0 -0
special_tokens_map.json +33 -0
tokenizer.json +3 -0
tokenizer.model +3 -0
tokenizer_config.json +0 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,3 +1,324 @@
----
-license: mit
----

+---
+license: mit
+language:
+- tr
+- en
+library_name: transformers
+tags:
+- kubernetes
+- devops
+- quantized
+- 4bit
+- gemma3
+- bitsandbytes
+base_model: aciklab/kubernetes-ai
+model_type: gemma3
+quantized_by: aciklab
+---
+# Kubernetes AI - 4bit Safetensors
+Fine-tuned Gemma 3 12B model specialized for answering Kubernetes questions in Turkish, quantized to 4bit format for efficient inference with reduced memory footprint.
+## Model Description
+This repository contains a 4bit quantized version of the Kubernetes AI model, optimized for running on consumer hardware with reduced VRAM/RAM requirements. The model uses BitsAndBytes quantization with safetensors format for fast loading and efficient inference.
+**Primary Purpose:** Answer Kubernetes-related questions in Turkish language with minimal hardware requirements.
+## Model Specifications
+| Specification | Details |
+|---------------|---------|
+| **Format** | Safetensors (4bit quantized) |
+| **Base Model** | unsloth/gemma-3-12b-it-qat-bnb-4bit |
+| **Quantization** | 4bit (BitsAndBytes) |
+| **Model Size** | ~7.2 GB |
+| **Memory Usage** | ~8-10 GB VRAM/RAM |
+| **Precision** | 4bit weights, FP16 compute |
+## Quick Start
+### Installation
+```bash
+# Install required packages
+pip install torch transformers accelerate bitsandbytes safetensors
+```
+### Basic Usage
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+# Load model and tokenizer
+model_name = "aciklab/kubernetes-ai-4bit"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    device_map="auto",
+    trust_remote_code=True
+)
+# Prepare input
+prompt = "Kubernetes'te 3 replikaya sahip bir deployment nasıl oluştururum?"
+# Format with chat template
+messages = [
+    {"role": "system", "content": "Sen Kubernetes konusunda uzmanlaşmış bir yapay zeka asistanısın. Kubernetes ile ilgili soruları Türkçe olarak yanıtlıyorsun."},
+    {"role": "user", "content": prompt}
+]
+input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
+# Generate response
+outputs = model.generate(
+    **inputs,
+    max_new_tokens=512,
+    temperature=1.0,
+    top_p=0.95,
+    top_k=64,
+    repetition_penalty=1.05,
+    do_sample=True
+)
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(response)
+```
+### Advanced Usage with Pipeline
+```python
+from transformers import pipeline
+# Create text generation pipeline
+pipe = pipeline(
+    "text-generation",
+    model="aciklab/kubernetes-ai-4bit",
+    device_map="auto",
+    trust_remote_code=True
+)
+# Generate response
+messages = [
+    {"role": "system", "content": "Sen Kubernetes konusunda uzmanlaşmış bir yapay zeka asistanısın."},
+    {"role": "user", "content": "Pod ve Deployment arasındaki fark nedir?"}
+]
+response = pipe(
+    messages,
+    max_new_tokens=512,
+    temperature=1.0,
+    top_p=0.95,
+    do_sample=True
+)
+print(response[0]["generated_text"][-1]["content"])
+```
+### Streaming Responses
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer
+from threading import Thread
+model_name = "aciklab/kubernetes-ai-4bit"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    device_map="auto",
+    trust_remote_code=True
+)
+# Prepare input
+prompt = "Kubernetes Service türlerini açıkla"
+messages = [
+    {"role": "system", "content": "Sen Kubernetes konusunda uzmanlaşmış bir yapay zeka asistanısın."},
+    {"role": "user", "content": prompt}
+]
+input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
+# Setup streamer
+streamer = TextIteratorStreamer(tokenizer, skip_special_tokens=True)
+generation_kwargs = dict(
+    **inputs,
+    max_new_tokens=512,
+    temperature=1.0,
+    streamer=streamer
+)
+# Generate in separate thread
+thread = Thread(target=model.generate, kwargs=generation_kwargs)
+thread.start()
+# Stream output
+for text in streamer:
+    print(text, end="", flush=True)
+thread.join()
+```
+## Training Details
+This model is based on the [aciklab/kubernetes-ai](https://huggingface.co/aciklab/kubernetes-ai) LoRA adapters:
+- **Base Model:** unsloth/gemma-3-12b-it-qat-bnb-4bit
+- **Training Method:** LoRA (Low-Rank Adaptation)
+- **LoRA Rank:** 8
+- **Target Modules:** q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
+- **Training Dataset:** ~157,210 examples from Kubernetes docs, Stack Overflow, and DevOps datasets
+- **Training Time:** 28 hours on NVIDIA RTX 5070 12GB
+- **Max Sequence Length:** 1024 tokens
+### Training Dataset Summary
+| Dataset Category | Count | Description |
+|-----------------|-------|-------------|
+| **Kubernetes Official Docs** | 8,910 | Concepts, kubectl, setup, tasks, tutorials |
+| **Stack Overflow** | 52,000 | Kubernetes Q&A from community |
+| **DevOps Datasets** | 62,500 | General DevOps and Kubernetes content |
+| **Configurations & CLI** | 36,800 | Kubernetes configs, kubectl examples, operators |
+| **Total** | **~157,210** | Comprehensive Kubernetes knowledge base |
+## Quantization Details
+This model uses 4bit quantization with BitsAndBytes for optimal memory efficiency:
+- **Source:** Merged LoRA adapters with base model
+- **Quantization Method:** BitsAndBytes 4bit (NF4)
+- **Compute Precision:** FP16
+- **Format:** Safetensors (fast loading)
+- **Memory Footprint:** ~7.2 GB on disk, ~8-10 GB in memory
+### Advantages of 4bit Format
+- **Efficient Memory Usage:** Runs on GPUs with 8GB+ VRAM
+- **Fast Loading:** Safetensors format loads quickly
+- **Good Quality:** Minimal accuracy loss compared to full precision
+- **Framework Support:** Compatible with Transformers, vLLM, Text Generation Inference
+- **Flexible Deployment:** Can run on CPU with acceptable speed
+## Hardware Requirements
+### Minimum (GPU)
+- **GPU:** 8GB VRAM (e.g., RTX 3060, RTX 4060)
+- **RAM:** 8GB system memory
+- **Storage:** 10GB free space
+- **Recommended:** CUDA-capable NVIDIA GPU
+### Minimum (CPU Only)
+- **CPU:** 8+ cores
+- **RAM:** 16GB system memory
+- **Storage:** 10GB free space
+- **Note:** CPU inference will be slower than GPU
+### Recommended
+- **GPU:** 12GB+ VRAM (e.g., RTX 3080, RTX 4070, RTX 5070)
+- **RAM:** 16GB system memory
+- **Storage:** 15GB free space
+- **CUDA:** 11.8 or higher
+## Performance Benchmarks
+| Hardware | Tokens/Second | Latency (512 tokens) |
+|----------|---------------|----------------------|
+| RTX 5070 12GB | ~45-55 | ~10-12 seconds |
+| RTX 4060 8GB | ~35-45 | ~12-15 seconds |
+| CPU (16 cores) | ~5-10 | ~60-100 seconds |
+*Benchmarks are approximate and may vary based on system configuration*
+## Inference Optimization Tips
+### For Maximum Speed
+```python
+# Use Flash Attention 2 (if available)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    device_map="auto",
+    trust_remote_code=True,
+    attn_implementation="flash_attention_2"  # Requires flash-attn package
+)
+```
+### For Lower Memory Usage
+```python
+# Enable 8bit quantization instead of 4bit if needed
+from transformers import BitsAndBytesConfig
+quantization_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_compute_dtype=torch.float16,
+    bnb_4bit_use_double_quant=True,
+    bnb_4bit_quant_type="nf4"
+)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    quantization_config=quantization_config,
+    device_map="auto"
+)
+```
+## Example Queries
+```python
+# Example 1: Creating a Deployment
+"Kubernetes'te 3 replikaya sahip bir nginx deployment nasıl oluştururum?"
+# Example 2: Service Explanation
+"ClusterIP, NodePort ve LoadBalancer service türleri arasındaki farklar nelerdir?"
+# Example 3: Troubleshooting
+"Pod'um CrashLoopBackOff durumunda, nasıl debug edebilirim?"
+# Example 4: Configuration
+"ConfigMap ve Secret arasındaki fark nedir ve ne zaman hangisini kullanmalıyım?"
+# Example 5: Best Practices
+"Production ortamında Kubernetes deployment için en iyi pratikler nelerdir?"
+```
+## Limitations
+- **Language:** Optimized primarily for Turkish; English queries may work but with reduced quality
+- **Context Window:** 1024 tokens maximum sequence length
+- **Domain:** Specialized for Kubernetes; may not perform well on general topics
+- **Quantization:** 4bit quantization may occasionally affect response quality on complex queries
+## License
+This model is released under the **MIT License**. Free to use in commercial and open-source projects.
+## Citation
+If you use this model in your research or applications, please cite:
+```bibtex
+@misc{kubernetes-ai-4bit,
+  author = {HAVELSAN/Açıklab},
+  title = {Kubernetes AI - 4bit Safetensors},
+  year = {2025},
+  publisher = {HuggingFace},
+  howpublished = {\url{https://huggingface.co/aciklab/kubernetes-ai-4bit}}
+}
+```
+## Contact
+**Produced by:** HAVELSAN/Açıklab
+For questions, feedback, or issues, please open an issue on the model repository or contact us through HuggingFace.
+## Related Models
+- [aciklab/kubernetes-ai](https://huggingface.co/aciklab/kubernetes-ai) - Original LoRA adapters
+- [aciklab/kubernetes-ai-GGUF](https://huggingface.co/aciklab/kubernetes-ai-GGUF) - GGUF quantized versions for llama.cpp
+---
+**Note:** This is a 4bit quantized model ready for immediate use with the Transformers library. No additional model merging or quantization required.

added_tokens.json ADDED Viewed

	@@ -0,0 +1,3 @@

+{
+  "<image_soft_token>": 262144
+}

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,47 @@

+{{ bos_token }}
+{%- if messages[0]['role'] == 'system' -%}
+    {%- if messages[0]['content'] is string -%}
+        {%- set first_user_prefix = messages[0]['content'] + '
+' -%}
+    {%- else -%}
+        {%- set first_user_prefix = messages[0]['content'][0]['text'] + '
+' -%}
+    {%- endif -%}
+    {%- set loop_messages = messages[1:] -%}
+{%- else -%}
+    {%- set first_user_prefix = "" -%}
+    {%- set loop_messages = messages -%}
+{%- endif -%}
+{%- for message in loop_messages -%}
+    {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
+        {{ raise_exception("Conversation roles must alternate user/assistant/user/assistant/...") }}
+    {%- endif -%}
+    {%- if (message['role'] == 'assistant') -%}
+        {%- set role = "model" -%}
+    {%- else -%}
+        {%- set role = message['role'] -%}
+    {%- endif -%}
+    {{ '<start_of_turn>' + role + '
+' + (first_user_prefix if loop.first else "") }}
+    {%- if message['content'] is string -%}
+        {{ message['content'] | trim }}
+    {%- elif message['content'] is iterable -%}
+        {%- for item in message['content'] -%}
+            {%- if item['type'] == 'image' -%}
+                {{ '<start_of_image>' }}
+            {%- elif item['type'] == 'text' -%}
+                {{ item['text'] | trim }}
+            {%- endif -%}
+        {%- endfor -%}
+    {%- else -%}
+        {{ raise_exception("Invalid content type") }}
+    {%- endif -%}
+    {{ '<end_of_turn>
+' }}
+{%- endfor -%}
+{%- if add_generation_prompt -%}
+    {{'<start_of_turn>model
+'}}
+{%- endif -%}

config.json ADDED Viewed

	@@ -0,0 +1,132 @@

+{
+  "architectures": [
+    "Gemma3ForConditionalGeneration"
+  ],
+  "boi_token_index": 255999,
+  "bos_token_id": 2,
+  "eoi_token_index": 256000,
+  "eos_token_id": 106,
+  "image_token_index": 262144,
+  "initializer_range": 0.02,
+  "mm_tokens_per_image": 256,
+  "model_type": "gemma3",
+  "pad_token_id": 0,
+  "quantization_config": {
+    "_load_in_4bit": true,
+    "_load_in_8bit": false,
+    "bnb_4bit_compute_dtype": "bfloat16",
+    "bnb_4bit_quant_storage": "uint8",
+    "bnb_4bit_quant_type": "nf4",
+    "bnb_4bit_use_double_quant": true,
+    "llm_int8_enable_fp32_cpu_offload": false,
+    "llm_int8_has_fp16_weight": false,
+    "llm_int8_skip_modules": [
+      "lm_head",
+      "multi_modal_projector",
+      "merger",
+      "modality_projection"
+    ],
+    "llm_int8_threshold": 6.0,
+    "load_in_4bit": true,
+    "load_in_8bit": false,
+    "quant_method": "bitsandbytes"
+  },
+  "text_config": {
+    "_sliding_window_pattern": 6,
+    "attention_bias": false,
+    "attention_dropout": 0.0,
+    "attn_logit_softcapping": null,
+    "cache_implementation": "hybrid",
+    "final_logit_softcapping": null,
+    "head_dim": 256,
+    "hidden_activation": "gelu_pytorch_tanh",
+    "hidden_size": 3840,
+    "initializer_range": 0.02,
+    "intermediate_size": 15360,
+    "layer_types": [
+      "sliding_attention",
+      "sliding_attention",
+      "sliding_attention",
+      "sliding_attention",
+      "sliding_attention",
+      "full_attention",
+      "sliding_attention",
+      "sliding_attention",
+      "sliding_attention",
+      "sliding_attention",
+      "sliding_attention",
+      "full_attention",
+      "sliding_attention",
+      "sliding_attention",
+      "sliding_attention",
+      "sliding_attention",
+      "sliding_attention",
+      "full_attention",
+      "sliding_attention",
+      "sliding_attention",
+      "sliding_attention",
+      "sliding_attention",
+      "sliding_attention",
+      "full_attention",
+      "sliding_attention",
+      "sliding_attention",
+      "sliding_attention",
+      "sliding_attention",
+      "sliding_attention",
+      "full_attention",
+      "sliding_attention",
+      "sliding_attention",
+      "sliding_attention",
+      "sliding_attention",
+      "sliding_attention",
+      "full_attention",
+      "sliding_attention",
+      "sliding_attention",
+      "sliding_attention",
+      "sliding_attention",
+      "sliding_attention",
+      "full_attention",
+      "sliding_attention",
+      "sliding_attention",
+      "sliding_attention",
+      "sliding_attention",
+      "sliding_attention",
+      "full_attention"
+    ],
+    "max_position_embeddings": 131072,
+    "model_type": "gemma3_text",
+    "num_attention_heads": 16,
+    "num_hidden_layers": 48,
+    "num_key_value_heads": 8,
+    "query_pre_attn_scalar": 256,
+    "rms_norm_eps": 1e-06,
+    "rope_local_base_freq": 10000,
+    "rope_scaling": {
+      "factor": 8.0,
+      "rope_type": "linear"
+    },
+    "rope_theta": 1000000,
+    "sliding_window": 1024,
+    "torch_dtype": "bfloat16",
+    "use_cache": true,
+    "vocab_size": 262208
+  },
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.55.4",
+  "unsloth_fixed": true,
+  "vision_config": {
+    "attention_dropout": 0.0,
+    "hidden_act": "gelu_pytorch_tanh",
+    "hidden_size": 1152,
+    "image_size": 896,
+    "intermediate_size": 4304,
+    "layer_norm_eps": 1e-06,
+    "model_type": "siglip_vision_model",
+    "num_attention_heads": 16,
+    "num_channels": 3,
+    "num_hidden_layers": 27,
+    "patch_size": 14,
+    "torch_dtype": "bfloat16",
+    "vision_use_head": false
+  }
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,13 @@

+{
+  "bos_token_id": 2,
+  "cache_implementation": "hybrid",
+  "do_sample": true,
+  "eos_token_id": [
+    1,
+    106
+  ],
+  "pad_token_id": 0,
+  "top_k": 64,
+  "top_p": 0.95,
+  "transformers_version": "4.55.4"
+}

model-00001-of-00002.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e1c4dfdbd9ed238c8963e6c48673889cb9c5a65a044ed782229f7fb87ecb0657
+size 4992268790

model-00002-of-00002.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1ff251b4e29bc079e6c802a3f5529dd0543b8c5f66469352f974fa36b2dc7e39
+size 2806556012

model.safetensors.index.json ADDED Viewed

The diff for this file is too large to render. See raw diff

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,33 @@

+{
+  "boi_token": "<start_of_image>",
+  "bos_token": {
+    "content": "<bos>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eoi_token": "<end_of_image>",
+  "eos_token": {
+    "content": "<end_of_turn>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "image_token": "<image_soft_token>",
+  "pad_token": {
+    "content": "<pad>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4667f2089529e8e7657cfb6d1c19910ae71ff5f28aa7ab2ff2763330affad795
+size 33384568

tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1299c11d7cf632ef3b4e11937501358ada021bbdf7c47638d13c0ee982f2e79c
+size 4689074

tokenizer_config.json ADDED Viewed

The diff for this file is too large to render. See raw diff