HackNetAyush commited on
Commit
616e513
·
verified ·
1 Parent(s): 522de31

Initial model upload

Browse files
Files changed (6) hide show
  1. .gitattributes +1 -0
  2. LICENSE +13 -0
  3. README.md +133 -3
  4. config.json +37 -0
  5. smollm2-135m-instruct-q8_0.gguf +3 -0
  6. tokenizer.json +0 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ smollm2-135m-instruct-q8_0.gguf filter=lfs diff=lfs merge=lfs -text
LICENSE ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Copyright 2025 HackNetAyush
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License");
4
+ you may not use this file except in compliance with the License.
5
+ You may obtain a copy of the License at
6
+
7
+ http://www.apache.org/licenses/LICENSE-2.0
8
+
9
+ Unless required by applicable law or agreed to in writing, software
10
+ distributed under the License is distributed on an "AS IS" BASIS,
11
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ See the License for the specific language governing permissions and
13
+ limitations under the License.
README.md CHANGED
@@ -1,3 +1,133 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SmolLM2 135M Instruct (Quantized Q8_0, GGUF)
2
+
3
+ [![Apache License 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://www.apache.org/licenses/LICENSE-2.0)
4
+
5
+ [![Model Parameters](https://img.shields.io/badge/Parameters-135M-green.svg)]()
6
+
7
+ [![Model Size](https://img.shields.io/badge/Size-138MB-green.svg)]()
8
+
9
+ [![Context Length](https://img.shields.io/badge/Context-8K%20tokens-orange.svg)]()
10
+
11
+ A tiny yet powerful instruction-tuned language model optimized for CPU inference. With only 135 million parameters and a file size of 138 MB, this model delivers impressive performance even on modest hardware.
12
+
13
+ ## 🌟 Key Features
14
+
15
+ - **Tiny Footprint**: Only 138 MB in size
16
+ - **CPU-Friendly**: Runs efficiently without a GPU
17
+ - **Low Resource Requirements**: Works on systems with just 1-2 GB RAM
18
+ - **Fast Inference**: Responsive even on older CPUs
19
+ - **Instruction-Tuned**: Optimized for chat and instruction-following tasks
20
+ - **Long Context**: Supports up to 8,192 tokens
21
+
22
+ ## 📦 Model Details
23
+
24
+ - **Architecture**: LLaMA-like transformer
25
+ - **Parameters**: 135M
26
+ - **Format**: GGUF (compatible with llama.cpp ecosystem)
27
+ - **Quantization**: Q8_0 (8-bit linear quantization)
28
+ - **Type**: Instruction-tuned chat model
29
+
30
+ ## 🗂️ Repository Contents
31
+
32
+ - `smollm2-135m-instruct-q8_0.gguf` - Main model file (Q8_0 quantized)
33
+ - `tokenizer.json` - Model tokenizer file
34
+ - `config.json` - HuggingFace compatibility configuration
35
+ - `LICENSE` - Apache 2.0 license file
36
+ - `README.md` - This documentation
37
+
38
+ ## 🚀 Quick Start Guide
39
+
40
+ ### Prerequisites
41
+ ```bash
42
+ # Install llama-cpp-python
43
+ pip install llama-cpp-python
44
+ ```
45
+
46
+ ### Using llama.cpp CLI
47
+
48
+ ```bash
49
+ # Basic usage
50
+ ./main -m smollm2-135m-instruct-q8_0.gguf -p "Who are you?"
51
+
52
+ # With custom parameters
53
+ ./main -m smollm2-135m-instruct-q8_0.gguf --ctx-size 2048 --threads 4 -p "Write a story."
54
+ ```
55
+
56
+ ### Using Python with llama-cpp-python
57
+
58
+ ```python
59
+ from llama_cpp import Llama
60
+
61
+ # Initialize the model
62
+ llm = Llama(
63
+ model_path="smollm2-135m-instruct-q8_0.gguf",
64
+ n_ctx=2048, # Context window
65
+ n_threads=4, # CPU threads to use
66
+ n_batch=512 # Batch size for prompt processing
67
+ )
68
+
69
+ # Generate a response
70
+ output = llm("What is the capital of France?",
71
+ max_tokens=128,
72
+ temperature=0.7,
73
+ top_p=0.95)
74
+ print(output)
75
+ ```
76
+
77
+ ## 💬 Prompt Format
78
+
79
+ This is a chat-style instruction-tuned model. Use the following message format for best results:
80
+
81
+ ```json
82
+ [
83
+ {"role": "system", "content": "You are a helpful AI assistant."},
84
+ {"role": "user", "content": "Tell me a joke."}
85
+ ]
86
+ ```
87
+
88
+ ### Example Interaction
89
+
90
+ ```
91
+ User: What is your name?
92
+
93
+ Luna: My name is Luna, and I'm your tiny but capable AI assistant, ready to help with anything you need!
94
+ ```
95
+
96
+ ## 🔧 Compatible Software
97
+
98
+ - llama.cpp
99
+ - text-generation-webui
100
+ - LM Studio
101
+ - KoboldCPP
102
+ - llama-cpp-python
103
+
104
+ ## 💪 Why Choose This Model?
105
+
106
+ - ✨ **Runs Offline**: No internet connection needed
107
+ - 📱 **Tiny Footprint**: Just 138 MB on disk
108
+ - ⚡ **Fast Inference**: Optimized for CPU performance
109
+ - 🌐 **Open Source**: Apache 2.0 licensed
110
+ - 🛠️ **Versatile**: Perfect for edge devices, embedded systems, hobby projects, and learning
111
+
112
+ ## 🥲 Limitations
113
+
114
+ SmolLM2 models primarily understand and generate content in English. They can produce text on a variety of topics, but the generated content may not always be factually accurate, logically consistent, or free from biases present in the training data. These models should be used as assistive tools rather than definitive sources of information. Users should always verify important information and critically evaluate any generated content.
115
+
116
+
117
+ ## 📄 License
118
+
119
+ [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
120
+
121
+ ## 🙏 Credits
122
+
123
+ - Quantized and packaged by Ayush Swami (HackNetAyush)
124
+ - Based on HuggingFaceTB's SmolLM2-135M-Instruct model
125
+
126
+ ## 💻 Hardware Requirements
127
+
128
+ - CPU: Any modern CPU
129
+ - RAM: 1-2 GB minimum
130
+ - GPU: Not required
131
+ - Disk Space: ~140 MB
132
+
133
+ Feel free to Like ❤️ the repository if you find this model useful!
config.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "LlamaForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 1,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "silu",
10
+ "hidden_size": 576,
11
+ "initializer_range": 0.041666666666666664,
12
+ "intermediate_size": 1536,
13
+ "is_llama_config": true,
14
+ "max_position_embeddings": 8192,
15
+ "mlp_bias": false,
16
+ "model_type": "llama",
17
+ "num_attention_heads": 9,
18
+ "num_hidden_layers": 30,
19
+ "num_key_value_heads": 3,
20
+ "pad_token_id": 2,
21
+ "pretraining_tp": 1,
22
+ "rms_norm_eps": 1e-05,
23
+ "rope_interleaved": false,
24
+ "rope_scaling": null,
25
+ "rope_theta": 100000,
26
+ "tie_word_embeddings": true,
27
+ "torch_dtype": "bfloat16",
28
+ "transformers_version": "4.42.3",
29
+ "transformers.js_config": {
30
+ "kv_cache_dtype": {
31
+ "q4f16": "float16",
32
+ "fp16": "float16"
33
+ }
34
+ },
35
+ "use_cache": true,
36
+ "vocab_size": 49152
37
+ }
smollm2-135m-instruct-q8_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4090835042df84521a580391915d5cfed516b911e203a1580f1d69aa7a4b9bf8
3
+ size 144811968
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff