HackNetAyush commited on
Commit
8deb4b3
·
verified ·
1 Parent(s): 616e513

YAML Metadata added

Browse files
Files changed (1) hide show
  1. README.md +152 -132
README.md CHANGED
@@ -1,133 +1,153 @@
1
- # SmolLM2 135M Instruct (Quantized Q8_0, GGUF)
2
-
3
- [![Apache License 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://www.apache.org/licenses/LICENSE-2.0)
4
-
5
- [![Model Parameters](https://img.shields.io/badge/Parameters-135M-green.svg)]()
6
-
7
- [![Model Size](https://img.shields.io/badge/Size-138MB-green.svg)]()
8
-
9
- [![Context Length](https://img.shields.io/badge/Context-8K%20tokens-orange.svg)]()
10
-
11
- A tiny yet powerful instruction-tuned language model optimized for CPU inference. With only 135 million parameters and a file size of 138 MB, this model delivers impressive performance even on modest hardware.
12
-
13
- ## 🌟 Key Features
14
-
15
- - **Tiny Footprint**: Only 138 MB in size
16
- - **CPU-Friendly**: Runs efficiently without a GPU
17
- - **Low Resource Requirements**: Works on systems with just 1-2 GB RAM
18
- - **Fast Inference**: Responsive even on older CPUs
19
- - **Instruction-Tuned**: Optimized for chat and instruction-following tasks
20
- - **Long Context**: Supports up to 8,192 tokens
21
-
22
- ## 📦 Model Details
23
-
24
- - **Architecture**: LLaMA-like transformer
25
- - **Parameters**: 135M
26
- - **Format**: GGUF (compatible with llama.cpp ecosystem)
27
- - **Quantization**: Q8_0 (8-bit linear quantization)
28
- - **Type**: Instruction-tuned chat model
29
-
30
- ## 🗂️ Repository Contents
31
-
32
- - `smollm2-135m-instruct-q8_0.gguf` - Main model file (Q8_0 quantized)
33
- - `tokenizer.json` - Model tokenizer file
34
- - `config.json` - HuggingFace compatibility configuration
35
- - `LICENSE` - Apache 2.0 license file
36
- - `README.md` - This documentation
37
-
38
- ## 🚀 Quick Start Guide
39
-
40
- ### Prerequisites
41
- ```bash
42
- # Install llama-cpp-python
43
- pip install llama-cpp-python
44
- ```
45
-
46
- ### Using llama.cpp CLI
47
-
48
- ```bash
49
- # Basic usage
50
- ./main -m smollm2-135m-instruct-q8_0.gguf -p "Who are you?"
51
-
52
- # With custom parameters
53
- ./main -m smollm2-135m-instruct-q8_0.gguf --ctx-size 2048 --threads 4 -p "Write a story."
54
- ```
55
-
56
- ### Using Python with llama-cpp-python
57
-
58
- ```python
59
- from llama_cpp import Llama
60
-
61
- # Initialize the model
62
- llm = Llama(
63
- model_path="smollm2-135m-instruct-q8_0.gguf",
64
- n_ctx=2048, # Context window
65
- n_threads=4, # CPU threads to use
66
- n_batch=512 # Batch size for prompt processing
67
- )
68
-
69
- # Generate a response
70
- output = llm("What is the capital of France?",
71
- max_tokens=128,
72
- temperature=0.7,
73
- top_p=0.95)
74
- print(output)
75
- ```
76
-
77
- ## 💬 Prompt Format
78
-
79
- This is a chat-style instruction-tuned model. Use the following message format for best results:
80
-
81
- ```json
82
- [
83
- {"role": "system", "content": "You are a helpful AI assistant."},
84
- {"role": "user", "content": "Tell me a joke."}
85
- ]
86
- ```
87
-
88
- ### Example Interaction
89
-
90
- ```
91
- User: What is your name?
92
-
93
- Luna: My name is Luna, and I'm your tiny but capable AI assistant, ready to help with anything you need!
94
- ```
95
-
96
- ## 🔧 Compatible Software
97
-
98
- - llama.cpp
99
- - text-generation-webui
100
- - LM Studio
101
- - KoboldCPP
102
- - llama-cpp-python
103
-
104
- ## 💪 Why Choose This Model?
105
-
106
- - ✨ **Runs Offline**: No internet connection needed
107
- - 📱 **Tiny Footprint**: Just 138 MB on disk
108
- - **Fast Inference**: Optimized for CPU performance
109
- - 🌐 **Open Source**: Apache 2.0 licensed
110
- - 🛠️ **Versatile**: Perfect for edge devices, embedded systems, hobby projects, and learning
111
-
112
- ## 🥲 Limitations
113
-
114
- SmolLM2 models primarily understand and generate content in English. They can produce text on a variety of topics, but the generated content may not always be factually accurate, logically consistent, or free from biases present in the training data. These models should be used as assistive tools rather than definitive sources of information. Users should always verify important information and critically evaluate any generated content.
115
-
116
-
117
- ## 📄 License
118
-
119
- [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
120
-
121
- ## 🙏 Credits
122
-
123
- - Quantized and packaged by Ayush Swami (HackNetAyush)
124
- - Based on HuggingFaceTB's SmolLM2-135M-Instruct model
125
-
126
- ## 💻 Hardware Requirements
127
-
128
- - CPU: Any modern CPU
129
- - RAM: 1-2 GB minimum
130
- - GPU: Not required
131
- - Disk Space: ~140 MB
132
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
133
  Feel free to Like ❤️ the repository if you find this model useful!
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - HuggingFaceTB/SmolLM2-135M-Instruct
7
+ pipeline_tag: text-generation
8
+ tags:
9
+ - gguf
10
+ - q8_0
11
+ - quantized
12
+ - llama
13
+ - llama.cpp
14
+ - smollm2
15
+ - embedded-ai
16
+ - lightweight
17
+ - fast-inference
18
+ - efficient
19
+ - tiny-llm
20
+ ---
21
+ # SmolLM2 135M Instruct (Quantized Q8_0, GGUF)
22
+
23
+ [![Apache License 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://www.apache.org/licenses/LICENSE-2.0)
24
+
25
+ [![Model Parameters](https://img.shields.io/badge/Parameters-135M-green.svg)]()
26
+
27
+ [![Model Size](https://img.shields.io/badge/Size-138MB-green.svg)]()
28
+
29
+ [![Context Length](https://img.shields.io/badge/Context-8K%20tokens-orange.svg)]()
30
+
31
+ A tiny yet powerful instruction-tuned language model optimized for CPU inference. With only 135 million parameters and a file size of 138 MB, this model delivers impressive performance even on modest hardware.
32
+
33
+ ## 🌟 Key Features
34
+
35
+ - **Tiny Footprint**: Only 138 MB in size
36
+ - **CPU-Friendly**: Runs efficiently without a GPU
37
+ - **Low Resource Requirements**: Works on systems with just 1-2 GB RAM
38
+ - **Fast Inference**: Responsive even on older CPUs
39
+ - **Instruction-Tuned**: Optimized for chat and instruction-following tasks
40
+ - **Long Context**: Supports up to 8,192 tokens
41
+
42
+ ## 📦 Model Details
43
+
44
+ - **Architecture**: LLaMA-like transformer
45
+ - **Parameters**: 135M
46
+ - **Format**: GGUF (compatible with llama.cpp ecosystem)
47
+ - **Quantization**: Q8_0 (8-bit linear quantization)
48
+ - **Type**: Instruction-tuned chat model
49
+
50
+ ## 🗂️ Repository Contents
51
+
52
+ - `smollm2-135m-instruct-q8_0.gguf` - Main model file (Q8_0 quantized)
53
+ - `tokenizer.json` - Model tokenizer file
54
+ - `config.json` - HuggingFace compatibility configuration
55
+ - `LICENSE` - Apache 2.0 license file
56
+ - `README.md` - This documentation
57
+
58
+ ## 🚀 Quick Start Guide
59
+
60
+ ### Prerequisites
61
+ ```bash
62
+ # Install llama-cpp-python
63
+ pip install llama-cpp-python
64
+ ```
65
+
66
+ ### Using llama.cpp CLI
67
+
68
+ ```bash
69
+ # Basic usage
70
+ ./main -m smollm2-135m-instruct-q8_0.gguf -p "Who are you?"
71
+
72
+ # With custom parameters
73
+ ./main -m smollm2-135m-instruct-q8_0.gguf --ctx-size 2048 --threads 4 -p "Write a story."
74
+ ```
75
+
76
+ ### Using Python with llama-cpp-python
77
+
78
+ ```python
79
+ from llama_cpp import Llama
80
+
81
+ # Initialize the model
82
+ llm = Llama(
83
+ model_path="smollm2-135m-instruct-q8_0.gguf",
84
+ n_ctx=2048, # Context window
85
+ n_threads=4, # CPU threads to use
86
+ n_batch=512 # Batch size for prompt processing
87
+ )
88
+
89
+ # Generate a response
90
+ output = llm("What is the capital of France?",
91
+ max_tokens=128,
92
+ temperature=0.7,
93
+ top_p=0.95)
94
+ print(output)
95
+ ```
96
+
97
+ ## 💬 Prompt Format
98
+
99
+ This is a chat-style instruction-tuned model. Use the following message format for best results:
100
+
101
+ ```json
102
+ [
103
+ {"role": "system", "content": "You are a helpful AI assistant."},
104
+ {"role": "user", "content": "Tell me a joke."}
105
+ ]
106
+ ```
107
+
108
+ ### Example Interaction
109
+
110
+ ```
111
+ User: What is your name?
112
+
113
+ Luna: My name is Luna, and I'm your tiny but capable AI assistant, ready to help with anything you need!
114
+ ```
115
+
116
+ ## 🔧 Compatible Software
117
+
118
+ - llama.cpp
119
+ - text-generation-webui
120
+ - LM Studio
121
+ - KoboldCPP
122
+ - llama-cpp-python
123
+
124
+ ## 💪 Why Choose This Model?
125
+
126
+ - **Runs Offline**: No internet connection needed
127
+ - 📱 **Tiny Footprint**: Just 138 MB on disk
128
+ - **Fast Inference**: Optimized for CPU performance
129
+ - 🌐 **Open Source**: Apache 2.0 licensed
130
+ - 🛠️ **Versatile**: Perfect for edge devices, embedded systems, hobby projects, and learning
131
+
132
+ ## 🥲 Limitations
133
+
134
+ SmolLM2 models primarily understand and generate content in English. They can produce text on a variety of topics, but the generated content may not always be factually accurate, logically consistent, or free from biases present in the training data. These models should be used as assistive tools rather than definitive sources of information. Users should always verify important information and critically evaluate any generated content.
135
+
136
+
137
+ ## 📄 License
138
+
139
+ [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
140
+
141
+ ## 🙏 Credits
142
+
143
+ - Quantized and packaged by Ayush Swami (HackNetAyush)
144
+ - Based on HuggingFaceTB's SmolLM2-135M-Instruct model
145
+
146
+ ## 💻 Hardware Requirements
147
+
148
+ - CPU: Any modern CPU
149
+ - RAM: 1-2 GB minimum
150
+ - GPU: Not required
151
+ - Disk Space: ~140 MB
152
+
153
  Feel free to Like ❤️ the repository if you find this model useful!