Onceler commited on
Commit
24fd5bf
·
verified ·
1 Parent(s): ec66225

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +120 -3
README.md CHANGED
@@ -1,3 +1,120 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: mlx-lm
4
+ tags:
5
+ - mlx
6
+ - apple-silicon
7
+ - quantized
8
+ - moe
9
+ - text-generation
10
+ base_model: zai-org/GLM-4.5
11
+ model_type: glm
12
+ language:
13
+ - en
14
+ - zh
15
+ pipeline_tag: text-generation
16
+ ---
17
+
18
+ # GLM-4.5 MLX 8-bit
19
+
20
+ ## Model Description
21
+
22
+ This is an 8-bit quantized MLX version of [zai-org/GLM-4.5](https://huggingface.co/zai-org/GLM-4.5), optimized for Apple Silicon with high unified memory configurations.
23
+
24
+ ## Key Features
25
+
26
+ - **8-bit quantization** (8.502 bits per weight) for memory efficiency
27
+ - **MLX optimized** for Apple Silicon unified memory architecture
28
+ - **High-memory optimized**: Designed for systems with 512GB+ unified memory
29
+ - **Long context capable**: Tested with 6,500+ word documents
30
+ - **Performance**: ~11.75 tokens/second on Mac Studio with 512GB RAM
31
+
32
+ ## Model Details
33
+
34
+ - **Base Model**: GLM-4.5 by ZhipuAI
35
+ - **Architecture**: MoE (Mixture of Experts)
36
+ - **Quantization**: 8-bit MLX with group size 64
37
+ - **MLX-LM Version**: 0.26.3
38
+ - **Model Size**: ~375GB
39
+ - **Context Length**: 131,072 tokens (tested stable up to 72K+ tokens)
40
+
41
+ ## System Requirements
42
+
43
+ - **Hardware**: Mac Studio or Mac Pro with Apple Silicon (M1/M2/M3 series)
44
+ - **Memory**: 512GB+ unified memory strongly recommended
45
+ - **Storage**: ~400GB free space
46
+ - **Software**: macOS with MLX framework
47
+
48
+ ## Performance Benchmarks
49
+
50
+ **Test Configuration**: Mac Studio with 512GB unified memory
51
+
52
+ ### Context Length Performance
53
+ - **Short Context (6.5K tokens)**: 11.75 tokens/second
54
+ - **Long Context (72K tokens)**: 5.0 tokens/second, 86% memory usage
55
+ - **Extended Context (121K tokens)**: 2.53 tokens/second, 92% memory usage
56
+ - **Beyond Theoretical Limit (132K tokens)**: 5.74 tokens/second, 85% peak memory
57
+ - **Proven Capability**: Successfully exceeds stated 131K context window (102.2% capacity)
58
+ - **Quality**: Full comprehension and analysis of complex, sprawling content at maximum context
59
+
60
+ ### Recommended Generation Settings
61
+ - **Temperature**: 0.8
62
+ - **Top K**: 100
63
+ - **Repeat Penalty**: 1.1
64
+ - **Min P**: Default/unset
65
+ - **Top P**: Default/unset
66
+
67
+ ### Comparison with GGUF
68
+ - **MLX Version**: System remains responsive during inference, stable performance
69
+ - **GGUF Version**: System becomes unusable, frequent crashes around 30-40K tokens
70
+
71
+ ## Usage
72
+
73
+ ### With MLX-LM
74
+
75
+ ```python
76
+ from mlx_lm import load, generate
77
+
78
+ model, tokenizer = load("mlx-community/GLM-4.5-MLX-8bit")
79
+ response = generate(model, tokenizer, "Your prompt here", max_tokens=500)
80
+ ```
81
+
82
+ ### With LM Studio
83
+
84
+ 1. Download the model files
85
+ 2. Load in LM Studio
86
+ 3. Set appropriate context length based on your memory
87
+ 4. Recommended settings: [Add any specific settings you found worked well]
88
+
89
+ ## Limitations
90
+
91
+ - Requires substantial unified memory (512GB+ recommended)
92
+ - Optimized specifically for Apple Silicon; may not perform well on other architectures
93
+ - Quantization may introduce minor quality differences compared to the full-precision model
94
+
95
+ ## Training Data & Bias
96
+
97
+ Please refer to the original [GLM-4.5 model card](https://huggingface.co/zai-org/GLM-4.5) for information about training data, intended use, and potential biases.
98
+
99
+ ## Citation
100
+
101
+ If you use this model, please cite both the original GLM-4.5 work and acknowledge this MLX conversion:
102
+
103
+ ```bibtex
104
+ @misc{glm45-mlx-8bit,
105
+ title={GLM-4.5 MLX 8-bit},
106
+ author={Onceler},
107
+ year={2025},
108
+ howpublished={\url{https://huggingface.co/mlx-community/GLM-4.5-MLX-8bit}},
109
+ }
110
+ ```
111
+
112
+ ## Acknowledgments
113
+
114
+ - Original model by ZhipuAI (zai-org/GLM-4.5)
115
+ - MLX framework by Apple
116
+ - Conversion performed on Mac Studio with 512GB unified memory
117
+
118
+ ## License
119
+
120
+ This model inherits the license from the original GLM-4.5 model. Please refer to the original model repository for license details.