Update README.md
Browse files
README.md
CHANGED
@@ -26,7 +26,7 @@ This is an 8-bit quantized MLX version of [zai-org/GLM-4.5](https://huggingface.
|
|
26 |
- **8-bit quantization** (8.502 bits per weight) for memory efficiency
|
27 |
- **MLX optimized** for Apple Silicon unified memory architecture
|
28 |
- **High-memory optimized**: Designed for systems with 512GB+ unified memory
|
29 |
-
- **Long context capable**: Tested with 6,500+ word documents
|
30 |
- **Performance**: ~11.75 tokens/second on Mac Studio with 512GB RAM
|
31 |
|
32 |
## Model Details
|
@@ -36,24 +36,24 @@ This is an 8-bit quantized MLX version of [zai-org/GLM-4.5](https://huggingface.
|
|
36 |
- **Quantization**: 8-bit MLX with group size 64
|
37 |
- **MLX-LM Version**: 0.26.3
|
38 |
- **Model Size**: ~375GB
|
39 |
-
- **Context Length**: 131,072 tokens (tested stable up to
|
40 |
|
41 |
## System Requirements
|
42 |
|
43 |
-
- **Hardware**: Mac Studio or Mac Pro with Apple Silicon (
|
44 |
- **Memory**: 512GB+ unified memory strongly recommended
|
45 |
- **Storage**: ~400GB free space
|
46 |
- **Software**: macOS with MLX framework
|
47 |
|
48 |
## Performance Benchmarks
|
49 |
|
50 |
-
**Test Configuration**: Mac Studio with 512GB unified memory
|
51 |
|
52 |
### Context Length Performance
|
53 |
- **Short Context (6.5K tokens)**: 11.75 tokens/second
|
54 |
- **Long Context (72K tokens)**: 5.0 tokens/second, 86% memory usage
|
55 |
-
- **Extended Context (121K tokens)**: 2.53 tokens/second, 92% memory usage
|
56 |
-
- **Beyond Theoretical Limit (132K tokens)**: 5.74 tokens/second, 85% peak memory
|
57 |
- **Proven Capability**: Successfully exceeds stated 131K context window (102.2% capacity)
|
58 |
- **Quality**: Full comprehension and analysis of complex, sprawling content at maximum context
|
59 |
|
@@ -66,7 +66,7 @@ This is an 8-bit quantized MLX version of [zai-org/GLM-4.5](https://huggingface.
|
|
66 |
|
67 |
### Comparison with GGUF
|
68 |
- **MLX Version**: System remains responsive during inference, stable performance
|
69 |
-
- **GGUF Version**: System becomes unusable, frequent crashes around 30-40K tokens
|
70 |
|
71 |
## Usage
|
72 |
|
|
|
26 |
- **8-bit quantization** (8.502 bits per weight) for memory efficiency
|
27 |
- **MLX optimized** for Apple Silicon unified memory architecture
|
28 |
- **High-memory optimized**: Designed for systems with 512GB+ unified memory
|
29 |
+
- **Long context capable**: Tested with multiple 6,500+ word documents, 30K token chunks
|
30 |
- **Performance**: ~11.75 tokens/second on Mac Studio with 512GB RAM
|
31 |
|
32 |
## Model Details
|
|
|
36 |
- **Quantization**: 8-bit MLX with group size 64
|
37 |
- **MLX-LM Version**: 0.26.3
|
38 |
- **Model Size**: ~375GB
|
39 |
+
- **Context Length**: 131,072 tokens (tested stable up to 132K+ tokens)
|
40 |
|
41 |
## System Requirements
|
42 |
|
43 |
+
- **Hardware**: Mac Studio or Mac Pro with Apple Silicon (M3 Ultra)
|
44 |
- **Memory**: 512GB+ unified memory strongly recommended
|
45 |
- **Storage**: ~400GB free space
|
46 |
- **Software**: macOS with MLX framework
|
47 |
|
48 |
## Performance Benchmarks
|
49 |
|
50 |
+
**Test Configuration**: 2025 Mac Studio M3 Ultra with 512GB unified memory
|
51 |
|
52 |
### Context Length Performance
|
53 |
- **Short Context (6.5K tokens)**: 11.75 tokens/second
|
54 |
- **Long Context (72K tokens)**: 5.0 tokens/second, 86% memory usage
|
55 |
+
- **Extended Context (121K tokens)**: 30K token input prompt, 2.53 tokens/second, 92% memory usage
|
56 |
+
- **Beyond Theoretical Limit (132K tokens)**: 11k token input prompt, 5.74 tokens/second, 85% peak memory
|
57 |
- **Proven Capability**: Successfully exceeds stated 131K context window (102.2% capacity)
|
58 |
- **Quality**: Full comprehension and analysis of complex, sprawling content at maximum context
|
59 |
|
|
|
66 |
|
67 |
### Comparison with GGUF
|
68 |
- **MLX Version**: System remains responsive during inference, stable performance
|
69 |
+
- **GGUF Version**: System becomes unusable, frequent crashes around 30-40K tokens in context window
|
70 |
|
71 |
## Usage
|
72 |
|