EliasOenal commited on
Commit
8bb7311
·
verified ·
1 Parent(s): f2e6aaf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +44 -3
README.md CHANGED
@@ -1,3 +1,44 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ tags:
6
+ - mistral
7
+ - mistral-small
8
+ - w8a8
9
+ - vllm
10
+ base_model: mistralai/Mistral-Small-24B-Instruct-2501
11
+ library_name: transformers
12
+ datasets:
13
+ - neuralmagic/LLM_compression_calibration
14
+ ---
15
+
16
+ # Mistral-Small-24B-Instruct-2501-W8A8-dynamic
17
+
18
+ ## Model Overview
19
+ - **Model Architecture:** Mistral-Small-24B-Instruct-2501
20
+ - **Input:** Text
21
+ - **Output:** Text
22
+ - **Model Optimizations:**
23
+ - **Weight quantization:** INT8
24
+ - **Activation quantization:** INT8
25
+ - **Release Date:** 2/12/2025
26
+ - **Version:** 1.0
27
+ - **Model Developers:** Elias Oenal
28
+
29
+ Quantized version of [Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501).
30
+
31
+ ### Model Optimizations
32
+
33
+ This model was obtained by quantizing the weights and activations to W8A8 data type, ready for inference with vLLM.
34
+ This optimization reduces the number of bits per parameter from 16 to 8, reducing the disk size and GPU memory requirements by approximately 50%. Only the weights and activations of the linear operators within transformers blocks are quantized.
35
+
36
+ ## Deployment
37
+
38
+ ### Use with vLLM
39
+
40
+ This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/latest/) backend.
41
+
42
+ ## Creation
43
+
44
+ This model was created with [llm-compressor](https://github.com/vllm-project/llm-compressor) and the [neuralmagic/LLM_compression_calibration](https://huggingface.co/datasets/neuralmagic/LLM_compression_calibration) dataset.