Quantized Models
Collection
2 items
•
Updated
Using llama.cpp release b4273 for quantization.
Original model: https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct
Run them in LM Studio
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024
{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>
{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Filename | Quant type | File Size | Split | Description |
---|---|---|---|---|
Llama-3.3-70B-Instruct-Q5_K_M.gguf | Q5_K_M | 49.9 GB | false | High quality, recommended. |
Llama-3.3-70B-Instruct-Q4_K_M.gguf | Q4_K_M | 42.5 GB | false | Good quality, default size for most use cases, recommended. |
Llama-3.3-70B-Instruct-IQ3_XS.gguf | IQ3_XS | 29.3 GB | false | Lower quality, new method with decent performance, slightly better than Q3_K_S. |
First, make sure you have hugginface-cli installed:
pip install -U "huggingface_hub[cli]"
Then, you can target the specific file you want:
huggingface-cli download BabaK07/Llama-3.3-70B-Instruct-GGUF --include "Llama-3.3-70B-Instruct-Q4_K_M.gguf" --local-dir ./
Base model
meta-llama/Llama-3.1-70B