Llamacpp imatrix Quantizations of Llama-3.3-70B-Instruct

Using llama.cpp release b4273 for quantization.

Original model: https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct

Run them in LM Studio

Prompt format

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024

{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>

{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Download a file (not the whole branch) from below:

Filename Quant type File Size Split Description
Llama-3.3-70B-Instruct-Q5_K_M.gguf Q5_K_M 49.9 GB false High quality, recommended.
Llama-3.3-70B-Instruct-Q4_K_M.gguf Q4_K_M 42.5 GB false Good quality, default size for most use cases, recommended.
Llama-3.3-70B-Instruct-IQ3_XS.gguf IQ3_XS 29.3 GB false Lower quality, new method with decent performance, slightly better than Q3_K_S.

Downloading using huggingface-cli

Click to view download instructions

First, make sure you have hugginface-cli installed:

pip install -U "huggingface_hub[cli]"

Then, you can target the specific file you want:

huggingface-cli download BabaK07/Llama-3.3-70B-Instruct-GGUF --include "Llama-3.3-70B-Instruct-Q4_K_M.gguf" --local-dir ./
Downloads last month
119
GGUF
Model size
70.6B params
Architecture
llama

4-bit

5-bit

Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for BabaK07/Llama-3.3-70B-Instruct-GGUF

Quantized
(99)
this model

Collection including BabaK07/Llama-3.3-70B-Instruct-GGUF