Quantized Llama 3 70B Instruct to Q40 format supported by Distributed Llama.

License

Before download this repository please accept Llama 3 Community License.

How to run

  1. Clone this repository.
  2. Clone Distributed Llama:
git clone https://github.com/b4rtaz/distributed-llama.git
  1. Build Distributed Llama:
make dllama
  1. Run Distributed Llama:
sudo nice -n -20 ./dllama inference --model /path/to/dllama_model_llama3-70b-instruct_q40.m --tokenizer /path/to/dllama_tokenizer_llama3.t --weights-float-type q40 --buffer-float-type q80 --prompt "Hello world" --steps 16 --nthreads 4

Chat Template

Please keep in mind this model expects the prompt to use the chat template of llama 3.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.