Llama-2-13b-chat-hf - bnb 4bit

Description

This model is 4bit quantized version of Llama-2-13b-chat-hf using bitsandbytes. It's designed for fine-tuning! The PAD token is set as UNK.

Downloads last month
10
Safetensors
Model size
6.87B params
Tensor type
F32
FP16
U8
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model authors have turned it off explicitly.

Model tree for itsanurag/Llama-2-13b-Chat-4BitQuantized

Quantized
(19)
this model