Llama.cpp imatrix quantizations of LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct

Using llama.cpp commit 5783575 for quantization.

All quants were made using the imatrix option and Bartowski's calibration file.


Perplexity table (the lower the better)

Quant Size (MB) PPL Size (%) Accuracy (%) PPL error rate
IQ1_S 1820 26.3205 12.20 33.81 0.40
IQ1_M 1955 19.0360 13.10 46.75 0.28
IQ2_XXS 2182 13.3276 14.63 66.77 0.20
IQ2_XS 2379 11.7742 15.95 75.58 0.18
IQ2_S 2514 11.3084 16.85 78.69 0.17
IQ2_M 2695 10.3850 18.07 85.69 0.16
Q2_K_S 2730 11.2910 18.30 78.82 0.17
Q2_K 2912 11.1386 19.52 79.89 0.17
IQ3_XXS 3006 9.5453 20.15 93.23 0.14
IQ3_XS 3226 9.2103 21.63 96.62 0.14
Q3_K_S 3365 10.0571 22.56 88.49 0.16
IQ3_S 3382 9.2420 22.67 96.29 0.14
IQ3_M 3479 9.0709 23.32 98.11 0.13
Q3_K_M 3703 9.2078 24.82 96.65 0.14
Q3_K_L 3992 9.1908 26.76 96.83 0.14
IQ4_XS 4101 9.0166 27.49 98.70 0.14
Q4_0 4316 9.4186 28.93 94.49 0.14
IQ4_NL 4318 9.0297 28.95 98.55 0.14
Q4_K_S 4332 8.9634 29.04 99.28 0.13
Q4_K_M 4549 8.9107 30.50 99.87 0.13
Q4_1 4743 8.9614 31.80 99.31 0.13
Q5_K_S 5184 8.9042 34.75 99.94 0.13
Q5_0 5198 9.0533 34.85 98.30 0.14
Q5_K_M 5311 8.9100 35.60 99.88 0.13
Q5_1 5625 8.9230 37.71 99.73 0.13
Q6_K 6121 8.8800 41.03 100.22 0.13
Q8_0 7927 8.8534 53.14 100.52 0.13
F16 14917 8.8992 100 100 0.13


EXAONE-3.5-7.8B-Instruct

Introduction

We introduce EXAONE 3.5, a collection of instruction-tuned bilingual (English and Korean) generative models ranging from 2.4B to 32B parameters, developed and released by LG AI Research. EXAONE 3.5 language models include: 1) 2.4B model optimized for deployment on small or resource-constrained devices, 2) 7.8B model matching the size of its predecessor but offering improved performance, and 3) 32B model delivering powerful performance. All models support long-context processing of up to 32K tokens. Each model demonstrates state-of-the-art performance in real-world use cases and long-context understanding, while remaining competitive in general domains compared to recently released models of similar sizes.

For more details, please refer to our technical report, blog and GitHub.

This repository contains the instruction-tuned 7.8B language model with the following features:

  • Number of Parameters (without embeddings): 6.98B
  • Number of Layers: 32
  • Number of Attention Heads: GQA with 32 Q-heads and 8 KV-heads
  • Vocab Size: 102,400
  • Context Length: 32,768 tokens

Quickstart

We recommend to use transformers v4.43 or later.

Here is the code snippet to run conversational inference with the model:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Choose your prompt
prompt = "Explain how wonderful you are"  # English example
prompt = "스스로를 자랑해 봐"       # Korean example

messages = [
    {"role": "system", 
     "content": "You are EXAONE model from LG AI Research, a helpful assistant."},
    {"role": "user", "content": prompt}
]
input_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
)

output = model.generate(
    input_ids.to("cuda"),
    eos_token_id=tokenizer.eos_token_id,
    max_new_tokens=128,
    do_sample=False,
)
print(tokenizer.decode(output[0]))

Note

The EXAONE 3.5 instruction-tuned language models were trained to utilize the system prompt, so we highly recommend using the system prompts provided in the code snippet above.

Evaluation

The following table shows the evaluation results of real-world use cases. The full evaluation results can be found in the technical report.

Models MT-Bench LiveBench Arena-Hard AlpacaEval IFEval KoMT-Bench[1] LogicKor
EXAONE 3.5 7.8B 8.29 39.8 68.7 54.2 78.9 7.96 9.08
Qwen 2.5 7B 6.48 35.6 48.9 31.7 72.5 5.19 6.38
Llama 3.1 8B 7.59 28.3 27.7 25.7 74.5 4.85 5.99
Gemma 2 9B 7.64 32.1 43.6 47.3 54.7 7.10 8.05
Phi 3 small (7B) 7.63 27.9 26.8 29.2 59.5 3.22 3.99
  • [1] KoMT-Bench is a dataset created by translating MT-Bench into Korean; see README for more details.

Deployment

EXAONE 3.5 models can be inferred in the various frameworks, such as:

  • TensorRT-LLM
  • vLLM
  • SGLang
  • llama.cpp
  • Ollama

Please refer to our EXAONE 3.5 GitHub for more details about the inference frameworks.

Quantization

We provide the pre-quantized EXAONE 3.5 models with AWQ and several quantization types in GGUF format. Please refer to our EXAONE 3.5 collection to find corresponding quantized models.

Limitation

The EXAONE language model has certain limitations and may occasionally generate inappropriate responses. The language model generates responses based on the output probability of tokens, and it is determined during learning from training data. While we have made every effort to exclude personal, harmful, and biased information from the training data, some problematic content may still be included, potentially leading to undesirable responses. Please note that the text generated by EXAONE language model does not reflects the views of LG AI Research.

  • Inappropriate answers may be generated, which contain personal, harmful or other inappropriate information.
  • Biased responses may be generated, which are associated with age, gender, race, and so on.
  • The generated responses rely heavily on statistics from the training data, which can result in the generation of semantically or syntactically incorrect sentences.
  • Since the model does not reflect the latest information, the responses may be false or contradictory.

LG AI Research strives to reduce potential risks that may arise from EXAONE language models. Users are not allowed to engage in any malicious activities (e.g., keying in illegal information) that may induce the creation of inappropriate outputs violating LG AI’s ethical principles when using EXAONE language models.

License

The model is licensed under EXAONE AI Model License Agreement 1.1 - NC

Citation

@article{exaone-3.5,
  title={EXAONE 3.5: Series of Large Language Models for Real-world Use Cases},
  author={LG AI Research},
  journal={arXiv preprint arXiv:https://arxiv.org/abs/2412.04862},
  year={2024}
}

Contact

LG AI Research Technical Support: [email protected]

Downloads last month
1,105
GGUF
Model size
7.82B params
Architecture
exaone

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Collection including ThomasBaruzier/EXAONE-3.5-7.8B-Instruct-GGUF