kishizaki-sci/Llama-3.1-405B-Instruct-AWQ-4bit-JP-EN
model information
Llama-3.1-405B-InstructใAutoAWQใง4bit ้ๅญๅใใใขใใซใ้ๅญๅใฎ้ใฎใญใฃใชใใฌใผใทใงใณใใผใฟใซๆฅๆฌ่ชใจ่ฑ่ชใๅซใใใผใฟใไฝฟ็จใ
A model of Llama-3.1-405B-Instruct quantized to 4 bits using AutoAWQ. Calibration data containing Japanese and English was used during the quantization process.
usage
vLLM
from vllm import LLM, SamplingParams
llm = LLM(
model="kishizaki-sci/Llama-3.1-405B-Instruct-AWQ-4bit-JP-EN",
tensor_parallel_size=4,
gpu_memory_utilization=0.97,
quantization="awq"
)
tokenizer = llm.get_tokenizer()
messages = [
{"role": "system", "content": "ใใชใใฏๆฅๆฌ่ชใงๅฟ็ญใใAIใใฃใใใใใใงใใใฆใผใถใใตใใผใใใฆใใ ใใใ"},
{"role": "user", "content": "plotly.graph_objectsใไฝฟใฃใฆๆฃๅธๅณใไฝใใตใณใใซใณใผใใๆธใใฆใใ ใใใ"},
]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
sampling_params = SamplingParams(
temperature=0.6,
top_p=0.9,
max_tokens=1024
)
outputs = llm.generate(prompt, sampling_params)
print(outputs[0].outputs[0].text)
H100 (94GB)ใ4ๅบ็ฉใใ ใคใณในใฟใณในใงใฎๅฎ่กใฏใใกใใฎnotebookใใ่ฆงใใ ใใใ
Please refer to this notebook for execution on an instance equipped with a four H100 (94GB).
calibration data
ไปฅไธใฎใใผใฟใปใใใใ512ๅใฎใใผใฟ๏ผใใญใณใใใๆฝๅบใ1ใคใฎใใผใฟใฎใใผใฏใณๆฐใฏๆๅคง350ๅถ้ใ
Extract 512 data points and prompts from the following dataset. The maximum token limit per data point is 350.
- TFMC/imatrix-dataset-for-japanese-llm
- meta-math/MetaMathQA
- m-a-p/CodeFeedback-Filtered-Instruction
- kunishou/databricks-dolly-15k-ja
- ใใฎไปๆฅๆฌ่ช็ใป่ฑ่ช็ใฎwikipedia่จไบใใไฝๆใใใชใชใธใใซใใผใฟ๏ผๆๅฎณใใญใณใใๅ้ฟใฎใใใฎใชใชใธใใซใใผใฟใไฝฟ็จใ Original data created from Japanese and English Wikipedia articles, as well as original data for avoiding harmful prompts, is used.
License
MIT Licenseใ้ฉ็จใใใใใ ใ้ๅญๅใฎใใผในใขใใซใซ้ฉ็จใใใฆใใLlama 3.1 Community License Agreementใซๅพใฃใฆใใ ใใใ
The MIT License is applied. However, obey the Llama 3.1 Community License Agreement applied to the base model of quantization.
- Downloads last month
- 13
Model tree for kishizaki-sci/Llama-3.1-405B-Instruct-AWQ-4bit-JP-EN
Base model
meta-llama/Llama-3.1-405B