Llama-3.1-Sherkala-8B-Chat

Llama-3.1-Sherkala-8B-Chat (Sherkala for short) is a state-of-the-art 8 billion parameter instruction-tuned large language model (LLM) designed primarily for Kazakh while maintaining robust performance in English, Russian, and Turkish. Developed by Inception (a G42 company) and MBZUAI, in collaboration with Cerebras Systems, Sherkala leverages a balanced mixture of multilingual data and a custom tokenizer to overcome the challenges of data scarcity in Kazakh. This model has been optimized for downstream tasks, safe text generation, and cultural alignment.

Sherkala Details

Developed by: Inception (a G42 company), MBZUAI, Cerebras Systems.
Languages: Kazakh (primary), English, Russian, Turkish.
Input: Text.
Output: Generated text.
Model Size: 8B parameters.
Context Length: 8,192 tokens.
Technical Report: Sherkala Technical Report
License: cc-by-nc-sa-4.0

How to Get Started with the Model:

Below is sample code to use the model. The code below is tested on transformers==4.46.2.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_path="inceptionai/Llama-3.1-Sherkala-8B-Chat"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.bfloat16, device_map="auto")
device = "cuda" if torch.cuda.is_available() else "cpu" 

tokenizer.chat_template="{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role']+'<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %} {% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}"


def get_response(text):
    conversation = [
        {"role": "user", "content": text}
    ]

    input_ids = tokenizer.apply_chat_template(
        conversation=conversation,
        tokenize=True,
        add_generation_prompt=True,
        return_tensors="pt").to(device)

    # Generate a response
    gen_tokens = model.generate(
        input_ids,
        max_new_tokens=500,
        stop_strings=["<|eot_id|>"],
        tokenizer=tokenizer
        )

    # Decode and print the generated text along with generation prompt
    gen_text = tokenizer.decode(gen_tokens[0][len(input_ids[0]): -1])
    return gen_text

question = 'Қазақстанның жақсы тағамдарын ұсына аласыз ба?'
print(get_response(question))

Response:

Бешбармақ: Бұл Қазақстанның ең танымал тағамы болуы мүмкін. Бұл ет (әдетте қой, сиыр немесе жылқы) сүйекке дейін пісірілетін және ұсақ сүйектері бар табаққа ұсынылатын дәстүрлі тағам. Етті қолмен жейді, көбінесе сорпамен бірге.

Казыбек: Бұл ашытылған сүттен жасалған дәстүрлі қазақ ірімшігі. Ол жұмсақ және сәл қышқыл дәмімен танымал. Оны өздігінен жеуге немесе нанға немесе сорпаға жаюға болады.

Бауырсақ: Бұл ет пен пиязбен толтырылған дәмді, дөңгелек нан. Бұл көптеген қазақ тағамдарының негізгі тағамы.

Қуырылған тауық: Бұл қуырылған тауық еті, бірақ оны жасау тәсілі ерекше. Тауық етін дәмдеуіштер мен йогурт қоспасында маринадтайды, содан кейін қытырлақ болғанша қуырады.

Шашлык: Бұл кәуапқа ұқсайды және бүкіл әлемде танымал. Шашлык әдетте тауық еті, қой еті немесе сиыр еті сияқты ет кесектерінен дайындалады және ашық отта грильде пісіріледі.

Борщ: Бұл қырыққабат, сәбіз, картоп және ет қосылған қызылша сорпасы. Бұл суық айларда жиі ұсынылатын дәмді, жылытатын сорпа.

Кәстрөл: Бұл ет (әдетте қой немесе сиыр еті), картоп, пияз және басқа да көкөністерден жасалған бұқтырылған тағам. Ол әдетте нанмен бірге беріледі.

Жал-жая: Бұл қой етінен, картоптан, пияздан және дәмдеуіштерден жасалған бұқтырылған тағам. Ол әдетте буға пісірілген күрішпен бірге беріледі.

Қуырылған кеспе: Бұл кеспе, ет және көкөністерден жасалған сорпа. Ол әдетте нанмен бірге беріледі.

Бал шырыны: Бұл бал мен сүттен жасалған тәтті сусын. Бұл сергітетін және пайдалы

Model Architecture

Sherkala builds upon the Llama-3.1-8B architecture—a causal, decoder-only transformer model that employs RoPE positional encoding and grouped-query attention. To better capture the rich morphological features of Kazakh, we extend the base vocabulary by 25% with high-frequency Kazakh tokens. This expansion reduces tokenization fertility (i.e., the average number of subwords per word) and improves both training and inference efficiency.

Training Data

Sherkala is continually pre-trained on 45.3 billion tokens from a diverse range of sources covering Kazakh and English with the addition of Russian and Turkish to enable better performance in Kazakh via cross-lingual transfer of capabilities. Pretraining data is preprocessed using standard techniques including language-specific standardization, filtering, cleaning, and deduplication using locality-sensitive hashing.

To enable robust instruction following and safe dialog generation, Sherkala is fine-tuned on a large-scale multilingual instruction dataset comprising of Kazakh, English, and Russian prompt-response examples. The instruction dataset includes a wide coverage of general tasks and capabilities in all 3 languages. A dedicated safety dataset—created using a mix of direct and adversarial prompts—is incorporated to mitigate harmful or biased outputs and to ensure cultural alignment. More information can be found in the Sherkala Technical Report.

Training Details

Training Hyperparameters

Learning rate: 1.5e-4
Batch size: 4 million tokens
Optimizer: AdamW (β1 = 0.9, β2 = 0.95, ε = 1e-5)
Weight decay: 0.1
Gradient norm clipping: 1.0
Learning rate schedule:
- Linear warm-up (110 steps)
- 10× cosine decay until 11,433 steps

Training Infrastructure

Training process was performed on the Cerebras Condor Galaxy 2 (CG-2) AI supercomputer
Training executed on 16 Cerebras CS-2 systems
Parallelism: Pure data parallelism across multiple CS-2 systems.

Evaluation

Sherkala has been extensively evaluated across downstream tasks, open-ended generation, and safety metrics. The following sections detail the evaluation results.

Downstream Evaluation

Sherkala is benchmarked on multiple tasks in Kazakh, Russian, and English using lm-evaluation-harness in zero-shot setting. The evaluation criteria spanned various dimensions, including:

Knowledge: How well the model answers factual questions.
Reasoning: The model's ability to answer questions requiring reasoning.
Misinformation/Bias: Assessment of the model's susceptibility to generating false or misleading information, and its neutrality.

Kazakh Benchmark Results

Model	AVG	KazMMLU	MMLU	Belebele	HS	PIQA	BoolQA	SIQA	ARC	OBQA	NIS	COPA	T-QA	CS-Pairs
BLOOM (7.1B)	37.6	29.3	27.9	26.4	29.9	52.0	62.1	36.7	23.6	33.6	22.0	47.2	49.2	49.1
BLOOMZ (7.1B)	36.9	29.2	27.8	22.1	30.4	50.8	54.4	36.8	24.4	31.0	23.0	51.8	48.1	50.1
Gemma-2 (9B)	35.7	26.1	27.5	26.0	28.3	51.9	62.0	33.5	23.6	28.4	17.0	45.2	47.1	47.5
Gemma-2-it (9B)	36.9	31.4	28.4	23.8	27.9	51.0	63.5	36.0	24.0	30.6	22.0	48.8	49.3	42.6
Qwen-2.5 (7B)	38.5	35.1	31.3	26.3	31.2	53.4	54.8	38.0	27.1	30.2	36.0	46.0	48.0	42.6
Qwen-2.5-Instruct (7B)	40.8	37.8	33.2	31.1	31.5	52.3	60.9	38.1	27.8	31.6	38.0	47.2	51.0	49.3
LLama3.1 (8B)	39.8	38.3	31.3	25.9	37.8	57.2	63.7	38.1	29.6	32.8	20.0	47.8	51.3	43.9
LLama3.1-Instruct (8B)	40.4	38.9	32.4	27.0	37.5	57.5	67.5	37.9	30.3	32.6	22.0	48.2	49.7	43.2
LLama3.1-KazLLM-1.0 (8B)	43.7	37.0	31.5	27.8	46.0	62.8	69.8	44.7	35.5	34.2	32.0	50.4	50.9	45.0
Irbis-7b-v0.1 (7B)	37.7	29.5	27.8	26.1	31.3	53.9	52.4	37.8	24.8	30.0	25.0	54.4	46.6	50.9
mGPT-13B (13B)	37.7	28.5	26.7	27.9	31.4	54.6	56.4	38.5	24.0	32.0	23.0	49.4	47.9	49.8
Sherkala (Ours)	45.7	51.6	37.7	25.9	53.1	68.1	66.9	42.2	38.1	37.0	18.0	51.0	50.3	54.3
Sherkala-chat (Ours-chat)	47.6	41.4	34.6	30.6	55.2	65.9	75.8	48.1	42.9	37.4	28.0	53.2	52.5	53.3

The average score (AVG) represents the mean performance across all tasks, with higher values indicating better results across all metrics. The abbreviations "HS," "ARC," "OBQA," "NIS," "T-QA," and "CS-Pairs" correspond to HellaSwag, ARC-Challenge (Easy), OpenBookQA, NIS-Math and TruthfulQA, and CrowS-Pairs respectively.

English Benchmark Results

Model	AVG	MMLU	RACE	HS	PIQA	BoolQA	SIQA	ARC	OBQA	T-QA	CrowS-Pairs
BLOOM (7.1B)	48.5	29.1	36.5	59.6	73.6	62.2	46.5	33.4	35.8	38.9	68.9
BLOOMZ (7.1B)	57.0	36.7	45.6	63.1	77.4	90.7	59.7	43.6	42.0	45.2	65.6
Gemma-2 (9B)	39.4	27.4	27.8	33.2	59.1	62.2	37.6	24.2	26.4	46.4	49.3
Gemma-2-it (9B)	53.2	37.7	46.7	65.4	69.5	80.1	44.1	40.7	29.6	62.1	56.5
Qwen-2.5 (7B)	60.8	44.0	41.4	78.9	79.9	84.5	51.9	51.4	47.2	56.4	71.9
Qwen-2.5-Instruct (7B)	62.1	46.7	46.3	80.5	80.3	86.4	48.7	54.9	48.8	64.8	63.2
LLama3.1 (8B)	56.6	39.6	38.9	79.0	81.3	65.3	52.6	53.5	45.0	45.2	65.5
LLama3.1-Instruct (8B)	60.1	41.7	44.9	79.2	81.0	79.4	52.7	55.0	43.6	54.0	69.0
LLama3.1-KazLLM-1.0 (8B)	58.6	39.7	44.3	77.9	80.8	72.8	51.5	54.6	43.0	51.0	70.0
Sherkala (Ours)	58.7	46.8	39.2	78.3	80.5	77.2	51.3	52.1	46.0	49.6	65.9
Sherkala-chat (Ours-chat)	59.1	40.5	41.6	78.1	79.1	84.8	58.0	52.6	42.6	51.3	62.2

Average here represents the mean score across tasks. Higher scores are better across all metrics. “HS”, “ARC”, “OBQA”, “T-QA” and "CS-Pairs" denote HellaSwag, ARC-Challenge (Easy), OpenBookQA, TruthfulQA, and CrowS-Pairs respectively. Further details on the evaluation, including additional results in Russian, can be found in the Sherkala Technical Report.

Generation Evaluation

We further evaluated open-ended text generation using GPT-4 as a judge. The following table shows average generation scores (with standard deviations) for models on the MT and Vicuna benchmarks across Kazakh, Russian, and English:

Model	Kazakh MT (avg ± sd)	Kazakh Vicuna (avg ± sd)	Russian MT (avg ± sd)	Russian Vicuna (avg ± sd)	English MT (avg ± sd)	English Vicuna (avg ± sd)
GPT-4o	8.81 ± 1.51	9.32 ± 0.61	8.89 ± 1.59	9.79 ± 0.41	8.36 ± 1.35	9.03 ± 0.59
Qwen-2.5-7B-Instruct	3.52 ± 3.52	3.23 ± 1.73	5.81 ± 2.36	6.05 ± 3.07	7.40 ± 1.85	8.06 ± 1.22
Llama-3.1-8B-Instruct	3.76 ± 2.11	3.75 ± 1.91	0.85 ± 1.20	0.82 ± 1.55	6.55 ± 2.03	7.41 ± 1.28
KazLLM-1.0-8B	3.98 ± 2.15	4.88 ± 2.01	0.72 ± 1.06	0.28 ± 0.71	6.00 ± 2.15	6.66 ± 1.24
Sherkala-chat	5.99 ± 2.73	7.39 ± 1.89	1.02 ± 1.41	0.97 ± 1.70	5.78 ± 2.43	6.55 ± 1.59

Intended Use

We release Sherkala under:

Meta’s Llama 3.1 Community license, and users must adhere to the terms and conditions of the license, Meta’s acceptable use policy, Meta’s privacy policy, and the applicable policies, laws, and regulations governing the specific use-case and region; and
The CC BY-NC-SA 4.0 license and users must adhere to the terms and conditions of the license

Sherkala is intended for research in Kazakh NLP, including:

Chat Assistants: Conversational agents tailored for Kazakh speakers.
Question Answering & Content Generation: Systems that deliver culturally aligned, factual, and contextually rich responses.
Multilingual NLP: Applications that support English, Russian, and Turkish alongside Kazakh.

We believe that a number of audiences will benefit from our model:

Academics: Those researching Kazakh natural language processing.
Businesses: Companies targeting Kazakh-speaking audiences.
Developers: Those integrating Kazakh language capabilities in apps.

Out-of-Scope Use

While Sherkala is a powerful language model catering to Kazakh and English it is essential to understand its limitations and the potential for its misuse.

Sherkala is not recommended for:

Commercial use: Sherkala shall not be used for any commercial purposes, comprising anything that is primarily intended to derive commercial advantage or monetary compensation.
Malicious Use: The model should not be used for generating harmful, misleading, or inappropriate content. This includes but is not limited to
- Generating or promoting hate speech, violence, or discrimination,
- Spreading misinformation or fake news,
- Engaging in illegal activities or promoting them,
- Handling sensitive information: the model should not be used to handle or to generate personal, confidential, or sensitive information.
Generalization Across All Languages: Sherkala is optimized only for Kazakh and English. It should not be assumed to have equal proficiency in other languages or dialects.
High-Stakes Decisions: The model should not be used for making high-stakes decisions without human oversight. This includes medical, legal, financial, or safety-critical decisions, among others.

Bias, Risks, and Limitations

Although extensive measures have been taken to mitigate biases and ensure safe outputs, Sherkala—like all large language models—may still produce inaccurate, misleading, or biased content. Users should apply additional safety measures and conduct thorough evaluations when deploying the model in sensitive or high-stakes environments.

By using Sherkala, you acknowledge and accept that, as with any large language model, it may generate incorrect, misleading and/or offensive information or content. The information is not intended as advice and should not be relied upon in any way, nor are we responsible for any of the content or consequences resulting from its use. We are continuously working to develop models with greater capabilities, and as such, welcome any feedback on the model.

Copyright Inception Institute of Artificial Intelligence Ltd. Sherkala is made available under the license CC-BY-NC-SA-4.0. You shall not use Sherkala except in compliance with the License. You may obtain a copy of the License at https://creativecommons.org/licenses/by-nc-sa/4.0/.

Unless required by applicable law or agreed to in writing, Sherkala is distributed on an AS IS basis, without warranties or conditions of any kind, either express or implied. Please see the terms of the License for the specific language permissions and limitations under the License.

Note: The model files have been updated to the latest version as of February 20, 2025.

inceptionai
/

Llama-3.1-Sherkala-8B-Chat

You need to agree to share your contact information to access this model