Gemma-2-Llama-Swallow-9b-it-v0.1 / README.md

Update README.md

9d643c2 verified 3 months ago

24 kB

	---
	language:
	- en
	- ja
	library_name: transformers
	pipeline_tag: text-generation
	license:
	- gemma
	- llama3.3
	datasets:
	- tokyotech-llm/lmsys-chat-1m-synth
	- tokyotech-llm/swallow-magpie-ultra-v0.1
	- tokyotech-llm/swallow-gemma-magpie-v0.1
	- lmsys/lmsys-chat-1m
	- argilla/magpie-ultra-v0.1
	---

	# Gemma-2-Llama-Swallow

	Gemma-2-Llama-Swallow series was built by continual pre-training on the [gemma-2](https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315) models.
	Gemma 2 Swallow enhanced the Japanese language capabilities of the original Gemma 2 while retaining the English language capabilities.
	We use approximately 200 billion tokens that were sampled from a large Japanese web corpus (Swallow Corpus Version 2), Japanese and English Wikipedia articles, and mathematical and
	coding contents, etc (see the Training Datasets section of the base model) for continual pre-training.
	The instruction-tuned models (it) were built by supervised fine-tuning (SFT) on the synthetic data specially built for Japanese.
	See the Swallow Model Index section to find other model variants. Built with Gemma. Built with Llama.

	# Release History

	- May 19, 2025: Released [Gemma-2-Llama-Swallow-2b-pt-v0.1](https://huggingface.co/tokyotech-llm/Gemma-2-Llama-Swallow-2b-pt-v0.1),
	[Gemma-2-Llama-Swallow-9b-pt-v0.1](https://huggingface.co/tokyotech-llm/Gemma-2-Llama-Swallow-9b-pt-v0.1),
	[Gemma-2-Llama-Swallow-27b-pt-v0.1](https://huggingface.co/tokyotech-llm/Gemma-2-Llama-Swallow-27b-pt-v0.1),
	[Gemma-2-Llama-Swallow-2b-it-v0.1](https://huggingface.co/tokyotech-llm/Gemma-2-Llama-Swallow-2b-it-v0.1),
	[Gemma-2-Llama-Swallow-9b-it-v0.1](https://huggingface.co/tokyotech-llm/Gemma-2-Llama-Swallow-9b-it-v0.1),
	and [Gemma-2-Llama-Swallow-27b-it-v0.1](https://huggingface.co/tokyotech-llm/Gemma-2-Llama-Swallow-27b-it-v0.1).

	## Swallow Model Index

	\| Model \| gemma-2-swallow v0.1 \| gemma-2-swallow-it v0.1 \|
	\| ----- \| ---------------------------------------------------------------------------------------- \| ---------------------------------------------------------------------------------------- \|
	\| 2B \| [🤗 HuggingFace](https://huggingface.co/tokyotech-llm/Gemma-2-Llama-Swallow-2b-pt-v0.1) \| [🤗 HuggingFace](https://huggingface.co/tokyotech-llm/Gemma-2-Llama-Swallow-2b-it-v0.1) \|
	\| 9B \| [🤗 HuggingFace](https://huggingface.co/tokyotech-llm/Gemma-2-Llama-Swallow-9b-pt-v0.1) \| [🤗 HuggingFace](https://huggingface.co/tokyotech-llm/Gemma-2-Llama-Swallow-9b-it-v0.1) \|
	\| 27B \| [🤗 HuggingFace](https://huggingface.co/tokyotech-llm/Gemma-2-Llama-Swallow-27b-pt-v0.1) \| [🤗 HuggingFace](https://huggingface.co/tokyotech-llm/Gemma-2-Llama-Swallow-27b-it-v0.1) \|

	![logo](./logo.png)

	The website [https://swallow-llm.github.io/](https://swallow-llm.github.io/index.en.html) provides large language models developed by the Swallow team.

	## Model Details

	- Model type: Please refer to [Gemma 2 paper](https://arxiv.org/abs/2408.00118) for details on the model architecture.
	- Language(s): Japanese, English
	- Library: [maxtext](https://github.com/AI-Hypercomputer/maxtext)
	- Tokenizer: Please refer to [Gemma 2 paper](https://arxiv.org/abs/2408.00118) for details on the tokenizer.
	- Contact: swallow[at]nlp.c.titech.ac.jp

	## Model Performance

	## MT-Bench JA

	\| Model \| coding \| extraction \| humanities \| math \| reasoning \| roleplay \| stem \| writing \| JMT Avg \|
	\| --------------------------------------------------- \| ------ \| ---------- \| ---------- \| ----- \| --------- \| -------- \| ----- \| ------- \| ------- \|
	\| google/gemma-3-1b-it \| 0.379 \| 0.497 \| 0.680 \| 0.385 \| 0.322 \| 0.628 \| 0.540 \| 0.651 \| 0.510 \|
	\| Qwen/Qwen2.5-1.5B-Instruct \| 0.408 \| 0.513 \| 0.456 \| 0.527 \| 0.352 \| 0.473 \| 0.406 \| 0.469 \| 0.450 \|
	\| google/gemma-2-2b-it \| 0.454 \| 0.587 \| 0.693 \| 0.524 \| 0.445 \| 0.654 \| 0.567 \| 0.630 \| 0.569 \|
	\| rinna/gemma-2-baku-2b-it \| 0.470 \| 0.625 \| 0.810 \| 0.414 \| 0.382 \| 0.713 \| 0.609 \| 0.697 \| 0.590 \|
	\| google/gemma-2-2b-jpn-it \| 0.467 \| 0.488 \| 0.741 \| 0.379 \| 0.406 \| 0.660 \| 0.589 \| 0.672 \| 0.550 \|
	\| tokyotech-llm/Gemma-2-Llama-Swallow-2b-it-v0.1 \| 0.438 \| 0.533 \| 0.781 \| 0.557 \| 0.404 \| 0.706 \| 0.674 \| 0.682 \| 0.597 \|
	\| Qwen/Qwen2.5-3B-Instruct \| 0.567 \| 0.647 \| 0.597 \| 0.665 \| 0.457 \| 0.649 \| 0.526 \| 0.637 \| 0.593 \|
	\| google/gemma-3-4b-it \| 0.603 \| 0.724 \| 0.798 \| 0.767 \| 0.498 \| 0.803 \| 0.775 \| 0.822 \| 0.724 \|
	\| Qwen/Qwen2.5-7B-Instruct \| 0.599 \| 0.741 \| 0.719 \| 0.637 \| 0.541 \| 0.744 \| 0.624 \| 0.713 \| 0.665 \|
	\| tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.3 \| 0.562 \| 0.756 \| 0.869 \| 0.610 \| 0.512 \| 0.783 \| 0.748 \| 0.803 \| 0.705 \|
	\| google/gemma-2-9b-it \| 0.652 \| 0.765 \| 0.857 \| 0.614 \| 0.673 \| 0.811 \| 0.713 \| 0.800 \| 0.736 \|
	\| tokyotech-llm/Gemma-2-Llama-Swallow-9b-it-v0.1 \| 0.592 \| 0.796 \| 0.872 \| 0.742 \| 0.638 \| 0.802 \| 0.745 \| 0.803 \| 0.749 \|
	\| google/gemma-3-12b-it \| 0.807 \| 0.814 \| 0.871 \| 0.886 \| 0.623 \| 0.847 \| 0.858 \| 0.863 \| 0.821 \|
	\| google/gemma-2-27b-it \| 0.727 \| 0.809 \| 0.874 \| 0.719 \| 0.639 \| 0.810 \| 0.740 \| 0.826 \| 0.768 \|
	\| tokyotech-llm/Gemma-2-Llama-Swallow-27b-it-v0.1 \| 0.618 \| 0.839 \| 0.873 \| 0.741 \| 0.608 \| 0.814 \| 0.739 \| 0.836 \| 0.759 \|
	\| google/gemma-3-27b-it \| 0.804 \| 0.927 \| 0.879 \| 0.876 \| 0.774 \| 0.846 \| 0.848 \| 0.882 \| 0.855 \|
	\| Qwen/Qwen2.5-32B-Instruct \| 0.724 \| 0.885 \| 0.816 \| 0.918 \| 0.726 \| 0.834 \| 0.763 \| 0.808 \| 0.809 \|

	### Japanese tasks

	\| Model \| JCom. \| JEMHopQA \| NIILC \| JSQuAD \| XL-Sum \| MGSM \| WMT20-en-ja \| WMT20-ja-en \| JMMLU \| JHumanEval \| Ja Avg \|
	\| --------------------------------------------------- \| ------ \| -------- \| ------- \| ------- \| ------- \| ------ \| ----------- \| ----------- \| ------ \| ---------- \| ------ \|
	\| \| 4-shot \| 4-shot \| 4-shot \| 4-shot \| 1-shot \| 4-shot \| 4-shot \| 4-shot \| 5-shot \| 0-shot \| \|
	\| \| EM acc \| Char-F1 \| Char-F1 \| Char-F1 \| ROUGE-2 \| EM acc \| BLEU \| BLEU \| EM acc \| pass@1 \| \|
	\| google/gemma-3-1b-it \| 0.526 \| 0.330 \| 0.237 \| 0.700 \| 0.113 \| 0.088 \| 0.166 \| 0.115 \| 0.332 \| 0.245 \| 0.285 \|
	\| Qwen/Qwen2.5-1.5B-Instruct \| 0.812 \| 0.276 \| 0.241 \| 0.847 \| 0.128 \| 0.292 \| 0.147 \| 0.119 \| 0.447 \| 0.242 \| 0.355 \|
	\| google/gemma-2-2b-it \| 0.862 \| 0.348 \| 0.315 \| 0.879 \| 0.117 \| 0.252 \| 0.207 \| 0.183 \| 0.437 \| 0.321 \| 0.392 \|
	\| rinna/gemma-2-baku-2b-it \| 0.855 \| 0.228 \| 0.390 \| 0.877 \| 0.115 \| 0.172 \| 0.255 \| 0.190 \| 0.415 \| 0.165 \| 0.366 \|
	\| google/gemma-2-2b-jpn-it \| 0.845 \| 0.321 \| 0.291 \| 0.877 \| 0.131 \| 0.192 \| 0.204 \| 0.180 \| 0.418 \| 0.311 \| 0.377 \|
	\| tokyotech-llm/Gemma-2-Llama-Swallow-2b-it-v0.1 \| 0.862 \| 0.367 \| 0.483 \| 0.881 \| 0.145 \| 0.288 \| 0.258 \| 0.200 \| 0.485 \| 0.267 \| 0.424 \|
	\| Qwen/Qwen2.5-3B-Instruct \| 0.876 \| 0.304 \| 0.293 \| 0.866 \| 0.144 \| 0.228 \| 0.198 \| 0.168 \| 0.536 \| 0.474 \| 0.409 \|
	\| google/gemma-3-4b-it \| 0.818 \| 0.444 \| 0.404 \| 0.801 \| 0.134 \| 0.332 \| 0.217 \| 0.169 \| 0.477 \| 0.365 \| 0.416 \|
	\| Qwen/Qwen2.5-7B-Instruct \| 0.915 \| 0.429 \| 0.391 \| 0.891 \| 0.168 \| 0.632 \| 0.211 \| 0.192 \| 0.623 \| 0.532 \| 0.498 \|
	\| tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.3 \| 0.924 \| 0.528 \| 0.583 \| 0.896 \| 0.191 \| 0.532 \| 0.281 \| 0.229 \| 0.544 \| 0.394 \| 0.510 \|
	\| google/gemma-2-9b-it \| 0.931 \| 0.532 \| 0.527 \| 0.876 \| 0.149 \| 0.636 \| 0.273 \| 0.239 \| 0.623 \| 0.559 \| 0.535 \|
	\| tokyotech-llm/Gemma-2-Llama-Swallow-9b-it-v0.1 \| 0.946 \| 0.606 \| 0.643 \| 0.852 \| 0.170 \| 0.624 \| 0.296 \| 0.238 \| 0.639 \| 0.446 \| 0.546 \|
	\| google/gemma-3-12b-it \| 0.935 \| 0.566 \| 0.542 \| 0.808 \| 0.148 \| 0.724 \| 0.289 \| 0.239 \| 0.645 \| 0.637 \| 0.553 \|
	\| google/gemma-2-27b-it \| 0.956 \| 0.541 \| 0.576 \| 0.883 \| 0.166 \| 0.704 \| 0.290 \| 0.249 \| 0.670 \| 0.638 \| 0.567 \|
	\| tokyotech-llm/Gemma-2-Llama-Swallow-27b-it-v0.1 \| 0.969 \| 0.654 \| 0.658 \| 0.891 \| 0.194 \| 0.764 \| 0.316 \| 0.258 \| 0.686 \| 0.635 \| 0.602 \|
	\| google/gemma-3-27b-it \| 0.946 \| 0.592 \| 0.584 \| 0.867 \| 0.142 \| 0.764 \| 0.307 \| 0.253 \| 0.716 \| 0.736 \| 0.591 \|
	\| Qwen/Qwen2.5-32B-Instruct \| 0.959 \| 0.567 \| 0.497 \| 0.903 \| 0.169 \| 0.780 \| 0.228 \| 0.195 \| 0.757 \| 0.651 \| 0.571 \|

	### English tasks

	\| Model \| OpenBookQA \| TriviaQA \| HellaSWAG \| SQuAD2.0 \| XWINO \| MMLU \| GSM8K \| MATH \| BBH \| HumanEval \| En Avg \|
	\| --------------------------------------------------- \| ---------- \| -------- \| --------- \| -------- \| ------ \| ------ \| ------ \| ---------- \| ---------- \| --------- \| ------ \|
	\| \| 4-shot \| 4-shot \| 4-shot \| 4-shot \| 4-shot \| 5-shot \| 4-shot \| 4-shot \| 3-shot \| 0-shot \| \|
	\| \| Acc \| EM acc \| Acc \| EM acc \| Acc \| Acc \| EM acc \| CoT EM Acc \| CoT EM Acc \| pass@1 \| \|
	\| google/gemma-3-1b-it \| 0.272 \| 0.229 \| 0.421 \| 0.501 \| 0.786 \| 0.398 \| 0.256 \| 0.340 \| 0.379 \| 0.335 \| 0.392 \|
	\| Qwen/Qwen2.5-1.5B-Instruct \| 0.334 \| 0.378 \| 0.503 \| 0.501 \| 0.844 \| 0.604 \| 0.257 \| 0.272 \| 0.272 \| 0.277 \| 0.424 \|
	\| google/gemma-2-2b-it \| 0.354 \| 0.502 \| 0.520 \| 0.548 \| 0.878 \| 0.569 \| 0.440 \| 0.230 \| 0.464 \| 0.382 \| 0.489 \|
	\| rinna/gemma-2-baku-2b-it \| 0.342 \| 0.416 \| 0.511 \| 0.522 \| 0.871 \| 0.526 \| 0.027 \| 0.174 \| 0.063 \| 0.158 \| 0.361 \|
	\| google/gemma-2-2b-jpn-it \| 0.370 \| 0.503 \| 0.532 \| 0.539 \| 0.879 \| 0.557 \| 0.351 \| 0.132 \| 0.451 \| 0.392 \| 0.471 \|
	\| tokyotech-llm/Gemma-2-Llama-Swallow-2b-it-v0.1 \| 0.332 \| 0.417 \| 0.529 \| 0.506 \| 0.856 \| 0.530 \| 0.284 \| 0.150 \| 0.405 \| 0.301 \| 0.431 \|
	\| Qwen/Qwen2.5-3B-Instruct \| 0.364 \| 0.446 \| 0.562 \| 0.504 \| 0.869 \| 0.664 \| 0.096 \| 0.612 \| 0.128 \| 0.471 \| 0.472 \|
	\| google/gemma-3-4b-it \| 0.412 \| 0.500 \| 0.560 \| 0.552 \| 0.872 \| 0.583 \| 0.769 \| 0.306 \| 0.598 \| 0.513 \| 0.566 \|
	\| Qwen/Qwen2.5-7B-Instruct \| 0.428 \| 0.519 \| 0.624 \| 0.569 \| 0.877 \| 0.742 \| 0.739 \| 0.688 \| 0.217 \| 0.636 \| 0.604 \|
	\| tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.3 \| 0.396 \| 0.629 \| 0.593 \| 0.570 \| 0.884 \| 0.629 \| 0.622 \| 0.266 \| 0.626 \| 0.445 \| 0.566 \|
	\| google/gemma-2-9b-it \| 0.432 \| 0.658 \| 0.605 \| 0.659 \| 0.904 \| 0.723 \| 0.779 \| 0.394 \| 0.719 \| 0.613 \| 0.649 \|
	\| tokyotech-llm/Gemma-2-Llama-Swallow-9b-it-v0.1 \| 0.404 \| 0.640 \| 0.609 \| 0.623 \| 0.900 \| 0.680 \| 0.710 \| 0.392 \| 0.663 \| 0.491 \| 0.611 \|
	\| google/gemma-3-12b-it \| 0.422 \| 0.665 \| 0.639 \| 0.649 \| 0.901 \| 0.721 \| 0.867 \| 0.796 \| 0.802 \| 0.712 \| 0.717 \|
	\| google/gemma-2-27b-it \| 0.458 \| 0.766 \| 0.655 \| 0.669 \| 0.909 \| 0.762 \| 0.851 \| 0.466 \| 0.790 \| 0.707 \| 0.703 \|
	\| tokyotech-llm/Gemma-2-Llama-Swallow-27b-it-v0.1 \| 0.424 \| 0.747 \| 0.663 \| 0.664 \| 0.911 \| 0.749 \| 0.821 \| 0.442 \| 0.772 \| 0.682 \| 0.687 \|
	\| google/gemma-3-27b-it \| 0.418 \| 0.744 \| 0.661 \| 0.687 \| 0.906 \| 0.774 \| 0.916 \| 0.852 \| 0.793 \| 0.829 \| 0.758 \|
	\| Qwen/Qwen2.5-32B-Instruct \| 0.424 \| 0.534 \| 0.671 \| 0.536 \| 0.893 \| 0.834 \| 0.581 \| 0.802 \| 0.017 \| 0.589 \| 0.588 \|

	## Evaluation Benchmarks

	The evaluation script can be found at [swallow-llm/swallow-evaluation](https://github.com/swallow-llm/swallow-evaluation), tagged as `v202411`.

	### MT-Bench JA

	We used [Japanese MT-Bench](https://wandb.ai/wandb-japan/llm-leaderboard/artifacts/dataset/mtbench_ja_question) to assess the capabilities of multi-turn dialogue with the following settings:

	- Implementation: FastChat [Zheng+, 2023] (commit #e86e70d0)
	- Question: [Nejumi LLM-Leaderboard NEO, mtbench_ja_question_v4](https://wandb.ai/wandb-japan/llm-leaderboard/artifacts/dataset/mtbench_ja_question/v3)
	- Reference Answer: A revised version of [Nejumi LLM-Leaderboard NEO, mtbench_ja_referenceanswer_v2](https://wandb.ai/wandb-japan/llm-leaderboard/artifacts/dataset/mtbench_ja_referenceanswer/v1), in which we verified and corrected incorrect answers. This revised version has been released alongside [swallow-evaluation](https://github.com/swallow-llm/swallow-evaluation) Ver. 202411.
	- Prompt for Judge: [Nejumi LLM-Leaderboard NEO, mtbench_ja_prompt_v1](https://wandb.ai/wandb-japan/llm-leaderboard/artifacts/dataset/mtbench_ja_prompt/v1)
	- Judge: `gpt-4o-2024-08-06`
	- Scoring: Absolute scale normalized to a 0-1 range, averaged over five runs.

	### Japanese evaluation benchmarks

	We used llm-jp-eval(v1.3.0), JP Language Model Evaluation Harness(commit #9b42d41) and Code Generation LM Evaluation Harness(commit #0261c52). The details are as follows:

	- Multiple-choice question answering (JCommonsenseQA [Kurihara et al., 2022])
	- Open-ended question answering (JEMHopQA [Ishii et al., 2024])
	- Open-ended question answering (NIILC [関根, 2003])
	- Machine reading comprehension (JSQuAD [Kurihara et al., 2022])
	- Automatic summarization (XL-Sum [Hasan et al., 2021])
	- Machine translation (WMT2020 ja-en [Barrault et al., 2020])
	- Machine translation (WMT2020 en-ja [Barrault et al., 2020])
	- Mathematical reasoning (MGSM [Shi et al., 2023])
	- Academic exams (JMMLU [尹ら, 2024])
	- Code generation (JHumanEval [佐藤ら, 2024])

	### English evaluation benchmarks

	We used the Language Model Evaluation Harness(v.0.4.2) and Code Generation LM Evaluation Harness(commit #0261c52). The details are as follows:

	- Multiple-choice question answering (OpenBookQA [Mihaylov et al., 2018])
	- Open-ended question answering (TriviaQA [Joshi et al., 2017])
	- Machine reading comprehension (SQuAD2 [Rajpurkar et al., 2018])
	- Commonsense reasoning (XWINO [Tikhonov and Ryabinin, 2021])
	- Natural language inference (HellaSwag [Zellers et al., 2019])
	- Mathematical reasoning (GSM8K [Cobbe et al., 2021])
	- Mathematical reasoning (MATH [Hendrycks et al., 2022][Lightman et al., 2024])
	- Reasoning (BBH (BIG-Bench-Hard) [Suzgun et al., 2023])
	- Academic exams (MMLU [Hendrycks et al., 2021])
	- Code generation (HumanEval [Chen et al., 2021])

	## Usage

	```sh
	pip install vllm
	```

	```python
	from transformers import AutoTokenizer
	from vllm import LLM, SamplingParams

	model_name = "tokyotech-llm/Gemma-2-Llama-Swallow-27b-it-v0.1"

	tokenizer = AutoTokenizer.from_pretrained(model_name)
	llm = LLM(
	model=model_name,
	tensor_parallel_size=1,
	)

	sampling_params = SamplingParams(
	temperature=0.6, top_p=0.9, max_tokens=512,
	)


	message = [
	{
	"role": "user",
	"content": "日本の春から夏の移り変わりについて教えてください",
	},
	]
	prompt = tokenizer.apply_chat_template(
	message, tokenize=False, add_generation_prompt=True
	)

	output = llm.generate(prompt, sampling_params)

	print(output[0].outputs[0].text)

	```

	## Training Datasets

	### Instruction Tuning

	The following datasets were used for the instruction tuning.

	- [Gemma-2-LMSYS-Chat-1M-Synth](https://huggingface.co/datasets/tokyotech-llm/lmsys-chat-1m-synth)
	- Multi-turn Japanese instruction dataset synthesized and derived from [lmsys-chat-1m](https://huggingface.co/datasets/lmsys/lmsys-chat-1m) [\[Zhang+, ICLR24\]](https://openreview.net/forum?id=BOfDKxfwt0)).
	- First-turn user instructions were translated into Japanese via DeepL (machine translation), and assistant responses were generated using [gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it). The same model, i.e., [gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it) served as a judge for rejection sampling (n=6).
	- Second-turn user instructions and responses were synthesized using [gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it). The same model scores the quality of the second-turn response with a range of 1-10. Second-turn responses with scores lower than 9 were rejected, along with their corresponding instructions.
	Conversations containing personally identifiable information (PII) and template-based user instructions were removed. Duplicate instructions were removed.
	- [Swallow-Magpie-Ultra-v0.1](https://huggingface.co/datasets/tokyotech-llm/swallow-magpie-ultra-v0.1)
	- A Japanese variant of the `filtered-magpie-ultra-en` dataset, translated into Japanese by [gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it).
	- [Swallow-Gemma-Magpie-v0.1](https://huggingface.co/datasets/tokyotech-llm/swallow-gemma-magpie-v0.1)
	- A Japanese synthetic instruction tuning dataset from scratch, generated by [gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it). User instructions were created with prompts specific to each topic, and assistant responses were generated for these instructions.
	- The conversations were heuristically filtered for quality and length. Then, [gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it) was applied to score the quality of each of the conversation with a range of 1-10. Conversations with scores <= 7 were rejected.

	## Risks and Limitations

	The models released here are still in the early stages of our research and development and have not been tuned to ensure outputs align with human intent and safety considerations.

	## Acknowledgements

	We thank Google DeepMind for releasing Gemma 2 under a generous open license.

	We received various support, including:

	- AIST project: "Research and Development of Foundation Models for Generative AI in the Physical Domain"
	- NEDO project: "Development of Artificial Intelligence Application Technology to Support Judgment in Design Risk Assessment Work Based on the Perspective of Skilled Persons" (JPNP18002) of "Development of Integration Technology as the Core of Next Generation Artificial Intelligence and Robotics"
	- MEXT project: "Formation of R&D center to ensure transparency and reliability of generative AI models"
	- AIST program: [Large Generative AI Development Support Program](https://abci.ai/en/link/lfm_support_program.html)
	- TPU Research Cloud

	## License

	[Gemma Terms of Use](https://ai.google.dev/gemma/terms) and [META LLAMA 3.3 COMMUNITY LICENSE](https://www.llama.com/llama3_3/license/)

	## Authors

	Team members:

	- From [Institute of Science Tokyo Okazaki Laboratory](https://www.nlp.c.titech.ac.jp/index.en.html), the following members:
	- [Naoaki Okazaki](https://www.chokkan.org/index.ja.html)
	- [Sakae Mizuki](https://s-mizuki-nlp.github.io/)
	- [Youmi Ma](https://www.nlp.c.titech.ac.jp/member/youmi.en.html)
	- [Koki Maeda](https://sites.google.com/view/silviase)
	- [Kakeru Hattori](https://aya-se.vercel.app/)
	- [Masanari Ohi](https://sites.google.com/view/masanariohi)
	- [Hinari Shimada](https://hinarishimada.github.io/portfolio)
	- [Taihei Shiotani](https://github.com/inatoihs)
	- [Koshiro Saito](https://sites.google.com/view/koshiro-saito)
	- From [Institute of Science Tokyo YOKOTA Laboratory](https://www.rio.gsic.titech.ac.jp/en/index.html), the following members:
	- [Rio Yokota](https://twitter.com/rioyokota)
	- [Kazuki Fujii](https://twitter.com/okoge_kaz)
	- [Taishi Nakamura](https://twitter.com/Setuna7777_2)
	- [Takumi Okamoto](https://www.linkedin.com/in/takumi-okamoto)
	- [Ishida Shigeki](https://www.wantedly.com/id/reborn27)
	- [Yukito Tajima](https://www.linkedin.com/in/yukito-tajima-51bbb2299)
	- [Masaki Kawamura](https://x.com/Masakichi333210)
	- From [Artificial Intelligence Research Center, AIST, Japan](https://www.airc.aist.go.jp/en/teams/), the following members:
	- [Hiroya Takamura](https://sites.google.com/view/hjtakamura)

	## How to cite

	If you find our work is helpful, please feel free to cite these papers.

	```
	@inproceedings{Fujii:COLM2024,
	title={Continual Pre-Training for Cross-Lingual LLM Adaptation:
	Enhancing Japanese Language Capabilities},
	author={Kazuki Fujii and Taishi Nakamura and Mengsay Loem and Hiroki
	Iida and Masanari Ohi and Kakeru Hattori and Hirai Shota and Sakae
	Mizuki and Rio Yokota and Naoaki Okazaki},
	booktitle="Proceedings of the First Conference on Language Modeling",
	series={COLM},
	pages="(to appear)",
	year="2024",
	month=oct,
	address={University of Pennsylvania, USA},
	}

	@inproceedings{Okazaki:COLM2024,
	title={Building a Large Japanese Web Corpus for Large Language Models},
	author={Naoaki Okazaki and Kakeru Hattori and Hirai Shota and Hiroki
	Iida and Masanari Ohi and Kazuki Fujii and Taishi Nakamura and Mengsay
	Loem and Rio Yokota and Sakae Mizuki},
	booktitle="Proceedings of the First Conference on Language Modeling",
	series={COLM},
	pages="(to appear)",
	year="2024",
	month=oct,
	address={University of Pennsylvania, USA},
	}

	@misc{ma:arxiv2025,
	title={Building Instruction-Tuning Datasets from Human-Written Instructions with Open-Weight Large Language Models},
	author={Youmi Ma and Sakae Mizuki and Kazuki Fujii and Taishi Nakamura and Masanari Ohi and Hinari Shimada and Taihei Shiotani and Koshiro Saito and Koki Maeda and Kakeru Hattori and Takumi Okamoto and Shigeki Ishida and Rio Yokota and Hiroya Takamura and Naoaki Okazaki},
	year={2025},
	eprint={2503.23714},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2503.23714},
	}
	```

	### References

	```tex
	@misc{gemmateam2024gemma2improvingopen,
	title={Gemma 2: Improving Open Language Models at a Practical Size},
	author={Gemma Team},
	year={2024},
	eprint={2408.00118},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2408.00118},
	}
	```