Upload README.md
Browse files
README.md
CHANGED
@@ -10,7 +10,7 @@ license: apache-2.0
|
|
10 |
|
11 |
# Swallow-MS-7b-v0.1
|
12 |
|
13 |
-
Our Swallow-MS-7b-v0.1 model has undergone
|
14 |
|
15 |
# Model Release Updates
|
16 |
|
@@ -38,24 +38,9 @@ This repository provides large language models developed by [TokyoTech-LLM](http
|
|
38 |
|---|---|---|---|---|---|---|---|---|---|
|
39 |
| Swallow-MS-7b-instruct-v0.1 |0.3411|0.3770|0.4290|0.3454|0.1040|0.2400|0.3677|0.3907|0.4750|
|
40 |
|
41 |
-
## Base Model Performance
|
42 |
-
|
43 |
|
44 |
## Evaluation Benchmarks
|
45 |
|
46 |
-
### Japanese evaluation benchmarks
|
47 |
-
|
48 |
-
We used llm-jp-eval(v1.0.0) and JP Language Model Evaluation Harness(commit #9b42d41). The details are as follows:
|
49 |
-
|
50 |
-
- Multiple-choice question answering (JCommonsenseQA [Kurihara+, 2022])
|
51 |
-
- Open-ended question answering (JEMHopQA [Ishii+, 2023])
|
52 |
-
- Open-ended question answering (NIILC [Sekine, 2003])
|
53 |
-
- Machine reading comprehension (JSQuAD [Kurihara+, 2022])
|
54 |
-
- Automatic summarization (XL-Sum [Hasan+, 2021])
|
55 |
-
- Machine translation (WMT2020 ja-en [Barrault+, 2020])
|
56 |
-
- Machine translation (WMT2020 en-ja [Barrault+, 2020])
|
57 |
-
- Mathematical reasoning (MGSM [Shi+, 2023])
|
58 |
-
|
59 |
### MT-Bench JA
|
60 |
|
61 |
We used [Japanese MT-Bench](https://wandb.ai/wandb-japan/llm-leaderboard/artifacts/dataset/mtbench_ja_question) to assess the instruction-following capabilities of models.
|
@@ -66,24 +51,6 @@ We utilized the following artifacts:
|
|
66 |
- Reference Answer: [Nejumi LLM-Leaderboard NEO, mtbench_ja_referenceanswer_v1](https://wandb.ai/wandb-japan/llm-leaderboard/artifacts/dataset/mtbench_ja_referenceanswer/v1)
|
67 |
- Prompt for Judge: [Nejumi LLM-Lederboard NEO, mtbench_ja_prompt_v1](https://wandb.ai/wandb-japan/llm-leaderboard/artifacts/dataset/mtbench_ja_prompt/v1)
|
68 |
|
69 |
-
### English evaluation benchmarks
|
70 |
-
|
71 |
-
We used the Language Model Evaluation Harness(v.0.3.0). The details are as follows:
|
72 |
-
|
73 |
-
- Multiple-choice question answering (OpenBookQA [Mihaylov+, 2018])
|
74 |
-
- Open-ended question answering (TriviaQA [Joshi+, 2017])
|
75 |
-
- Machine reading comprehension (SQuAD 2.0 [Rajpurkar+, 2018])
|
76 |
-
- Commonsense reasoning (XWINO [Tikhonov & Ryabinin, 2021])
|
77 |
-
- Natural language inference (HellaSwag [Zellers+, 2019])
|
78 |
-
- Mathematical reasoning (GSM8k [Cobbe+, 2021])
|
79 |
-
|
80 |
-
### Code evaluation benchmarks
|
81 |
-
|
82 |
-
We utilized the Code Generation LM Evaluation Harness [Allal+, 2022] (commit #0261c52). The details are as follows:
|
83 |
-
|
84 |
-
- Code generation (HumanEval [Chen+, 2021])
|
85 |
-
- Code generation in Japanese (JHumanEval [Satoh+, 2024])
|
86 |
-
|
87 |
|
88 |
## Usage
|
89 |
|
@@ -93,7 +60,7 @@ First install additional dependencies in [requirements.txt](./requirements.txt):
|
|
93 |
pip install -r requirements.txt
|
94 |
```
|
95 |
|
96 |
-
### Instruction format
|
97 |
This format must be adhered to strictly, as deviations may result in less optimal outputs from the model.
|
98 |
|
99 |
The template used to construct a prompt for the Instruct model is specified as follows:
|
@@ -102,15 +69,16 @@ The template used to construct a prompt for the Instruct model is specified as f
|
|
102 |
<s>[INST] <<SYS>>\n{Instruction}\n<</SYS>>\n\n{USER_MESSAGE_1} [INST] {BOT_MESSAGE_1} </s>[INST] {USER_MESSAGE_2}[/INST]
|
103 |
```
|
104 |
|
105 |
-
Please be aware that
|
106 |
|
107 |
-
|
|
|
108 |
|
109 |
```python
|
110 |
import torch
|
111 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
112 |
|
113 |
-
model_name = "tokyotech-llm/Swallow-MS-7b-instruct-
|
114 |
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")
|
115 |
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
116 |
|
@@ -131,49 +99,9 @@ decoded = tokenizer.batch_decode(generated_ids)
|
|
131 |
print(decoded[0])
|
132 |
```
|
133 |
|
134 |
-
|
135 |
-
### Use the base model
|
136 |
-
|
137 |
-
```python
|
138 |
-
from transformers import AutoModelForCausalLM, AutoTokenizer
|
139 |
-
import torch
|
140 |
-
|
141 |
-
model_name = "tokyotech-llm/Swallow-MS-7b-v0.1"
|
142 |
-
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
143 |
-
|
144 |
-
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")
|
145 |
-
prompt = "東京工業大学の主なキャンパスは、"
|
146 |
-
input_ids = tokenizer.encode(
|
147 |
-
prompt,
|
148 |
-
add_special_tokens=False,
|
149 |
-
return_tensors="pt"
|
150 |
-
)
|
151 |
-
tokens = model.generate(
|
152 |
-
input_ids.to(device=model.device),
|
153 |
-
max_new_tokens=128,
|
154 |
-
temperature=0.99,
|
155 |
-
top_p=0.95,
|
156 |
-
do_sample=True,
|
157 |
-
)
|
158 |
-
|
159 |
-
out = tokenizer.decode(tokens[0], skip_special_tokens=True)
|
160 |
-
print(out)
|
161 |
-
```
|
162 |
-
|
163 |
## Training Datasets
|
164 |
|
165 |
-
###
|
166 |
-
The following datasets were used for continual pre-training.
|
167 |
-
|
168 |
-
- [Algebraic Stack](https://huggingface.co/datasets/EleutherAI/proof-pile-2)
|
169 |
-
- [Japanese Wikipedia](https://dumps.wikimedia.org/other/cirrussearch)
|
170 |
-
- [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb)
|
171 |
-
- [Swallow Corpus](https://chokkan.org/temp/tokyotech-llm/swallow-corpus)
|
172 |
-
- [The Pile](https://huggingface.co/datasets/EleutherAI/pile)
|
173 |
-
|
174 |
-
### Instruction Tuning
|
175 |
-
|
176 |
-
#### Ver1.0
|
177 |
|
178 |
The following datasets were used for the instruction tuning.
|
179 |
|
|
|
10 |
|
11 |
# Swallow-MS-7b-v0.1
|
12 |
|
13 |
+
Our Swallow-MS-7b-v0.1 model has undergone continual pre-training from the Mistral-7B-v0.1, primarily with the addition of Japanese language data.
|
14 |
|
15 |
# Model Release Updates
|
16 |
|
|
|
38 |
|---|---|---|---|---|---|---|---|---|---|
|
39 |
| Swallow-MS-7b-instruct-v0.1 |0.3411|0.3770|0.4290|0.3454|0.1040|0.2400|0.3677|0.3907|0.4750|
|
40 |
|
|
|
|
|
41 |
|
42 |
## Evaluation Benchmarks
|
43 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
44 |
### MT-Bench JA
|
45 |
|
46 |
We used [Japanese MT-Bench](https://wandb.ai/wandb-japan/llm-leaderboard/artifacts/dataset/mtbench_ja_question) to assess the instruction-following capabilities of models.
|
|
|
51 |
- Reference Answer: [Nejumi LLM-Leaderboard NEO, mtbench_ja_referenceanswer_v1](https://wandb.ai/wandb-japan/llm-leaderboard/artifacts/dataset/mtbench_ja_referenceanswer/v1)
|
52 |
- Prompt for Judge: [Nejumi LLM-Lederboard NEO, mtbench_ja_prompt_v1](https://wandb.ai/wandb-japan/llm-leaderboard/artifacts/dataset/mtbench_ja_prompt/v1)
|
53 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
54 |
|
55 |
## Usage
|
56 |
|
|
|
60 |
pip install -r requirements.txt
|
61 |
```
|
62 |
|
63 |
+
### Instruction format Ver0.1
|
64 |
This format must be adhered to strictly, as deviations may result in less optimal outputs from the model.
|
65 |
|
66 |
The template used to construct a prompt for the Instruct model is specified as follows:
|
|
|
69 |
<s>[INST] <<SYS>>\n{Instruction}\n<</SYS>>\n\n{USER_MESSAGE_1} [INST] {BOT_MESSAGE_1} </s>[INST] {USER_MESSAGE_2}[/INST]
|
70 |
```
|
71 |
|
72 |
+
Please be aware that ``<s>`` and ``</s>`` are special tokens used for the beginning of string (BOS) and end of string (EOS), respectively, while [INST] and [/INST] are considered regular strings.
|
73 |
|
74 |
+
For the "{Instruction}" part, We recommend using "あなたは誠実で優秀な日本人のアシスタントです。"
|
75 |
+
### Use the instruct model Ver0.1
|
76 |
|
77 |
```python
|
78 |
import torch
|
79 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
80 |
|
81 |
+
model_name = "tokyotech-llm/Swallow-MS-7b-instruct-v0.1"
|
82 |
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")
|
83 |
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
84 |
|
|
|
99 |
print(decoded[0])
|
100 |
```
|
101 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
102 |
## Training Datasets
|
103 |
|
104 |
+
### Instruction Tuning Ver0.1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
105 |
|
106 |
The following datasets were used for the instruction tuning.
|
107 |
|