Upload README.md
Browse files
README.md
CHANGED
@@ -40,22 +40,22 @@ I use A100 GPU 40GB and COLAB, when trianing.
|
|
40 |
|
41 |
| Model Name | Vocabulary Size | Description |
|
42 |
| --- | --- | --- |
|
43 |
-
| Original Platypus2 |
|
44 |
-
| **Expanded KO-Platypus-ex** |
|
45 |
|
46 |
**Tokenizing "안녕하세요, 오늘은 날씨가 좋네요."**
|
47 |
|
48 |
| Model | Tokens |
|
49 |
| --- | --- |
|
50 |
-
| Platypus2-7b | `[
|
51 |
-
| KO-Platypus2-7b-ex | `[
|
52 |
|
53 |
**Tokenizing "Platypus: Quick, Cheap, and Powerful Refinement of LLMs"**
|
54 |
|
55 |
| Model | Tokens |
|
56 |
| --- | --- |
|
57 |
-
| Platypus2-7b | `[
|
58 |
-
| KO-Platypus2-7b-ex | `[
|
59 |
|
60 |
# **Model Benchmark**
|
61 |
|
|
|
40 |
|
41 |
| Model Name | Vocabulary Size | Description |
|
42 |
| --- | --- | --- |
|
43 |
+
| Original Platypus2 | 32000 | Sentencepiece BPE |
|
44 |
+
| **Expanded KO-Platypus-ex** | 46336 | Sentencepiece BPE. Added Korean vocab and merges |
|
45 |
|
46 |
**Tokenizing "안녕하세요, 오늘은 날씨가 좋네요."**
|
47 |
|
48 |
| Model | Tokens |
|
49 |
| --- | --- |
|
50 |
+
| Platypus2-7b | `['▁', '안', '<0xEB>', '<0x85>', '<0x95>', '하', '세', '요', ',', '▁', '오', '<0xEB>', '<0x8A>', '<0x98>', '은', '▁', '<0xEB>', '<0x82>', '<0xA0>', '씨', '가', '▁', '<0xEC>', '<0xA2>', '<0x8B>', '<0xEB>', '<0x84>', '<0xA4>', '요', '.']` |
|
51 |
+
| KO-Platypus2-7b-ex | `['▁안녕', '하세요', ',', '▁오늘은', '▁날', '씨가', '▁좋네요', '.']` |
|
52 |
|
53 |
**Tokenizing "Platypus: Quick, Cheap, and Powerful Refinement of LLMs"**
|
54 |
|
55 |
| Model | Tokens |
|
56 |
| --- | --- |
|
57 |
+
| Platypus2-7b | `['▁Plat', 'yp', 'us', ':', '▁Quick', ',', '▁Che', 'ap', ',', '▁and', '▁Power', 'ful', '▁Re', 'fin', 'ement', '▁of', '▁L', 'LM', 's']` |
|
58 |
+
| KO-Platypus2-7b-ex | `[▁Plat', 'yp', 'us', ':', '▁Quick', ',', '▁Che', 'ap', ',', '▁and', '▁Power', 'ful', '▁Re', 'fin', 'ement', '▁of', '▁L', 'LM', 's']` |
|
59 |
|
60 |
# **Model Benchmark**
|
61 |
|