Fix typo on tokenize example
Browse files
README.md
CHANGED
@@ -43,7 +43,7 @@ Llama-2-Ko is an auto-regressive language model that uses an optimized transform
|
|
43 |
- New vocab and merges, trained with Korean Corpus
|
44 |
- Tokenizer Examples: Llama-2 vs **Llama-2-Ko**
|
45 |
- Use the same tokenization for English, but a shorter/merged tokenization for Korean.
|
46 |
-
- Tokenize "안녕하세요, 오늘은 날씨가
|
47 |
- Llama-2:
|
48 |
```
|
49 |
['▁', '안', '<0xEB>', '<0x85>', '<0x95>', '하', '세', '요', ',', '▁', '오', '<0xEB>', '<0x8A>', '<0x98>', '은', '▁', '<0xEB>', '<0x82>', '<0xA0>', '씨', '가', '▁', '<0xEC>', '<0xA2>', '<0x8B>', '<0xEB>', '<0x84>', '<0xA4>', '요']
|
|
|
43 |
- New vocab and merges, trained with Korean Corpus
|
44 |
- Tokenizer Examples: Llama-2 vs **Llama-2-Ko**
|
45 |
- Use the same tokenization for English, but a shorter/merged tokenization for Korean.
|
46 |
+
- Tokenize "안녕하세요, 오늘은 날씨가 좋네요."
|
47 |
- Llama-2:
|
48 |
```
|
49 |
['▁', '안', '<0xEB>', '<0x85>', '<0x95>', '하', '세', '요', ',', '▁', '오', '<0xEB>', '<0x8A>', '<0x98>', '은', '▁', '<0xEB>', '<0x82>', '<0xA0>', '씨', '가', '▁', '<0xEC>', '<0xA2>', '<0x8B>', '<0xEB>', '<0x84>', '<0xA4>', '요']
|