|
--- |
|
license: mit |
|
language: |
|
- en |
|
tags: |
|
- Chinese Spell Correction |
|
- csc |
|
- Chinese Spell Checking |
|
--- |
|
|
|
# NamBert-for-csc |
|
|
|
[](https://colab.research.google.com/github/iioSnail/NamBert/blob/master/example.ipynb) |
|
|
|
Official model for the paper "Unveiling the Impact of Multimodal Features on Chinese Spelling Correction: From Analysis to Design". |
|
|
|
Github: https://github.com/iioSnail/NamBert |
|
|
|
The sentence-level performance of the model in SIGHAN datasets is as follows: |
|
|
|
| | Detect-Acc | Detect-Precision | Detect-Recall | Detect-F1 | Correct-Acc | Correct-Precision | Correct-Recall | Correct-F1 | |
|
|--|--|--|--|--|--|--|--|--| |
|
| Sighan2013 | 82.70 | 87.72 | 82.39 | 84.97 | 81.60 | 86.51 | 81.26 | 83.80 | |
|
| Sighan2014 | 79.76 | 69.03 | 75.00 | 71.89 | 79.10 | 67.79 | 73.65 | 70.60 | |
|
| Sighan2015 | 86.18 | 77.52 | 85.40 | 81.27 | 85.73 | 76.68 | 84.47 | 80.39 | |
|
|
|
# Usage |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModel |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("iioSnail/NamBert-for-csc", trust_remote_code=True) |
|
model = AutoModel.from_pretrained("iioSnail/NamBert-for-csc", trust_remote_code=True) |
|
|
|
inputs = tokenizer("我喜换吃平果,逆呢?", return_tensors='pt') |
|
logits = model(**inputs).logits |
|
|
|
target_ids = logits.argmax(-1) |
|
target_ids = tokenizer.restore_ids(target_ids, inputs['input_ids']) |
|
|
|
print(''.join(tokenizer.convert_ids_to_tokens(target_ids[0, 1:-1]))) |
|
``` |
|
|
|
Or |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModel |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("iioSnail/NamBert-for-csc", trust_remote_code=True) |
|
model = AutoModel.from_pretrained("iioSnail/NamBert-for-csc", trust_remote_code=True) |
|
|
|
model = model.to(device) |
|
model = model.eval() |
|
model.set_tokenizer(tokenizer) |
|
|
|
model.predict("我是炼习时长两念半的个人练习生菜徐坤") |
|
model.predict(["我是炼习时长两念半的个人练习生菜徐坤", "喜欢场跳rap篮球!!"]) |
|
``` |
|
|