File size: 2,184 Bytes
8d03773
 
 
 
 
 
8f2161d
 
 
 
8d03773
 
 
8f2161d
 
 
8d03773
275822b
 
8f2161d
8d03773
 
 
 
 
 
8f2161d
8d03773
 
 
8f2161d
8d03773
 
 
8f2161d
8d03773
 
 
 
 
 
 
 
 
 
 
 
2d2e0f7
275822b
 
2d2e0f7
 
 
 
275822b
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
---
license: apache-2.0
---

# BiLLa: A Bilingual LLaMA with Enhanced Reasoning Ability 

BiLLa is an open-source reasoning-enhanced bilingual LLaMA model. The main features are:
- Greatly improve the ability of Chinese language modeling, and minimize the damage to the original English ability of LLaMA;
- During the training, more task data is added with ChatGPT-generated analysis;
- Full-parameter optimization for better performance.

Github: https://github.com/Neutralzz/BiLLa

<b>Note</b>: Due to LLaMA's license, the model weights in this hub cannot be used directly. 
The weight of `word embedding` is the sum of the weights of the trained model and the original LLaMA, 
so as to ensure that developers with LLaMA original model accessibility can convert the model released by this hub into a usable one.

## Usage

First, you can revert the model weights by [this script](https://github.com/Neutralzz/BiLLa/blob/main/embedding_convert.py):
```shell
python3 embedding_convert.py \
    --model_dir /path_to_BiLLa/BiLLa-7B-SFT \
    --meta_llama_pth_file /path_to_LLaMA/llama-7b/consolidated.00.pth
```

Then, you can run this model as follows:
```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_path = "/path_to_BiLLa/BiLLa-7B-SFT"
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)
model = AutoModelForCausalLM.from_pretrained(model_path, low_cpu_mem_usage=True, torch_dtype=torch.float16).cuda()

prompt = "Human: Write a Python function that checks if a given number is even or odd.\nAssistant: "
input_ids = tokenizer([prompt]).input_ids
output_ids = model.generate(
            torch.as_tensor(input_ids).cuda(),
            do_sample=True,
            temperature=0.7,
            max_new_tokens=1024
        )
output_ids = output_ids[0][len(input_ids[0]):]

outputs = tokenizer.decode(output_ids, skip_special_tokens=True).strip()
print(outputs)
```

### Input Format
Different from [BiLLa-7B-LLM](https://huggingface.co/Neutralzz/BiLLa-7B-LLM), the model input of `BiLLa-7B-SFT` should be formatted as follows:
```
Human: [Your question]
Assistant: 
```
Note that <b>a space</b> is following the `Assistant:`