Nick Doiron
commited on
Commit
·
60aabf1
1
Parent(s):
0800fac
readme
Browse files
README.md
CHANGED
@@ -1,3 +1,142 @@
|
|
1 |
---
|
2 |
license: mit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
+
datasets:
|
4 |
+
- monsoon-nlp/asknyc-chatassistant-format
|
5 |
+
language:
|
6 |
+
- en
|
7 |
+
tags:
|
8 |
+
- reddit
|
9 |
+
- asknyc
|
10 |
+
- nyc
|
11 |
+
- llama2
|
12 |
---
|
13 |
+
|
14 |
+
# nyc-savvy-llama2-7b
|
15 |
+
|
16 |
+
Essentials:
|
17 |
+
- Based on LLaMa2-7b-hf (version 2, 7B params)
|
18 |
+
- Used [QLoRA](https://github.com/artidoro/qlora/blob/main/qlora.py) to fine-tune on [13k rows of /r/AskNYC](https://huggingface.co/datasets/monsoon-nlp/asknyc-chatassistant-format) formatted as Human/Assistant exchanges
|
19 |
+
- Released [the adapter weights](https://huggingface.co/monsoon-nlp/nyc-savvy-llama2-7b)
|
20 |
+
- Merged LLaMa2 and the adapter weights for this full-sized model
|
21 |
+
|
22 |
+
## Prompt options
|
23 |
+
|
24 |
+
Here is the template used in training. Note it starts with "### Human: " (following space), the post title and content, then "### Assistant: " (no preceding space, yes following space).
|
25 |
+
|
26 |
+
`### Human: Post title - post content### Assistant: `
|
27 |
+
|
28 |
+
For example:
|
29 |
+
|
30 |
+
`### Human: Where can I find a good bagel? - We are in Brooklyn### Assistant: Anywhere with fresh-baked bagels and lots of cream cheese options.`
|
31 |
+
|
32 |
+
From [QLoRA's Gradio example](https://colab.research.google.com/drive/17XEqL1JcmVWjHkT-WczdYkJlNINacwG7?usp=sharing), it looks helpful to add a more assistant-like prompt, especially if you follow their lead for a chat format:
|
33 |
+
|
34 |
+
```
|
35 |
+
A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
|
36 |
+
```
|
37 |
+
|
38 |
+
## Training data
|
39 |
+
|
40 |
+
- Collected one month of posts to /r/AskNYC from each year 2015-2019 (no content after July 2019)
|
41 |
+
- Downloaded from PushShift, accepted comments only if upvote scores >= 3
|
42 |
+
- Originally collected for my GPT-NYC model in spring 2021: https://mapmeld.medium.com/gpt-nyc-part-1-9cb698b2e3d
|
43 |
+
|
44 |
+
## Training script
|
45 |
+
|
46 |
+
Takes about 2 hours on CoLab once you get it right. You can only set max_steps for QLoRA, but I wanted to stop at 1 epoch.
|
47 |
+
|
48 |
+
```
|
49 |
+
git clone https://github.com/artidoro/qlora
|
50 |
+
cd qlora
|
51 |
+
|
52 |
+
pip3 install -r requirements.txt --quiet
|
53 |
+
|
54 |
+
python3 qlora.py \
|
55 |
+
--model_name_or_path ../llama-2-7b-hf \
|
56 |
+
--use_auth \
|
57 |
+
--output_dir ../nyc-savvy-llama2-7b \
|
58 |
+
--logging_steps 10 \
|
59 |
+
--save_strategy steps \
|
60 |
+
--data_seed 42 \
|
61 |
+
--save_steps 500 \
|
62 |
+
--save_total_limit 40 \
|
63 |
+
--dataloader_num_workers 1 \
|
64 |
+
--group_by_length False \
|
65 |
+
--logging_strategy steps \
|
66 |
+
--remove_unused_columns False \
|
67 |
+
--do_train \
|
68 |
+
--num_train_epochs 1 \
|
69 |
+
--lora_r 64 \
|
70 |
+
--lora_alpha 16 \
|
71 |
+
--lora_modules all \
|
72 |
+
--double_quant \
|
73 |
+
--quant_type nf4 \
|
74 |
+
--bf16 \
|
75 |
+
--bits 4 \
|
76 |
+
--warmup_ratio 0.03 \
|
77 |
+
--lr_scheduler_type constant \
|
78 |
+
--gradient_checkpointing \
|
79 |
+
--dataset /content/gpt_nyc.jsonl \
|
80 |
+
--dataset_format oasst1 \
|
81 |
+
--source_max_len 16 \
|
82 |
+
--target_max_len 512 \
|
83 |
+
--per_device_train_batch_size 1 \
|
84 |
+
--gradient_accumulation_steps 16 \
|
85 |
+
--max_steps 760 \
|
86 |
+
--learning_rate 0.0002 \
|
87 |
+
--adam_beta2 0.999 \
|
88 |
+
--max_grad_norm 0.3 \
|
89 |
+
--lora_dropout 0.1 \
|
90 |
+
--weight_decay 0.0 \
|
91 |
+
--seed 0 \
|
92 |
+
```
|
93 |
+
|
94 |
+
## Merging it back
|
95 |
+
|
96 |
+
What you get in the `output_dir` is an adapter model. [Here's ours](https://huggingface.co/monsoon-nlp/nyc-savvy-llama2-7b-lora-adapter/). Cool, but not as easy to drop into their script.
|
97 |
+
|
98 |
+
The `peftmerger.py` script applies the adapter and saves the model like this:
|
99 |
+
|
100 |
+
```python
|
101 |
+
m = AutoModelForCausalLM.from_pretrained(
|
102 |
+
model_name,
|
103 |
+
#load_in_4bit=True,
|
104 |
+
torch_dtype=torch.bfloat16,
|
105 |
+
#device_map={"": 0},
|
106 |
+
)
|
107 |
+
m = PeftModel.from_pretrained(m, adapters_name)
|
108 |
+
m = m.merge_and_unload()
|
109 |
+
m.save_pretrained("nyc-savvy")
|
110 |
+
```
|
111 |
+
|
112 |
+
## Testing that the model is NYC-savvy
|
113 |
+
|
114 |
+
You might wonder if the model successfully learned anything about NYC or is the same old LLaMa2. With your prompt not adding clues, try this from the `pefttester.py` script in this repo:
|
115 |
+
|
116 |
+
```python
|
117 |
+
messages = "A chat between a curious human and an assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.\n"
|
118 |
+
messages += "### Human: What museums should I visit? - My kids are aged 12 and 5"
|
119 |
+
messages += "### Assistant: "
|
120 |
+
|
121 |
+
input_ids = tok(messages, return_tensors="pt").input_ids
|
122 |
+
|
123 |
+
# ...
|
124 |
+
|
125 |
+
temperature = 0.7
|
126 |
+
top_p = 0.9
|
127 |
+
top_k = 0
|
128 |
+
repetition_penalty = 1.1
|
129 |
+
|
130 |
+
op = m.generate(
|
131 |
+
input_ids=input_ids,
|
132 |
+
max_new_tokens=100,
|
133 |
+
temperature=temperature,
|
134 |
+
do_sample=temperature > 0.0,
|
135 |
+
top_p=top_p,
|
136 |
+
top_k=top_k,
|
137 |
+
repetition_penalty=repetition_penalty,
|
138 |
+
stopping_criteria=StoppingCriteriaList([stop]),
|
139 |
+
)
|
140 |
+
for line in op:
|
141 |
+
print(tok.decode(line))
|
142 |
+
```
|