monsoon-nlp
/

nyc-savvy-llama2-7b

 ---
 license: mit
+datasets:
+- monsoon-nlp/asknyc-chatassistant-format
+language:
+- en
+tags:
+- reddit
+- asknyc
+- nyc
+- llama2
 ---
+# nyc-savvy-llama2-7b
+Essentials:
+- Based on LLaMa2-7b-hf (version 2, 7B params)
+- Used [QLoRA](https://github.com/artidoro/qlora/blob/main/qlora.py) to fine-tune on [13k rows of /r/AskNYC](https://huggingface.co/datasets/monsoon-nlp/asknyc-chatassistant-format) formatted as Human/Assistant exchanges
+- Released [the adapter weights](https://huggingface.co/monsoon-nlp/nyc-savvy-llama2-7b)
+- Merged LLaMa2 and the adapter weights for this full-sized model
+## Prompt options
+Here is the template used in training. Note it starts with "### Human: " (following space), the post title and content, then "### Assistant: " (no preceding space, yes following space).
+`### Human: Post title - post content### Assistant: `
+For example:
+`### Human: Where can I find a good bagel? - We are in Brooklyn### Assistant: Anywhere with fresh-baked bagels and lots of cream cheese options.`
+From [QLoRA's Gradio example](https://colab.research.google.com/drive/17XEqL1JcmVWjHkT-WczdYkJlNINacwG7?usp=sharing), it looks helpful to add a more assistant-like prompt, especially if you follow their lead for a chat format:
+```
+A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
+```
+## Training data
+- Collected one month of posts to /r/AskNYC from each year 2015-2019 (no content after July 2019)
+- Downloaded from PushShift, accepted comments only if upvote scores >= 3
+- Originally collected for my GPT-NYC model in spring 2021: https://mapmeld.medium.com/gpt-nyc-part-1-9cb698b2e3d
+## Training script
+Takes about 2 hours on CoLab once you get it right. You can only set max_steps for QLoRA, but I wanted to stop at 1 epoch.
+```
+git clone https://github.com/artidoro/qlora
+cd qlora
+pip3 install -r requirements.txt --quiet
+python3 qlora.py \
+    --model_name_or_path ../llama-2-7b-hf \
+    --use_auth \
+    --output_dir ../nyc-savvy-llama2-7b \
+    --logging_steps 10 \
+    --save_strategy steps \
+    --data_seed 42 \
+    --save_steps 500 \
+    --save_total_limit 40 \
+    --dataloader_num_workers 1 \
+    --group_by_length False \
+    --logging_strategy steps \
+    --remove_unused_columns False \
+    --do_train \
+    --num_train_epochs 1 \
+    --lora_r 64 \
+    --lora_alpha 16 \
+    --lora_modules all \
+    --double_quant \
+    --quant_type nf4 \
+    --bf16 \
+    --bits 4 \
+    --warmup_ratio 0.03 \
+    --lr_scheduler_type constant \
+    --gradient_checkpointing \
+    --dataset /content/gpt_nyc.jsonl \
+    --dataset_format oasst1 \
+    --source_max_len 16 \
+    --target_max_len 512 \
+    --per_device_train_batch_size 1 \
+    --gradient_accumulation_steps 16 \
+    --max_steps 760 \
+    --learning_rate 0.0002 \
+    --adam_beta2 0.999 \
+    --max_grad_norm 0.3 \
+    --lora_dropout 0.1 \
+    --weight_decay 0.0 \
+    --seed 0 \
+```
+## Merging it back
+What you get in the `output_dir` is an adapter model. [Here's ours](https://huggingface.co/monsoon-nlp/nyc-savvy-llama2-7b-lora-adapter/). Cool, but not as easy to drop into their script.
+The `peftmerger.py` script applies the adapter and saves the model like this:
+```python
+m = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    #load_in_4bit=True,
+    torch_dtype=torch.bfloat16,
+    #device_map={"": 0},
+)
+m = PeftModel.from_pretrained(m, adapters_name)
+m = m.merge_and_unload()
+m.save_pretrained("nyc-savvy")
+```
+## Testing that the model is NYC-savvy
+You might wonder if the model successfully learned anything about NYC or is the same old LLaMa2. With your prompt not adding clues, try this from the `pefttester.py` script in this repo:
+```python
+messages = "A chat between a curious human and an assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.\n"
+messages += "### Human: What museums should I visit? - My kids are aged 12 and 5"
+messages += "### Assistant: "
+input_ids = tok(messages, return_tensors="pt").input_ids
+# ...
+temperature = 0.7
+top_p = 0.9
+top_k = 0
+repetition_penalty = 1.1
+op = m.generate(
+    input_ids=input_ids,
+    max_new_tokens=100,
+    temperature=temperature,
+    do_sample=temperature > 0.0,
+    top_p=top_p,
+    top_k=top_k,
+    repetition_penalty=repetition_penalty,
+    stopping_criteria=StoppingCriteriaList([stop]),
+)
+for line in op:
+    print(tok.decode(line))
+```