yongchao chen
commited on
Commit
·
6417eac
1
Parent(s):
886972e
add files
Browse files
README.md
CHANGED
|
@@ -1,4 +1,119 @@
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
---
|
| 4 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
---
|
| 4 |
+
# CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance
|
| 5 |
+
<img src="./Figures/Tag.png" width="650px" alt="s" />
|
| 6 |
+
|
| 7 |
+
[Huggingface🤗](https://huggingface.co/yongchao98/CodeSteer-v1)
|
| 8 |
+
[Model Weights](https://drive.google.com/drive/folders/1qb_rec6f8rMYtFKm0eQpad0L0uHCwgpL?usp=share_link)
|
| 9 |
+
[Finetune Datasets](https://drive.google.com/drive/folders/1Byn-99gFd5ckRkPMJ8-zagzW7XDfO8ie?usp=share_link)
|
| 10 |
+
[SymBench Datasets](https://github.com/yongchao98/CodeSteer-v1.0/tree/main/dataset_gather)
|
| 11 |
+
[SymBench Synthesis Scripts](https://github.com/yongchao98/CodeSteer-v1.0/tree/main/benchmark)
|
| 12 |
+
|
| 13 |
+
## Contents
|
| 14 |
+
|
| 15 |
+
- [Framework](#Framework)
|
| 16 |
+
- [Inspirations](#Inspirations)
|
| 17 |
+
- [Performance](#Performance)
|
| 18 |
+
- [Environment_Setup](#Environment_Setup)
|
| 19 |
+
- [LLM_API_Key_Setup](#LLM_API_Key_Setup)
|
| 20 |
+
- [Train_and_Test_Models](#Train_and_Test_Models)
|
| 21 |
+
- [Assistance](#Assistance)
|
| 22 |
+
- [Citation](#Citation)
|
| 23 |
+
|
| 24 |
+
## Framework
|
| 25 |
+
<img src="./Figures/CodeSteer-intro.png" width="800px" alt="s" />
|
| 26 |
+
|
| 27 |
+
<p align="center" style="font-size: 16px;">
|
| 28 |
+
Figure: CodeSteer on guiding LLM code/text generation to integrate symbolic computing. At each interaction with TaskLLM, it reviews current and previous answers, then provides guidance for the next round.
|
| 29 |
+
</p>
|
| 30 |
+
|
| 31 |
+
## Inspirations
|
| 32 |
+
<img src="./Figures/LLM-makes-simple-mistakes-gather.png" width="800px" alt="s" />
|
| 33 |
+
<p align="center" style="font-size: 16px;">
|
| 34 |
+
Figure: The cases that GPT-4o makes simple mistakes by direct textual reasoning but can reliably solve the problem with prompted to use code.
|
| 35 |
+
</p>
|
| 36 |
+
|
| 37 |
+
|
| 38 |
+
## Performance
|
| 39 |
+
We compare GPT-4o + CodeSteer with OpenAI o1 and DeepSeek R1 on SymBench, with 28 seen tasks and 9 unseen tasks. GPT-4o + CodeSteer surpasses o1 (82.7), R1 (76.8), and o1-preview (74.8), highlighting the importance of integrating symbolic computing into LLMs.
|
| 40 |
+
|
| 41 |
+
<img src="./Figures/Table-results.png" width="800px" alt="s" />
|
| 42 |
+
|
| 43 |
+
The cost of tokens and runtimes for each method are as follows. GPT-4o + CodeSteer costs less tokens and runtimes than o1 and R1.
|
| 44 |
+
<img src="./Figures/Cost-token-runtime.png" width="800px" alt="s" />
|
| 45 |
+
|
| 46 |
+
## Environment_Setup
|
| 47 |
+
The fine-tuning and inference of CodeSteerLLM are based on [Llama-factory](https://github.com/hiyouga/LLaMA-Factory) with some modules modified by us.
|
| 48 |
+
```
|
| 49 |
+
git clone https://github.com/yongchao98/CodeSteer-v1.0.git
|
| 50 |
+
cd CodeSteer-v1.0
|
| 51 |
+
|
| 52 |
+
conda create -n CodeSteer python=3.10
|
| 53 |
+
conda activate CodeSteer
|
| 54 |
+
pip install -r requirements.txt
|
| 55 |
+
```
|
| 56 |
+
|
| 57 |
+
## LLM_API_Key_Setup
|
| 58 |
+
If you want to use several API-based LLMs as TaskLLM or CodeSteerLLM, then you need to set up API key.
|
| 59 |
+
|
| 60 |
+
1. First, create a .env file in your project root:
|
| 61 |
+
```
|
| 62 |
+
OPENAI_API_KEY='your_key_here'
|
| 63 |
+
CLAUDE_API_KEY='your_key_here'
|
| 64 |
+
MIXTRAL_API_KEY='your_key_here'
|
| 65 |
+
DEEPSEEK_API_KEY='your_key_here'
|
| 66 |
+
```
|
| 67 |
+
2. Add this .env file to your .gitignore to prevent accidentally committing it:
|
| 68 |
+
```
|
| 69 |
+
echo ".env" >> .gitignore
|
| 70 |
+
```
|
| 71 |
+
|
| 72 |
+
## Train_and_Test_Models
|
| 73 |
+
|
| 74 |
+
### Create_test_samples
|
| 75 |
+
The synthesized test samples for 37 tasks of SymBench are in [dataset_gather](https://github.com/yongchao98/CodeSteer-v1.0/tree/main/dataset_gather) dictionary. You can also synthezise the samples by yourself with tunable complexities with scripts in [create_dataset](https://github.com/yongchao98/CodeSteer-v1.0/tree/main/create_dataset).
|
| 76 |
+
|
| 77 |
+
### Run inference without GPU, test close LLM as CodeSteerLLM
|
| 78 |
+
We can directly use unfinetuned model like GPT-4o as CodeSteerLLM, in this case directly run
|
| 79 |
+
```
|
| 80 |
+
python benchmark_test_baseline.py
|
| 81 |
+
```
|
| 82 |
+
|
| 83 |
+
### Run inference with GPU, test finetuned CodeSteerLLM
|
| 84 |
+
We can infer Llama-3.1-8B with own GPUs (default setting is in infer_CodeSteer.sh using 4*H100 of Harvard Cluster, please modify freely with your own cluster settings). You can also download the [Model Weights](https://drive.google.com/drive/folders/1qb_rec6f8rMYtFKm0eQpad0L0uHCwgpL?usp=share_link) in your local and change the path in llama3_8B_CodeSteer.yaml.
|
| 85 |
+
|
| 86 |
+
```bash
|
| 87 |
+
bash infer_CodeSteer.sh
|
| 88 |
+
# default config file is ./llama3_8B_CodeSteer.yaml using the model uploaded on Huggingface.
|
| 89 |
+
```
|
| 90 |
+
|
| 91 |
+
### Finetuning CodeSteerLLM with synthesized data
|
| 92 |
+
Both our synthesized datasets of SFT and DPO finetuning are in [Finetune Datasets](https://drive.google.com/drive/folders/1Byn-99gFd5ckRkPMJ8-zagzW7XDfO8ie?usp=share_link).
|
| 93 |
+
We use Llama-factory and DeepSpeed for fintuning processes. First install Llama-factory with:
|
| 94 |
+
```
|
| 95 |
+
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
|
| 96 |
+
cd LLaMA-Factory
|
| 97 |
+
pip install -e ".[torch,metrics]"
|
| 98 |
+
cd ..
|
| 99 |
+
```
|
| 100 |
+
|
| 101 |
+
Then we run the code with (default setting is in train_llama3-8B-CodeSteer.sh using 4*H100 of Harvard Cluster, please modify freely with your own cluster settings):
|
| 102 |
+
```
|
| 103 |
+
bash train_llama3-8B-CodeSteer.sh
|
| 104 |
+
```
|
| 105 |
+
|
| 106 |
+
## Assistance
|
| 107 |
+
|
| 108 |
+
We appreciate all feedback! Feel free to raise an issue for bugs, questions, or suggestions. Contacting [Yongchao Chen](https://yongchao98.github.io/YongchaoChen/) and [Chuchu Fan](https://chuchu.mit.edu) for any questions and discussion.
|
| 109 |
+
|
| 110 |
+
## Citation
|
| 111 |
+
```md
|
| 112 |
+
@article{chen2024steering,
|
| 113 |
+
title={Steering Large Language Models between Code Execution and Textual Reasoning},
|
| 114 |
+
author={Chen, Yongchao and Jhamtani, Harsh and Sharma, Srinagesh and Fan, Chuchu and Wang, Chi},
|
| 115 |
+
journal={arXiv preprint arXiv:2410.03524},
|
| 116 |
+
year={2024}
|
| 117 |
+
}
|
| 118 |
+
```
|
| 119 |
+
|