# ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation (ACL25 Main) 🤗 Dataset(HuggingFace) | 🤖 Dataset(ModelScope) | 📑 Paper This repository contains the code to train and infer ChartCoder. **Our Github repository [ChartCoder](https://github.com/thunlp/ChartCoder) updates with more details and news.** ## Installation 1. Clone this repo ``` git clone https://github.com/thunlp/ChartCoder.git ``` 2. Create environment ``` conda create -n chartcoder python=3.10 -y conda activate chartcoder pip install --upgrade pip # enable PEP 660 support pip install -e . ``` 3. Additional packages required for training ``` pip install -e ".[train]" pip install flash-attn --no-build-isolation ``` ## Train The whole training process consists of two stages. To train the ChartCoder, ```siglip-so400m-patch14-384``` and ```deepseek-coder-6.7b-instruct``` should be downloaded first. For **Pre-training**, run ``` bash scripts/train/pretrain_siglip.sh ``` For **SFT**, run ``` bash scripts/train/finetune_siglip_a4.sh ``` Please change the model path to your local path. See the corresponding ```.sh ``` file for details. We also provide other training scripts, such as using CLIP ```_clip``` and multiple machines ```_m```. See ``` scripts/train ``` for further information. ## Citation If you find this work useful, consider giving this repository a star ⭐️ and citing 📝 our paper as follows: ``` @misc{zhao2025chartcoderadvancingmultimodallarge, title={ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation}, author={Xuanle Zhao and Xianzhen Luo and Qi Shi and Chi Chen and Shuo Wang and Wanxiang Che and Zhiyuan Liu and Maosong Sun}, year={2025}, eprint={2501.06598}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2501.06598}, } ```