junyoung-00
/

Phi-3.5-vision-instruct-ChartCap

@@ -12,45 +12,34 @@ tags:
 This repository contains the model presented in the paper [**ChartCap: Mitigating Hallucination of Dense Chart Captioning**](https://huggingface.co/papers/2508.03164).
-**Project Page:** [https://junyoung-00.github.io/ChartCap/](https://junyoung-00.github.io/ChartCap/)
 **Code:** [https://github.com/junyoung-00/ChartCap](https://github.com/junyoung-00/ChartCap)
 ## Model Description
-`ChartCap` is a vision-language model specifically fine-tuned for generating accurate, informative, and hallucination-free captions for charts. It addresses the challenges of existing chart captioning models by leveraging innovations in both data and a novel evaluation metric.
-The model aims to generate high-quality, dense captions for various chart types, ensuring that the generated text accurately captures structural elements and key insights discernible from the charts, while mitigating the inclusion of extraneous or hallucinated information.
-## Key Features
-*   **Dense Chart Captioning:** Generates detailed, type-specific captions that highlight structural elements and key insights from charts.
-*   **Hallucination Mitigation:** Designed to reduce the generation of extraneous information not discernible from the chart data.
-*   **Real-world Data:** Fine-tuned on `ChartCap`, a large-scale dataset of 565K real-world chart images with high-quality, dense captions.
 ## How to Use
-You can use the ChartCap model with the Hugging Face `transformers` library. The model is built upon a Phi-3.5-vision-instruct base, implying a multimodal conversation template.
 ```python
 from transformers import AutoProcessor, AutoModelForCausalLM
 from PIL import Image
 import requests
 import torch
-# Replace "your_model_id" with the actual model ID from the Hugging Face Hub.
-# For example, if this model is hosted at `junyoung-00/ChartCap-Phi3V`, use "junyoung-00/ChartCap-Phi3V".
-model_id = "your_model_id"
 processor = AutoProcessor.from_pretrained(model_id)
 model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")
-# Example image: a bar chart (replace with your chart image URL or local path)
-# For a local image: image = Image.open("path/to/your/chart_image.png").convert("RGB")
-image_url = "https://junyoung-00.github.io/ChartCap/assets/images/teaser.png" # Example chart image from project page
 image = Image.open(requests.get(image_url, stream=True).raw).convert("RGB")
 # Define the prompt for dense chart captioning
-prompt = "Describe this chart in detail, focusing on its structural elements and key insights."
 messages = [
     {"role": "user", "content": f"<|image|>
 {prompt}"}
@@ -71,20 +60,15 @@ response = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
 print(response.strip())
 ```
-## Dataset
-This model was fine-tuned on **ChartCap**, a large-scale dataset featuring 565K real-world chart images paired with type-specific, dense captions. The dataset generation pipeline ensures captions are derived solely from discernible chart data, emphasizing structural elements and key insights to mitigate hallucination.
 ## Citation
-If you find this model or the associated research helpful, please consider citing the paper:
 ```bibtex
-@article{Kim2025ChartCapMH,
-  title={ChartCap: Mitigating Hallucination of Dense Chart Captioning},
-  author={Junyoung Kim and Suhyang Gwon and Jonghun Kim and Hyeonseop Song and Seung-Hoon Na and Junmo Kim},
-  journal={arXiv preprint arXiv:2508.03164},
-  year={2025},
-  url={https://arxiv.org/abs/2508.03164}
-}
 ```

 This repository contains the model presented in the paper [**ChartCap: Mitigating Hallucination of Dense Chart Captioning**](https://huggingface.co/papers/2508.03164).
+**Project Page:** (WIP) [https://junyoung-00.github.io/ChartCap/](https://junyoung-00.github.io/ChartCap/)
 **Code:** [https://github.com/junyoung-00/ChartCap](https://github.com/junyoung-00/ChartCap)
 ## Model Description
+`Phi-3.5-vision-instruct-ChartCap` is a ChartCap-fine-tuned version of microsoft/Phi-3.5-vision-instruct.
+The model aims to generate high-quality, dense captions for charts, ensuring that the generated text accurately captures structural elements and key insights discernible from the charts, while mitigating the inclusion of extraneous or hallucinated information.
 ## How to Use
 ```python
 from transformers import AutoProcessor, AutoModelForCausalLM
 from PIL import Image
 import requests
 import torch
+model_id = "junyoung-00/Phi-3.5-vision-instruct-ChartCap"
 processor = AutoProcessor.from_pretrained(model_id)
 model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")
+# Load an example chart image (URL or local path)
+image_url = "https://your-server.com/example_chart.png"
 image = Image.open(requests.get(image_url, stream=True).raw).convert("RGB")
 # Define the prompt for dense chart captioning
+prompt = "Please provide a detailed caption for the chart."
 messages = [
     {"role": "user", "content": f"<|image|>
 {prompt}"}
 print(response.strip())
 ```
 ## Citation
+If you find this model or the associated research helpful, please cite:
 ```bibtex
+@inproceedings{{lim2025chartcap,
+  title={{ChartCap: Mitigating Hallucination of Dense Chart Captioning}},
+  author={{Junyoung Lim and Jaewoo Ahn and Gunhee Kim}},
+  booktitle={{Proceedings of the IEEE/CVF International Conference on Computer Vision}},
+  year={{2025}}
+}}
 ```