junyoung-00 commited on
Commit
43cf888
·
verified ·
1 Parent(s): 3c374ca

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -30
README.md CHANGED
@@ -12,45 +12,34 @@ tags:
12
 
13
  This repository contains the model presented in the paper [**ChartCap: Mitigating Hallucination of Dense Chart Captioning**](https://huggingface.co/papers/2508.03164).
14
 
15
- **Project Page:** [https://junyoung-00.github.io/ChartCap/](https://junyoung-00.github.io/ChartCap/)
16
  **Code:** [https://github.com/junyoung-00/ChartCap](https://github.com/junyoung-00/ChartCap)
17
 
18
  ## Model Description
19
 
20
- `ChartCap` is a vision-language model specifically fine-tuned for generating accurate, informative, and hallucination-free captions for charts. It addresses the challenges of existing chart captioning models by leveraging innovations in both data and a novel evaluation metric.
21
 
22
- The model aims to generate high-quality, dense captions for various chart types, ensuring that the generated text accurately captures structural elements and key insights discernible from the charts, while mitigating the inclusion of extraneous or hallucinated information.
23
-
24
- ## Key Features
25
-
26
- * **Dense Chart Captioning:** Generates detailed, type-specific captions that highlight structural elements and key insights from charts.
27
- * **Hallucination Mitigation:** Designed to reduce the generation of extraneous information not discernible from the chart data.
28
- * **Real-world Data:** Fine-tuned on `ChartCap`, a large-scale dataset of 565K real-world chart images with high-quality, dense captions.
29
 
30
  ## How to Use
31
 
32
- You can use the ChartCap model with the Hugging Face `transformers` library. The model is built upon a Phi-3.5-vision-instruct base, implying a multimodal conversation template.
33
-
34
  ```python
35
  from transformers import AutoProcessor, AutoModelForCausalLM
36
  from PIL import Image
37
  import requests
38
  import torch
39
 
40
- # Replace "your_model_id" with the actual model ID from the Hugging Face Hub.
41
- # For example, if this model is hosted at `junyoung-00/ChartCap-Phi3V`, use "junyoung-00/ChartCap-Phi3V".
42
- model_id = "your_model_id"
43
 
44
  processor = AutoProcessor.from_pretrained(model_id)
45
  model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")
46
 
47
- # Example image: a bar chart (replace with your chart image URL or local path)
48
- # For a local image: image = Image.open("path/to/your/chart_image.png").convert("RGB")
49
- image_url = "https://junyoung-00.github.io/ChartCap/assets/images/teaser.png" # Example chart image from project page
50
  image = Image.open(requests.get(image_url, stream=True).raw).convert("RGB")
51
 
52
  # Define the prompt for dense chart captioning
53
- prompt = "Describe this chart in detail, focusing on its structural elements and key insights."
54
  messages = [
55
  {"role": "user", "content": f"<|image|>
56
  {prompt}"}
@@ -71,20 +60,15 @@ response = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
71
  print(response.strip())
72
  ```
73
 
74
- ## Dataset
75
-
76
- This model was fine-tuned on **ChartCap**, a large-scale dataset featuring 565K real-world chart images paired with type-specific, dense captions. The dataset generation pipeline ensures captions are derived solely from discernible chart data, emphasizing structural elements and key insights to mitigate hallucination.
77
-
78
  ## Citation
79
 
80
- If you find this model or the associated research helpful, please consider citing the paper:
81
 
82
  ```bibtex
83
- @article{Kim2025ChartCapMH,
84
- title={ChartCap: Mitigating Hallucination of Dense Chart Captioning},
85
- author={Junyoung Kim and Suhyang Gwon and Jonghun Kim and Hyeonseop Song and Seung-Hoon Na and Junmo Kim},
86
- journal={arXiv preprint arXiv:2508.03164},
87
- year={2025},
88
- url={https://arxiv.org/abs/2508.03164}
89
- }
90
  ```
 
12
 
13
  This repository contains the model presented in the paper [**ChartCap: Mitigating Hallucination of Dense Chart Captioning**](https://huggingface.co/papers/2508.03164).
14
 
15
+ **Project Page:** (WIP) [https://junyoung-00.github.io/ChartCap/](https://junyoung-00.github.io/ChartCap/)
16
  **Code:** [https://github.com/junyoung-00/ChartCap](https://github.com/junyoung-00/ChartCap)
17
 
18
  ## Model Description
19
 
20
+ `Phi-3.5-vision-instruct-ChartCap` is a ChartCap-fine-tuned version of microsoft/Phi-3.5-vision-instruct.
21
 
22
+ The model aims to generate high-quality, dense captions for charts, ensuring that the generated text accurately captures structural elements and key insights discernible from the charts, while mitigating the inclusion of extraneous or hallucinated information.
 
 
 
 
 
 
23
 
24
  ## How to Use
25
 
 
 
26
  ```python
27
  from transformers import AutoProcessor, AutoModelForCausalLM
28
  from PIL import Image
29
  import requests
30
  import torch
31
 
32
+ model_id = "junyoung-00/Phi-3.5-vision-instruct-ChartCap"
 
 
33
 
34
  processor = AutoProcessor.from_pretrained(model_id)
35
  model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")
36
 
37
+ # Load an example chart image (URL or local path)
38
+ image_url = "https://your-server.com/example_chart.png"
 
39
  image = Image.open(requests.get(image_url, stream=True).raw).convert("RGB")
40
 
41
  # Define the prompt for dense chart captioning
42
+ prompt = "Please provide a detailed caption for the chart."
43
  messages = [
44
  {"role": "user", "content": f"<|image|>
45
  {prompt}"}
 
60
  print(response.strip())
61
  ```
62
 
 
 
 
 
63
  ## Citation
64
 
65
+ If you find this model or the associated research helpful, please cite:
66
 
67
  ```bibtex
68
+ @inproceedings{{lim2025chartcap,
69
+ title={{ChartCap: Mitigating Hallucination of Dense Chart Captioning}},
70
+ author={{Junyoung Lim and Jaewoo Ahn and Gunhee Kim}},
71
+ booktitle={{Proceedings of the IEEE/CVF International Conference on Computer Vision}},
72
+ year={{2025}}
73
+ }}
 
74
  ```