junyoung-00 nielsr HF Staff commited on
Commit
3c374ca
·
verified ·
1 Parent(s): e1e2eb1

Enhance model card for ChartCap with metadata, links, and usage example (#1)

Browse files

- Enhance model card for ChartCap with metadata, links, and usage example (677d1665be94894e05d2718da5b94c92d1c8f2be)


Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show
  1. README.md +90 -3
README.md CHANGED
@@ -1,3 +1,90 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: image-to-text
4
+ library_name: transformers
5
+ tags:
6
+ - chart-captioning
7
+ - multimodal
8
+ - vision-language-model
9
+ ---
10
+
11
+ # ChartCap: Mitigating Hallucination of Dense Chart Captioning
12
+
13
+ This repository contains the model presented in the paper [**ChartCap: Mitigating Hallucination of Dense Chart Captioning**](https://huggingface.co/papers/2508.03164).
14
+
15
+ **Project Page:** [https://junyoung-00.github.io/ChartCap/](https://junyoung-00.github.io/ChartCap/)
16
+ **Code:** [https://github.com/junyoung-00/ChartCap](https://github.com/junyoung-00/ChartCap)
17
+
18
+ ## Model Description
19
+
20
+ `ChartCap` is a vision-language model specifically fine-tuned for generating accurate, informative, and hallucination-free captions for charts. It addresses the challenges of existing chart captioning models by leveraging innovations in both data and a novel evaluation metric.
21
+
22
+ The model aims to generate high-quality, dense captions for various chart types, ensuring that the generated text accurately captures structural elements and key insights discernible from the charts, while mitigating the inclusion of extraneous or hallucinated information.
23
+
24
+ ## Key Features
25
+
26
+ * **Dense Chart Captioning:** Generates detailed, type-specific captions that highlight structural elements and key insights from charts.
27
+ * **Hallucination Mitigation:** Designed to reduce the generation of extraneous information not discernible from the chart data.
28
+ * **Real-world Data:** Fine-tuned on `ChartCap`, a large-scale dataset of 565K real-world chart images with high-quality, dense captions.
29
+
30
+ ## How to Use
31
+
32
+ You can use the ChartCap model with the Hugging Face `transformers` library. The model is built upon a Phi-3.5-vision-instruct base, implying a multimodal conversation template.
33
+
34
+ ```python
35
+ from transformers import AutoProcessor, AutoModelForCausalLM
36
+ from PIL import Image
37
+ import requests
38
+ import torch
39
+
40
+ # Replace "your_model_id" with the actual model ID from the Hugging Face Hub.
41
+ # For example, if this model is hosted at `junyoung-00/ChartCap-Phi3V`, use "junyoung-00/ChartCap-Phi3V".
42
+ model_id = "your_model_id"
43
+
44
+ processor = AutoProcessor.from_pretrained(model_id)
45
+ model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")
46
+
47
+ # Example image: a bar chart (replace with your chart image URL or local path)
48
+ # For a local image: image = Image.open("path/to/your/chart_image.png").convert("RGB")
49
+ image_url = "https://junyoung-00.github.io/ChartCap/assets/images/teaser.png" # Example chart image from project page
50
+ image = Image.open(requests.get(image_url, stream=True).raw).convert("RGB")
51
+
52
+ # Define the prompt for dense chart captioning
53
+ prompt = "Describe this chart in detail, focusing on its structural elements and key insights."
54
+ messages = [
55
+ {"role": "user", "content": f"<|image|>
56
+ {prompt}"}
57
+ ]
58
+
59
+ # Apply chat template and prepare inputs
60
+ input_ids = processor.tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
61
+ # The image token handling for Phi3V can sometimes be specific, ensure correct placeholder handling if <|image|> is mapped.
62
+ # For simplicity, we use the standard processor input which handles image embedding.
63
+ inputs = processor(text=input_ids, images=image, return_tensors="pt").to(model.device)
64
+
65
+
66
+ # Generate response
67
+ generated_ids = model.generate(**inputs, max_new_tokens=512)
68
+
69
+ # Decode and print the output
70
+ response = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
71
+ print(response.strip())
72
+ ```
73
+
74
+ ## Dataset
75
+
76
+ This model was fine-tuned on **ChartCap**, a large-scale dataset featuring 565K real-world chart images paired with type-specific, dense captions. The dataset generation pipeline ensures captions are derived solely from discernible chart data, emphasizing structural elements and key insights to mitigate hallucination.
77
+
78
+ ## Citation
79
+
80
+ If you find this model or the associated research helpful, please consider citing the paper:
81
+
82
+ ```bibtex
83
+ @article{Kim2025ChartCapMH,
84
+ title={ChartCap: Mitigating Hallucination of Dense Chart Captioning},
85
+ author={Junyoung Kim and Suhyang Gwon and Jonghun Kim and Hyeonseop Song and Seung-Hoon Na and Junmo Kim},
86
+ journal={arXiv preprint arXiv:2508.03164},
87
+ year={2025},
88
+ url={https://arxiv.org/abs/2508.03164}
89
+ }
90
+ ```