Update README.md
Browse files
README.md
CHANGED
|
@@ -12,45 +12,34 @@ tags:
|
|
| 12 |
|
| 13 |
This repository contains the model presented in the paper [**ChartCap: Mitigating Hallucination of Dense Chart Captioning**](https://huggingface.co/papers/2508.03164).
|
| 14 |
|
| 15 |
-
**Project Page:** [https://junyoung-00.github.io/ChartCap/](https://junyoung-00.github.io/ChartCap/)
|
| 16 |
**Code:** [https://github.com/junyoung-00/ChartCap](https://github.com/junyoung-00/ChartCap)
|
| 17 |
|
| 18 |
## Model Description
|
| 19 |
|
| 20 |
-
`ChartCap` is a
|
| 21 |
|
| 22 |
-
The model aims to generate high-quality, dense captions for
|
| 23 |
-
|
| 24 |
-
## Key Features
|
| 25 |
-
|
| 26 |
-
* **Dense Chart Captioning:** Generates detailed, type-specific captions that highlight structural elements and key insights from charts.
|
| 27 |
-
* **Hallucination Mitigation:** Designed to reduce the generation of extraneous information not discernible from the chart data.
|
| 28 |
-
* **Real-world Data:** Fine-tuned on `ChartCap`, a large-scale dataset of 565K real-world chart images with high-quality, dense captions.
|
| 29 |
|
| 30 |
## How to Use
|
| 31 |
|
| 32 |
-
You can use the ChartCap model with the Hugging Face `transformers` library. The model is built upon a Phi-3.5-vision-instruct base, implying a multimodal conversation template.
|
| 33 |
-
|
| 34 |
```python
|
| 35 |
from transformers import AutoProcessor, AutoModelForCausalLM
|
| 36 |
from PIL import Image
|
| 37 |
import requests
|
| 38 |
import torch
|
| 39 |
|
| 40 |
-
|
| 41 |
-
# For example, if this model is hosted at `junyoung-00/ChartCap-Phi3V`, use "junyoung-00/ChartCap-Phi3V".
|
| 42 |
-
model_id = "your_model_id"
|
| 43 |
|
| 44 |
processor = AutoProcessor.from_pretrained(model_id)
|
| 45 |
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")
|
| 46 |
|
| 47 |
-
#
|
| 48 |
-
|
| 49 |
-
image_url = "https://junyoung-00.github.io/ChartCap/assets/images/teaser.png" # Example chart image from project page
|
| 50 |
image = Image.open(requests.get(image_url, stream=True).raw).convert("RGB")
|
| 51 |
|
| 52 |
# Define the prompt for dense chart captioning
|
| 53 |
-
prompt = "
|
| 54 |
messages = [
|
| 55 |
{"role": "user", "content": f"<|image|>
|
| 56 |
{prompt}"}
|
|
@@ -71,20 +60,15 @@ response = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
|
| 71 |
print(response.strip())
|
| 72 |
```
|
| 73 |
|
| 74 |
-
## Dataset
|
| 75 |
-
|
| 76 |
-
This model was fine-tuned on **ChartCap**, a large-scale dataset featuring 565K real-world chart images paired with type-specific, dense captions. The dataset generation pipeline ensures captions are derived solely from discernible chart data, emphasizing structural elements and key insights to mitigate hallucination.
|
| 77 |
-
|
| 78 |
## Citation
|
| 79 |
|
| 80 |
-
If you find this model or the associated research helpful, please
|
| 81 |
|
| 82 |
```bibtex
|
| 83 |
-
@
|
| 84 |
-
title={ChartCap: Mitigating Hallucination of Dense Chart Captioning},
|
| 85 |
-
author={Junyoung
|
| 86 |
-
|
| 87 |
-
year={2025}
|
| 88 |
-
|
| 89 |
-
}
|
| 90 |
```
|
|
|
|
| 12 |
|
| 13 |
This repository contains the model presented in the paper [**ChartCap: Mitigating Hallucination of Dense Chart Captioning**](https://huggingface.co/papers/2508.03164).
|
| 14 |
|
| 15 |
+
**Project Page:** (WIP) [https://junyoung-00.github.io/ChartCap/](https://junyoung-00.github.io/ChartCap/)
|
| 16 |
**Code:** [https://github.com/junyoung-00/ChartCap](https://github.com/junyoung-00/ChartCap)
|
| 17 |
|
| 18 |
## Model Description
|
| 19 |
|
| 20 |
+
`Phi-3.5-vision-instruct-ChartCap` is a ChartCap-fine-tuned version of microsoft/Phi-3.5-vision-instruct.
|
| 21 |
|
| 22 |
+
The model aims to generate high-quality, dense captions for charts, ensuring that the generated text accurately captures structural elements and key insights discernible from the charts, while mitigating the inclusion of extraneous or hallucinated information.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
|
| 24 |
## How to Use
|
| 25 |
|
|
|
|
|
|
|
| 26 |
```python
|
| 27 |
from transformers import AutoProcessor, AutoModelForCausalLM
|
| 28 |
from PIL import Image
|
| 29 |
import requests
|
| 30 |
import torch
|
| 31 |
|
| 32 |
+
model_id = "junyoung-00/Phi-3.5-vision-instruct-ChartCap"
|
|
|
|
|
|
|
| 33 |
|
| 34 |
processor = AutoProcessor.from_pretrained(model_id)
|
| 35 |
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")
|
| 36 |
|
| 37 |
+
# Load an example chart image (URL or local path)
|
| 38 |
+
image_url = "https://your-server.com/example_chart.png"
|
|
|
|
| 39 |
image = Image.open(requests.get(image_url, stream=True).raw).convert("RGB")
|
| 40 |
|
| 41 |
# Define the prompt for dense chart captioning
|
| 42 |
+
prompt = "Please provide a detailed caption for the chart."
|
| 43 |
messages = [
|
| 44 |
{"role": "user", "content": f"<|image|>
|
| 45 |
{prompt}"}
|
|
|
|
| 60 |
print(response.strip())
|
| 61 |
```
|
| 62 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 63 |
## Citation
|
| 64 |
|
| 65 |
+
If you find this model or the associated research helpful, please cite:
|
| 66 |
|
| 67 |
```bibtex
|
| 68 |
+
@inproceedings{{lim2025chartcap,
|
| 69 |
+
title={{ChartCap: Mitigating Hallucination of Dense Chart Captioning}},
|
| 70 |
+
author={{Junyoung Lim and Jaewoo Ahn and Gunhee Kim}},
|
| 71 |
+
booktitle={{Proceedings of the IEEE/CVF International Conference on Computer Vision}},
|
| 72 |
+
year={{2025}}
|
| 73 |
+
}}
|
|
|
|
| 74 |
```
|