mikewang nielsr HF Staff commited on
Commit
f66bfef
·
verified ·
1 Parent(s): 97c7c61

Add library name and pipeline tag (#1)

Browse files

- Add library name and pipeline tag (97a9cdad3ecac3f51efa7f7e4185176af28c93c5)


Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -1,7 +1,9 @@
1
  ---
2
- license: apache-2.0
3
  datasets:
4
  - mikewang/PVD-160K
 
 
 
5
  ---
6
 
7
  <h1 align="center"> Text-Based Reasoning About Vector Graphics </h1>
@@ -19,7 +21,6 @@ datasets:
19
 
20
  </p>
21
 
22
-
23
  We observe that current *large multimodal models (LMMs)* still struggle with seemingly straightforward reasoning tasks that require precise perception of low-level visual details, such as identifying spatial relations or solving simple mazes. In particular, this failure mode persists in question-answering tasks about vector graphics—images composed purely of 2D objects and shapes.
24
 
25
  ![Teaser](https://github.com/MikeWangWZHL/VDLM/blob/main/figures/teaser.png?raw=true)
 
1
  ---
 
2
  datasets:
3
  - mikewang/PVD-160K
4
+ license: apache-2.0
5
+ library_name: transformers
6
+ pipeline_tag: image-to-text
7
  ---
8
 
9
  <h1 align="center"> Text-Based Reasoning About Vector Graphics </h1>
 
21
 
22
  </p>
23
 
 
24
  We observe that current *large multimodal models (LMMs)* still struggle with seemingly straightforward reasoning tasks that require precise perception of low-level visual details, such as identifying spatial relations or solving simple mazes. In particular, this failure mode persists in question-answering tasks about vector graphics—images composed purely of 2D objects and shapes.
25
 
26
  ![Teaser](https://github.com/MikeWangWZHL/VDLM/blob/main/figures/teaser.png?raw=true)