mikewang
/

PVD-160k-Mistral-7b

text-generation

text-generation-inference

Model card Files Files and versions

Add library name and pipeline tag

#1

by nielsr HF Staff - opened Jun 13

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

Files changed (1) hide show

README.md +3 -2

README.md CHANGED Viewed

@@ -1,7 +1,9 @@
 ---
-license: apache-2.0
 datasets:
 - mikewang/PVD-160K
 ---
 <h1 align="center"> Text-Based Reasoning About Vector Graphics </h1>
@@ -19,7 +21,6 @@ datasets:
 </p>
 We observe that current *large multimodal models (LMMs)* still struggle with seemingly straightforward reasoning tasks that require precise perception of low-level visual details, such as identifying spatial relations or solving simple mazes. In particular, this failure mode persists in question-answering tasks about vector graphics—images composed purely of 2D objects and shapes.
 ![Teaser](https://github.com/MikeWangWZHL/VDLM/blob/main/figures/teaser.png?raw=true)

 ---
 datasets:
 - mikewang/PVD-160K
+license: apache-2.0
+library_name: transformers
+pipeline_tag: image-to-text
 ---
 <h1 align="center"> Text-Based Reasoning About Vector Graphics </h1>
 </p>
 We observe that current *large multimodal models (LMMs)* still struggle with seemingly straightforward reasoning tasks that require precise perception of low-level visual details, such as identifying spatial relations or solving simple mazes. In particular, this failure mode persists in question-answering tasks about vector graphics—images composed purely of 2D objects and shapes.
 ![Teaser](https://github.com/MikeWangWZHL/VDLM/blob/main/figures/teaser.png?raw=true)