Upload folder using huggingface_hub

Browse files

Files changed (3) hide show

README.md +10 -24
configuration_pointsv15_chat.py +0 -6
preprocessor_config.json +4 -2

README.md CHANGED Viewed

@@ -1,17 +1,3 @@
----
-datasets:
-- HuggingFaceM4/Docmatix
-- opendatalab/OmniDocBench
-language:
-- zh
-- en
-base_model:
-- Qwen/Qwen2.5-3B-Instruct
-- WePOINTS/POINTS-Qwen-2-5-7B-Chat
-tags:
-- vision-language
-- document-parsing
----
 <p align="center">
     <img src="images/logo.png" width="700"/>
 <p>
@@ -51,7 +37,7 @@ We are delighted to announce that the WePOINTS family has welcomed a new member:
 ## Results
-For comparison, we use the results reported by [OmniDocBench](https://github.com/opendatalab/OmniDocBench/tree/main) and POINTS-Reader. Compared with the version submitted to EMNLP 2025, the current release provides (1) improved performance and (2) support for Chinese documents. Both enhancements build upon the methods proposed in this paper.
 <table style="width: 92%; margin: auto; border-collapse: collapse;">
 <thead>
@@ -239,8 +225,8 @@ For comparison, we use the results reported by [OmniDocBench](https://github.com
 <td>0.641</td>
 </tr>
 <tr>
-<td rowspan="11">Expert VLMs</td>
-<td><strong style="color: green;">POINTS-Reader-3B</strong></td>
 <td>0.133</td>
 <td>0.212</td>
 <td>0.062</td>
@@ -607,9 +593,9 @@ prompt = (
 image_path = '/path/to/your/local/image'
 model_path = 'tencent/POINTS-Reader'
 model = AutoModelForCausalLM.from_pretrained(model_path,
-                                             trust_remote_code=True,
-                                             torch_dtype=torch.float16,
-                                             device_map='cuda')
 tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
 image_processor = Qwen2ImageProcessorForPOINTSV15.from_pretrained(model_path)
 content = [
@@ -647,8 +633,8 @@ We will create a Pull Request to SGLang, please stay tuned.
 ## Known Issues
-- **Complex Document Parsing**: POINTS-Reader can struggle with complex layouts (e.g., newspapers), often producing repeated or missing content.
-- **Handwritten Document Parsing**: It also has difficulty handling handwritten inputs (e.g., receipts, notes), which can lead to recognition errors or omissions.
 - **Multi-language Document Parsing**: POINTS-Reader currently supports only English and Chinese, limiting its effectiveness on other languages.
 ## Citation
@@ -659,7 +645,7 @@ If you use this model in your work, please cite the following paper:
 @article{points-reader,
   title={POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models for Document Conversion},
   author={Liu, Yuan and Zhongyin Zhao and Tian, Le and Haicheng Wang and Xubing Ye and Yangxiu You and Zilin Yu and Chuhan Wu and  Zhou, Xiao and Yu, Yang and Zhou, Jie},
-  journal={EMNLP2025},
   year={2025}
 }
@@ -683,4 +669,4 @@ If you use this model in your work, please cite the following paper:
   journal={arXiv preprint arXiv:2405.11850},
   year={2024}
 }
-```

 <p align="center">
     <img src="images/logo.png" width="700"/>
 <p>
 ## Results
+We take the following results from [OmniDocBench](https://github.com/opendatalab/OmniDocBench/tree/main) and POINTS-Reader for comparison:
 <table style="width: 92%; margin: auto; border-collapse: collapse;">
 <thead>
 <td>0.641</td>
 </tr>
 <tr>
+<td rowspan="10">Expert VLMs</td>
+<td>POINTS-Reader-3B</td>
 <td>0.133</td>
 <td>0.212</td>
 <td>0.062</td>
 image_path = '/path/to/your/local/image'
 model_path = 'tencent/POINTS-Reader'
 model = AutoModelForCausalLM.from_pretrained(model_path,
+                                                    trust_remote_code=True,
+                                                    torch_dtype=torch.float16,
+                                                    device_map='cuda')
 tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
 image_processor = Qwen2ImageProcessorForPOINTSV15.from_pretrained(model_path)
 content = [
 ## Known Issues
+- **Complex Document Parsing**: POINTS-Reader can struggle with complex layouts (e.g., newspapers), often producing repeated or missing content.
+- **Handwritten Document Parsing**: It also has difficulty handling handwritten inputs (e.g., receipts, notes), which can lead to recognition errors or omissions.
 - **Multi-language Document Parsing**: POINTS-Reader currently supports only English and Chinese, limiting its effectiveness on other languages.
 ## Citation
 @article{points-reader,
   title={POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models for Document Conversion},
   author={Liu, Yuan and Zhongyin Zhao and Tian, Le and Haicheng Wang and Xubing Ye and Yangxiu You and Zilin Yu and Chuhan Wu and  Zhou, Xiao and Yu, Yang and Zhou, Jie},
+  journal={},
   year={2025}
 }
   journal={arXiv preprint arXiv:2405.11850},
   year={2024}
 }
+```

configuration_pointsv15_chat.py CHANGED Viewed

@@ -27,9 +27,3 @@ class POINTSV15ChatConfig(PretrainedConfig):
             self.llm_config = Qwen2Config(**llm_config)
         else:
             self.llm_config = llm_config
-    def to_dict(self) -> Dict[str, Any]:
-        output = copy.deepcopy(self.__dict__)
-        output["vision_config"] = self.vision_config.to_dict()
-        output["llm_config"] = self.llm_config.to_dict()
-        return output

             self.llm_config = Qwen2Config(**llm_config)
         else:
             self.llm_config = llm_config

preprocessor_config.json CHANGED Viewed

@@ -22,8 +22,10 @@
     "rescale_factor": 0.00392156862745098,
     "size": {
         "max_pixels": 12845056,
-        "min_pixels": 3136
     },
     "temporal_patch_size": 2,
     "processor_class": "Qwen2VLProcessor"
-}

     "rescale_factor": 0.00392156862745098,
     "size": {
         "max_pixels": 12845056,
+        "min_pixels": 3136,
+        "longest_edge": 12845056,
+        "shortest_edge": 3136
     },
     "temporal_patch_size": 2,
     "processor_class": "Qwen2VLProcessor"
+}