nielsr HF Staff commited on
Commit
774ee03
·
verified ·
1 Parent(s): f4c9b7f

Improve model card: Add pipeline tag, library, links, and usage example

Browse files

This PR significantly enhances the model card for **PAPO: Perception-Aware Policy Optimization for Multimodal Reasoning** by:

- Adding the `pipeline_tag: image-text-to-text`, which accurately categorizes the model as a multimodal large language model capable of processing images and text to generate text. This improves its discoverability on the Hugging Face Hub (e.g., at https://huggingface.co/models?pipeline_tag=image-text-to-text).
- Specifying `library_name: transformers`, enabling the convenient "Use in Transformers" widget directly on the model page and providing standard loading instructions for users.
- Updating the paper link to the official Hugging Face Papers page (https://huggingface.co/papers/2507.06448) for better integration within the Hub's ecosystem.
- Including direct links to the project page (https://mikewangwzhl.github.io/PAPO/) and the GitHub repository (https://github.com/mikewangwzhl/PAPO) for users to find more context and the source code.
- Adding a practical Python code snippet for inference using the `transformers` library, allowing users to quickly get started with the model.

These updates aim to provide a more comprehensive, user-friendly, and discoverable model card.

Files changed (1) hide show
  1. README.md +47 -5
README.md CHANGED
@@ -1,14 +1,56 @@
1
  ---
2
- license: mit
3
  datasets:
4
  - PAPOGalaxy/PAPO_train
 
 
 
5
  ---
6
 
 
7
 
8
- # PAPO Model
9
 
10
- ## Model Source
11
- This is the official model released for our paper **PAPO: Perception-Aware Policy Optimization for Multimodal Reasoning** (arxiv.org/abs/2507.06448)
12
 
13
  ## Model Version
14
- PAPO (γ=0.01)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  datasets:
3
  - PAPOGalaxy/PAPO_train
4
+ license: mit
5
+ pipeline_tag: image-text-to-text
6
+ library_name: transformers
7
  ---
8
 
9
+ # PAPO: Perception-Aware Policy Optimization for Multimodal Reasoning
10
 
11
+ This is the official model released for our paper [Perception-Aware Policy Optimization for Multimodal Reasoning](https://huggingface.co/papers/2507.06448).
12
 
13
+ **Project Page:** [https://mikewangwzhl.github.io/PAPO/](https://mikewangwzhl.github.io/PAPO/)
14
+ **Code:** [https://github.com/mikewangwzhl/PAPO](https://github.com/mikewangwzhl/PAPO)
15
 
16
  ## Model Version
17
+ PAPO (γ=0.01)
18
+
19
+ ## Usage
20
+
21
+ You can use this model with the Hugging Face `transformers` library.
22
+
23
+ ```python
24
+ from transformers import AutoProcessor, AutoModelForCausalLM
25
+ from PIL import Image
26
+ import requests
27
+
28
+ # Replace "PAPOGalaxy/PAPO" with the actual model ID if different
29
+ # For example, if it's PAPOGalaxy/PAPO-7B or PAPOGalaxy/PAPO-3B
30
+ model_id = "PAPOGalaxy/PAPO"
31
+
32
+ processor = AutoProcessor.from_pretrained(model_id)
33
+ model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
34
+
35
+ # Example image (replace with your own image path or URL)
36
+ image_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/bee.JPG"
37
+ image = Image.open(requests.get(image_url, stream=True).raw)
38
+
39
+ # Example prompt
40
+ prompt = "What is in the image?"
41
+
42
+ # Prepare inputs following the model's chat template
43
+ messages = [
44
+ {"role": "user", "content": [
45
+ {"type": "image", "image": image},
46
+ {"type": "text", "text": prompt}
47
+ ]}
48
+ ]
49
+ text = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
50
+ inputs = processor(text=text, images=image, return_tensors="pt").to(model.device)
51
+
52
+ # Generate response
53
+ generated_ids = model.generate(**inputs, max_new_tokens=100)
54
+ generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
55
+ print(generated_text)
56
+ ```