yu3733 commited on
Commit
6a7e08b
·
verified ·
1 Parent(s): 8f1072c

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +57 -0
README.md ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - paligemma
4
+ - lora
5
+ - adapter
6
+ - visual-question-answering
7
+ - image-to-text
8
+ base_model: google/paligemma2-3b-mix-224
9
+ widget:
10
+ - text: "<image>\nQuestion: What is in this image?\nAnswer:"
11
+ ---
12
+
13
+ # paligemma2-3b-lora-vqa-d1000-r24
14
+
15
+ This is a LoRA adapter for PaliGemma-2 3B trained on VQA tasks.
16
+
17
+ ## Usage
18
+
19
+ ```python
20
+ from transformers import AutoProcessor, AutoModelForCausalLM
21
+ from peft import PeftModel
22
+ import torch
23
+
24
+ # Base model
25
+ base_model_id = "google/paligemma2-3b-mix-224"
26
+ adapter_id = "yu3733/paligemma2-3b-lora-vqa-d1000-r24"
27
+
28
+ # Load processor
29
+ processor = AutoProcessor.from_pretrained(base_model_id)
30
+
31
+ # Load base model
32
+ model = AutoModelForCausalLM.from_pretrained(
33
+ base_model_id,
34
+ torch_dtype=torch.float16,
35
+ device_map="auto"
36
+ )
37
+
38
+ # Load LoRA adapter
39
+ model = PeftModel.from_pretrained(model, adapter_id)
40
+
41
+ # Inference
42
+ prompt = "<image>\nQuestion: What is in this image?\nAnswer:"
43
+ inputs = processor(text=prompt, images=image, return_tensors="pt")
44
+ outputs = model.generate(**inputs, max_new_tokens=20)
45
+ print(processor.decode(outputs[0], skip_special_tokens=True))
46
+ ```
47
+
48
+ ## Training Details
49
+
50
+ - Base Model: google/paligemma2-3b-mix-224
51
+ - Training Data: VizWiz VQA Dataset
52
+ - LoRA Rank: 24
53
+ - Training Framework: PEFT + Transformers
54
+
55
+ ## License
56
+
57
+ Same as the base model (see google/paligemma2-3b-mix-224)