jupyterjazz commited on
Commit
3a7d083
·
1 Parent(s): 18e0a83

add README

Browse files

Signed-off-by: jupyterjazz <[email protected]>

Files changed (1) hide show
  1. README.md +95 -0
README.md ADDED
@@ -0,0 +1,95 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <br><br>
2
+
3
+ <p align="center">
4
+ <img src="https://huggingface.co/datasets/jinaai/documentation-images/resolve/main/logo.webp" alt="Jina AI: Your Search Foundation, Supercharged!" width="150px">
5
+ </p>
6
+
7
+
8
+ <p align="center">
9
+ <b>The embedding model trained by <a href="https://jina.ai/"><b>Jina AI</b></a>.</b>
10
+ </p>
11
+
12
+ # [Jina Embeddings v4]((https://huggingface.co/jinaai/jina-embeddings-v4)): Universal Embeddings for Multimodal Multilingual Retrieval
13
+
14
+
15
+ [Blog](https://jina.ai/news/jina-embeddings-v4-universal-embeddings-for-multimodal-multilingual-retrieval) | [Technical Report](https://arxiv.org/abs/2506.18902) | [API](https://jina.ai/embeddings)
16
+
17
+
18
+ ## Model Overview
19
+
20
+ This repository hosts a vLLM-compatible version of [`jina-embeddings-v4`](https://huggingface.co/jinaai/jina-embeddings-v4) with the retrieval adapter merged into the base `Qwen2.5-VL` weights. This architecture modification enables native compatibility with vLLM without requiring custom adapter-handling code.
21
+
22
+
23
+ ## Usage
24
+
25
+ ```python
26
+ import torch
27
+ from PIL import Image
28
+
29
+ from vllm import LLM
30
+ from vllm.config import PoolerConfig
31
+ from vllm.inputs.data import TextPrompt
32
+
33
+ # Initialize model
34
+ model = LLM(
35
+ model="jinaai/jina-embeddings-v4-vllm-retrieval",
36
+ task="embed",
37
+ enforce_eager=True,
38
+ override_pooler_config=PoolerConfig(pooling_type="ALL", normalize=False),
39
+ dtype="float16",
40
+ )
41
+
42
+ # Create text prompts
43
+ query = "Overview of climate change impacts on coastal cities"
44
+ query_prompt = TextPrompt(
45
+ prompt=f"Query: {query}"
46
+ )
47
+
48
+ passage = "The impacts of climate change on coastal cities are significant.."
49
+ passage_prompt = TextPrompt(
50
+ prompt=f"Passage: {passage}"
51
+ )
52
+
53
+ # Create image prompt
54
+ image = Image.open("<path_to_image>")
55
+ image_prompt = TextPrompt(
56
+ prompt="<|im_start|>user\n<|vision_start|><|image_pad|><|vision_end|>Describe the image.<|im_end|>\n",
57
+ multi_modal_data={"image": image},
58
+ )
59
+
60
+ # Encode all prompts
61
+ prompts = [query_prompt, passage_prompt, image_prompt]
62
+ outputs = model.encode(prompts)
63
+
64
+
65
+ def get_embeddings(outputs):
66
+ VISION_START_TOKEN_ID, VISION_END_TOKEN_ID = 151652, 151653
67
+
68
+ embeddings = []
69
+ for output in outputs:
70
+ if VISION_START_TOKEN_ID in output.prompt_token_ids:
71
+ # Gather only vision tokens
72
+ img_start_pos = torch.where(
73
+ torch.tensor(output.prompt_token_ids) == VISION_START_TOKEN_ID
74
+ )[0][0]
75
+ img_end_pos = torch.where(
76
+ torch.tensor(output.prompt_token_ids) == VISION_END_TOKEN_ID
77
+ )[0][0]
78
+ embeddings_tensor = output.outputs.data.detach().clone()[
79
+ img_start_pos : img_end_pos + 1
80
+ ]
81
+ else:
82
+ # Use all tokens for text-only prompts
83
+ embeddings_tensor = output.outputs.data.detach().clone()
84
+
85
+ # Pool and normalize embeddings
86
+ pooled_output = (
87
+ embeddings_tensor.sum(dim=0, dtype=torch.float32)
88
+ / embeddings_tensor.shape[0]
89
+ )
90
+ embeddings.append(torch.nn.functional.normalize(pooled_output, dim=-1))
91
+ return embeddings
92
+
93
+ embeddings = get_embeddings(outputs)
94
+ ```
95
+