jina-embeddings-v4 / README.md

Update README.md

6f12a27 verified 2 days ago

11.8 kB

	---
	tags:
	- vidore
	- colpali
	- multimodal-embedding
	- multilingual-embedding
	- Text-to-Visual Document (T→VD) retrieval
	- feature-extraction
	- sentence-similarity
	- mteb
	- sentence-transformers
	language:
	- multilingual
	inference: false
	library_name: transformers
	pipeline_tag: visual-document-retrieval
	---
	<br><br>

	<p align="center">
	<img src="https://huggingface.co/datasets/jinaai/documentation-images/resolve/main/logo.webp" alt="Jina AI: Your Search Foundation, Supercharged!" width="150px">
	</p>


	<p align="center">
	<b>The embedding model trained by <a href="https://jina.ai/"><b>Jina AI</b></a>.</b>
	</p>

	# Jina Embeddings v4: Universal Embeddings for Multimodal Multilingual Retrieval


	[GGUF](https://github.com/jina-ai/jina-embeddings-v4-gguf) \| [Blog](https://jina.ai/news/jina-embeddings-v4-universal-embeddings-for-multimodal-multilingual-retrieval) \| [Technical Report](https://arxiv.org/abs/2506.18902) \| [API](https://jina.ai/embeddings)


	## Intended Usage & Model Info
	`jina-embeddings-v4` is a universal embedding model for multimodal and multilingual retrieval.
	The model is specially designed for complex document retrieval, including visually rich documents with charts, tables, and illustrations.


	Built on [Qwen/Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct), `jina-embeddings-v4` features:

	- Unified embeddings for text, images, and visual documents, supporting both dense (single-vector) and late-interaction (multi-vector) retrieval.
	- Multilingual support (30+ languages) and compatibility with a wide range of domains, including technical and visually complex documents.
	- Task-specific adapters for retrieval, text matching, and code-related tasks, which can be selected at inference time.
	- Flexible embedding size: dense embeddings are 2048 dimensions by default but can be truncated to as low as 128 with minimal performance loss.


	Summary of features:

	\| Feature \| Jina Embeddings V4 \|
	\|------------\|------------\|
	\| Base Model \| Qwen2.5-VL-3B-Instruct \|
	\| Supported Tasks \| `retrieval`, `text-matching`, `code` \|
	\| Model DType \| BFloat 16 \|
	\| Max Sequence Length \| 32768 \|
	\| Single-Vector Dimension \| 2048 \|
	\| Multi-Vector Dimension \| 128 \|
	\| Matryoshka dimensions \| 128, 256, 512, 1024, 2048 \|
	\| Pooling Strategy \| Mean pooling \|
	\| Attention Mechanism \| FlashAttention2 \|



	## Training & Evaluation

	Please refer to our [technical report of jina-embeddings-v4](https://arxiv.org/abs/2506.18902) for training details and benchmarks.


	## Usage

	<details>
	<summary>Requirements</a></summary>

	The following Python packages are required:

	- `transformers>=4.52.0`
	- `torch>=2.6.0`
	- `peft>=0.15.2`
	- `torchvision`
	- `pillow`

	### Optional / Recommended
	- flash-attention: Installing [flash-attention](https://github.com/Dao-AILab/flash-attention) is recommended for improved inference speed and efficiency, but not mandatory.
	- sentence-transformers: If you want to use the model via the `sentence-transformers` interface, install this package as well.

	</details>


	<details>
	<summary>via <a href="https://jina.ai/embeddings/">Jina AI Embeddings API</a></summary>


	```bash
	curl https://api.jina.ai/v1/embeddings \
	-H "Content-Type: application/json" \
	-H "Authorization: Bearer $JINA_AI_API_TOKEN" \
	-d @- <<EOFEOF
	{
	"model": "jina-embeddings-v4",
	"task": "text-matching",
	"input": [
	{
	"text": "غروب جميل على الشاطئ"
	},
	{
	"text": "海滩上美丽的日落"
	},
	{
	"text": "A beautiful sunset over the beach"
	},
	{
	"text": "Un beau coucher de soleil sur la plage"
	},
	{
	"text": "Ein wunderschöner Sonnenuntergang am Strand"
	},
	{
	"text": "Ένα όμορφο ηλιοβασίλεμα πάνω από την παραλία"
	},
	{
	"text": "समुद्र तट पर एक खूबसूरत सूर्यास्त"
	},
	{
	"text": "Un bellissimo tramonto sulla spiaggia"
	},
	{
	"text": "浜辺に沈む美しい夕日"
	},
	{
	"text": "해변 위로 아름다운 일몰"
	},
	{
	"image": "https://i.ibb.co/nQNGqL0/beach1.jpg"
	},
	{
	"image": "https://i.ibb.co/r5w8hG8/beach2.jpg"
	}
	]
	}
	EOFEOF
	```

	</details>

	<details>
	<summary>via <a href="https://huggingface.co/docs/transformers/en/index">transformers</a></summary>

	```python
	# !pip install transformers>=4.52.0 torch>=2.6.0 peft>=0.15.2 torchvision pillow
	# !pip install
	from transformers import AutoModel
	import torch

	# Initialize the model
	model = AutoModel.from_pretrained("jinaai/jina-embeddings-v4", trust_remote_code=True, torch_dtype=torch.float16)

	model.to("cuda")

	# ========================
	# 1. Retrieval Task
	# ========================
	# Configure truncate_dim, max_length (for texts), max_pixels (for images), vector_type, batch_size in the encode function if needed

	# Encode query
	query_embeddings = model.encode_text(
	texts=["Overview of climate change impacts on coastal cities"],
	task="retrieval",
	prompt_name="query",
	)

	# Encode passage (text)
	passage_embeddings = model.encode_text(
	texts=[
	"Climate change has led to rising sea levels, increased frequency of extreme weather events..."
	],
	task="retrieval",
	prompt_name="passage",
	)

	# Encode image/document
	image_embeddings = model.encode_image(
	images=["https://i.ibb.co/nQNGqL0/beach1.jpg"],
	task="retrieval",
	)

	# ========================
	# 2. Text Matching Task
	# ========================
	texts = [
	"غروب جميل على الشاطئ", # Arabic
	"海滩上美丽的日落", # Chinese
	"Un beau coucher de soleil sur la plage", # French
	"Ein wunderschöner Sonnenuntergang am Strand", # German
	"Ένα όμορφο ηλιοβασίλεμα πάνω από την παραλία", # Greek
	"समुद्र तट पर एक खूबसूरत सूर्यास्त", # Hindi
	"Un bellissimo tramonto sulla spiaggia", # Italian
	"浜辺に沈む美しい夕日", # Japanese
	"해변 위로 아름다운 일몰", # Korean
	]

	text_embeddings = model.encode_text(texts=texts, task="text-matching")

	# ========================
	# 3. Code Understanding Task
	# ========================

	# Encode query
	query_embedding = model.encode_text(
	texts=["Find a function that prints a greeting message to the console"],
	task="code",
	prompt_name="query",
	)

	# Encode code
	code_embeddings = model.encode_text(
	texts=["def hello_world():\n print('Hello, World!')"],
	task="code",
	prompt_name="passage",
	)

	# ========================
	# 4. Use multivectors
	# ========================

	multivector_embeddings = model.encode_text(
	texts=texts,
	task="retrieval",
	prompt_name="query",
	return_multivector=True,
	)

	images = ["https://i.ibb.co/nQNGqL0/beach1.jpg", "https://i.ibb.co/r5w8hG8/beach2.jpg"]
	multivector_image_embeddings = model.encode_image(
	images=images,
	task="retrieval",
	return_multivector=True,
	)
	```
	</details>

	<details>
	<summary>via <a href="https://sbert.net/">sentence-transformers</a></summary>

	```python
	from sentence_transformers import SentenceTransformer

	# Initialize the model
	model = SentenceTransformer("jinaai/jina-embeddings-v4", trust_remote_code=True)
	# ========================
	# 1. Retrieval Task
	# ========================
	# Encode query
	query_embeddings = model.encode(
	sentences=["Overview of climate change impacts on coastal cities"],
	task="retrieval",
	prompt_name="query",
	)

	print(f"query_embeddings.shape = {query_embeddings.shape}")

	# Encode passage (text)
	passage_embeddings = model.encode(
	sentences=[
	"Climate change has led to rising sea levels, increased frequency of extreme weather events..."
	],
	task="retrieval",
	prompt_name="passage",
	)

	print(f"passage_embeddings.shape = {passage_embeddings.shape}")

	# Encode image/document
	image_embeddings = model.encode(
	sentences=["https://i.ibb.co/nQNGqL0/beach1.jpg"],
	task="retrieval",
	)

	print(f"image_embeddings.shape = {image_embeddings.shape}")

	# ========================
	# 2. Text Matching Task
	# ========================
	texts = [
	"غروب جميل على الشاطئ", # Arabic
	"海滩上美丽的日落", # Chinese
	"Un beau coucher de soleil sur la plage", # French
	"Ein wunderschöner Sonnenuntergang am Strand", # German
	"Ένα όμορφο ηλιοβασίλεμα πάνω από την παραλία", # Greek
	"समुद्र तट पर एक खूबसूरत सूर्यास्त", # Hindi
	"Un bellissimo tramonto sulla spiaggia", # Italian
	"浜辺に沈む美しい夕日", # Japanese
	"해변 위로 아름다운 일몰", # Korean
	]

	text_embeddings = model.encode(sentences=texts, task="text-matching")

	# ========================
	# 3. Code Understanding Task
	# ========================

	# Encode query
	query_embeddings = model.encode(
	sentences=["Find a function that prints a greeting message to the console"],
	task="code",
	prompt_name="query",
	)

	# Encode code
	code_embeddings = model.encode(
	sentences=["def hello_world():\n print('Hello, World!')"],
	task="code",
	prompt_name="passage",
	)

	# ========================
	# 4. Use multivectors
	# ========================
	# If you want to use multi-vector embeddings, please use the Hugging Face model directly.
	```
	</details>

	<details>
	<summary>via <a href="https://github.com/vllm-project/vllm">vLLM</a></summary>

	We provide separate model versions for each task (`retrieval`, `text-matching`, `code`) where specific adapter is merged into the base `Qwen2.5-VL` weights.
	This modification enables native compatibility with vLLM.

	Instructions and usage examples for each task are available in their respective directories:
	- [jina-embeddings-v4-vllm-retrieval](https://huggingface.co/jinaai/jina-embeddings-v4-vllm-retrieval)
	- [jina-embeddings-v4-vllm-text-matching](https://huggingface.co/jinaai/jina-embeddings-v4-vllm-text-matching)
	- [jina-embeddings-v4-vllm-code](https://huggingface.co/jinaai/jina-embeddings-v4-vllm-code)

	Please refer to the directory that matches your task for more details.

	</details>


	## Jina-VDR
	Alongside `jina-embeddings-v4`, we’re releasing [Jina VDR](https://github.com/jina-ai/jina-vdr), a multilingual, multi-domain benchmark for visual document retrieval. The task collection can be viewed [here](https://huggingface.co/collections/jinaai/jinavdr-visual-document-retrieval-684831c022c53b21c313b449), and evaluation instructions can be found [here](https://github.com/jina-ai/jina-vdr).


	## License

	This model was initially released under cc-by-nc-4.0 due to an error.
	The correct license is the Qwen Research License, as this model is derived from Qwen-2.5-VL-3B which is governed by that license.

	## Contact

	Join our [Discord community](https://discord.jina.ai) and chat with other community members about ideas.


	## Citation

	If you find `jina-embeddings-v4` useful in your research, please cite the following paper:
	```
	@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
	title={jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
	author={Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Sedigheh Eslami and Scott Martens and Bo Wang and Nan Wang and Han Xiao},
	year={2025},
	eprint={2506.18902},
	archivePrefix={arXiv},
	primaryClass={cs.AI},
	url={https://arxiv.org/abs/2506.18902},
	}
	```