File size: 2,037 Bytes
e31b922
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
01bf1da
e31b922
 
 
 
01bf1da
 
e31b922
 
 
 
 
 
 
 
01bf1da
e31b922
 
 
 
 
 
 
 
 
 
 
 
 
01bf1da
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
---
base_model:
- Qwen/Qwen2.5-Coder-0.5B
---

<br><br>

<p align="center">
<img src="https://huggingface.co/datasets/jinaai/documentation-images/resolve/main/logo.webp" alt="Jina AI: Your Search Foundation, Supercharged!" width="150px">
</p>

<p align="center">
<b>The code embedding model trained by <a href="https://jina.ai/"><b>Jina AI</b></a>.</b>
</p>

# Jina Embeddings c1: A Small but Performant Code Embedding Model

## Intended Usage & Model Info
`jina-embeddings-c1` is an embedding model for code retrieval. 
The model supports various types of code retrieval (natural language-to-code, code-to-code, code-to-natural language, code-to-completion) and technical question answering across 15+ programming languages. 


Built on [Qwen/Qwen2.5-Coder-0.5B](https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B), `jina-embeddings-c1` features:

- **Multilingual support** (15+ programming languages) and compatibility with a wide range of domains, including web development, software development, machine learning, data science, and educational coding problems.
- **Task-specific instruction prefixes** for NL2Code, Code2Code, Code2NL, Code2Completion, and Technical QA, which can be selected at inference time.
- **Flexible embedding size**: dense embeddings are 896-dimensional by default but can be truncated to as low as 64 with minimal performance loss.


Summary of features:

| Feature   | Jina Embeddings C1   |
|------------|------------|
| Base Model | Qwen2.5-Coder-0.5B |
| Supported Tasks | `nl2code`, `code2code`, `code2nl`, `code2completion`, `qa` |
| Model DType | BFloat 16 |
| Max Sequence Length | 32768 |
| Embedding Vector Dimension | 896 |
| Matryoshka dimensions | 64, 128, 256, 512, 896 |
| Pooling Strategy | Last-token pooling |
| Attention Mechanism | FlashAttention2 |

## Training & Evaluation

Please refer to our technical report of jina-embeddings-c1 for training details and benchmarks.

## Contact

Join our [Discord community](https://discord.jina.ai) and chat with other community members about ideas.