File size: 4,765 Bytes
7299a13 80c2c91 7299a13 80c2c91 7299a13 6ec7cb9 7299a13 779de26 7299a13 a99df2c 7299a13 a99df2c 7299a13 941797c 7299a13 422869c 7299a13 422869c 7299a13 422869c 7299a13 422869c 7299a13 422869c 7299a13 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 |
---
base_model:
- jinaai/jina-code-embeddings-0.5b
base_model_relation: quantized
license: cc-by-nc-4.0
---
<p align="center">
<img src="https://huggingface.co/datasets/jinaai/documentation-images/resolve/main/logo.webp" alt="Jina AI: Your Search Foundation, Supercharged!" width="150px">
</p>
<p align="center">
<b>The GGUF version of the code embedding model trained by <a href="https://jina.ai/"><b>Jina AI</b></a>.</b>
</p>
# Jina Code Embeddings: A Small but Performant Code Embedding Model
## Intended Usage & Model Info
`jina-code-embeddings-0.5b-GGUF` is the **GGUF export** of our [jina-code-embeddings-0.5b](https://huggingface.co/jinaai/jina-code-embeddings-0.5b), built on [Qwen/Qwen2.5-Coder-0.5B](https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B).
The model supports code retrieval and technical QA across **15+ programming languages** and multiple domains, including web development, software development, machine learning, data science, and educational coding problems.
### Key Features
| Feature | Jina Code Embeddings 0.5B GGUF |
|------------------------|--------------------------------|
| Base Model | Qwen2.5-Coder-0.5B |
| Supported Tasks | `nl2code`, `code2code`, `code2nl`, `code2completion`, `qa` |
| Max Sequence Length | 32768 (**recommended ≤ 8192**) |
| Embedding Vector Dim | **896** |
| Matryoshka Dimensions | 64, 128, 256, 512, 896 (**client-side slice**) |
| Pooling Strategy | **MUST use `--pooling last`** (EOS) |
> **Matryoshka note:** `llama.cpp` always returns **896-d** embeddings for this model. To use 64/128/256/512, **slice client-side** (e.g., take the first *k* elements).
---
## Task Instructions
Prefix inputs with task-specific instructions:
```python
INSTRUCTION_CONFIG = {
"nl2code": {
"query": "Find the most relevant code snippet given the following query:\n",
"passage": "Candidate code snippet:\n"
},
"qa": {
"query": "Find the most relevant answer given the following question:\n",
"passage": "Candidate answer:\n"
},
"code2code": {
"query": "Find an equivalent code snippet given the following code snippet:\n",
"passage": "Candidate code snippet:\n"
},
"code2nl": {
"query": "Find the most relevant comment given the following code snippet:\n",
"passage": "Candidate comment:\n"
},
"code2completion": {
"query": "Find the most relevant completion given the following start of code snippet:\n",
"passage": "Candidate completion:\n"
}
}
````
Use the appropriate prefix for **queries** and **passages** at inference time.
---
## Install `llama.cpp`
Follow the official instructions: **[https://github.com/ggml-org/llama.cpp](https://github.com/ggml-org/llama.cpp)**
---
## Model files
Hugging Face repo (GGUF): **[https://huggingface.co/jinaai/jina-code-embeddings-0.5b-GGUF](https://huggingface.co/jinaai/jina-code-embeddings-0.5b-GGUF)**
Pick a file (e.g., `jina-code-embeddings-0.5b-F16.gguf`). You can either:
* **auto-download** by passing the **repo and file directly** to `llama.cpp`
* **use a local path** with `-m`
---
## HTTP service with `llama-server`
### Auto-download from Hugging Face (repo + file)
```bash
./llama-server \
--embedding \
--hf-repo jinaai/jina-code-embeddings-0.5b-GGUF \
--hf-file jina-code-embeddings-0.5b-F16.gguf \
--host 0.0.0.0 \
--port 8080 \
--ctx-size 32768 \
--ubatch-size 8192 \
--pooling last
```
### Local file
```bash
./llama-server \
--embedding \
-m /path/to/jina-code-embeddings-0.5b-F16.gguf \
--host 0.0.0.0 \
--port 8080 \
--ctx-size 32768 \
--ubatch-size 8192 \
--pooling last
```
> Tips: `-ngl <N>` to offload layers to GPU. Max context is 32768 but stick to `--ubatch-size` ≤ 8192 for best results.
---
## Query examples (HTTP)
### Native endpoint (`/embedding`)
```bash
curl -X POST http://localhost:8080/embedding \
-H "Content-Type: application/json" \
-d '{
"content": [
"Find the most relevant code snippet given the following query:\nprint hello world in python",
"Candidate code snippet:\nprint(\"Hello World!\")"
]
}'
```
### OpenAI-compatible (`/v1/embeddings`)
```bash
curl http://localhost:8080/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"input": [
"Find the most relevant code snippet given the following query:\nprint hello world in python",
"Candidate code snippet:\nprint(\"Hello World!\")"
]
}'
```
---
## Training & Evaluation
See our technical report: **[https://arxiv.org/abs/2508.21290](https://arxiv.org/abs/2508.21290)**
---
## Contact
Join our Discord: **[https://discord.jina.ai](https://discord.jina.ai)** |