GGUF
🇪🇺 Region: EU
File size: 4,765 Bytes
7299a13
 
80c2c91
 
7299a13
 
 
 
 
 
 
 
80c2c91
7299a13
 
6ec7cb9
7299a13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
779de26
7299a13
 
 
 
 
 
 
a99df2c
7299a13
a99df2c
 
7299a13
 
 
 
941797c
7299a13
 
 
 
 
 
 
422869c
7299a13
 
422869c
 
7299a13
 
 
 
 
 
 
 
422869c
7299a13
 
422869c
 
7299a13
 
 
422869c
7299a13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
---
base_model:
- jinaai/jina-code-embeddings-0.5b
base_model_relation: quantized
license: cc-by-nc-4.0
---

<p align="center">
 <img src="https://huggingface.co/datasets/jinaai/documentation-images/resolve/main/logo.webp" alt="Jina AI: Your Search Foundation, Supercharged!" width="150px">
</p>

<p align="center">
 <b>The GGUF version of the code embedding model trained by <a href="https://jina.ai/"><b>Jina AI</b></a>.</b>
</p>

# Jina Code Embeddings: A Small but Performant Code Embedding Model

## Intended Usage & Model Info

`jina-code-embeddings-0.5b-GGUF` is the **GGUF export** of our [jina-code-embeddings-0.5b](https://huggingface.co/jinaai/jina-code-embeddings-0.5b), built on [Qwen/Qwen2.5-Coder-0.5B](https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B).

The model supports code retrieval and technical QA across **15+ programming languages** and multiple domains, including web development, software development, machine learning, data science, and educational coding problems.

### Key Features
| Feature                | Jina Code Embeddings 0.5B GGUF |
|------------------------|--------------------------------|
| Base Model             | Qwen2.5-Coder-0.5B             |
| Supported Tasks        | `nl2code`, `code2code`, `code2nl`, `code2completion`, `qa` |
| Max Sequence Length    | 32768 (**recommended ≤ 8192**) |
| Embedding Vector Dim   | **896**                        |
| Matryoshka Dimensions  | 64, 128, 256, 512, 896 (**client-side slice**) |
| Pooling Strategy       | **MUST use `--pooling last`** (EOS) |

> **Matryoshka note:** `llama.cpp` always returns **896-d** embeddings for this model. To use 64/128/256/512, **slice client-side** (e.g., take the first *k* elements).

---

## Task Instructions

Prefix inputs with task-specific instructions:

```python
INSTRUCTION_CONFIG = {
  "nl2code": {
    "query": "Find the most relevant code snippet given the following query:\n",
    "passage": "Candidate code snippet:\n"
  },
  "qa": {
    "query": "Find the most relevant answer given the following question:\n",
    "passage": "Candidate answer:\n"
  },
  "code2code": {
    "query": "Find an equivalent code snippet given the following code snippet:\n",
    "passage": "Candidate code snippet:\n"
  },
  "code2nl": {
    "query": "Find the most relevant comment given the following code snippet:\n",
    "passage": "Candidate comment:\n"
  },
  "code2completion": {
    "query": "Find the most relevant completion given the following start of code snippet:\n",
    "passage": "Candidate completion:\n"
  }
}
````

Use the appropriate prefix for **queries** and **passages** at inference time.

---

## Install `llama.cpp`

Follow the official instructions: **[https://github.com/ggml-org/llama.cpp](https://github.com/ggml-org/llama.cpp)**

---

## Model files

Hugging Face repo (GGUF): **[https://huggingface.co/jinaai/jina-code-embeddings-0.5b-GGUF](https://huggingface.co/jinaai/jina-code-embeddings-0.5b-GGUF)**

Pick a file (e.g., `jina-code-embeddings-0.5b-F16.gguf`). You can either:

* **auto-download** by passing the **repo and file directly** to `llama.cpp`
* **use a local path** with `-m`

---


## HTTP service with `llama-server`

### Auto-download from Hugging Face (repo + file)

```bash
./llama-server \
  --embedding \
  --hf-repo jinaai/jina-code-embeddings-0.5b-GGUF \
  --hf-file jina-code-embeddings-0.5b-F16.gguf \
  --host 0.0.0.0 \
  --port 8080 \
  --ctx-size 32768 \
  --ubatch-size 8192 \
  --pooling last
```

### Local file

```bash
./llama-server \
  --embedding \
  -m /path/to/jina-code-embeddings-0.5b-F16.gguf \
  --host 0.0.0.0 \
  --port 8080 \
  --ctx-size 32768 \
  --ubatch-size 8192 \
  --pooling last
```

> Tips: `-ngl <N>` to offload layers to GPU. Max context is 32768 but stick to `--ubatch-size` ≤ 8192 for best results.

---

## Query examples (HTTP)

### Native endpoint (`/embedding`)

```bash
curl -X POST http://localhost:8080/embedding \
  -H "Content-Type: application/json" \
  -d '{
        "content": [
          "Find the most relevant code snippet given the following query:\nprint hello world in python",
          "Candidate code snippet:\nprint(\"Hello World!\")"
        ]
      }'
```

### OpenAI-compatible (`/v1/embeddings`)

```bash
curl http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
        "input": [
          "Find the most relevant code snippet given the following query:\nprint hello world in python",
          "Candidate code snippet:\nprint(\"Hello World!\")"
        ]
      }'
```

---

## Training & Evaluation

See our technical report: **[https://arxiv.org/abs/2508.21290](https://arxiv.org/abs/2508.21290)**

---

## Contact

Join our Discord: **[https://discord.jina.ai](https://discord.jina.ai)**