|
--- |
|
base_model: |
|
- jinaai/jina-code-embeddings-0.5b |
|
base_model_relation: quantized |
|
license: cc-by-nc-4.0 |
|
--- |
|
|
|
<p align="center"> |
|
<img src="https://huggingface.co/datasets/jinaai/documentation-images/resolve/main/logo.webp" alt="Jina AI: Your Search Foundation, Supercharged!" width="150px"> |
|
</p> |
|
|
|
<p align="center"> |
|
<b>The GGUF version of the code embedding model trained by <a href="https://jina.ai/"><b>Jina AI</b></a>.</b> |
|
</p> |
|
|
|
# Jina Code Embeddings: A Small but Performant Code Embedding Model |
|
|
|
## Intended Usage & Model Info |
|
|
|
`jina-code-embeddings-0.5b-GGUF` is the **GGUF export** of our [jina-code-embeddings-0.5b](https://huggingface.co/jinaai/jina-code-embeddings-0.5b), built on [Qwen/Qwen2.5-Coder-0.5B](https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B). |
|
|
|
The model supports code retrieval and technical QA across **15+ programming languages** and multiple domains, including web development, software development, machine learning, data science, and educational coding problems. |
|
|
|
### Key Features |
|
| Feature | Jina Code Embeddings 0.5B GGUF | |
|
|------------------------|--------------------------------| |
|
| Base Model | Qwen2.5-Coder-0.5B | |
|
| Supported Tasks | `nl2code`, `code2code`, `code2nl`, `code2completion`, `qa` | |
|
| Max Sequence Length | 32768 (**recommended ≤ 8192**) | |
|
| Embedding Vector Dim | **896** | |
|
| Matryoshka Dimensions | 64, 128, 256, 512, 896 (**client-side slice**) | |
|
| Pooling Strategy | **MUST use `--pooling last`** (EOS) | |
|
|
|
> **Matryoshka note:** `llama.cpp` always returns **896-d** embeddings for this model. To use 64/128/256/512, **slice client-side** (e.g., take the first *k* elements). |
|
|
|
--- |
|
|
|
## Task Instructions |
|
|
|
Prefix inputs with task-specific instructions: |
|
|
|
```python |
|
INSTRUCTION_CONFIG = { |
|
"nl2code": { |
|
"query": "Find the most relevant code snippet given the following query:\n", |
|
"passage": "Candidate code snippet:\n" |
|
}, |
|
"qa": { |
|
"query": "Find the most relevant answer given the following question:\n", |
|
"passage": "Candidate answer:\n" |
|
}, |
|
"code2code": { |
|
"query": "Find an equivalent code snippet given the following code snippet:\n", |
|
"passage": "Candidate code snippet:\n" |
|
}, |
|
"code2nl": { |
|
"query": "Find the most relevant comment given the following code snippet:\n", |
|
"passage": "Candidate comment:\n" |
|
}, |
|
"code2completion": { |
|
"query": "Find the most relevant completion given the following start of code snippet:\n", |
|
"passage": "Candidate completion:\n" |
|
} |
|
} |
|
```` |
|
|
|
Use the appropriate prefix for **queries** and **passages** at inference time. |
|
|
|
--- |
|
|
|
## Install `llama.cpp` |
|
|
|
Follow the official instructions: **[https://github.com/ggml-org/llama.cpp](https://github.com/ggml-org/llama.cpp)** |
|
|
|
--- |
|
|
|
## Model files |
|
|
|
Hugging Face repo (GGUF): **[https://huggingface.co/jinaai/jina-code-embeddings-0.5b-GGUF](https://huggingface.co/jinaai/jina-code-embeddings-0.5b-GGUF)** |
|
|
|
Pick a file (e.g., `jina-code-embeddings-0.5b-F16.gguf`). You can either: |
|
|
|
* **auto-download** by passing the **repo and file directly** to `llama.cpp` |
|
* **use a local path** with `-m` |
|
|
|
--- |
|
|
|
|
|
## HTTP service with `llama-server` |
|
|
|
### Auto-download from Hugging Face (repo + file) |
|
|
|
```bash |
|
./llama-server \ |
|
--embedding \ |
|
--hf-repo jinaai/jina-code-embeddings-0.5b-GGUF \ |
|
--hf-file jina-code-embeddings-0.5b-F16.gguf \ |
|
--host 0.0.0.0 \ |
|
--port 8080 \ |
|
--ctx-size 32768 \ |
|
--ubatch-size 8192 \ |
|
--pooling last |
|
``` |
|
|
|
### Local file |
|
|
|
```bash |
|
./llama-server \ |
|
--embedding \ |
|
-m /path/to/jina-code-embeddings-0.5b-F16.gguf \ |
|
--host 0.0.0.0 \ |
|
--port 8080 \ |
|
--ctx-size 32768 \ |
|
--ubatch-size 8192 \ |
|
--pooling last |
|
``` |
|
|
|
> Tips: `-ngl <N>` to offload layers to GPU. Max context is 32768 but stick to `--ubatch-size` ≤ 8192 for best results. |
|
|
|
--- |
|
|
|
## Query examples (HTTP) |
|
|
|
### Native endpoint (`/embedding`) |
|
|
|
```bash |
|
curl -X POST http://localhost:8080/embedding \ |
|
-H "Content-Type: application/json" \ |
|
-d '{ |
|
"content": [ |
|
"Find the most relevant code snippet given the following query:\nprint hello world in python", |
|
"Candidate code snippet:\nprint(\"Hello World!\")" |
|
] |
|
}' |
|
``` |
|
|
|
### OpenAI-compatible (`/v1/embeddings`) |
|
|
|
```bash |
|
curl http://localhost:8080/v1/embeddings \ |
|
-H "Content-Type: application/json" \ |
|
-d '{ |
|
"input": [ |
|
"Find the most relevant code snippet given the following query:\nprint hello world in python", |
|
"Candidate code snippet:\nprint(\"Hello World!\")" |
|
] |
|
}' |
|
``` |
|
|
|
--- |
|
|
|
## Training & Evaluation |
|
|
|
See our technical report: **[https://arxiv.org/abs/2508.21290](https://arxiv.org/abs/2508.21290)** |
|
|
|
--- |
|
|
|
## Contact |
|
|
|
Join our Discord: **[https://discord.jina.ai](https://discord.jina.ai)** |