metadata

base_model:
  - Qwen/Qwen2.5-Coder-0.5B

The code embedding model trained by Jina AI.

Jina Embeddings c1: A Small but Performant Code Embedding Model

Intended Usage & Model Info

jina-embeddings-c1 is an embedding model for code retrieval. The model supports various types of code retrieval (natural language-to-code, code-to-code, code-to-natural language, code-to-completion) and technical question answering across 15+ programming languages.

Built on Qwen/Qwen2.5-Coder-0.5B, jina-embeddings-c1 features:

Multilingual support (15+ programming languages) and compatibility with a wide range of domains, including web development, software development, machine learning, data science, and educational coding problems.
Task-specific instruction prefixes for NL2Code, Code2Code, Code2NL, Code2Completion, and Technical QA, which can be selected at inference time.
Flexible embedding size: dense embeddings are 896-dimensional by default but can be truncated to as low as 64 with minimal performance loss.

Summary of features:

Feature	Jina Embeddings C1
Base Model	Qwen2.5-Coder-0.5B
Supported Tasks	`nl2code`, `code2code`, `code2nl`, `code2completion`, `qa`
Model DType	BFloat 16
Max Sequence Length	32768
Embedding Vector Dimension	896
Matryoshka dimensions	64, 128, 256, 512, 896
Pooling Strategy	Last-token pooling
Attention Mechanism	FlashAttention2

Training & Evaluation

Please refer to our technical report of jina-embeddings-c1 for training details and benchmarks.

Contact

Join our Discord community and chat with other community members about ideas.