metadata
base_model:
- Qwen/Qwen2.5-Coder-0.5B
The code embedding model trained by Jina AI.
Jina Embeddings c1: A Small but Performant Code Embedding Model
Intended Usage & Model Info
jina-embeddings-c1
is an embedding model for code retrieval.
The model supports various types of code retrieval (natural language-to-code, code-to-code, code-to-natural language, code-to-completion) and technical question answering across 15+ programming languages.
Built on Qwen/Qwen2.5-Coder-0.5B, jina-embeddings-c1
features:
- Multilingual support (15+ programming languages) and compatibility with a wide range of domains, including web development, software development, machine learning, data science, and educational coding problems.
- Task-specific instruction prefixes for NL2Code, Code2Code, Code2NL, Code2Completion, and Technical QA, which can be selected at inference time.
- Flexible embedding size: dense embeddings are 896-dimensional by default but can be truncated to as low as 64 with minimal performance loss.
Summary of features:
Feature | Jina Embeddings C1 |
---|---|
Base Model | Qwen2.5-Coder-0.5B |
Supported Tasks | nl2code , code2code , code2nl , code2completion , qa |
Model DType | BFloat 16 |
Max Sequence Length | 32768 |
Embedding Vector Dimension | 896 |
Matryoshka dimensions | 64, 128, 256, 512, 896 |
Pooling Strategy | Last-token pooling |
Attention Mechanism | FlashAttention2 |
Training & Evaluation
Please refer to our technical report of jina-embeddings-c1 for training details and benchmarks.
Contact
Join our Discord community and chat with other community members about ideas.