dariakryvosheieva's picture
Update README.md
4d12d10 verified
|
raw
history blame
3.1 kB
metadata
base_model:
  - Qwen/Qwen2.5-Coder-0.5B



Jina AI: Your Search Foundation, Supercharged!

The code embedding model trained by Jina AI.

Jina Embeddings c1: A Small but Performant Code Embedding Model

Intended Usage & Model Info

jina-embeddings-c1 is an embedding model for code retrieval. The model supports various types of code retrieval (text-to-code, code-to-code, code-to-text, code-to-completion) and technical question answering across 15+ programming languages.

Built on Qwen/Qwen2.5-Coder-0.5B, jina-embeddings-c1 features:

  • Multilingual support (15+ programming languages) and compatibility with a wide range of domains, including web development, software development, machine learning, data science, and educational coding problems.
  • Task-specific instruction prefixes for NL2Code, Code2Code, Code2NL, Code2Completion, and Technical QA, which can be selected at inference time.
  • Flexible embedding size: dense embeddings are 896-dimensional by default but can be truncated to as low as 64 with minimal performance loss.

Summary of features:

Feature Jina Embeddings C1
Base Model Qwen2.5-Coder-0.5B
Supported Tasks nl2code, code2code, code2nl, code2completion, qa
Model DType BFloat 16
Max Sequence Length 32768
Embedding Vector Dimension 896
Matryoshka dimensions 64, 128, 256, 512, 896
Pooling Strategy Last-token pooling
Attention Mechanism FlashAttention2

Usage

Requirements

The following Python packages are required:

  • transformers>=4.53.0
  • torch>=2.7.1

Optional / Recommended

  • flash-attention: Installing flash-attention is recommended for improved inference speed and efficiency, but not mandatory.
via transformers
# !pip install transformers>=4.53.0 torch>=2.7.1

from transformers import AutoModel
import torch

# Initialize the model
model = AutoModel.from_pretrained("jinaai/jina-embeddings-c1-0.5B", trust_remote_code=True)
model.to("cuda")

# Configure truncate_dim, max_length, batch_size in the encode function if needed

# Encode query
query_embeddings = model.encode(
    ["print hello world in python"],
    task="nl2code",
    prompt_name="query",
)

# Encode passage
passage_embeddings = model.encode(
    ["print('Hello World!')"],
    task="nl2code",
    prompt_name="passage",
)

Training & Evaluation

Please refer to our technical report of jina-embeddings-c1 for training details and benchmarks.

Contact

Join our Discord community and chat with other community members about ideas.