jinaai
/

jina-embeddings-v2-base-code

Feature Extraction

sentence-transformers

Transformers.js

sentence-similarity

text-embeddings-inference

🇪🇺 Region: EU

Model card Files Files and versions

bwang0911 commited on Nov 17, 2023

Commit

9cf3ad7

·

1 Parent(s): 38ca931

Update README.md

Files changed (1) hide show

README.md +4 -2

README.md CHANGED Viewed

@@ -21,8 +21,10 @@ tags:
 ## Intended Usage & Model Info
-`jina-embeddings-v2-base-code` is an multilingual **embedding model** speaks English and 29 widely used programming languages supporting **8192 sequence length**.
-It is based on a Bert architecture (JinaBert) that supports the symmetric bidirectional variant of [ALiBi](https://arxiv.org/abs/2108.12409) to allow longer sequence length.
 The backbone `jina-bert-v2-base-code` is pretrained on the [github-code](https://huggingface.co/datasets/codeparrot/github-code) dataset.
 The model is further trained on Jina AI's collection of more than 150 millions of coding question answer and docstring source code pairs.
 These pairs were obtained from various domains and were carefully selected through a thorough cleaning process.

 ## Intended Usage & Model Info
+`jina-embeddings-v2-base-code` is an multilingual **embedding model** speaks **English and 30 widely used programming languages**.
+Similar as other jina-embeddings-v2 series models, it supports **8192 sequence length**.
+`jina-embeddings-v2-base-code` is based on a Bert architecture (JinaBert) that supports the symmetric bidirectional variant of [ALiBi](https://arxiv.org/abs/2108.12409) to allow longer sequence length.
 The backbone `jina-bert-v2-base-code` is pretrained on the [github-code](https://huggingface.co/datasets/codeparrot/github-code) dataset.
 The model is further trained on Jina AI's collection of more than 150 millions of coding question answer and docstring source code pairs.
 These pairs were obtained from various domains and were carefully selected through a thorough cleaning process.