Update README.md
Browse files
README.md
CHANGED
|
@@ -21,8 +21,10 @@ tags:
|
|
| 21 |
|
| 22 |
## Intended Usage & Model Info
|
| 23 |
|
| 24 |
-
`jina-embeddings-v2-base-code` is an multilingual **embedding model** speaks English and
|
| 25 |
-
|
|
|
|
|
|
|
| 26 |
The backbone `jina-bert-v2-base-code` is pretrained on the [github-code](https://huggingface.co/datasets/codeparrot/github-code) dataset.
|
| 27 |
The model is further trained on Jina AI's collection of more than 150 millions of coding question answer and docstring source code pairs.
|
| 28 |
These pairs were obtained from various domains and were carefully selected through a thorough cleaning process.
|
|
|
|
| 21 |
|
| 22 |
## Intended Usage & Model Info
|
| 23 |
|
| 24 |
+
`jina-embeddings-v2-base-code` is an multilingual **embedding model** speaks **English and 30 widely used programming languages**.
|
| 25 |
+
Similar as other jina-embeddings-v2 series models, it supports **8192 sequence length**.
|
| 26 |
+
|
| 27 |
+
`jina-embeddings-v2-base-code` is based on a Bert architecture (JinaBert) that supports the symmetric bidirectional variant of [ALiBi](https://arxiv.org/abs/2108.12409) to allow longer sequence length.
|
| 28 |
The backbone `jina-bert-v2-base-code` is pretrained on the [github-code](https://huggingface.co/datasets/codeparrot/github-code) dataset.
|
| 29 |
The model is further trained on Jina AI's collection of more than 150 millions of coding question answer and docstring source code pairs.
|
| 30 |
These pairs were obtained from various domains and were carefully selected through a thorough cleaning process.
|