Feature Extraction
Model2Vec
Safetensors
English
Portuguese
File size: 988 Bytes
202393c
 
 
 
 
 
 
 
 
 
 
 
 
 
01c8cbc
202393c
01c8cbc
202393c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
---
license: apache-2.0
datasets:
- cnmoro/AllTripletsMsMarco-PTBR
- Tevatron/msmarco-passage-corpus
language:
- en
- pt
library_name: model2vec
base_model:
- nomic-ai/nomic-embed-text-v2-moe
pipeline_tag: feature-extraction
---

This [Model2Vec](https://github.com/MinishLab/model2vec) model was created by using [Tokenlearn](https://github.com/MinishLab/tokenlearn), with [nomic-embed-text-v2-moe](https://huggingface.co/nomic-ai/nomic-embed-text-v2-moe) as a base, trained on around 20M passages (english and portuguese).

The output dimension is 50.

This is supposed to be a more minimalistic version of [cnmoro/static-nomic-eng-ptbr](https://huggingface.co/cnmoro/static-nomic-eng-ptbr)

## Usage

Load this model using the `from_pretrained` method:
```python
from model2vec import StaticModel

# Load a pretrained Model2Vec model
model = StaticModel.from_pretrained("cnmoro/static-nomic-eng-ptbr-tiny")

# Compute text embeddings
embeddings = model.encode(["Example sentence"])
```