Feature Extraction
Model2Vec
Safetensors
English
Portuguese
cnmoro's picture
Update README.md
6627240 verified
metadata
license: apache-2.0
datasets:
  - cnmoro/AllTripletsMsMarco-PTBR
  - Tevatron/msmarco-passage-corpus
language:
  - en
  - pt
library_name: model2vec
base_model:
  - nomic-ai/nomic-embed-text-v2-moe
pipeline_tag: feature-extraction

This Model2Vec model was created by using Tokenlearn, with nomic-embed-text-v2-moe as a base, trained on around 3.5M passages (english and portuguese).

I have yet to run any benchmarks on it, but it easily outperforms potion-multilingual-128M on my custom-portuguese-testing-workload-thing.

The output dimension is 512.

Usage

Load this model using the from_pretrained method:

from model2vec import StaticModel

# Load a pretrained Model2Vec model
model = StaticModel.from_pretrained("cnmoro/static-nomic-eng-ptbr")

# Compute text embeddings
embeddings = model.encode(["Example sentence"])