Feature Extraction
sentence-transformers
ONNX
Transformers
fastText
sentence-embeddings
sentence-similarity
semantic-search
vector-search
retrieval-augmented-generation
multilingual
cross-lingual
low-resource
merged-model
combined-model
tokenizer-embedded
tokenizer-integrated
standalone
all-in-one
quantized
int8
int8-quantization
optimized
efficient
fast-inference
low-latency
lightweight
small-model
edge-ready
arm64
edge-device
mobile-device
on-device
mobile-inference
tablet
smartphone
embedded-ai
onnx-runtime
onnx-model
MiniLM
MiniLM-L12-v2
paraphrase
usecase-ready
plug-and-play
production-ready
deployment-ready
real-time
distiluse
🧠 Unified Multilingual Distiluse Text Embedder (ONNX + Tokenizer Merged)
This is a highly optimized, quantized, and fully standalone model for generating sentence embeddings from multilingual text, including Ukrainian, English, Polish, and more.
Built upon distiluse-base-multilingual-cased-v2
, the model has been:
- 🔁 Merged with its tokenizer into a single ONNX file
- ⚙️ Extended with a custom preprocessing layer
- ⚡ Quantized to INT8 and ARM64-ready
- 🧪 Extensively tested across real-world NLP tasks
- 🛠️ Bug-fixed vs the original
sentence-transformers
quantized version that produced inaccurate cosine similarity
🚀 Key Features
- 🧩 Single-file architecture: no need for external tokenizer, vocab, or
transformers
library. - ⚡ 93% faster inference on mobile compared to the original model.
- 🗣️ Multilingual: robust across many languages, including low-resource ones.
- 🧠 Output = pure embeddings: pass a string, get a 768-dim vector. That’s it.
- 🛠️ Ready for production: small, fast, accurate, and easy to integrate.
- 📱 Ideal for edge-AI, mobile, and offline scenarios.
🤖 Author @vlad-m-dev Built for edge-ai/phone/tablet offline Telegram: https://t.me/dwight_schrute_engineer
🐍 Python Example
import numpy as np
import onnxruntime as ort
from onnxruntime_extensions import get_library_path
sess_options = ort.SessionOptions()
sess_options.register_custom_ops_library(get_library_path())
session = ort.InferenceSession(
'model.onnx',
sess_options=sess_options,
providers=['CPUExecutionProvider']
)
input_feed = {"text": np.asarray(['something..'])}
outputs = session.run(None, input_feed)
embedding = outputs[0]
🐍 JS Example
const session = await InferenceSession.create(EMBEDDING_FULL_MODEL_PATH);
const inputTensor = new Tensor('string', ['something..'], [1]);
const feeds = { text: inputTensor };
const outputMap = await session.run(feeds);
const embedding = outputMap.text_embedding.data;
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support