vllm docs

by ndurkee - opened 5 days ago

5 days ago

Could we add some docs to the README for how to run this with vllm? I haven't been able to do my full tests but it appears to work with

vllm serve google/embeddinggemma-300m --dtype bfloat16

tedslin

5 days ago

•

edited 5 days ago

With a single model trained using Matryoshka Representation Learning (MRL), you can efficiently use the --hf_overrides '{"matryoshka_dimensions":[128,256,512,768]}' command to get the optimal embedding size for any application, balancing speed and accuracy without compromising quality.

vllm serve google/embeddinggemma-300m --dtype bfloat --hf_overrides '{"matryoshka_dimensions":[128,256,512,768]}'

BalakrishnaCh

Google org 2 days ago

Hi @ndurkee ,

Thanks for reaching out to us, Welcome to Gemma family of open source models. Yes you could able to run the model with vllm command with basic command that you have mentioned above or if you would like to experiment with different dimensions of vectors you could do that by adding a flag --hf_overrides the google/embeddinggemma-300m model support different embedding dimensions from 128 to 768.

Thank you so much for you interest in Gemma models.

Thanks.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment