vllm docs

#1
by ndurkee - opened

Could we add some docs to the README for how to run this with vllm? I haven't been able to do my full tests but it appears to work with

vllm serve google/embeddinggemma-300m --dtype bfloat16

With a single model trained using Matryoshka Representation Learning (MRL), you can efficiently use the --hf_overrides '{"matryoshka_dimensions":[128,256,512,768]}' command to get the optimal embedding size for any application, balancing speed and accuracy without compromising quality.

vllm serve google/embeddinggemma-300m --dtype bfloat --hf_overrides '{"matryoshka_dimensions":[128,256,512,768]}'

Hi @ndurkee ,

Thanks for reaching out to us, Welcome to Gemma family of open source models. Yes you could able to run the model with vllm command with basic command that you have mentioned above or if you would like to experiment with different dimensions of vectors you could do that by adding a flag --hf_overrides the google/embeddinggemma-300m model support different embedding dimensions from 128 to 768.

Thank you so much for you interest in Gemma models.

Thanks.

Sign up or log in to comment