MedGemma-27B-IT / README.md
tylermullen's picture
Update README.md
c888044 verified
metadata
license: other
license_name: health-ai-developer-foundations
license_link: https://developers.google.com/health-ai-developer-foundations/terms
pipeline_tag: text-generation
extra_gated_heading: Access MedGemma on Hugging Face
extra_gated_prompt: >-
  To access MedGemma on Hugging Face, you're required to review and agree to
  [Health AI Developer Foundation's terms of
  use](https://developers.google.com/health-ai-developer-foundations/terms). To
  do this, please ensure you're logged in to Hugging Face and click below.
  Requests are processed immediately.
extra_gated_button_content: Acknowledge license
base_model: google/medgemma-27b-text-it
tags:
  - medical
  - clinical-reasoning
  - thinking

litert-community/MedGemma-27B-IT

This model provides a few variants of google/medgemma-27b-text-it that are ready for deployment on Web using the MediaPipe LLM Inference API.

Web

To add the model to your web app, please follow the instructions in our documentation.

Performance

Web

Note that all benchmark stats are from a MacBook Pro 2024 (Apple M4 Max chip) with 1280 KV cache size, 1024 tokens prefill, and 256 tokens decode, running in Chrome.

Precision Backend Prefill (tokens/sec) Decode (tokens/sec) Time-to-first-token (sec) GPU Memory CPU Memory Model size

F16

int8

GPU

167 tk/s

8 tk/s

14.9 s

27.0 GB

1.5 GB

27.05 GB

🔗

F32

int8

GPU

97 tk/s

8 tk/s

15.0 s

28.0 GB

1.5 GB

27.05 GB

🔗

  • Model size: measured by the size of the .tflite flatbuffer (serialization format for LiteRT models).
  • int8: quantized model with int8 weights and float activations.
  • GPU memory: measured by "GPU Process" memory for all of Chrome while running. Chrome was measured as using 340-350MB before any model loading took place.
  • CPU memory: measured for the entire tab while running. Tab was measured as using 60-70MB before any model loading took place.