metadata

license: apache-2.0
base_model:
  - apple/aimv2-large-patch14-448
pipeline_tag: zero-shot-object-detection
tags:
  - t5gemma
  - vision

TMoon

3D positional embedding (as in the text encoder replacement, object detection and others)
Supports multiple image aspect ratios
Contrastive prediction model
AIMv2 vision model
T5Gemma text encoder
Moondream2 dataset