Tiny Llava 4 CPU πŸ›


πŸš€ Model Overview

tiny-llava-open-elm-aimv2 is a lightweight image-text-to-text model that combines OpenELM 270M - INSTRUCT as the LLM backbone and AIMv2-Large-Patch14-224-distilled (309M) as the vision encoder. The model has been fine-tuned using LoRA (Low-Rank Adaptation) for efficient training. It was developed using the TinyLLaVA Factory codebase, which provides a modular framework for lightweight multi-modal models.

The model is designed to run efficiently on CPU, making it ideal for resource-constrained environments. It is trained and evaluated on POPE and TextVQA benchmarks. The total model size is 0.6B parameters.


πŸ“Š Performance

Model Name VQAv2 GQA SQA TextVQA MM-VET POPE MME MMMU
LLaVA-1.5-7B 78.5 62.0 66.8 58.2 30.5 85.9 1510.7 -
bczhou/TinyLLaVA-3.1B 79.9 62.0 69.1 59.1 32.0 86.4 1464.9 -
tinyllava/TinyLLaVA-Gemma-SigLIP-2.4B 78.4 61.6 64.4 53.6 26.9 86.4 1339.0 31.7
tinyllava/TinyLLaVA-Phi-2-SigLIP-3.1B 80.1 62.1 73.0 60.3 37.5 87.2 1466.4 38.4
cpu4dream/llava-small-OpenELM-AIMv2-0.6B - - - 39.68 - 83.93 - -

πŸ”— References

Downloads last month
25
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for cpu4dream/llava-small-OpenELM-AIMv2-0.6B

Base model

apple/OpenELM
Finetuned
(2)
this model

Datasets used to train cpu4dream/llava-small-OpenELM-AIMv2-0.6B

Space using cpu4dream/llava-small-OpenELM-AIMv2-0.6B 1