VITA-MLLM
/

VITA-1.5

Video-Text-to-Text

Model card Files Files and versions Community

This repository contains the model of the paper VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction.

Code: https://github.com/VITA-MLLM/VITA

Downloads last month: 875

Safetensors

Model size

8.32B params

Tensor type

BF16

·

Inference Providers NEW

Video-Text-to-Text

This model is not currently available via any of the supported Inference Providers.

The model cannot be deployed to the HF Inference API: The model has no library tag.