Safetensors
etchat_phi3
custom_code

E.T. Chat

arXiv | Project Page | GitHub

E.T. Chat is a novel time-sensitive Video-LLM that reformulates timestamp prediction as an embedding matching problem, serving as a strong baseline on E.T. Bench. E.T. Chat consists of a visual encoder, a frame compressor, and a LLM. A special token <vid> is introduced to trigger frame embedding matching for timestamp prediction.

πŸ”– Model Details

Model Description

  • Developed by: Ye Liu
  • Model type: Multi-modal Large Language Model
  • Language(s): English
  • License: BSD-3-Clause

Training Data

The stage-3 checkpoint of E.T. Chat was trained from ET-Instruct-164K dataset.

More Details

Please refer to our GitHub Repository for more details about this model.

πŸ“– Citation

Please kindly cite our paper if you find this project helpful.

@inproceedings{liu2024etbench,
  title={E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding},
  author={Liu, Ye and Ma, Zongyang and Qi, Zhongang and Wu, Yang and Chen, Chang Wen and Shan, Ying},
  booktitle={Neural Information Processing Systems (NeurIPS)},
  year={2024}
}
Downloads last month
9
Safetensors
Model size
5.02B params
Tensor type
FP16
Β·
I64
Β·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Collection including PolyU-ChenLab/ETChat-Phi3-Mini-Stage-3