mtgv
/

Image Classification

VisionLLaMA-Base-MAE

With the Masked Autoencoders' paradigm, VisionLLaMA-Base-MAE model is trained on ImageNet-1k without labels. It manifests substantial improvements over classification tasks (SFT, linear probing) on ImageNet-1K and the segmentation task on ADE20K.

Model ImageNet Acc (SFT) ImageNet Acc (Linear Probe) ADE20K Segmentation
VisionLLaMA-Base-MAE (ep800) 84.0 69.7 49.0
VisionLLaMA-Base-MAE (ep1600) 84.3 71.7 50.2

How to Use

Please refer the Github page for usage.

Citation

@article{chu2024visionllama,
  title={VisionLLaMA: A Unified LLaMA Interface for Vision Tasks},
  author={Chu, Xiangxiang and Su, Jianlin and Zhang, Bo and Shen, Chunhua},
  journal={arXiv preprint arXiv:2403.00522},
  year={2024}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Dataset used to train mtgv/VisionLLaMA-Base-MAE