The Next Step Forward in Multimodal LLM Alignment

[2025/02/10] ๐Ÿ”ฅ We are proud to open-source MM-RLHF, a comprehensive project for aligning Multimodal Large Language Models (MLLMs) with human preferences. This release includes:

  • A high-quality MLLM alignment dataset.
  • A strong Critique-Based MLLM reward model and its training algorithm.
  • A novel alignment algorithm MM-DPO.
  • Two new benchmarks.

Our dataset and algorithms enable consistent performance improvements across 10 dimensions and 27 benchmarks.\n

Use

Intended use

The model was trained on MM-RLHF data and have the ability to interact with images, multi-image and videos.

image/png

Feel free to share your generations in the Community tab!

Generation

We provide the simple generation process for using our model. For more details, you could refer to Github.\n

Citation

If you find it useful for your research and applications, please cite related papers/blogs using this BibTeX:

@article{zhang2025mm,
  title={MM-RLHF: The Next Step Forward in Multimodal LLM Alignment},
  author={Zhang, Yi-Fan and Yu, Tao and Tian, Haochen and Fu, Chaoyou and Li, Peiyan and Zeng, Jianshu and Xie, Wulin and Shi, Yang and Zhang, Huanyu and Wu, Junkang and others},
  journal={arXiv preprint arXiv:2502.10391},
  year={2025}
}
Downloads last month
95
Safetensors
Model size
8.04B params
Tensor type
BF16
ยท
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Collection including yifanzhang114/MM-RLHF-Reward-7B-llava-ov-qwen