DeepSeer: Vision Language Models with Reasoning

Vision language models with chain-of-thought reasoning are just starting to emerge. This is a proof-of-concept to train a vision model with thinking-enabled chat templates based on DeepSeek-R1 models.

Setup

pip install git+https://github.com/facebookresearch/schedule_free.git
pip install peft
git clone https://github.com/mkturkcan/seers.git
cd seers/seers/
git clone https://huggingface.co/mehmetkeremturkcan/DeepSeer-R1-Vision-Distill-Qwen-1.5B_google_vit-base-patch16-224

Test

Run

python predict.py

Training Details

This model is a fine-tuned version of deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B on the 5CD-AI/LLaVA-CoT-o1-Instruct dataset. It has been trained using seers.

Downloads last month
68
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for mehmetkeremturkcan/DeepSeer-R1-Vision-Distill-Qwen-1.5B_google_vit-base-patch16-224

Finetuned
(105)
this model

Dataset used to train mehmetkeremturkcan/DeepSeer-R1-Vision-Distill-Qwen-1.5B_google_vit-base-patch16-224

Collection including mehmetkeremturkcan/DeepSeer-R1-Vision-Distill-Qwen-1.5B_google_vit-base-patch16-224