--- license: cc-by-nc-sa-4.0 datasets: - PengxiangLi/SPORT language: - en base_model: - Qwen/Qwen2-VL-7B-Instruct --- # 🎯 SPORT: Iterative Tool Usage Exploration for Multimodal Agents via Step-wise Preference Tuning
[![arXiv](https://img.shields.io/badge/arXiv-2504.21561-b31b1b.svg)](https://arxiv.org/abs/2504.21561) [![Project Page](https://img.shields.io/badge/Project-Page-2ea44f)](https://sport-agents.github.io) [![Paper](https://img.shields.io/badge/Paper-PDF-red)](https://arxiv.org/pdf/2504.21561)
This repository contains the **LoRA checkpoint** for **SPORT**, a framework that enables multimodal agents to improve iteratively through self-generated tasks and preference-based optimization. We finetuned **Qwen2-VL-7B-Instruct** using **LoRA adapters** and **Direct Preference Optimization (DPO)**, making the model more effective at reasoning about multimodal tasks and aligning with preference signals. --- ## 📋 Key Features * **LoRA Fine-tuning**: Lightweight finetuning on top of Qwen2-VL-7B-Instruct for efficient adaptation. * **DPO Training**: Preference-based optimization for stronger alignment without human annotations. * **Task Synthesis**: Multimodal task generation via LLMs for broad coverage. * **Step Exploration**: Multiple candidate actions sampled per decision point. * **Step Verification**: LLM-based critics evaluate and rank candidate outcomes. * **Self-Improvement Loop**: Iterative cycle of task creation, exploration, and refinement. --- ## 🚀 Performance Highlights On the **GTA benchmark**, SPORT demonstrates consistent improvements over strong baselines: * **+7%** Answer Accuracy (AnsAcc) * **+8%** Tool Accuracy (ToolAcc) * **+7%** Code Execution Success (CodeExec) --- ## 💾 Model Details * **Base Model**: [Qwen2-VL-7B](https://huggingface.co/Qwen/Qwen2-VL-7B) * **Finetuning Method**: LoRA (rank 64, α=16) * **Optimization**: Direct Preference Optimization (DPO) * **Checkpoint**: LoRA weights only (requires merging with base model for inference) --- ## 🛠️ Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel base_model = "Qwen/Qwen2-VL-7B" lora_ckpt = "your-hf-username/SPORT-LoRA-7B" tokenizer = AutoTokenizer.from_pretrained(base_model) model = AutoModelForCausalLM.from_pretrained(base_model, device_map="auto") model = PeftModel.from_pretrained(model, lora_ckpt) ``` --- ## 📝 Citation If you use SPORT or this checkpoint in your research, please cite: ```bibtex @inproceedings{li2025iterative, title={Iterative Trajectory Exploration for Multimodal Agents}, author={Li, Pengxiang and Gao, Zhi and Zhang, Bofei and Mi, Yapeng and Ma, Xiaojian and Shi, Chenrui and Yuan, Tao and Wu, Yuwei and Jia, Yunde and Zhu, Song-Chun and Li, Qing}, year={2025}, eprint={2504.21561}, archivePrefix={arXiv}, url={https://arxiv.org/abs/2504.21561}, } ``` --- ⚠️ **Note**: This repository only provides LoRA weights. You must load them on top of the base **Qwen2-VL-7B** model for inference.