FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games Paper • 2509.01052 • Published 5 days ago • 18
Perception-Aware Policy Optimization for Multimodal Reasoning Paper • 2507.06448 • Published Jul 8 • 47
Energy-Based Transformers are Scalable Learners and Thinkers Paper • 2507.02092 • Published Jul 2 • 61
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset Paper • 2505.09568 • Published May 14 • 97
DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs Paper • 2504.17040 • Published Apr 23 • 13
MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents Paper • 2503.01935 • Published Mar 3 • 27
Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks Paper • 2501.11733 • Published Jan 20 • 29
XGen-MM-1 models and datasets Collection A collection of all XGen-MM (Foundation LMM) models! • 18 items • Updated Jul 28 • 39
Unleashing Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration Paper • 2307.05300 • Published Jul 11, 2023 • 19