Soundwave: Less is More for Speech-Text Alignment in LLMs Paper • 2502.12900 • Published 5 days ago • 72
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems Paper • 2411.02959 • Published Nov 5, 2024 • 68
Text2SQL is Not Enough: Unifying AI and Databases with TAG Paper • 2408.14717 • Published Aug 27, 2024 • 26
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents Paper • 2408.07199 • Published Aug 13, 2024 • 21
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications Paper • 2408.11878 • Published Aug 20, 2024 • 56
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models Paper • 2402.05935 • Published Feb 8, 2024 • 17
AIMO Progress Prize Collection Models and datasets used in the winning solution to the AIMO 1st Progress Prize • 7 items • Updated Jul 19, 2024 • 12
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning? Paper • 2407.01284 • Published Jul 1, 2024 • 77
4M Models Collection Multimodal models from https://4m.epfl.ch/ • 14 items • Updated Jun 14, 2024 • 31
Plan, Generate and Complicate: Improving Low-resource Dialogue State Tracking via Easy-to-Difficult Zero-shot Data Augmentation Paper • 2406.08860 • Published Jun 13, 2024 • 1
CapS-Adapter: Caption-based MultiModal Adapter in Zero-Shot Classification Paper • 2405.16591 • Published May 26, 2024 • 1