Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2502.13923

DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 181
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

Paper • 2401.00849 • Published Jan 1, 2024 • 17
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 50
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

Paper • 2311.00571 • Published Nov 1, 2023 • 40

AI Paper of the Day

A collection of papers that I think are interesting, one added each day

about 18 hours ago

Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 23
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 83
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 146
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Paper • 2401.17072 • Published Jan 30, 2024 • 25

about 10 hours ago

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published 4 days ago • 136

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published 3 days ago • 98
LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models

Paper • 2502.14834 • Published 3 days ago • 22
Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published 4 days ago • 136

Improved GUI Grounding via Iterative Narrowing

Paper • 2411.13591 • Published Nov 18, 2024
Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published 4 days ago • 136

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published 4 days ago • 136

about 8 hours ago

MLLM-as-a-Judge for Image Safety without Human Labeling

Paper • 2501.00192 • Published Dec 31, 2024 • 25
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1 • 99
Xmodel-2 Technical Report

Paper • 2412.19638 • Published Dec 27, 2024 • 26
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Paper • 2412.18925 • Published Dec 25, 2024 • 97

Image-Video MultiModal Understanding

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 140
SeFAR: Semi-supervised Fine-grained Action Recognition with Temporal Perturbation and Learning Stabilization

Paper • 2501.01245 • Published Jan 2 • 5
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Paper • 2501.00599 • Published Dec 31, 2024 • 41
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks

Paper • 2501.08326 • Published Jan 14 • 32

Instruction Following without Instruction Tuning

Paper • 2409.14254 • Published Sep 21, 2024 • 29
Baichuan Alignment Technical Report

Paper • 2410.14940 • Published Oct 19, 2024 • 50
CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution

Paper • 2410.16256 • Published Oct 21, 2024 • 60
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data

Paper • 2410.18558 • Published Oct 24, 2024 • 19

LLM Tech Report

Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 346
Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published Sep 18, 2024 • 141
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement

Paper • 2409.12122 • Published Sep 18, 2024 • 3
Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published 4 days ago • 136

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs