new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

by AK and the research community

Aug 7

Submitted by

akhaliq

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

·
12 authors

Submitted by

akhaliq

LLaVA-OneVision: Easy Visual Task Transfer

·
10 authors

Submitted by

akhaliq

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

·
4 authors

Submitted by

akhaliq

An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion

·
4 authors

Submitted by

akhaliq

MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine

·
11 authors

Submitted by

akhaliq

IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts

·
6 authors

Submitted by

akhaliq

CoverBench: A Challenging Benchmark for Complex Claim Verification

·
8 authors

Submitted by

akhaliq

Diffusion Models as Data Mining Tools

·
5 authors

Submitted by

davanstrien

Synthesizing Text-to-SQL Data from Weak and Strong LLMs

·
6 authors

Submitted by

akhaliq

ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer

·
13 authors

Submitted by

Bowieee

StructEval: Deepen and Broaden Large Language Model Assessment via Structured Evaluation

·
7 authors

Submitted by

MarkWang

AVESFormer: Efficient Transformer Design for Real-Time Audio-Visual Segmentation

·
7 authors