new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

by AK and the research community

Dec 6

Submitted by

Senqiao

VisionZip: Longer is Better but Not Necessary in Vision Language Models

·
7 authors

Submitted by

ranpox

Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

·
9 authors

Submitted by

jiuhai

Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion

·
7 authors

Submitted by

JeffreyXiang

Structured 3D Latents for Scalable and Versatile 3D Generation

·
9 authors

Submitted by

zhijianliu

NVILA: Efficient Frontier Visual Language Models

·
27 authors

Submitted by

seungone

Evaluating Language Models as Synthetic Data Generators

·
10 authors

Submitted by

Zhoues

Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection

·
8 authors

Submitted by

sunovivid

A Noise is Worth Diffusion Guidance

·
12 authors

Submitted by

huanngzh

MV-Adapter: Multi-view Consistent Image Generation Made Easy

·
7 authors

Submitted by

Crayon-Shinchan

AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models

·
8 authors

Submitted by

jsingh

Negative Token Merging: Image-based Adversarial Feature Guidance

·
10 authors

Submitted by

xcjthu

Densing Law of LLMs

·
7 authors

Submitted by

dvilasuero

Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation

·
23 authors

Submitted by

leo1117

Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

·
8 authors

Submitted by

Franck-Dernoncourt

Personalized Multimodal Large Language Models: A Survey

·
27 authors

Submitted by

BryanW

HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing

·
7 authors

Submitted by

affjljoo3581

Monet: Mixture of Monosemantic Experts for Transformers

·
4 authors

Submitted by

jacklishufan

OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows

·
7 authors

Submitted by

adrianb1

Discriminative Fine-tuning of LVLMs

·
7 authors

Submitted by

ltzheng

MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation

·
9 authors

Submitted by

akhaliq

Marco-LLM: Bridging Languages via Massive Multilingual Training for Cross-Lingual Enhancement

·
20 authors

Submitted by

haoningwu

Towards Universal Soccer Video Understanding

·
6 authors

Submitted by

xumingyu16

KV Shifting Attention Enhances Language Modeling

·
4 authors

Submitted by

kpzhang996

ZipAR: Accelerating Autoregressive Image Generation through Spatial Locality

·
7 authors

Submitted by

james371507

4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion

·
10 authors

Submitted by

JungleGym

p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay

·
6 authors

Submitted by

russwang

Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension

·
9 authors

Submitted by

haoningwu

MRGen: Diffusion-based Controllable Data Engine for MRI Segmentation towards Unannotated Modalities

·
5 authors

Submitted by

ethanbradley

SynFinTabs: A Dataset of Synthetic Financial Tables for Information and Table Extraction

·
4 authors

Submitted by

liujch1998

Establishing Task Scaling Laws via Compute-Efficient Model Ladders

·
12 authors

Submitted by

wentingzhao

Challenges in Trustworthy Human Evaluation of Chatbots

·
3 authors