new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

by AK and the research community

Nov 27

Submitted by

guyuchao

ROICtrl: Boosting Instance Control for Visual Generation

·
8 authors

Submitted by

KevinQHLin

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

·
9 authors

Submitted by

BestWishYsh

Identity-Preserving Text-to-Video Generation by Frequency Decomposition

·
8 authors

Submitted by

noamrot

Pathways on the Image Manifold: Image Editing via Video Generation

·
6 authors

Submitted by

SadilKhan

MARVEL-40M+: Multi-Level Visual Elaboration for High-Fidelity Text-to-3D Content Creation

·
7 authors

Submitted by

shuaishuaicdp

Interleaved Scene Graph for Interleaved Text-and-Image Generation Assessment

·
11 authors

Submitted by

huangsiteng

Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration

·
7 authors

Submitted by

yifanzhang114

MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs

·
12 authors

Submitted by

yaelvinker

SketchAgent: Language-Driven Sequential Sketch Generation

·
6 authors

Submitted by

sggetao

Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens

·
6 authors

Submitted by

cyw-3d

SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE

·
5 authors

Submitted by

tobiaslee

VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models

·
12 authors

Submitted by

Xuweiyi

Learning 3D Representations from Procedural 3D Programs

·
2 authors

Submitted by

arkimjh

SALOVA: Segment-Augmented Long Video Assistant for Targeted Retrieval and Routing in Long-Form Video Analysis

·
4 authors

Submitted by

hhua2

FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity

·
8 authors

Submitted by

akhaliq

AnchorCrafter: Animate CyberAnchors Saling Your Products via Human-Object Interacting Video Generation

·
10 authors

Submitted by

SanghyeokLee

EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space Duality

·
3 authors

Submitted by

phenixace

MolReFlect: Towards In-Context Fine-grained Alignments between Molecules and Texts

·
9 authors

Submitted by

yisol

Controllable Human Image Generation with Personalized Multi-Garments

·
5 authors

Submitted by

amanchadha

Visual Counter Turing Test (VCT^2): Discovering the Challenges for AI-Generated Image Detection and Introducing Visual AI Index (V_AI)

·
14 authors