new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

by AK and the research community

Dec 13

Submitted by

akhaliq

Phi-4 Technical Report

·
27 authors

Submitted by

myownskyW7

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

·
29 authors

Submitted by

oliu-io

Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions

·
5 authors

Submitted by

wcy1122

Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition

·
15 authors

Submitted by

unilm

Multimodal Latent Language Modeling with Next-Token Diffusion

·
8 authors

Submitted by

ranpox

AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials

·
8 authors

Submitted by

alanspike

SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training

·
19 authors

Submitted by

CaraJ

EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM

·
8 authors

Submitted by

arielgera

JuStRank: Benchmarking LLM Judges for System Ranking

·
6 authors

Submitted by

kangnamgyu27

PIG: Physics-Informed Gaussians as Adaptive Parametric Mesh Representations

·
4 authors

Submitted by

zxhezexin

Neural LightRig: Unlocking Accurate Object Normal and Material Estimation with Multi-Light Diffusion

·
5 authors

Submitted by

lisabdunlap

VisionArena: 230K Real World User-VLM Conversations with Preference Labels

·
8 authors

Submitted by

OAOA

Arbitrary-steps Image Super-resolution via Diffusion Inversion

·
3 authors

Submitted by

danjacobellis

Learned Compression for Compressed Learning

·
2 authors

Submitted by

praeclarumjj3

OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation

·
5 authors

Submitted by

wenyueH

RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios

·
7 authors

Submitted by

shuangfei

Normalizing Flows are Capable Generative Models

·
10 authors

Submitted by

andreim14

Word Sense Linking: Disambiguating Outside the Sandbox

·
5 authors

Submitted by

versae

The Impact of Copyrighted Material on Large Language Models: A Norwegian Perspective

·
18 authors

Submitted by

Yw22

DisPose: Disentangling Pose Guidance for Controllable Human Image Animation

·
7 authors

Submitted by

enisimsar

LoRACLR: Contrastive Adaptation for Customization of Diffusion Models

·
4 authors

Submitted by

bluestyle97

FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction

·
3 authors

Submitted by

adhiraj1998

ONEBench to Test Them All: Sample-Level Benchmarking Over Open-Ended Capabilities

·
6 authors

Submitted by

praeclarumjj3

Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders

·
6 authors

Submitted by

ZGZzz

SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts

·
6 authors

Submitted by

rumourscape

Shiksha: A Technical Domain focused Translation Dataset and Model for Indian Languages

·
2 authors