new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

by AK and the research community

Sep 4

Submitted by

HaoranWei

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

·
12 authors

Submitted by

Muennighoff

OLMoE: Open Mixture-of-Experts Language Models

·
24 authors

Submitted by

SushantGautam

Kvasir-VQA: A Text-Image Pair GI Tract Dataset

·
7 authors

Submitted by

sheryc

LongRecipe: Recipe for Efficient Long Context Generalization in Large Languge Models

·
11 authors

Submitted by

akhaliq

DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos

·
8 authors

Submitted by

Huage001

LinFusion: 1 GPU, 1 Minute, 16K Image

·
4 authors

Submitted by

akhaliq

FLUX that Plays Music

·
4 authors

Submitted by

zlzheng

VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges

·
4 authors

Submitted by

akhaliq

Diffusion Policy Policy Optimization

·
9 authors

Submitted by

akhaliq

Compositional 3D-aware Video Generation with LLM Director

·
6 authors

Submitted by

akhaliq

ContextCite: Attributing Model Generation to Context

·
4 authors

Submitted by

akhaliq

OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model

·
9 authors

Submitted by

akhaliq

Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization

·
8 authors

Submitted by

whlzy

GenAgent: Build Collaborative AI Systems with Automated Workflow Generation -- Case Studies on ComfyUI

·
5 authors

Submitted by

akhaliq

Follow-Your-Canvas: Higher-Resolution Video Outpainting with Extensive Content Generation

·
10 authors

Submitted by

amanchadha

Density Adaptive Attention-based Speech Network: Enhancing Feature Understanding for Mental Health Disorders

·
4 authors

Submitted by

antoinelouis

Know When to Fuse: Investigating Non-English Hybrid Retrieval in the Legal Domain

·
3 authors

Submitted by

de-Rodrigo

The MERIT Dataset: Modelling and Efficiently Rendering Interpretable Transcripts

·
4 authors

Submitted by

EchoShao8899

PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action

·
5 authors