Submitted by HaoranWei 83 General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model · 12 authors 9
Submitted by sheryc 38 LongRecipe: Recipe for Efficient Long Context Generalization in Large Languge Models · 11 authors 2
Submitted by akhaliq 36 DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos · 8 authors 3
Submitted by zlzheng 27 VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges · 4 authors 6
Submitted by akhaliq 14 OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model · 9 authors 2
Submitted by akhaliq 12 Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization · 8 authors 2
Submitted by whlzy 9 GenAgent: Build Collaborative AI Systems with Automated Workflow Generation -- Case Studies on ComfyUI · 5 authors 3
Submitted by akhaliq 6 Follow-Your-Canvas: Higher-Resolution Video Outpainting with Extensive Content Generation · 10 authors 2
Submitted by amanchadha 4 Density Adaptive Attention-based Speech Network: Enhancing Feature Understanding for Mental Health Disorders · 4 authors 3
Submitted by antoinelouis 3 Know When to Fuse: Investigating Non-English Hybrid Retrieval in the Legal Domain · 3 authors 2
Submitted by de-Rodrigo 2 The MERIT Dataset: Modelling and Efficiently Rendering Interpretable Transcripts · 4 authors 2
Submitted by EchoShao8899 1 PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action · 5 authors 2