StoryMem: Multi-shot Long Video Storytelling with Memory Paper • 2512.19539 • Published 13 days ago • 17
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI Paper • 2512.16676 • Published 17 days ago • 200
VABench: A Comprehensive Benchmark for Audio-Video Generation Paper • 2512.09299 • Published 26 days ago • 7
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing Paper • 2509.22186 • Published Sep 26, 2025 • 139
StreamingVLM: Real-Time Understanding for Infinite Video Streams Paper • 2510.09608 • Published Oct 10, 2025 • 50
Trace Anything: Representing Any Video in 4D via Trajectory Fields Paper • 2510.13802 • Published Oct 15, 2025 • 30
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing Paper • 2509.22186 • Published Sep 26, 2025 • 139
NativeRes-LLaVA Collection LLaVA using images with native resolution • 7 items • Updated Jun 14, 2025 • 5
Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents Paper • 2507.04009 • Published Jul 5, 2025 • 51
Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language Models Paper • 2506.12776 • Published Jun 15, 2025 • 2
NativeRes-LLaVA Collection LLaVA using images with native resolution • 7 items • Updated Jun 14, 2025 • 5