Submitted by zlatamaria 139 Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA · 9 authors 18 4
Submitted by chenguolin 78 PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers · 7 authors 2.1k 8
Submitted by lss727 37 Truth in the Few: High-Value Data Selection for Efficient Multi-Modal Reasoning · 9 authors 5 2
Submitted by abhi1nandy2 36 Leveraging Self-Attention for Input-Dependent Soft Prompting in LLMs · 3 authors 2
Submitted by russwang 34 MORSE-500: A Programmatically Controllable Video Benchmark to Stress-Test Multimodal Reasoning · 13 authors 28 2
Submitted by Shunian 30 FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal Contextual Fusion · 8 authors 78 2
Submitted by thomagram 22 STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis · 10 authors 2
Submitted by DarthZhu 22 Is Extending Modality The Right Path Towards Omni-Modality? · 4 authors 13 2
Submitted by scott-yjyang 20 Medical World Model: Generative Simulation of Tumor Evolution for Treatment Planning · 11 authors 74 2
Submitted by dcml0714 15 Audio-Aware Large Language Models as Judges for Speaking Styles · 11 authors 4
Submitted by DhavalPatel 14 AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and Maintenance · 8 authors 153 3
Submitted by EmetTheGolum 10 Peer-Ranked Precision: Creating a Foundational Dataset for Fine-Tuning Vision Models from DataSeeds' Annotated Imagery · 4 authors 2
Submitted by cg1177 9 Bridging Perspectives: A Survey on Cross-view Collaborative Intelligence with Egocentric-Exocentric Vision · 8 authors 12 2
Submitted by salman-abdullah 9 MIRIAD: Augmenting LLMs with millions of medical query-response pairs · 10 authors 116 2
Submitted by zhwang01 9 CodeContests+: High-Quality Test Case Generation for Competitive Programming · 5 authors 5
Submitted by MauroC 7 Splatting Physical Scenes: End-to-End Real-to-Sim from Imperfect Robot Data · 6 authors 2
Submitted by Hoyard 6 3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model · 7 authors 26 2
Submitted by sy1998 5 When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding · 10 authors 2
Submitted by guineapig 5 HASHIRU: Hierarchical Agent System for Hybrid Intelligent Resource Utilization · 3 authors 16 2
Submitted by benshi34 4 When Models Know More Than They Can Explain: Quantifying Knowledge Transfer in Human-AI Collaboration · 6 authors 2
Submitted by JohnCage 4 Prefix Grouper: Efficient GRPO Training through Shared-Prefix Forward · 8 authors 47 2
Submitted by neildlf 3 GuideX: Guided Synthetic Data Generation for Zero-Shot Information Extraction · 4 authors 5 2