Submitted by akhaliq 30 Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution · 15 authors 2
Submitted by akhaliq 28 In-context Autoencoder for Context Compression in a Large Language Model · 5 authors
Submitted by akhaliq 23 InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation · 13 authors
Submitted by akhaliq 23 Stack More Layers Differently: High-Rank Training Through Low-Rank Updates · 4 authors
Submitted by akhaliq 14 SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Task Planning · 6 authors 1
Submitted by akhaliq 10 Distilling Large Language Models for Biomedical Knowledge Extraction: A Case Study on Adverse Drug Events · 11 authors 1
Submitted by akhaliq 10 DNAGPT: A Generalized Pretrained Tool for Multiple DNA Sequence Analysis Tasks · 6 authors
Submitted by akhaliq 10 Instruction Mining: High-Quality Instruction Data Selection for Large Language Models · 3 authors
Submitted by akhaliq 8 Generating Benchmarks for Factuality Evaluation of Language Models · 10 authors
Submitted by akhaliq 7 T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation · 5 authors
Submitted by akhaliq 4 VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models · 6 authors
Submitted by akhaliq 3 Giving Robots a Hand: Learning Generalizable Manipulation with Eye-in-Hand Human Video Demonstrations · 3 authors