Weitai Kang's picture

1 1

Weitai Kang

weitaikang

·

https://weitaikang.github.io/

AI & ML interests

Large Multimodal Models, Visual Grounding

Organizations

None yet

authored a paper 3 months ago

InfantAgent-Next: A Multimodal Generalist Agent for Automated Computer Interaction

Paper • 2505.10887 • Published May 16 • 10

authored 8 papers 11 months ago

Interpolating Video-LLMs: Toward Longer-sequence LMMs in a Training-free Manner

Paper • 2409.12963 • Published Sep 19, 2024

Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning

Paper • 2410.00255 • Published Sep 30, 2024 • 5

SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding

Paper • 2407.03200 • Published Jul 3, 2024

Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention

Paper • 2405.18295 • Published May 28, 2024

ACTRESS: Active Retraining for Semi-supervised Visual Grounding

Paper • 2407.03251 • Published Jul 3, 2024

Visual Grounding with Attention-Driven Constraint Balancing

Paper • 2407.03243 • Published Jul 3, 2024

On the Faithfulness of Vision Transformer Explanations

Paper • 2404.01415 • Published Apr 1, 2024

Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision Transformer

Paper • 2403.14552 • Published Mar 21, 2024