On Vacation 🏝️

16 35 12

Delin Qu

delinqu

https://delinqu.github.io/

AI & ML interests

Embodied AI, 3D Vision

Recent Activity

upvoted a paper 2 days ago

LTX-2: Efficient Joint Audio-Visual Foundation Model

updated a dataset 6 days ago

delinqu/comet-1.5k

liked a dataset 6 days ago

delinqu/comet-1.5k

View all activity

Organizations

upvoted a paper 2 days ago

LTX-2: Efficient Joint Audio-Visual Foundation Model

Paper • 2601.03233 • Published 3 days ago • 77

upvoted a paper 17 days ago

LoGoPlanner: Localization Grounded Navigation Policy with Metric-aware Visual Geometry

Paper • 2512.19629 • Published 18 days ago • 25

upvoted a paper 18 days ago

Towards Scalable Pre-training of Visual Tokenizers for Generation

Paper • 2512.13687 • Published 25 days ago • 100

upvoted an article 18 days ago

Article

Improving Hugging Face Training Efficiency Through Packing with Flash Attention 2

Aug 21, 2024

•

upvoted a paper 25 days ago

Openpi Comet: Competition Solution For 2025 BEHAVIOR Challenge

Paper • 2512.10071 • Published 30 days ago • 17

upvoted a paper 29 days ago

Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation

Paper • 2512.10949 • Published 29 days ago • 45

upvoted 3 papers about 1 month ago

PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing

Paper • 2512.02589 • Published Dec 2, 2025 • 68

Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation

Paper • 2512.04678 • Published Dec 4, 2025 • 40

Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach

Paper • 2512.02834 • Published Dec 2, 2025 • 40

upvoted 3 papers 4 months ago

Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors

Paper • 2505.24625 • Published May 30, 2025 • 9

F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions

Paper • 2509.06951 • Published Sep 8, 2025 • 32

EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control

Paper • 2508.21112 • Published Aug 28, 2025 • 77

upvoted a collection 4 months ago

EO-Robotics

Collection

EmbodiedOneVision is a unified framework for multimodal embodied reasoning and robot control, featuring interleaved vision-text-action pretraining. • 8 items • Updated Dec 7, 2025 • 8

upvoted 2 papers 5 months ago

Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation

Paper • 2508.05635 • Published Aug 7, 2025 • 73

Deep Researcher with Test-Time Diffusion

Paper • 2507.16075 • Published Jul 21, 2025 • 67

upvoted 2 papers 6 months ago

Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24, 2025 • 316

Scaling RL to Long Videos

Paper • 2507.07966 • Published Jul 10, 2025 • 159

upvoted a collection 6 months ago

Libero Benchmark Dataset

Collection

18 items • Updated Aug 28, 2025 • 7

upvoted a paper 6 months ago

Hume: Introducing System-2 Thinking in Visual-Language-Action Model

Paper • 2505.21432 • Published May 27, 2025 • 4

upvoted a paper 7 months ago

ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation

Paper • 2506.18095 • Published Jun 22, 2025 • 66

Delin Qu

AI & ML interests

Recent Activity

Organizations

delinqu's activity

Improving Hugging Face Training Efficiency Through Packing with Flash Attention 2

🎉 Free Image Generator Now Available!