V-DPM: 4D Video Reconstruction with Dynamic Point Maps
Abstract
Dynamic Point Maps extended to video input through V-DPM framework achieve state-of-the-art 3D and 4D reconstruction by recovering both dynamic depth and full 3D motion of scene points.
Powerful 3D representations such as DUSt3R invariant point maps, which encode 3D shape and camera parameters, have significantly advanced feed forward 3D reconstruction. While point maps assume static scenes, Dynamic Point Maps (DPMs) extend this concept to dynamic 3D content by additionally representing scene motion. However, existing DPMs are limited to image pairs and, like DUSt3R, require post processing via optimization when more than two views are involved. We argue that DPMs are more useful when applied to videos and introduce V-DPM to demonstrate this. First, we show how to formulate DPMs for video input in a way that maximizes representational power, facilitates neural prediction, and enables reuse of pretrained models. Second, we implement these ideas on top of VGGT, a recent and powerful 3D reconstructor. Although VGGT was trained on static scenes, we show that a modest amount of synthetic data is sufficient to adapt it into an effective V-DPM predictor. Our approach achieves state of the art performance in 3D and 4D reconstruction for dynamic scenes. In particular, unlike recent dynamic extensions of VGGT such as P3, DPMs recover not only dynamic depth but also the full 3D motion of every point in the scene.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- DePT3R: Joint Dense Point Tracking and 3D Reconstruction of Dynamic Scenes in a Single Forward Pass (2025)
- Any4D: Unified Feed-Forward Metric 4D Reconstruction (2025)
- Efficiently Reconstructing Dynamic Scenes One D4RT at a Time (2025)
- AMB3R: Accurate Feed-forward Metric-scale 3D Reconstruction with Backend (2025)
- VGGT4D: Mining Motion Cues in Visual Geometry Transformers for 4D Scene Reconstruction (2025)
- LASER: Layer-wise Scale Alignment for Training-Free Streaming 4D Reconstruction (2025)
- Selfi: Self Improving Reconstruction Engine via 3D Geometric Feature Alignment (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper
