RealMaster: Lifting Rendered Scenes into Photorealistic Video
Abstract
RealMaster combines video diffusion models with 3D engine outputs to generate photorealistic videos that maintain geometric accuracy and scene consistency through paired training and IC-LoRA distillation.
State-of-the-art video generation models produce remarkable photorealism, but they lack the precise control required to align generated content with specific scene requirements. Furthermore, without an underlying explicit geometry, these models cannot guarantee 3D consistency. Conversely, 3D engines offer granular control over every scene element and provide native 3D consistency by design, yet their output often remains trapped in the "uncanny valley". Bridging this sim-to-real gap requires both structural precision, where the output must exactly preserve the geometry and dynamics of the input, and global semantic transformation, where materials, lighting, and textures must be holistically transformed to achieve photorealism. We present RealMaster, a method that leverages video diffusion models to lift rendered video into photorealistic video while maintaining full alignment with the output of the 3D engine. To train this model, we generate a paired dataset via an anchor-based propagation strategy, where the first and last frames are enhanced for realism and propagated across the intermediate frames using geometric conditioning cues. We then train an IC-LoRA on these paired videos to distill the high-quality outputs of the pipeline into a model that generalizes beyond the pipeline's constraints, handling objects and characters that appear mid-sequence and enabling inference without requiring anchor frames. Evaluated on complex GTA-V sequences, RealMaster significantly outperforms existing video editing baselines, improving photorealism while preserving the geometry, dynamics, and identity specified by the original 3D control.
Community
RealMaster lifts rendered scenes into photorealistic video with full 3D geometry preservation and alignment to a controllable 3D engine.
RealMaster lifts rendered scenes into photorealistic video with full 3D geometry preservation and alignment to a controllable 3D engine.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- CamLit: Unified Video Diffusion with Explicit Camera and Lighting Control (2026)
- Tri-Prompting: Video Diffusion with Unified Control over Scene, Subject, and Motion (2026)
- VS3R: Robust Full-frame Video Stabilization via Deep 3D Reconstruction (2026)
- CineScene: Implicit 3D as Effective Scene Representation for Cinematic Video Generation (2026)
- TAPESTRY: From Geometry to Appearance via Consistent Turntable Videos (2026)
- 3DreamBooth: High-Fidelity 3D Subject-Driven Video Generation Model (2026)
- Ctrl&Shift: High-Quality Geometry-Aware Object Manipulation in Visual Generation (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2603.23462 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper