CVPR 2026 · Accepted

DreamStereo: Towards Real-Time Stereo Inpainting for HD Videos

Real-time stereo inpainting at 768×1280@~25FPS on a single A100 with NFE=1.

ByteDance

*Equal contribution · †Corresponding author

Abstract

Task. Mono-to-stereo video conversion via stereo inpainting, filling occlusions while maintaining geometric consistency and temporal coherence.

Method. DreamStereo enables real-time HD stereo inpainting through three key innovations:

  • GAPW, gradient-aware backward warping for clean occlusion boundaries
  • PBDP, pseudo-binocular data pipeline generating training pairs from monocular videos
  • SASI, sparse attention inference achieving 10.7× DiT speedup

Performance. 768×1280@25FPS on A100 (40ms), 54ms on RTX 3090, 33ms on RTX 4090. First real-time HD stereo inpainting system.

Figure 1

Real-Time

Why it runs in real time

SASI
  • SASI prunes 70%+ redundant tokens during inference.
Speed comparison
  • Distilled VAE for accelerated codec.
  • NFE=1 inference for real-time throughput.

Data Construction

PBDP
  • GAPW: backward warping + gradient-based occlusion masks for clean boundaries.
  • PBDP: dual projection to build pseudo-stereo pairs/masks from monocular videos.

Comparison Data

Data Construction Method PSNR ↑ SSIM ↑ LPIPS ↓
Random Mask 26.64 0.906 0.092
TrajectoryCrafter (Forward Splating) 31.14 0.923 0.047
Ours (PBDP) 32.48 0.933 0.049

Comparison of Stereo Inpainting

BibTeX

Cite:

@inproceedings{Huang2026DreamStereo,
  title     = {DreamStereo: Towards Real-Time Stereo Inpainting for HD Videos},
  author    = {Huang, Yuan and Zhao, Sijie and Cheng, Jing and Xu, Hao and Jiao, Shaohui},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2026},
  note      = {arXiv:2604.12270}
}