CrowdEraser: Generating Humanless Environment Walkthroughs from Egocentric Walking Tour Videos

Rice University

Our method produces a clean, human-free video from crowd-heavy egocentric walking tour videos.

Abstract

Egocentric walking tour videos are a rich source of visual data for modeling real-world environments, but their usefulness is limited by frequent human occlusions from crowds and eye-level viewpoints. We address this by developing a generative method to realistically remove humans and their shadows from such videos.

We focus on addressing this challenge by developing a generative algorithm that can realistically remove (i.e., inpaint) humans and their associated shadow effects from walking tour videos. Key to our approach is the construction of a rich semi-synthetic dataset of video clip pairs to train this generative model. We then used this dataset to fine-tune the state-of-the-art Casper video diffusion model for object and effects inpainting, and demonstrate that the resulting model performs far better than Casper both qualitatively and quantitatively at removing humans from walking tour clips with significant human presence and complex backgrounds.

Comparisons on Crowd Removal

Data Construction Pipeline

Data Construction Pipeline Diagram
Overview of our data construction pipeline. Both background and foreground clips are sourced from real “walking tour” videos. The foreground clips were selected to ensure an approximately uniform distribution across different Crowd% levels. For each instance, we generate a soft shadow with randomized strength and angle by applying an affine transform to the human mask (red dots indicate pivot points).

BibTeX

@article{TBU,
  author    = {TBU},
  title     = {Generating Humanless Environment Walkthroughs from Egocentric Walking Tour Videos},
  journal   = {CVPR},
  year      = {2026},
}