Haotian Zhang1    Long Mai2    Ning Xu2    Zhaowen Wang2    John Collomosse2,3    Hailin Jin2   

1 Stanford University
2 Adobe Research
3 University Of Surrey

Abstract

We propose a novel video inpainting algorithm that simultaneously hallucinates missing appearance and motion (optical flow) information, building upon the recent `Deep Image Prior' (DIP) that exploits convolutional network architectures to enforce plausible texture in static images. In extending DIP to video we make two important contributions. First, we show that coherent video inpainting is possible without a priori training. We take a generative approach to inpainting based on internal (within-video) learning without reliance upon an external corpus of visual data to train a one-size-fits-all model for the large space of general videos. Second, we show that such a framework can jointly generate both appearance and flow, whilst exploiting these complementary modalities to ensure mutual consistency. We show that leveraging appearance statistics specific to each video achieves visually plausible results whilst handling the challenging problem of long-term consistency.

Materials

Paper
Code

Method

In this work, we approach video inpainting with an internal learning formulation. The general idea is to use the input video as the training data to learn a generative neural network \(G_{\theta}\) to generate each target frame \(I^*_i\) from a corresponding noise map \(N_i\). The noise map \(N_i\) has one channel and shares the same spatial size with the input frame. We sample the input noise maps independently for each frame and fix them during training. The generative network \(G_{\theta}\) is trained to predict both frames \(\hat{I}_i\) and optical flow maps \(\hat{F}_{i,i\pm t}\). The model is trained entirely on the input video (with holes) without any external data, optimizing the combination of the image generation loss \(L_r\), perceptual loss \(L_p\), flow generation loss \(L_f\) and consistency loss \(L_c\).

Results

Supplemental Video for ICCV 2019

Oral Video for ECCV 2020 workshop on Deep Internal Learning

Citation

@inproceedings{zhang2019internal,
        title={An Internal Learning Approach to Video Inpainting},
        author={Zhang, Haotian and Mai, Long and Xu, Ning and Wang, Zhaowen and Collomosse, John and Jin, Hailin},
        booktitle={Proceedings of the IEEE International Conference on Computer Vision},
        pages={2720--2729},
        year={2019}
      }