Partial-View Object View Synthesis via Filtering Inversion

1Stanford University 2Nvidia Research 3Georgia Tech
Workshop XRNeRF, CVPR 2023
3DV 2024 (Spotlight)

Input Views (Top) and Reconstructions (Bottom)

Abstract

We propose Filtering Inversion (FINV), a learning framework and optimization process that predicts a renderable 3D object representation from one or few partial views. FINV addresses the challenge of synthesizing novel views of objects from partial observations, spanning cases where the object is not entirely in view, is partially occluded, or is only observed from similar views.

To achieve this, FINV learns shape priors by training a 3D generative model. At inference, given one or more views of a novel real-world object, FINV first finds a set of latent codes for the object by inverting the generative model from multiple initial seeds. Maintaining the set of latent codes, FINV filters and resamples them after receiving each new observation, akin to particle filtering. The generator is then finetuned for each latent code on the available views in order to adapt to novel objects.

We show that FINV successfully synthesizes novel views of real-world objects (e.g., chairs, tables, and cars), even if the generative prior is trained only on synthetic objects. The ability to address the sim-to-real problem allows FINV to be used for object categories without real-world datasets. FINV achieves state-of-the-art performance on multiple real-world datasets, recovers object shape and texture from partial and sparse views, is robust to occlusion, and is able to incrementally improve its representation with more observations.

Method

Phase I: given a set of observed view(s), our method first samples a set of latent codes and then optimizes those latent codes for creating a 3D model that matches observed view(s) semantically. Phase II: following the latent optimization, we freeze the latent codes and optimize the generator part of the network by fine-tuning on the observations. In each phase, the module highlighted in blue is frozen while the one in yellow is trained.


Filtering Inversion

The method first samples multiple latent codes (shown by the non-filled icons). Using an inversion update, we refine the sampled latent codes as shown by yellow icons (Eq. 1 in the paper). Then, we render and compare each latent code (using Eq. 2 in the paper) to decide which ones will be re-sampled or updated further, as shown by blue icons.

BibTeX

@article{sun2023partial,
  title={Partial-View Object View Synthesis via Filtered Inversion},
  author={Sun, Fan-Yun and Tremblay, Jonathan and Blukis, Valts and Lin, Kevin and Xu, Danfei and Ivanovic, Boris and Karkus, Peter and Birchfield, Stan and Fox, Dieter and Zhang, Ruohan and others},
  journal={arXiv preprint arXiv:2304.00673},
  year={2023}
}