Seeing a Rose in Five Thousand Ways
arXiv preprint
Stanford University
University of Oxford
Cornell Tech, Cornell University
Stanford University

People where you live ... grow five thousand roses in one garden ... And yet what they’re looking for could be found in a single rose.

--- "The Little Prince" by Antoine de Saint-Exupery



Abstract

What is a rose, visually? A rose comprises its intrinsics, including the distribution of geometry, texture, and material specific to its object category. With knowledge of these intrinsic properties, we may render roses of different sizes and shapes, in different poses, and under different lighting conditions. In this work, we build a generative model that learns to capture such object intrinsics from a single image, such as a photo of a bouquet. Such an image includes multiple instances of an object type. These instances all share the same intrinsics, but appear different due to a combination of variance within these intrinsics and differences in extrinsic factors, such as pose and illumination. Experiments show that our model successfully learns object intrinsics (distribution of geometry, texture, and material) for a wide range of objects, each from a single Internet image. Our method achieves superior results on multiple downstream tasks, including intrinsic image decomposition, shape and image generation, view synthesis, and relighting.

Learning from Internet Images

After training on a single Internet image containing a group of similar objects, our method can robustly recover the geometry and texture of the object class, and capture the variation among the observed instances. Note that our method has no access to camera intrinsic or extrinsic parameters, object poses, or illumination conditions.


The first column below shows training inputs. The second and third columns show rendered results and corresponding geometry.
Try it yourself: Move the slider to change viewpoints, lighting, and identities.

input
00000_view_appearance
00000_view_shading
00000_light_appearance
00000_light_shading
00000_latent_appearance
00000_latent_shading

Input Image

input
00001_view_appearance
00001_view_shading
00001_light_appearance
00001_light_shading
00001_latent_appearance
00001_latent_shading

Input Image

input
00002_view_appearance
00002_view_shading
00002_light_appearance
00002_light_shading
00002_latent_appearance
00002_latent_shading

Input Image

input
00003_view_appearance
00003_view_shading
00003_light_appearance
00003_light_shading
00003_latent_appearance
00003_latent_shading

Input Image

input
00004_view_appearance
00004_view_shading
00004_light_appearance
00004_light_shading
00004_latent_appearance
00004_latent_shading

Input Image

input
00005_view_appearance
00005_view_shading
00005_light_appearance
00005_light_shading
00005_latent_appearance
00005_latent_shading

Input Image

input
00006_view_appearance
00006_view_shading
00006_light_appearance
00006_light_shading
00006_latent_appearance
00006_latent_shading

Input Image

input
00007_view_appearance
00007_view_shading
00007_light_appearance
00007_light_shading
00007_latent_appearance
00007_latent_shading

Input Image

input
00008_view_appearance
00008_view_shading
00008_light_appearance
00008_light_shading
00008_latent_appearance
00008_latent_shading

Input Image

input
00009_view_appearance
00009_view_shading
00009_light_appearance
00009_light_shading
00009_latent_appearance
00009_latent_shading

Input Image

input
00010_view_appearance
00010_view_shading
00010_light_appearance
00010_light_shading
00010_latent_appearance
00010_latent_shading

Input Image

BibTeX
@article{zhang2022seeing,
  title     = {Seeing a Rose in Five Thousand Ways},
  author    = {Yunzhi Zhang and Shangzhe Wu and Noah Snavely and Jiajun Wu},
  journal   = {arXiv preprint arXiv:2212.04965},
  year      = {2022}
}
Acknowledgement

We would like to thank Angjoo Kanazawa and Josh Tenenbaum for detailed comments and feedbacks, thank Ruocheng Wang and Kai Zhang for insightful discussions, and thank Yiming Dou and Koven Yu for their advice in data collection. This work is supported in part by the Stanford Institute for Human-Centered AI (HAI), NSF CCRI #2120095, NSF RI #2211258, ONR MURI N00014-22-1-2740, Amazon, Bosch, Ford, Google, and Samsung.
The website design is adapted from SunStage.