NEW [Oct 22, 2019] Errata: Page 11, in Eq. (25), S is the testing set shapes, instead of the training shapes.
[July 27, 2019] Paper gets accepted. Code and Data is released.
The ability to generate novel, diverse, and realistic 3D shapes along with associated part semantics and structure is central to many applications requiring high-quality 3D assets or large volumes of realistic training data. A key challenge towards this goal is how to accommodate diverse shape , including both continuous deformations of parts as well as structural or discrete alterations which add to, remove from, or modify the shape constituents and compositional structure. Such object structure can typically be organized into a hierarchy of constituent object parts and relationships, represented as a hierarchy of n-ary graphs. We introduce StructureNet, a hierarchical graph network which (i) can directly encode shapes represented as such n-ary graphs; (ii) can be robustly trained on large and complex shape families; and (iii) be used to generate a great diversity of realistic structured shape geometries. Technically, we accomplish this by drawing inspiration from recent advances in graph neural networks to propose an order-invariant encoding of n-ary graphs, considering jointly both part geometry and inter-part relations during network training. We extensively evaluate the quality of the learned latent spaces for various shape families and show significant advantages over baseline and competing methods. The learned latent spaces enable several structure-aware geometry processing applications, including shape generation and interpolation, shape editing, or shape structure discovery directly from un-annotated images, point clouds, or partial scans.
Figure 1. StructureNet is a hierarchical graph network that produces a unified latent space to encode structured models with both continuous geometric and discrete structural variations. In this example, we projected an un-annotated point cloud (left) and un-annotated image (right) into the learned latent space yielding semantically segmented point clouds structured as a hierarchy of graphs. The shape interpolation in the latent space also produces structured point clouds (top) including their corresponding graphs (bottom). Edges correspond to specific part relationships that are modeled by our approach. For simplicity, here we only show the graphs without the hierarchy. Note how the base of the chair morphs via functionally plausible intermediate configurations, or the chair back transitions from a plain back to a back with arm-rests.
Figure 2. Network Architecture. Our variational autoencoder consists of two encoders and two decoders that both operate on our shape representation. The geometry encoder egeo encodes the geometry of a part into a fixed-length feature vector f, illustrated with a gray circle. The graph encoder egraph encodes the feature vectors of each part in a graph, and the relationships among parts, into a feature vector of the same size using graph convolutions. The graph encoder is applied recursively to obtain a feature vector z that encodes the entire shape. The reverse process is performed by the graph and geometry decoders d_graph and d_geo to reconstruct the shape. The decoder also recovers the geometry of non-leaf nodes.
Figure 3. Generated Shapes. We show shapes in all categories decoded from random latent vectors, including shapes with bounding box geometry, and shapes with point cloud geometry. Parts are colored according to semantics, see the Supplementary for the full semantic hierarchy for each category. Since we explicitly encode shape structure in our latent representation, the generated shapes have a large variety of different structures.
Figure 4. Part Interpolation. We interpolate either only the backrest (first row) or only the base (second row) between the chairs on the left and right side. Intermediate shapes preserve structural plausibility of the interpolated result mainly through geometric differences to the target part, but faithfully interpolate the structure. We observe that these interpolations are not necessarily symmetric: the base interpolations follow different paths to be compatible with the different back styles.
Figure 5. Shape Abstraction. Images, synthetic point clouds, and real-world scans from ScanNet [Dai et al. 2017] are embedded into our learned latent space, allowing us to effectively recover a full shape description that matches the raw input.
Figure 6. Structure-aware Shape Editing. We show editing results on two shapes with box geometry (first four rows) and two shapes with point cloud geometry (two bottom rows). For the two shapes with box geometry, we perform five different edits each, one edit per column. The edited box is highlighted in yellow, and the result is shown below. We see that the other boxes in the shape are adjusted to maintain shape plausibility. For the two shapes with point cloud geometry, we show intermediate results for one edit each. From left to right, these are (a) the original point cloud; (b) the predicted box abstraction; (c) the induced segmentation; (d) edited boxes; and (e) the induced edit of the point-cloud.
This project was supported by a Vannevar Bush Faculty Fellowship, NSF grant RI-1764078, NSF grant CCF-1514305, a Google Research award, an ERC Starting Grant (SmartGeometry StG-2013-335373), ERC PoC Grant (SemanticCity), Google Faculty Awards, Google PhD Fellowships, Royal Society Advanced Newton Fellowship, KAUST OSR number CRG2017-3426 and gifts from Adobe, Autodesk and Qualcomm. We especially thank Kun Liu, Peilang Zhu, Yan Zhang, and Kai Xu for the help preparing binary symmetry hierarchies for GRASS baselines on PartNet. We also thank the anonymous reviewers for their fruitful suggestions.