Human-designed visual manuals are crucial components in shape assembly activities. They provide step-by-step guidance on how we should move and connect different parts in a convenient and physically-realizable way. While there has been an ongoing effort in building agents that perform assembly tasks, the information in human-design manuals has been largely overlooked. We identify that this is due to 1) a lack of realistic 3D assembly objects that have paired manuals and 2) the difficulty of extracting structured information from purely image-based manuals. Motivated by this observation, we present IKEA-Manual, a dataset consisting of 102 IKEA objects paired with assembly manuals. We provide fine-grained annotations on the IKEA objects and assembly manuals, including decomposed assembly parts, assembly plans, manual segmentation, and 2D-3D correspondence between 3D parts and visual manuals. We illustrate the broad application of our dataset on four tasks related to shape assembly: assembly plan generation, part segmentation, pose estimation and 3D part assembly.
We present IKEA-Manual, a dataset for step-by-step understanding of shape assembly from 3D models and human-designed visual manuals. (a) IKEA-Manual contains 102 3D IKEA objects paired with human-designed visual manuals, where each object is decomposed into primitive assembly parts that match manuals shown in different colors. (b) The original IKEA manuals provide step-by-step guidance on the assembly process by showing images of how parts are connected. (c) We extract a high-level, tree-structured assembly plan from the visual manual, specifying how parts are connected during the assembly process. (d) For each step, we provide dense visual annotation such as 2D part segmentation and 2D-3D correspondence between 2D manual images and 3D parts.