The AdobeIndoorNav Dataset: Towards Deep Reinforcement Learning based Real-world Indoor Robot Visual Navigation

Kaichun Mo1,*     Haoxiang Li2     Zhe Lin2     Joon-Young Lee2    
1 Stanford University     2 Adobe Research

*This work is done while Kaichun Mo is a research intern at Adobe Research.

[arXiv version] [Code and Data (GitHub)] [BibTex]

Figure 1. The AdobeIndoorNav Dataset and other 3D scene datasets. Our dataset supports research on robot visual navigation in real-world scenes. It provides visual inputs given a robot position: (a) the original 3D point cloud reconstruction; (b) the densely sampled locations shown on 2D scene map; (c) four examples RGB images captured by robot camera and their corresponding locations and poses. Sample views from 3D synthetic and real-world recontructed scene datasets: (d) Observation images from two synthetic datasets: SceneNet RGB-D and AI2-THOR; (e) Rendered images from two real-world scene datasets: Stanford 2D-3D-S and ScanNet.


Deep reinforcement learning (DRL) demonstrates its potential in learning a model-free navigation policy for robot visual navigation. However, the data-demanding algorithm relies on a large number of navigation trajectories in training. Existing datasets supporting training such robot navigation algorithms consist of either 3D synthetic scenes or reconstructed scenes. Synthetic data suffers from domain gap to the real-world scenes while visual inputs rendered from 3D reconstructed scenes have undesired holes and artifacts. In this paper, we present a new dataset collected in real-world to facilitate the research in DRL based visual navigation. Our dataset includes 3D reconstruction for real-world scenes as well as densely captured real 2D images from the scenes. It provides high-quality visual inputs with real-world scene complexity to the robot at dense grid locations. We further study and benchmark one recent DRL based navigation algorithm and present our attempts and thoughts on improving its generalizability to unseen test targets in the scenes.

Data Collection Pipeline

Figure 2. The Pipeline to Collect the AdobeIndoorNav Datatset and the Robot Setting. (a) 3D reconstruction of the scene is obtained by the Tango device; (b) A 2D obstacle map is generated from the 3D point cloud and it indicates the area where robots can navigate; (c) A 2D laser-scan map is generated from the 3D point cloud and it is used later for robot to do localization; (d) Densely sampled grid locations are generated on the 2D obstacle map; (e) Robot runs in the real scenes and captures the RGB-D and panoramic images at all grid locations; (f) Our Turtlebot equipped with one RGB-D camera, one 360 panoramic camera and a series of laser scanners.