Their work centered on developing a platform with three cameras, each focused at a different distance: .4 m, 1 m, and 2 m. The images were then fed into an on-board computer. As you can see in the examples below, the part of the image that is in focus (and thus at a certain distance away) is the sharpest part.

To define "sharp" in a computer, each pixel is given a value based on the intensity differences of neighboring pixels. In this case, an 8 x 5 depth map is generated, with each pixel filled with either a 0, 1, or a 2 based on which of the cameras was in focus -- also indicating the distance to the nearest object at that point.
From that information only, the robot was able to safely maneuver both indoors and outdoors, avoiding many obstacles, including humans actively trying to confuse it.