Computer science PhD graduate, 2014
Advised by Sebastian Thrun
I'm interested in robotics and machine learning. In particular, I'm focused on pushing the boundaries of machine perception using depth sensors and big datasets.
A more generic segmentation and tracking algorithm
The tracking-based semi-supervised learning method shown below requires the ability to segment and track objects before a label is attached to them. This is fairly easy to do in autonomous driving environments but quite difficult in more general scenes. Here, the algorithm is initialized with a segmentation on the first frame, which it then propagates through the rest of the video. The segmented object is shown in the pointcloud (right) in bold and color; everything else is small and gray.
See “Learning to segment and track in RGBD” at my papers page for more details, or skip right to the pdf.
If you're having trouble seeing what's going on in this video, click on the YouTube icon and view it in fullscreen there.
Tracking-based semi-supervised learning
By exploiting tracking information in a semi-supervised learning framework, a classifier trained with three hand-labeled training tracks can produce accuracy comparable to the fully-supervised equivalent (i.e. trained with many thousands of training tracks). This video shows laser track classifications projected into video for easy visualization.
Gray outlines show objects that were tracked in the laser and classified as neither person, bike, nor car.
I am highly excited about this result. Despite the ease with which humans can recognize objects, training a computer to do the same is extremely difficult, and when it is possible, it nearly always requires very large training sets. This requirement makes it impractical for regular people to, for example, train their smarthome to recognize their dog or train their automated farming equipment to recognize a particular type of weed in their fields. Using the techniques of this work, we can lift this requirement.
This is not at all to say the problem is entirely solved. There are many apparent failures in this video, but they are almost entirely due to segmentation and tracking failures. The missing piece now is robust segmentation and tracking.