Anticipating which activities will a human do next (and how) can enable an assistive robot to plan ahead for reactive responses in human environments. We propose a graphical model that captures the rich context of activities and object affordances, and obtain the distribution over a large space of future human activities. Tested on robots performing reactive tasks based on anticipations.
For reasoning about human environments, it is critical we reason through humans. By observing online 3D data such as on Google 3D warehouse or RGB-D scenes, robots learn how humans use the objects and the environments. Applied to robotic arrangement of objects and 3D scene labeling.
Being able to grasp and pick up objects is critical for a robot to interact with human environments in useful ways. Although a robot should be able to reason about how to grasp any object, even one it has not seen before, it can be difficult to design good features which allow it to do so. In this work, we use a deep neural network to learn these features instead, both avoiding the need to hand-engineer them, and improving the performance of our grasp detection system.
Learning algorithms to predict robotic placements, even for objects of types never seen before by the robot. pplied to tasks such as arranging a cluttered room, loading items onto a dish-rack, or putting items in a fridge, etc.
Being able to detect human activities is important for making personal assistant robots useful in performing assistive tasks. Our CAD dataset comprises twelve different activities (composed of several sub-activities) performed by four people in different environments, such as a kitchen, a living room, and office, etc. Tested on robots reactively responding to the detected activities. (Code + CAD dataset available).
Learning algorithms to understand the 3D structure of the scenes.
Learning algorithms to predict depth and infer 3-d models, given just a single still image. Applications included creating immersive 3-d experience from users' photos, improving performance of stereovision, creating large-scale models from a few images, robot navigation, etc. Tens of thousands of users have converted their single photographs into 3D models.
Learning algorithms to predict robotic grasps, even for objects of types never seen before by the robot. Applied to tasks such as unloading items from a dishwasher, clearing up a cluttered table, opening new doors, etc.
Holistic scene understanding requires solving several tasks simultaneously, including object detection, scene categorization, labeling of meaningful regions, and 3-d reconstruction. We develop a learning method that couples these individual sub-tasks for improving performance in each of them.
Use monocular depth perception and reinforcement learning techniques to drive a small rc-car at high speeds in unstructured environments. Also fly a indoor helicopters/quadrotors autonomously using a single onboard camera.
For a robot to practically deployed in home and office environments, they should be able to manipulate their environment to gain access to new spaces. We present learning algorithms to do so, thus making our robot the first one able to navigate anywhere in a new building by opening doors and elevators, even ones it has never seen before.
The ability to perform monaural (single-ear) localization is important to many animals; indeed, monaural cues are also the primary method by which humans decide if a sound comes from the front or back, as well as estimate its elevation. In this paper, we propose a machine learning approach to monaural localization, using only a single microphone and an "artificial pinna" (that distorts sound in a direction-dependent way).
We propose novel optical proximity sensors for improving grasping. These sensors, mounted on fingertips, allow pre-touch pose estimation, and therefore allow for online grasp adjustments to an initial grasp point without the need for premature object contact or regrasping strategies.
We developed algorithms to automatically modify videos by adding textures in them. Our algorithms perform robust tracking, occlusion inference, and color correction to make the texture look part of the original scene.
Use monocular depth perception and reinforcement learning techniques to drive a small rc-car at high speeds in unstructured environments.
Create 3-d models of large environments, given only a small number of (possibly) non-overlapping images. This technique integrates Structure from Motion (SFM) techniques with Make3D's single image depth perception algorithms.
Stereovision is fundamentally limited by the baseline distance between the two cameras. I.e., the depth estimates tend to be inaccurate when the distances considered are large. We believe that monocular visual cues give largely orthogonal, and therefore complementary, types of information about depth. We propose a method to incorporate monocular cues to stereo (triangulation) cues to obtain significantly more accurate depth estimates than is possible with either alone.
This device uses accelerometers and gyrometers to estimate its 3-d location and 3-d orientation. This device can be used, for example, to conveniently navigate in a 3-d virtual world.
Isomaps (for non-linear dimensionality reduction) suffer from the problem of short-circuiting, which occurs when the neighborhood distance is larger than the distance between the folds in the manifolds. We proposed a new variant of Isomap algorithm based on local linear properties of manifolds to increase its robustness to short-circuiting.
The issue of what data is there to learn from is at the heart of all learning algorithms---often even an inferior learning algorithm will outperform a superior one, if it is given more data to learn from. We proposed a novel and practical solution to the dataset collection problem; we first use a green screen to rapidly collect data and then use a probabilistic model to rapidly synthesize a much larger training set. We used this data to build reliable classifiers for our robots.
Infer facial expressions (e.g., smile, surprise, disgust, etc.) given an image of a face. This algorithm builds a sparse geometric model of face, and uses the parameters of the geometric model as features in a learning algorithm. Reasonably robust to partial occlusions. In a similar project, we use a web camera to track the hand and to infer the hand gestures for controlling a simple computer GUI. (No other equipment such as gloves were needed.)
We described a simple, bioinspired approach for the conversion of an insulator, polystyrene, to a moderately conducting polymer by introducing adenine nucleobases.
We developed a electronic device that when worn as a wrist-watch protects the person from electric shocks. It monitors the skin potentials continuously and trips the power circuit wirelessly to save the person's life.
Other projects See publications page for more.