Idea of Visual Perception

Human Perception

Computer vision is all about trying to make a computer take in an image and make sense of it. Just like you can look around at something or some things and make judgments about what you see, we'd like a computer to be able to do the same thing. This process of seeing may not seem like a difficult task at all. If you were asked to write a 25 page paper or watch a sunset, chances are you'd pick watching the sunset. Despite this fact, seeing is not really as easy as we may think it is. It comes easily to us because it is an unconscious process, whereas writing, for example, takes conscious effort. In viewing an image, the retina of the human eye performs the equivalent of close to ten billion calculations per second before sending that image to the optic nerve. It contains about one hundred million rods and cones, and the eye itself contains four layers of neurons other than the retina. In addition, an image is hardly simple itself: as M. Waldrop describes in his book Man-Made Minds, "depending on what part of [a picture] you're looking at, the actual intensity of light and color at your chosen point is a function of the color and texture of [that part], the orientation of the patch of surface, the color and intensity of the lighting, the direction of the lighting, the transparency of the intervening atmosphere, the position of shadows, ad infinitum." (Waldrop 89) Moreover, the eye does not simply scan over an image and ignore the consequences (like the passive operation of a camera) -- instead it analyzes the information it receives as well. The human visual process determines whether a small object in its line of sight is actually small or just far away by processing information about a 2-Dimensional picture to a 3-D form. It can also recognize a type of object regardless of what form it may be in -- a small chair, a swivel chair, a purple chair, an armless chair, etc. are all perceived as chairs. Humans can also perceive from memory. If you're driving at night and an oncoming car's headlights impair your ability to see the road, you can continue driving because you can remember where the road is supposed to be. Or, if somebody throws a ball into the air and sunlight blocks your view of it, you don't walk away assuming it's disappeared forever -- you know it will come back down. Clearly, millions of years of evolution have created a visual process that is extremely intricate.

As a result, teaching a computer to see in the same way that humans see is not an easy project.

The Sensory Equation

The problem inherent in computer vision, in fact, the very purpose of the field, is to recover information about the world from sensory input. This can be thought about as a formula: S = f (W). Our sensory information is a function of the world around us. What humans take for granted, and what the field of Computer Vision struggles to make machines do, is the reverse: W = f -1(S) -- understand the world from sensory information.

Back to the Table of Contents.