Elements of Computer Vision

Object Recognition

• Text
• An in-depth discussion of one aspect of text recognition can be found in the Applications section.
• 3-D Recognition
• Alignment Method
In general, an object that is to be recognized by a computer vision system has to have an archetypal representation for the system to draw on. Maybe there is a library of represented objects, where each object consists of a series of points in a 3-D representation. In any image, the representation of the object to be identified has been submitted to unknown 3-D rotation, translation, and projection. Such a representation can be recognized if the vision system can understand that transformation.

One method is to match three points from the computer's model with three points in the image. The rotation and translation can then be computed and tested with the other points of the model to confirm the initial triad match. Here is a pseudo-code implementation of such a strategy, based on Russell & Norvig's version in Artificial Intelligence.

```align (image points, model points)  {
while (TRUE)  {
choose an untried triplet (p1, p2, p3) from the image
if (no untried image triplets left)  {
return (failure)
}
while (untried model triplets left)  {
choose an untried triplet (m1, m2, m3) from the model
TransF = findtransform (p1, p2,p 3, m1, m2, m3)
if (projection according to TransF explains image)  {
return (TransF)
}
}
}
}

findtransform (p1, p2,p3, m1, m2, m3) returns a transformation showing how the
points on the image are rotated, translated, and projected from the model.```

Such a search takes O(m4n3logn) time in worst case (where m and n are the number of points in the model and image, respectively) and shortcuts have brought that down to O(mn3). But the complexity of the search also depends on the number of models in the computer vision system's library.

• Projective Invariants
Geometric invariants of an object have the same relative value regardless of the object's 3-D projection or orientation.

The advantage to this is once an invariant relationship is identified in an image, it can be identified with a model's invariant -- other models no longer need to be examined. Another advantage is that by the nature invariant features, a computer's models can be derived from actual images, rather than archetypal models.

• Neural Networks and Face Detection
• Researchers at Carnegie Mellon University developed a neural network-based Face Detection System. It detects frontal views of faces in gray-scale images.

The system operates in two stages: it first applies a set of neural network-based filters to an image, and then uses an arbitrator to combine the filter outputs. The filter examines each location in the image at several scales, looking for locations that might contain a face. The arbitrator then merges detections from individual filters and eliminates overlapping detections. A bootstrap algorithm is used for training the networks.

This system does not run in real time: applying two networks to a 320x240 pixel image on a Sparc 20 takes about 590 seconds.

We scanned into the system some pictures of our class. The face detection process took only a few minutes. The results were quite astounding.