One method is to match three points from the computer's model with three points
in the image. The rotation and translation can then be computed and tested with
the other points of the model to confirm the initial triad match. Here is a
pseudo-code implementation of such a strategy, based on Russell & Norvig's
version in Artificial Intelligence.
Such a search takes O(m4n3logn) time in worst case (where m and n are the number of points in the model and image, respectively) and shortcuts have brought
that down to O(mn3). But the complexity of the search also depends on the number
of models in the computer vision system's library.
The advantage to this is once an invariant relationship is identified in an
image, it can be identified with a model's invariant -- other models no longer
need to be examined. Another advantage is that by the nature invariant features,
a computer's models can be derived from actual images, rather than archetypal
models.
The system operates in two stages: it first applies a set of neural network-based
filters to an image, and then uses an arbitrator to combine the filter outputs.
The filter examines each location in the image at several scales, looking for
locations that might contain a face. The arbitrator then merges detections from
individual filters and eliminates overlapping detections. A bootstrap algorithm
is used for training the networks.
This system does not run in real time: applying two networks to a 320x240 pixel
image on a Sparc 20 takes about 590 seconds.
We scanned into the system some pictures of our class. The face detection
process took only a few minutes. The results were quite astounding.
In general, an object that is to be recognized by a computer vision system has to
have an archetypal representation for the system to draw on. Maybe there is a
library of represented objects, where each object consists of a series of points
in a 3-D representation. In any image, the representation of the object to be
identified has been submitted to unknown 3-D rotation, translation, and
projection. Such a representation can be recognized if the vision system can
understand that transformation.
align (image points, model points) {
while (TRUE) {
choose an untried triplet (p1, p2, p3) from the image
if (no untried image triplets left) {
return (failure)
}
while (untried model triplets left) {
choose an untried triplet (m1, m2, m3) from the model
TransF = findtransform (p1, p2,p 3, m1, m2, m3)
if (projection according to TransF explains image) {
return (TransF)
}
}
}
}
findtransform (p1, p2,p3, m1, m2, m3) returns a transformation showing how the
points on the image are rotated, translated, and projected from the model.
Geometric invariants of an object have the same relative value regardless of the
object's 3-D projection or orientation.
Back to the Table of Contents.