Part-based models refers to a broad class of detection algorithms used on images, in which various parts of the image are used separately in order to determine if and where an object of interest exists. Amongst these methods a very popular one is the constellation model which refers to those schemes which seek to detect a small number of features and their relative positions to then determine whether or not the object of interest is present. These models build on the original idea of Fischler and Elschlager of using the relative position of a few template matches and evolve in complexity in the work of Perona and others. These models will be covered in the constellation models section. To get a better idea of what is meant by constellation model an example may be more illustrative. Say we are trying to detect faces. A constellation model would use smaller part detectors, for instance mouth, nose and eye detectors and make a judgment about whether an image has a face based on the relative positions in which the components fire.

Non-constellation models

Many overlapping ideas are included under the title part-based models even after having excluded those models of the constellation variety. The uniting thread is the use of small parts to build up to an algorithm that can detect/recognize an item (face, car, etc.) Early efforts, such as those by Yuille, Hallinan and Cohen sought to detect facial features and fit deformable templates to them. These templates were mathematically defined outlines which sought to capture the position and shape of the feature. Yuille, Hallinan and Cohen's algorithm does have trouble finding the

global minimum In mathematical analysis, the maxima and minima (the respective plurals of maximum and minimum) of a function, known collectively as extrema (the plural of extremum), are the largest and smallest value of the function, either within a given r ...

fit for a given model and so templates did occasionally become mismatched. Later efforts such as those by Poggio and Brunelli focus on building specific detectors for each feature. They use successive detectors to estimate scale, position, etc. and narrow the search field to be used by the next detector. As such it is a part-based model, however, they seek more to recognize specific faces rather than to detect the presence of a face. They do so by using each detector to build a 35 element vector of characteristics of a given face. These characteristic can then be compared to recognize specific faces, however cut-offs can also be used to detect whether a face is present at all. Cootes, Lanitis and Taylor build on this work in constructing a 100 element representation of the primary features of a face. The model is more detailed and robust however, given the additional complexity (100 elements compared to 35) this might be expected. The model essentially computes deviations from a mean face in terms of shape, orientation and gray level. The model is matched by the minimization of an

error function In mathematics, the error function (also called the Gauss error function), often denoted by , is a complex function of a complex variable defined as: :\operatorname z = \frac\int_0^z e^\,\mathrm dt. This integral is a special (non- elementa ...

. These three classes of algorithms naturally fall within the scope of

template matching Template matching is a technique in digital image processing for finding small parts of an image which match a template image. It can be used in manufacturing as a part of quality control, a way to robotic navigation, navigate a mobile robot, or a ...

Of the non-constellation perhaps the most successful is that of Leibe and Schiele.{{Cite book, doi=10.1007/11957959_26, citeseerx = 10.1.1.5.6272, chapter=An Implicit Shape Model for Combined Object Categorization and Segmentation, title=Toward Category-Level Object Recognition, series=Lecture Notes in Computer Science, year=2006, last1=Leibe, first1=Bastian, last2=Leonardis, first2=Ales, last3=Schiele, first3=Bernt, isbn=978-3-540-68794-8, volume=4170, pages=508 Their algorithm finds templates associated with positive examples and records both the template (an average of the feature in all positive examples where it is present) and the position of the center of the item (a face for instance) relative to the template. The algorithm then takes a test image and runs an interest point locater (hopefully one of the

scale invariant In physics, mathematics and statistics, scale invariance is a feature of objects or laws that do not change if scales of length, energy, or other variables, are multiplied by a common factor, and thus represent a universality. The technical term ...

variety). These interest points are then compared to each template and the probability of a match is computed. All templates then cast votes for the center of the detected object proportional to the probability of the match, and the probability the template predicts the center. These votes are all summed and if there are enough of them, well enough clustered, the presence of the object in question (i.e. a face or car) is predicted. The algorithm is effective because it imposes much less constellational rigidity the way the constellation model does. Admittedly the constellation model can be modified to allow for occlusions and other large abnormalities but this model is naturally suited to it. Also it must be said that sometimes the more rigid structure of the constellation is desired.

References

Object recognition and categorization

Non-constellation models

See also

References