Template matching is a technique in

digital image processing Digital image processing is the use of a digital computer to process digital images through an algorithm. As a subcategory or field of digital signal processing, digital image processing has many advantages over analog image processing. It allow ...

for finding small parts of an image which match a template image. It can be used in manufacturing as a part of quality control, a way to navigate a mobile robot, or as a way to detect edges in images. The main challenges in the template matching task are: occlusion, detection of non-rigid transformations, illumination and background changes, background clutter and scale changes.

Feature-based approach

Feature-based approach relies on the extraction of image features such, i.e. shapes, textures, colors, to match in the target image or frame. This approach is currently achieved by using

Neural Networks A neural network is a network or circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up of biological ...

and Deep Learning classifiers such as VGG, AlexNet, ResNet. Deep

Convolutional Neural Networks In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of artificial neural network (ANN), most commonly applied to analyze visual imagery. CNNs are also known as Shift Invariant or Space Invariant Artificial Neural Networ ...

process the image by passing it through different hidden layers and at each layer produce a vector with classification information about the image. These vectors are extracted from the network and are used as the features of the image.

Feature extraction In machine learning, pattern recognition, and image processing, feature extraction starts from an initial set of measured data and builds derived values ( features) intended to be informative and non-redundant, facilitating the subsequent learning ...

by using

Deep Neural Networks Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised. ...

is extremely effective and thus is the standard in state of the art template matching algorithms. This method is considered more robust and is state of the art as it can match templates with non-rigid and out of plane

transformation Transformation may refer to: Science and mathematics In biology and medicine * Metamorphosis, the biological process of changing physical form after birth or hatching * Malignant transformation, the process of cells becoming cancerous * Tran ...

, it can match with high background clutter and illumination changes.

Template-based approach

For templates without strong features, or for when the bulk of the template image constitutes the matching image, a template-based approach may be effective. As aforementioned, since template-based matching may potentially require sampling of a large number of points, it is possible to reduce the number of sampling points by reducing the resolution of the search and template images by the same factor and performing the operation on the resultant downsized images (multiresolution, or

pyramid A pyramid (from el, πυραμίς ') is a structure whose outer surfaces are triangular and converge to a single step at the top, making the shape roughly a pyramid in the geometric sense. The base of a pyramid can be trilateral, quadrila ...

), providing a search window of data points within the search image so that the template does not have to search every viable data point, or a combination of both.

Motion tracking and occlusion handling

In instances where the template may not provide a direct match, it may be useful to implement the use of

eigenspace In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted ...

s – templates that detail the matching object under a number of different conditions, such as varying perspectives, illuminations, color contrasts, or acceptable matching object “poses”. For example, if the user was looking for a face, the eigenspaces may consist of images (templates) of faces in different positions to the camera, in different lighting conditions, or with different expressions. It is also possible for the matching image to be obscured, or occluded by an object; in these cases, it is unreasonable to provide a multitude of templates to cover each possible occlusion. For example, the search image may be a playing card, and in some of the search images, the card is obscured by the fingers of someone holding the card, or by another card on top of it, or any object in front of the camera for that matter. In cases where the object is malleable or poseable, motion also becomes a problem, and problems involving both motion and occlusion become ambiguous. In these cases, one possible solution is to divide the template image into multiple sub-images and perform matching on each subdivision.

Deformable templates in computational anatomy

Template matching is a central tool in

Computational anatomy Computational anatomy is an interdisciplinary field of biology focused on quantitative investigation and modelling of anatomical shapes variability. It involves the development and application of mathematical, statistical and data-analytical metho ...

(CA). The deformable template model models the space of human anatomies and orbits under the group action of diffeomorphisms. Template matching arise as a problem in matching the unknown diffeomorphism that acts on the template to match the target image. Template matching algorithms in CA have come to be called

large deformation diffeomorphic metric mapping Large deformation diffeomorphic metric mapping (LDDMM) is a specific suite of algorithms used for diffeomorphic mapping and manipulating dense imagery based on diffeomorphic metric mapping within the academic discipline of computational anatomy, ...

(LDDMM); there are now LDDMM template matching algorithms for matching landmark points, curves, surfaces, volumes.

Template-based matching explained using cross correlation or sum of absolute differences

A basic method of template matching uses an image patch (template), tailored to a specific feature of the search image, which we want to detect. This technique can be easily performed on grey images or

edge Edge or EDGE may refer to: Technology Computing * Edge computing, a network load-balancing system * Edge device, an entry point to a computer network * Adobe Edge, a graphical development application * Microsoft Edge, a web browser developed b ...

images. The cross correlation output will be highest at places where the image structure matches the mask structure, where large image values get multiplied by large mask values. This method is normally implemented by first picking out a part of the search image to use as a template: We will call the search image S(x, y), where (x, y) represent the coordinates of each pixel in the search image. We will call the template T(x _t, y _t), where (x_t, y_t) represent the coordinates of each pixel in the template. We then simply move the center (or the origin) of the template T(x _t, y _t) over each (x, y) point in the search image and calculate the sum of products between the coefficients in S(x, y) and T(x_t, y_t) over the whole area spanned by the template. As all possible positions of the template with respect to the search image are considered, the position with the highest score is the best position. This method is sometimes referred to as 'Linear Spatial Filtering' and the template is called a ''filter mask'' . Another way to handle translation problems on images using template matching is to compare the intensities of the

pixel In digital imaging, a pixel (abbreviated px), pel, or picture element is the smallest addressable element in a raster image, or the smallest point in an all points addressable display device. In most digital display devices, pixels are the s ...

s, using the SAD (

Sum of absolute differences In digital image processing, the sum of absolute differences (SAD) is a measure of the similarity between image blocks. It is calculated by taking the absolute difference between each pixel in the original block and the corresponding pixel in the ...

) measure. A pixel in the search image with coordinates (x_s, y_s) has intensity I_s(x_s, y_s) and a pixel in the template with coordinates (x_t, y_t) has intensity I_t(x_t, y_t ). Thus the

absolute difference The absolute difference of two real numbers x and y is given by , x-y, , the absolute value of their difference. It describes the distance on the real line between the points corresponding to x and y. It is a special case of the Lp distance for ...

in the pixel intensities is defined as Diff(x_s, y_s, x _t, y _t) = , I_s(x_s, y_s) – I_t(x _t, y _t) , .

SAD(x, y) = \sum_^\sum_^

The mathematical representation of the idea about looping through the pixels in the search image as we translate the origin of the template at every pixel and take the SAD measure is the following:

\sum_^\sum_^

S_rows and S_cols denote the rows and the columns of the search image and T_rows and T_cols denote the rows and the columns of the template image, respectively. In this method the lowest SAD score gives the estimate for the best position of template within the search image. The method is simple to implement and understand, but it is one of the slowest methods.''

Implementation

In this simple implementation, it is assumed that the above described method is applied on grey images: This is why Grey is used as pixel intensity. The final position in this implementation gives the top left location for where the template image best matches the search image. minSAD = VALUE_MAX; // loop through the search image for ( size_t x = 0; x <= S_cols - T_cols; x++ ) One way to perform template matching on color images is to decompose the

s into their color components and measure the quality of match between the color template and search image using the sum of the SAD computed for each color separately.

Speeding up the process

In the past, this type of spatial filtering was normally only used in dedicated hardware solutions because of the computational complexity of the operation, however we can lessen this complexity by filtering it in the frequency domain of the image, referred to as 'frequency domain filtering,' this is done through the use of the

convolution theorem In mathematics, the convolution theorem states that under suitable conditions the Fourier transform of a convolution of two functions (or signals) is the pointwise product of their Fourier transforms. More generally, convolution in one domain (e. ...

. Another way of speeding up the matching process is through the use of an image pyramid. This is a series of images, at different scales, which are formed by repeatedly filtering and subsampling the original image in order to generate a sequence of reduced resolution images. These lower resolution images can then be searched for the template (with a similarly reduced resolution), in order to yield possible start positions for searching at the larger scales. The larger images can then be searched in a small window around the start position to find the best template location. Other methods can handle problems such as translation, scale, image rotation and even all affine transformations.

Improving the accuracy of the matching

Improvements can be made to the matching method by using more than one template (eigenspaces), these other templates can have different scales and rotations. It is also possible to improve the accuracy of the matching method by hybridizing the feature-based and template-based approaches. Naturally, this requires that the search and template images have features that are apparent enough to support feature matching.

Similar methods

Other methods which are similar include ' Stereo matching', '

Image registration Image registration is the process of transforming different sets of data into one coordinate system. Data may be multiple photographs, data from different sensors, times, depths, or viewpoints. It is used in computer vision, medical imaging, mili ...

' and '

Scale-invariant feature transform The scale-invariant feature transform (SIFT) is a computer vision algorithm to detect, describe, and match local '' features'' in images, invented by David Lowe in 1999. Applications include object recognition, robotic mapping and navigation, ...

Examples of use

Template matching has various applications and is used in such fields as face recognition (see

facial recognition system A facial recognition system is a technology capable of matching a human face from a digital image or a video frame against a database of faces. Such a system is typically employed to authenticate users through ID verification services, and wor ...

) and medical image processing. Systems have been developed and used in the past to count the number of faces that walk across part of a bridge within a certain amount of time. Other systems include automated calcified nodule detection within digital chest X-rays. Recently, this method was implemented in geostatistical simulation which could provide a fast algorithm.Tahmasebi, P., Hezarkhani, A., Sahimi, M., 2012
Multiple-point geostatistical modeling based on the cross-correlation functions
Computational Geosciences, 16(3):779-79742.

References

External links

Template Matching in OpenCVVisual Object Recognition using Template Matching
* ttp://campar.in.tum.de/Main/AndreasHofhauser perspective-invariant template matchingbr>An extensive template matching bibliography up to 2009
{{DEFAULTSORT:Template Matching Image processing