In the fields of
computer vision
Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the hum ...
and
image analysis
Image analysis or imagery analysis is the extraction of meaningful information from images; mainly from digital images by means of digital image processing techniques. Image analysis tasks can be as simple as reading bar coded tags or as sophi ...
, the Harris affine region detector belongs to the category of
feature detection. Feature detection is a preprocessing step of several algorithms that rely on identifying characteristic points or
interest points so to make correspondences between images, recognize textures, categorize objects or build panoramas.
Overview
The Harris affine detector can identify similar regions between images that are related through
affine transformations
In Euclidean geometry, an affine transformation or affinity (from the Latin, ''affinis'', "connected with") is a geometric transformation that preserves lines and parallelism, but not necessarily Euclidean distances and angles.
More generally, ...
and have different illuminations. These ''affine-invariant'' detectors should be capable of identifying similar regions in images taken from different viewpoints that are related by a simple geometric transformation: scaling, rotation and shearing. These detected regions have been called both ''invariant'' and ''covariant''. On one hand, the regions are detected ''invariant'' of the image transformation but the regions ''covariantly'' change with image transformation.
[ Do not dwell too much on these two naming conventions; the important thing to understand is that the design of these interest points will make them compatible across images taken from several viewpoints. Other detectors that are affine-invariant include Hessian affine region detector, ]Maximally stable extremal regions
In computer vision, maximally stable extremal regions (MSER) are used as a method of blob detection in images. This technique was proposed by Matas et al.J. Matas, O. Chum, M. Urban, and T. Pajdla"Robust wide baseline stereo from maximally stable ...
, Kadir–Brady saliency detector
The Kadir–Brady saliency detector extracts features of objects in images that are distinct and representative. It was invented by Timor Kadir and J. Michael Brady in 2001 and an affine invariant version was introduced by Kadir and Brady in 200 ...
, edge-based regions (EBR) and intensity-extrema-based regions (IBR).
Mikolajczyk and Schmid (2002) first described the Harris affine detector as it is used today i
Affine Invariant Interest Point Detector''
[Mikolajcyk, K. and Schmid, C. 2002. An affine invariant interest point detector. In ''Proceedings of the 8th International Conference on Computer Vision'', Vancouver, Canada.]
/ref> Earlier works in this direction include use of affine shape adaptation
Affine shape adaptation is a methodology for iteratively adapting the shape of the smoothing kernels in an affine group of smoothing kernels to the local image structure in neighbourhood region of a specific image point. Equivalently, affine shape ...
by Lindeberg and Garding for computing affine invariant image descriptors and in this way reducing the influence of perspective image deformations,[ depth cues from affine distortions of local 2- structure". Image and Vision Computing 15: pp 415—434.">T. Lindeberg and J. Garding (1997). "Shape-adapted smoothing in estimation of 3- depth cues from affine distortions of local 2- structure". Image and Vision Computing 15: pp 415—434.]
/ref> the use affine adapted feature points for wide baseline matching by Baumberg and the first use of scale invariant feature points by Lindeberg;
[[http://kth.diva-portal.org/smash/record.jsf?pid=diva2%3A453064&dswid=7766 T. Lindeberg (1998). "Feature detection with automatic scale selection". International Journal of Computer Vision 30 (2): pp 77—116.]] for an overview of the theoretical background. The Harris affine detector relies on the combination of corner points detected through Corner detection#Harris corner, Harris corner detection, multi-scale analysis through Gaussian scale space and affine normalization using an iterative affine shape adaptation
Affine shape adaptation is a methodology for iteratively adapting the shape of the smoothing kernels in an affine group of smoothing kernels to the local image structure in neighbourhood region of a specific image point. Equivalently, affine shape ...
algorithm. The recursive and iterative algorithm follows an iterative approach to detecting these regions:
# Identify initial region points using scale-invariant Harris–Laplace detector.
# For each initial point, normalize the region to be affine invariant using affine shape adaptation
Affine shape adaptation is a methodology for iteratively adapting the shape of the smoothing kernels in an affine group of smoothing kernels to the local image structure in neighbourhood region of a specific image point. Equivalently, affine shape ...
.
# Iteratively estimate the affine region: selection of proper integration scale, differentiation scale and spatially localize interest points..
# Update the affine region using these scales and spatial localizations.
# Repeat step 3 if the stopping criterion is not met.
Algorithm description
Harris–Laplace detector (initial region points)
The Harris affine detector relies heavily on both the Harris measure and a Gaussian scale space representation
Scale-space theory is a framework for multi-scale signal representation developed by the computer vision, image processing and signal processing communities with complementary motivations from physics and biological vision. It is a formal th ...
. Therefore, a brief examination of both follow. For a more exhaustive derivations see corner detection
Corner detection is an approach used within computer vision systems to extract certain kinds of features and infer the contents of an image. Corner detection is frequently used in motion detection, image registration, video tracking, image mosai ...
and Gaussian scale space or their associated papers.[C. Harris and M. Stephens (1988). "A combined corner and edge detector". Proceedings of the 4th Alvey Vision Conference: pages 147—151.]
Harris corner measure
The Harris corner detector algorithm relies on a central principle: at a corner, the image intensity will change largely in multiple directions. This can alternatively be formulated by examining the changes of intensity due to shifts in a local window. Around a corner point, the image intensity will change greatly when the window is shifted in an arbitrary direction. Following this intuition and through a clever decomposition, the Harris detector uses the second moment matrix as the basis of its corner decisions. (See corner detection
Corner detection is an approach used within computer vision systems to extract certain kinds of features and infer the contents of an image. Corner detection is frequently used in motion detection, image registration, video tracking, image mosai ...
for more complete derivation). The matrix , has also been called the autocorrelation matrix and has values closely related to the derivatives of image intensity.
:
where and are the respective derivatives
The derivative of a function is the rate of change of the function's output relative to its input value.
Derivative may also refer to:
In mathematics and economics
* Brzozowski derivative in the theory of formal languages
* Formal derivative, an ...
(of pixel intensity) in the and direction at point (,); and are the position parameters of the weighting function w. The off-diagonal entries are the product of and , while the diagonal entries are squares of the respective derivatives
The derivative of a function is the rate of change of the function's output relative to its input value.
Derivative may also refer to:
In mathematics and economics
* Brzozowski derivative in the theory of formal languages
* Formal derivative, an ...
. The weighting function can be uniform, but is more typically an isotropic, circular Gaussian,
:
that acts to average in a local region while weighting those values near the center more heavily.
As it turns out, this matrix describes the shape of the autocorrelation measure as due to shifts in window location. Thus, if we let and be the eigenvalues of , then these values will provide a quantitative description of how the autocorrelation measure changes in space: its principal curvatures. As Harris and Stephens (1988) point out, the matrix centered on corner points will have two large, positive eigenvalues.[ Rather than extracting these eigenvalues using methods like singular value decomposition, the Harris measure based on the trace and determinant is used:
:
where is a constant. Corner points have large, positive eigenvalues and would thus have a large Harris measure. Thus, corner points are identified as local maxima of the Harris measure that are above a specified threshold.
:
where are the set of all corner points, is the Harris measure calculated at , is an 8-neighbor set centered on and is a specified threshold.
]
Gaussian scale-space
A Gaussian scale space representation
Scale-space theory is a framework for multi-scale signal representation developed by the computer vision, image processing and signal processing communities with complementary motivations from physics and biological vision. It is a formal th ...
of an image is the set of images that result from convolving a Gaussian kernel of various sizes with the original image. In general, the representation can be formulated as:
:
where is an isotropic, circular Gaussian kernel as defined above. The convolution with a Gaussian kernel smooths the image using a window the size of the kernel. A larger scale, , corresponds to a smoother resultant image. Mikolajczyk and Schmid (2001) point out that derivatives and other measurements must be normalized across scales.[K. Mikolajczyk and C. Schmid. Indexing based on scale invariant interest points. In Proceedings of the 8th International Conference on Computer Vision, Vancouver, Canada, pages 525-531, 2001.]
/ref> A derivative of order , , must be normalized by a factor in the following manner:
:
These derivatives, or any arbitrary measure, can be adapted to a scale space representation by calculating this measure using a set of scales recursively where the th scale is . See scale space
Scale-space theory is a framework for multi-scale signal representation developed by the computer vision, image processing and signal processing communities with complementary motivations from physics and biological vision. It is a formal theor ...
for a more complete description.
Combining Harris detector across Gaussian scale-space
The Harris-Laplace detector combines the traditional 2D Harris corner detector with the idea of a Gaussian scale space representation
Scale-space theory is a framework for multi-scale signal representation developed by the computer vision, image processing and signal processing communities with complementary motivations from physics and biological vision. It is a formal th ...
in order to create a scale-invariant detector. Harris-corner points are good starting points because they have been shown to have good rotational and illumination invariance in addition to identifying the interesting points of the image. However, the points are not scale invariant and thus the second-moment matrix must be modified to reflect a scale-invariant property. Let us denote, as the scale adapted second-moment matrix used in the Harris-Laplace detector.
: [Mikolajczyk, K. and Schmid, C. 2004. Scale & affine invariant interest point detectors. ''International Journal on Computer Vision'' 60(1):63-86.]
/ref>
where is the Gaussian kernel of scale and . Similar to the Gaussian-scale space, is the Gaussian-smoothed image. The operator denotes convolution. and are the derivatives in their respective direction applied to the smoothed image and calculated using a Gaussian kernel with scale . In terms of our Gaussian scale-space framework, the parameter determines the current scale at which the Harris corner points are detected.
Building upon this scale-adapted second-moment matrix, the Harris-Laplace detector is a twofold process: applying the Harris corner detector at multiple scales and automatically choosing the ''characteristic scale''.
Multi-scale Harris corner points
The algorithm searches over a fixed number of predefined scales. This set of scales is defined as:
:
Mikolajczyk and Schmid (2004) use . For each integration scale, , chosen from this set, the appropriate differentiation scale is chosen to be a constant factor of the integration scale: . Mikolajczyk and Schmid (2004) used .[ Using these scales, the interest points are detected using a Harris measure on the matrix. The ''cornerness,'' like the typical Harris measure, is defined as:
:
Like the traditional Harris detector, corner points are those local (8 point neighborhood) maxima of the ''cornerness'' that are above a specified threshold.
]
Characteristic scale identification
An iterative algorithm based on Lindeberg (1998) both spatially localizes the corner points and selects the ''characteristic scale''.[ The iterative search has three key steps, that are carried for each point that were initially detected at scale by the multi-scale Harris detector ( indicates the iteration):
* Choose the scale that maximizes the Laplacian-of-Gaussians (LoG) over a predefined range of neighboring scales. The neighboring scales are typically chosen from a range that is within a ''two scale-space'' neighborhood. That is, if the original points were detected using a scaling factor of between successive scales, a ''two scale-space'' neighborhood is the range . Thus the Gaussian scales examined are: . The LoG measurement is defined as:
::
:where and are the second derivatives in their respective directions. The factor (as discussed above in Gaussian scale-space) is used to normalize the LoG across scales and make these measures comparable, thus making a maximum relevant. Mikolajczyk and Schmid (2001) demonstrate that the LoG measure attains the highest percentage of correctly detected corner points in comparison to other scale-selection measures.][ The scale which maximizes this LoG measure in the ''two scale-space'' neighborhood is deemed the ''characteristic scale'', , and used in subsequent iterations. If no extrema, or maxima of the LoG is found, this point is discarded from future searches.
* Using the characteristic scale, the points are spatially localized. That is to say, the point is chosen such that it maximizes the Harris corner measure (''cornerness'' as defined above) within an 8×8 local neighborhood.
* Stopping criterion: and .
If the stopping criterion is not met, then the algorithm repeats from step 1 using the new points and scale. When the stopping criterion is met, the found points represent those that maximize the LoG across scales (scale selection) and maximize the Harris corner measure in a local neighborhood (spatial selection).
]
Affine-invariant points
Mathematical theory
The Harris-Laplace detected points are scale invariant and work well for isotropic regions that are viewed from the same viewing angle. In order to be invariant to arbitrary affine transformations (and viewpoints), the mathematical framework must be revisited. The second-moment matrix is defined more generally for anisotropic regions:
:
where and are covariance matrices defining the differentiation and the integration Gaussian kernel scales. Although this may look significantly different from the second-moment matrix in the Harris-Laplace detector; it is in fact, identical. The earlier matrix was the 2D-isotropic version in which the covariance matrices and were 2x2 identity matrices multiplied by factors and , respectively. In the new formulation, one can think of Gaussian kernels as a multivariate Gaussian distributions as opposed to a uniform Gaussian kernel. A uniform Gaussian kernel can be thought of as an isotropic, circular region. Similarly, a more general Gaussian kernel defines an ellipsoid. In fact, the eigenvectors and eigenvalues of the covariance matrix define the rotation and size of the ellipsoid. Thus we can easily see that this representation allows us to completely define an arbitrary elliptical affine region over which we want to integrate or differentiate.
The goal of the affine invariant detector is to identify regions in images that are related through affine transformations. We thus consider a point and the transformed point , where A is an affine transformation. In the case of images, both and live in space. The second-moment matrices are related in the following manner:
:
where and are the covariance matrices for the reference frame. If we continue with this formulation and enforce that
:
where and are scalar factors, one can show that the covariance matrices for the related point are similarly related:
:
By requiring the covariance matrices to satisfy these conditions, several nice properties arise. One of these properties is that the square root of the second-moment matrix, will transform the original anisotropic region into isotropic regions that are related simply through a pure rotation matrix . These new isotropic regions can be thought of as a normalized reference frame. The following equations formulate the relation between the normalized points and :
:
The rotation matrix can be recovered using gradient methods likes those in the SIFT descriptor. As discussed with the Harris detector, the eigenvalues and eigenvectors of the second-moment matrix, characterize the curvature and shape of the pixel intensities. That is, the eigenvector associated with the largest eigenvalue indicates the direction of largest change and the eigenvector associated with the smallest eigenvalue defines the direction of least change. In the 2D case, the eigenvectors and eigenvalues define an ellipse. For an isotropic region, the region should be circular in shape and not elliptical. This is the case when the eigenvalues have the same magnitude. Thus a measure of the isotropy around a local region is defined as the following:
:
where denote eigenvalues. This measure has the range