computer vision Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the hum ...

, speeded up robust features (SURF) is a patented local feature detector and descriptor. It can be used for tasks such as

object recognition Object recognition – technology in the field of computer vision for finding and identifying objects in an image or video sequence. Humans recognize a multitude of objects in images with little effort, despite the fact that the image of the ...

image registration Image registration is the process of transforming different sets of data into one coordinate system. Data may be multiple photographs, data from different sensors, times, depths, or viewpoints. It is used in computer vision, medical imaging, milit ...

classification Classification is a process related to categorization, the process in which ideas and objects are recognized, differentiated and understood. Classification is the grouping of related facts into classes. It may also refer to: Business, organizat ...

, or

3D reconstruction In computer vision and computer graphics, 3D reconstruction is the process of capturing the shape and appearance of real objects. This process can be accomplished either by active or passive methods. If the model is allowed to change its shape i ...

. It is partly inspired by the

scale-invariant feature transform The scale-invariant feature transform (SIFT) is a computer vision algorithm to detect, describe, and match local ''features'' in images, invented by David Lowe in 1999. Applications include object recognition, robotic mapping and navigation, ima ...

(SIFT) descriptor. The standard version of SURF is several times faster than SIFT and claimed by its authors to be more robust against different image transformations than SIFT. To detect interest points, SURF uses an integer approximation of the determinant of Hessian blob detector, which can be computed with 3 integer operations using a precomputed

integral image A summed-area table is a data structure and algorithm for quickly and efficiently generating the sum of values in a rectangular subset of a grid. In the image processing domain, it is also known as an integral image. It was introduced to computer ...

. Its feature descriptor is based on the sum of the

Haar wavelet In mathematics, the Haar wavelet is a sequence of rescaled "square-shaped" functions which together form a wavelet family or basis. Wavelet analysis is similar to Fourier analysis in that it allows a target function over an interval to be represe ...

response around the point of interest. These can also be computed with the aid of the integral image. SURF descriptors have been used to locate and recognize objects, people or faces, to reconstruct 3D scenes, to track objects and to extract points of interest. SURF was first published by

Herbert Bay In computer vision, speeded up robust features (SURF) is a patented local feature detector and descriptor. It can be used for tasks such as object recognition, image registration, classification, or 3D reconstruction. It is partly inspired by ...

, Tinne Tuytelaars, and Luc Van Gool, and presented at the 2006

European Conference on Computer Vision The European Conference on Computer Vision (ECCV) is a biennial research conference with the proceedings published by Springer Science+Business Media. Similar to ICCV in scope and quality, it is held those years which ICCV is not. It is considere ...

. An application of the algorithm is patented in the United States. An "upright" version of SURF (called U-SURF) is not invariant to image rotation and therefore faster to compute and better suited for application where the camera remains more or less horizontal. The image is transformed into coordinates, using the multi-resolution pyramid technique, to copy the original image with Pyramidal Gaussian or

Laplacian Pyramid Pyramid, or pyramid representation, is a type of multi-scale signal representation developed by the computer vision, image processing and signal processing communities, in which a signal or an image is subject to repeated smoothing and subsamp ...

shape to obtain an image with the same size but with reduced bandwidth. This achieves a special blurring effect on the original image, called Scale-Space and ensures that the points of interest are scale invariant.

Algorithm and features

The SURF algorithm is based on the same principles and steps as SIFT; but details in each step are different. The algorithm has three main parts: interest point detection, local neighborhood description, and matching. The youtube algorithm is a good example of this

Detection

SURF uses square-shaped filters as an approximation of

Gaussian smoothing In image processing, a Gaussian blur (also known as Gaussian smoothing) is the result of blurring an image by a Gaussian function (named after mathematician and scientist Carl Friedrich Gauss). It is a widely used effect in graphics software, ...

. (The SIFT approach uses cascaded filters to detect scale-invariant characteristic points, where the difference of Gaussians (DoG) is calculated on rescaled images progressively.) Filtering the image with a square is much faster if the

is used: :

S(x, y)=\sum_^x \sum_^y I(i,j)

The sum of the original image within a rectangle can be evaluated quickly using the integral image, requiring evaluations at the rectangle's four corners. SURF uses a blob detector based on the

Hessian matrix In mathematics, the Hessian matrix or Hessian is a square matrix of second-order partial derivatives of a scalar-valued function, or scalar field. It describes the local curvature of a function of many variables. The Hessian matrix was developed ...

to find points of interest. The

determinant In mathematics, the determinant is a scalar value that is a function of the entries of a square matrix. It characterizes some properties of the matrix and the linear map represented by the matrix. In particular, the determinant is nonzero if and ...

of the Hessian matrix is used as a measure of local change around the point and points are chosen where this determinant is maximal. In contrast to the Hessian-Laplacian detector by Mikolajczyk and Schmid, SURF also uses the determinant of the Hessian for selecting the scale, as is also done by Lindeberg. Given a point p=(x, y) in an image I, the Hessian matrix H(p, σ) at point p and scale σ, is: :

H(p,\sigma)=\begin L_(p,\sigma) & L_(p,\sigma) \\ L_(p,\sigma) & L_(p,\sigma) \end

where

L_(p,\sigma)

etc. is the convolution of the second-order derivative of gaussian with the image

I(x,y)

at the point

p

. The box filter of size 9×9 is an approximation of a Gaussian with σ=1.2 and represents the lowest level (highest spatial resolution) for blob-response maps.

Scale-space representation and location of points of interest

Interest points can be found at different scales, partly because the search for correspondences often requires comparison images where they are seen at different scales. In other feature detection algorithms, the scale space is usually realized as an image pyramid. Images are repeatedly smoothed with a Gaussian filter, then they are subsampled to get the next higher level of the pyramid. Therefore, several floors or stairs with various measures of the masks are calculated: :

\sigma_\text = \text \times \left( \frac \right)

The scale space is divided into a number of octaves, where an octave refers to a series of response maps of covering a doubling of scale. In SURF, the lowest level of the scale space is obtained from the output of the 9×9 filters. Hence, unlike previous methods, scale spaces in SURF are implemented by applying box filters of different sizes. Accordingly, the scale space is analyzed by up-scaling the filter size rather than iteratively reducing the image size. The output of the above 9×9 filter is considered as the initial scale layer at scale ''s'' =1.2 (corresponding to Gaussian derivatives with ''σ'' = 1.2). The following layers are obtained by filtering the image with gradually bigger masks, taking into account the discrete nature of integral images and the specific filter structure. This results in filters of size 9×9, 15×15, 21×21, 27×27,.... Non-maximum suppression in a 3×3×3 neighborhood is applied to localize interest points in the image and over scales. The maxima of the determinant of the Hessian matrix are then interpolated in scale and image space with the method proposed by Brown, et al. Scale space interpolation is especially important in this case, as the difference in scale between the first layers of every octave is relatively large.

Descriptor

The goal of a descriptor is to provide a unique and robust description of an image

feature Feature may refer to: Computing * Feature (CAD), could be a hole, pocket, or notch * Feature (computer vision), could be an edge, corner or blob * Feature (software design) is an intentional distinguishing characteristic of a software item ...

, e.g., by describing the intensity distribution of the pixels within the neighbourhood of the point of interest. Most descriptors are thus computed in a local manner, hence a description is obtained for every point of interest identified previously. The dimensionality of the descriptor has direct impact on both its computational complexity and point-matching robustness/accuracy. A short descriptor may be more robust against appearance variations, but may not offer sufficient discrimination and thus give too many false positives. The first step consists of fixing a reproducible orientation based on information from a circular region around the interest point. Then we construct a square region aligned to the selected orientation, and extract the SURF descriptor from it.

Orientation assignment

In order to achieve rotational invariance, the orientation of the point of interest needs to be found. The Haar wavelet responses in both x- and y-directions within a circular neighbourhood of radius

6s

around the point of interest are computed, where

s

is the scale at which the point of interest was detected. The obtained responses are weighted by a Gaussian function centered at the point of interest, then plotted as points in a two-dimensional space, with the horizontal response in the

abscissa In common usage, the abscissa refers to the (''x'') coordinate and the ordinate refers to the (''y'') coordinate of a standard two-dimensional graph. The distance of a point from the y-axis, scaled with the x-axis, is called abscissa or x coo ...

and the vertical response in the

ordinate In common usage, the abscissa refers to the (''x'') coordinate and the ordinate refers to the (''y'') coordinate of a standard two-dimensional graph. The distance of a point from the y-axis, scaled with the x-axis, is called abscissa or x coo ...

. The dominant orientation is estimated by calculating the sum of all responses within a sliding orientation window of size π/3. The horizontal and vertical responses within the window are summed. The two summed responses then yield a local orientation vector. The longest such vector overall defines the orientation of the point of interest. The size of the sliding window is a parameter that has to be chosen carefully to achieve a desired balance between robustness and angular resolution.

Descriptor based on the sum of Haar wavelet responses

To describe the region around the point, a square region is extracted, centered on the interest point and oriented along the orientation as selected above. The size of this window is 20s. The interest region is split into smaller 4x4 square sub-regions, and for each one, the Haar wavelet responses are extracted at 5x5 regularly spaced sample points. The responses are weighted with a Gaussian (to offer more robustness for deformations, noise and translation).

Matching

By comparing the descriptors obtained from different images, matching pairs can be found.

References

{{reflist, 30em

Sources

* Herbert Bay, Andreas Ess, Tinne Tuytelaars, and Luc Van Gool,
Speeded Up Robust Features
, ETH Zurich, Katholieke Universiteit Leuven * Andrea Maricela Plaza Cordero,Jorge Luis Zambrano Martínez,
Estudio y Selección de las Técnicas SIFT, SURF y ASIFT de Reconocimiento de Imágenes para el Diseño de un Prototipo en Dispositivos Móviles
, 15º Concurso de Trabajos Estudiantiles, EST 2012 * A. M. Romero and M. Cazorla,
Comparativa de detectores de característicasvisuales y su aplicación al SLAM
, X Workshop de agentes físicos, Setiembre 2009, Cáceres * P. M. Panchal, S. R. Panchal, S. K. Shah,
A Comparison of SIFT and SURF
", International Journal of Innovative Research in Computer and Communication Engineering Vol. 1, Issue 2, April 2013 *Herbert Bay, Andreas Ess, Tinne Tuytelaars, Luc Van Goo

Computer Vision and Image Understanding (CVIU), Vol. 110, No. 3, pp. 346–359, 2008 * Christopher Evans "Notes on the OpenSURF Library", MSc Computer Science, University of Bristol; source code and documentation archive

*J an Knopp, Mukta Prasad, Gert Willems, Radu Timofte, and Luc Van Gool,
Hough Transform and 3D SURF for Robust Three Dimensional Classification
, European Conference on Computer Vision (ECCV), 2010

External links

SURF on GitHub

Website of SURF: Speeded Up Robust Features

First publication of Speeded Up Robust Features (2006)
* tp://ftp.vision.ee.ethz.ch/publications/articles/eth_biwi_00517.pdf Revised publication of SURF (2008) Feature detection (computer vision)