Scale-space theory is a framework for
multi-scale signal
A signal is both the process and the result of transmission of data over some media accomplished by embedding some variation. Signals are important in multiple subject fields including signal processing, information theory and biology.
In ...
representation developed by the
computer vision
Computer vision tasks include methods for image sensor, acquiring, Image processing, processing, Image analysis, analyzing, and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical ...
,
image processing
An image or picture is a visual representation. An image can be two-dimensional, such as a drawing, painting, or photograph, or three-dimensional, such as a carving or sculpture. Images may be displayed through other media, including a pr ...
and
signal processing
Signal processing is an electrical engineering subfield that focuses on analyzing, modifying and synthesizing ''signals'', such as audio signal processing, sound, image processing, images, Scalar potential, potential fields, Seismic tomograph ...
communities with complementary motivations from
physics
Physics is the scientific study of matter, its Elementary particle, fundamental constituents, its motion and behavior through space and time, and the related entities of energy and force. "Physical science is that department of knowledge whi ...
and
biological vision. It is a formal theory for handling image structures at different
scales, by representing an image as a one-parameter family of smoothed images, the scale-space representation, parametrized by the size of the
smoothing kernel used for suppressing fine-scale structures.
The parameter
in this family is referred to as the ''scale parameter'', with the interpretation that image structures of spatial size smaller than about
have largely been smoothed away in the scale-space level at scale
.
The main type of scale space is the ''linear (Gaussian) scale space'', which has wide applicability as well as the attractive property of being possible to derive from a small set of ''
scale-space axioms''. The corresponding scale-space framework encompasses a theory for Gaussian derivative operators, which can be used as a basis for expressing a large class of visual operations for computerized systems that process visual information. This framework also allows visual operations to be made ''
scale invariant'', which is necessary for dealing with the size variations that may occur in image data, because real-world objects may be of different sizes and in addition the distance between the object and the camera may be unknown and may vary depending on the circumstances.
Definition
The notion of scale space applies to signals of arbitrary numbers of variables. The most common case in the literature applies to two-dimensional images, which is what is presented here. Consider a given image
where
is the greyscale value of the pixel at position
. The linear (Gaussian) ''scale-space representation'' of
is a family of derived signals
defined by the
convolution
In mathematics (in particular, functional analysis), convolution is a operation (mathematics), mathematical operation on two function (mathematics), functions f and g that produces a third function f*g, as the integral of the product of the two ...
of
with the two-dimensional
Gaussian kernel
In mathematics, a Gaussian function, often simply referred to as a Gaussian, is a function of the base form
f(x) = \exp (-x^2)
and with parametric extension
f(x) = a \exp\left( -\frac \right)
for arbitrary real constants , and non-zero . It is ...
:
such that
:
where the semicolon in the argument of
implies that the convolution is performed only over the variables
, while the scale parameter
after the semicolon just indicates which scale level is being defined. This definition of
works for a continuum of scales
, but typically only a finite discrete set of levels in the scale-space representation would be actually considered.
The scale parameter
is the
variance
In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...
of the
Gaussian filter and as a limit for
the filter
becomes an
impulse function
In mathematical analysis, the Dirac delta function (or distribution), also known as the unit impulse, is a generalized function on the real numbers, whose value is zero everywhere except at zero, and whose integral over the entire real lin ...
such that
that is, the scale-space representation at scale level
is the image
itself. As
increases,
is the result of smoothing
with a larger and larger filter, thereby removing more and more of the details that the image contains. Since the standard deviation of the filter is
, details that are significantly smaller than this value are to a large extent removed from the image at scale parameter
, see the following figures and
for graphical illustrations.
Image:Scalespace0.png, Scale-space representation at scale , corresponding to the original image
Image:Scalespace1.png, Scale-space representation at scale
Image:Scalespace2.png, Scale-space representation at scale
Image:Scalespace3.png, Scale-space representation at scale
Image:Scalespace4.png, Scale-space representation at scale
Image:Scalespace5.png, Scale-space representation at scale
Why a Gaussian filter?
When faced with the task of generating a multi-scale representation one may ask: could any filter ''g'' of low-pass type and with a parameter ''t'' which determines its width be used to generate a scale space? The answer is no, as it is of crucial importance that the smoothing filter does not introduce new spurious structures at coarse scales that do not correspond to simplifications of corresponding structures at finer scales. In the scale-space literature, a number of different ways have been expressed to formulate this criterion in precise mathematical terms.
The conclusion from several different axiomatic derivations that have been presented is that the Gaussian scale space constitutes the ''canonical'' way to generate a linear scale space, based on the essential requirement that new structures must not be created when going from a fine scale to any coarser scale.
Conditions, referred to as '' scale-space axioms'', that have been used for deriving the uniqueness of the Gaussian kernel include linearity
In mathematics, the term ''linear'' is used in two distinct senses for two different properties:
* linearity of a '' function'' (or '' mapping'');
* linearity of a '' polynomial''.
An example of a linear function is the function defined by f(x) ...
, shift invariance, semi-group structure, non-enhancement of local extrema, scale invariance
In physics, mathematics and statistics, scale invariance is a feature of objects or laws that do not change if scales of length, energy, or other variables, are multiplied by a common factor, and thus represent a universality.
The technical term ...
and rotational invariance In mathematics, a function defined on an inner product space is said to have rotational invariance if its value does not change when arbitrary rotations are applied to its argument.
Mathematics
Functions
For example, the function
: f(x,y) = ...
.
In the works, the uniqueness claimed in the arguments based on scale invariance has been criticized, and alternative self-similar scale-space kernels have been proposed. The Gaussian kernel is, however, a unique choice according to the scale-space axiomatics based on causality[ or non-enhancement of local extrema.]
Alternative definition
''Equivalently'', the scale-space family can be defined as the solution of the diffusion equation
The diffusion equation is a parabolic partial differential equation. In physics, it describes the macroscopic behavior of many micro-particles in Brownian motion, resulting from the random movements and collisions of the particles (see Fick's l ...
(for example in terms of the heat equation
In mathematics and physics (more specifically thermodynamics), the heat equation is a parabolic partial differential equation. The theory of the heat equation was first developed by Joseph Fourier in 1822 for the purpose of modeling how a quanti ...
),
:
with initial condition . This formulation of the scale-space representation ''L'' means that it is possible to interpret the intensity values of the image ''f'' as a "temperature distribution" in the image plane and that the process that generates the scale-space representation as a function of ''t'' corresponds to heat diffusion
Diffusion is the net movement of anything (for example, atoms, ions, molecules, energy) generally from a region of higher concentration to a region of lower concentration. Diffusion is driven by a gradient in Gibbs free energy or chemical p ...
in the image plane over time ''t'' (assuming the thermal conductivity of the material equal to the arbitrarily chosen constant ). Although this connection may appear superficial for a reader not familiar with differential equations, it is indeed the case that the main scale-space formulation in terms of non-enhancement of local extrema is expressed in terms of a sign condition on partial derivative
In mathematics, a partial derivative of a function of several variables is its derivative with respect to one of those variables, with the others held constant (as opposed to the total derivative, in which all variables are allowed to vary). P ...
s in the 2+1-D volume generated by the scale space, thus within the framework of partial differential equation
In mathematics, a partial differential equation (PDE) is an equation which involves a multivariable function and one or more of its partial derivatives.
The function is often thought of as an "unknown" that solves the equation, similar to ho ...
s. Furthermore, a detailed analysis of the discrete case shows that the diffusion equation provides a unifying link between continuous and discrete scale spaces, which also generalizes to nonlinear scale spaces, for example, using anisotropic diffusion
In image processing and computer vision, anisotropic diffusion, also called Perona–Malik diffusion, is a technique aiming at reducing image noise without removing significant parts of the image content, typically edges, lines or other details t ...
. Hence, one may say that the primary way to generate a scale space is by the diffusion equation, and that the Gaussian kernel arises as the Green's function
In mathematics, a Green's function (or Green function) is the impulse response of an inhomogeneous linear differential operator defined on a domain with specified initial conditions or boundary conditions.
This means that if L is a linear dif ...
of this specific partial differential equation.
Motivations
The motivation for generating a scale-space representation of a given data set originates from the basic observation that real-world objects are composed of different structures at different scales
Scale or scales may refer to:
Mathematics
* Scale (descriptive set theory), an object defined on a set of points
* Scale (ratio), the ratio of a linear dimension of a model to the corresponding dimension of the original
* Scale factor, a number ...
. This implies that real-world objects, in contrast to idealized mathematical entities such as points
A point is a small dot or the sharp tip of something. Point or points may refer to:
Mathematics
* Point (geometry), an entity that has a location in space or on a plane, but has no extent; more generally, an element of some abstract topologica ...
or lines, may appear in different ways depending on the scale of observation.
For example, the concept of a "tree" is appropriate at the scale of meters, while concepts such as leaves and molecules are more appropriate at finer scales.
For a computer vision
Computer vision tasks include methods for image sensor, acquiring, Image processing, processing, Image analysis, analyzing, and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical ...
system analysing an unknown scene, there is no way to know a priori what scales
Scale or scales may refer to:
Mathematics
* Scale (descriptive set theory), an object defined on a set of points
* Scale (ratio), the ratio of a linear dimension of a model to the corresponding dimension of the original
* Scale factor, a number ...
are appropriate for describing the interesting structures in the image data.
Hence, the only reasonable approach is to consider descriptions at multiple scales in order to be able to capture the unknown scale variations that may occur.
Taken to the limit, a scale-space representation considers representations at all scales.[
Another motivation to the scale-space concept originates from the process of performing a physical measurement on real-world data. In order to extract any information from a measurement process, one has to apply ''operators of non-infinitesimal size'' to the data. In many branches of computer science and applied mathematics, the size of the measurement operator is disregarded in the theoretical modelling of a problem. The scale-space theory on the other hand explicitly incorporates the need for a non-infinitesimal size of the image operators as an integral part of any measurement as well as any other operation that depends on a real-world measurement.][
There is a close link between scale-space theory and biological vision. Many scale-space operations show a high degree of similarity with receptive field profiles recorded from the mammalian retina and the first stages in the visual cortex.
In these respects, the scale-space framework can be seen as a theoretically well-founded paradigm for early vision, which in addition has been thoroughly tested by algorithms and experiments.][
]
Gaussian derivatives
At any scale in scale space, we can apply local derivative operators to the scale-space representation:
:
Due to the commutative property between the derivative operator and the Gaussian smoothing operator, such ''scale-space derivatives'' can equivalently be computed by convolving the original image with Gaussian derivative operators. For this reason they are often also referred to as ''Gaussian derivatives'':
:
The uniqueness of the Gaussian derivative operators as local operations derived from a scale-space representation can be obtained by similar axiomatic derivations as are used for deriving the uniqueness of the Gaussian kernel for scale-space smoothing.
Visual front end
These Gaussian derivative operators can in turn be combined by linear or non-linear operators into a larger variety of different types of feature detectors, which in many cases can be well modelled by differential geometry
Differential geometry is a Mathematics, mathematical discipline that studies the geometry of smooth shapes and smooth spaces, otherwise known as smooth manifolds. It uses the techniques of Calculus, single variable calculus, vector calculus, lin ...
. Specifically, invariance (or more appropriately ''covariance'') to local geometric transformations, such as rotations or local affine transformations, can be obtained by considering differential invariants under the appropriate class of transformations or alternatively by normalizing the Gaussian derivative operators to a locally determined coordinate frame determined from e.g. a preferred orientation in the image domain, or by applying a preferred local affine transformation to a local image patch (see the article on affine shape adaptation for further details).
When Gaussian derivative operators and differential invariants are used in this way as basic feature detectors at multiple scales, the uncommitted first stages of visual processing are often referred to as a ''visual front-end''. This overall framework has been applied to a large variety of problems in computer vision, including feature detection, feature classification, image segmentation
In digital image processing and computer vision, image segmentation is the process of partitioning a digital image into multiple image segments, also known as image regions or image objects (Set (mathematics), sets of pixels). The goal of segmen ...
, image matching, motion estimation
In computer vision and image processing, motion estimation is the process of determining ''motion vectors'' that describe the transformation from one 2D image to another; usually from adjacent video frame, frames in a video sequence. It is an wel ...
, computation of shape
A shape is a graphics, graphical representation of an object's form or its external boundary, outline, or external Surface (mathematics), surface. It is distinct from other object properties, such as color, Surface texture, texture, or material ...
cues and object recognition
Object recognition – technology in the field of computer vision for finding and identifying objects in an image or video sequence. Humans recognize a multitude of objects in images with little effort, despite the fact that the image of the ...
. The set of Gaussian derivative operators up to a certain order is often referred to as the '' N-jet'' and constitutes a basic type of feature within the scale-space framework.
Detector examples
Following the idea of expressing visual operations in terms of differential invariants computed at multiple scales using Gaussian derivative operators, we can express an edge detector from the set of points that satisfy the requirement that the gradient magnitude
:
should assume a local maximum in the gradient direction
:
By working out the differential geometry, it can be shown [ that this differential edge detector can equivalently be expressed from the zero-crossings of the second-order differential invariant
:
that satisfy the following sign condition on a third-order differential invariant:
:
Similarly, multi-scale blob detectors at any given fixed scale][ can be obtained from local maxima and local minima of either the ]Laplacian
In mathematics, the Laplace operator or Laplacian is a differential operator given by the divergence of the gradient of a scalar function on Euclidean space. It is usually denoted by the symbols \nabla\cdot\nabla, \nabla^2 (where \nabla is th ...
operator (also referred to as the Laplacian of Gaussian
In computer vision and image processing, blob detection methods are aimed at detecting regions in a digital image that differ in properties, such as brightness or color, compared to surrounding regions. Informally, a ''blob'' is a region of a ...
)
:
or the determinant of the Hessian matrix
:
In an analogous fashion, corner detectors and ridge and valley detectors can be expressed as local maxima, minima or zero-crossings of multi-scale differential invariants defined from Gaussian derivatives. The algebraic expressions for the corner and ridge detection operators are, however, somewhat more complex and the reader is referred to the articles on corner detection
Corner detection is an approach used within computer vision systems to extract certain kinds of Feature detection (computer vision), features and infer the contents of an image. Corner detection is frequently used in motion detection, image reg ...
and ridge detection
In image processing, ridge detection is the attempt, via software, to locate ridges in an image, defined as curves whose points are local maxima of the function, akin to geographical ridges.
For a function of ''N'' variables, its ridges are a s ...
for further details.
Scale-space operations have also been frequently used for expressing coarse-to-fine methods, in particular for tasks such as image matching and for multi-scale image segmentation.
Scale selection
The theory presented so far describes a well-founded framework for ''representing'' image structures at multiple scales. In many cases it is, however, also necessary to select locally appropriate scales for further analysis. This need for ''scale selection'' originates from two major reasons; (i) real-world objects may have different size, and this size may be unknown to the vision system, and (ii) the distance between the object and the camera can vary, and this distance information may also be unknown ''a priori''.
A highly useful property of scale-space representation is that image representations can be made invariant to scales, by performing automatic local scale selection based on local maxima (or minima) over scales of scale-normalized derivative
In mathematics, the derivative is a fundamental tool that quantifies the sensitivity to change of a function's output with respect to its input. The derivative of a function of a single variable at a chosen input value, when it exists, is t ...
s
:
where