Scale Space Representation
   HOME

TheInfoList



OR:

Scale-space theory is a framework for multi-scale
signal In signal processing, a signal is a function that conveys information about a phenomenon. Any quantity that can vary over space or time can be used as a signal to share messages between observers. The '' IEEE Transactions on Signal Processing' ...
representation developed by the computer vision, image processing and
signal processing Signal processing is an electrical engineering subfield that focuses on analyzing, modifying and synthesizing ''signals'', such as sound, images, and scientific measurements. Signal processing techniques are used to optimize transmissions, ...
communities with complementary motivations from
physics Physics is the natural science that studies matter, its fundamental constituents, its motion and behavior through space and time, and the related entities of energy and force. "Physical science is that department of knowledge which r ...
and biological vision. It is a formal theory for handling image structures at different scales, by representing an image as a one-parameter family of smoothed images, the scale-space representation, parametrized by the size of the
smoothing In statistics and image processing, to smooth a data set is to create an approximating function that attempts to capture important patterns in the data, while leaving out noise or other fine-scale structures/rapid phenomena. In smoothing, the dat ...
kernel Kernel may refer to: Computing * Kernel (operating system), the central component of most operating systems * Kernel (image processing), a matrix used for image convolution * Compute kernel, in GPGPU programming * Kernel method, in machine learn ...
used for suppressing fine-scale structures.Ijima, T. "Basic theory on normalization of pattern (in case of typical one-dimensional pattern)". Bull. Electrotech. Lab. 26, 368– 388, 1962. (in Japanese) The parameter t in this family is referred to as the ''scale parameter'', with the interpretation that image structures of spatial size smaller than about \sqrt have largely been smoothed away in the scale-space level at scale t. The main type of scale space is the ''linear (Gaussian) scale space'', which has wide applicability as well as the attractive property of being possible to derive from a small set of '' scale-space axioms''. The corresponding scale-space framework encompasses a theory for Gaussian derivative operators, which can be used as a basis for expressing a large class of visual operations for computerized systems that process visual information. This framework also allows visual operations to be made ''
scale invariant In physics, mathematics and statistics, scale invariance is a feature of objects or laws that do not change if scales of length, energy, or other variables, are multiplied by a common factor, and thus represent a universality. The technical ter ...
'', which is necessary for dealing with the size variations that may occur in image data, because real-world objects may be of different sizes and in addition the distance between the object and the camera may be unknown and may vary depending on the circumstances.


Definition

The notion of scale space applies to signals of arbitrary numbers of variables. The most common case in the literature applies to two-dimensional images, which is what is presented here. For a given image f(x, y), its linear (Gaussian) ''scale-space representation'' is a family of derived signals L(x, y; t) defined by the
convolution In mathematics (in particular, functional analysis), convolution is a mathematical operation on two functions ( and ) that produces a third function (f*g) that expresses how the shape of one is modified by the other. The term ''convolution'' ...
of f(x, y) with the two-dimensional
Gaussian kernel In mathematics, a Gaussian function, often simply referred to as a Gaussian, is a function of the base form f(x) = \exp (-x^2) and with parametric extension f(x) = a \exp\left( -\frac \right) for arbitrary real constants , and non-zero . It is ...
:g(x, y; t) = \frac e^\, such that :L(\cdot, \cdot ; t)\ = g(\cdot, \cdot ; t) * f(\cdot, \cdot) , where the semicolon in the argument of L implies that the convolution is performed only over the variables x, y, while the scale parameter t after the semicolon just indicates which scale level is being defined. This definition of L works for a continuum of scales t \geq 0, but typically only a finite discrete set of levels in the scale-space representation would be actually considered. The scale parameter t = \sigma^2 is the
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...
of the
Gaussian filter In electronics and signal processing mainly in digital signal processing, a Gaussian filter is a filter whose impulse response is a Gaussian function (or an approximation to it, since a true Gaussian response would have infinite impulse respons ...
and as a limit for t = 0 the filter g becomes an
impulse function In mathematics, the Dirac delta distribution ( distribution), also known as the unit impulse, is a generalized function or distribution over the real numbers, whose value is zero everywhere except at zero, and whose integral over the entire ...
such that L(x, y; 0) = f(x, y), that is, the scale-space representation at scale level t = 0 is the image f itself. As t increases, L is the result of smoothing f with a larger and larger filter, thereby removing more and more of the details that the image contains. Since the standard deviation of the filter is \sigma = \sqrt , details that are significantly smaller than this value are to a large extent removed from the image at scale parameter t , see the following figure and for graphical illustrations. Image:Scalespace0.png, Scale-space representation L(x,y;t) at scale t=0, corresponding to the original image f Image:Scalespace1.png, Scale-space representation L(x,y;t) at scale t=1 Image:Scalespace2.png, Scale-space representation L(x,y;t) at scale t=4 Image:Scalespace3.png, Scale-space representation L(x,y;t) at scale t=16 Image:Scalespace4.png, Scale-space representation L(x,y;t) at scale t=64 Image:Scalespace5.png, Scale-space representation L(x,y;t) at scale t=256


Why a Gaussian filter?

When faced with the task of generating a multi-scale representation one may ask: could any filter ''g'' of low-pass type and with a parameter ''t'' which determines its width be used to generate a scale space? The answer is no, as it is of crucial importance that the smoothing filter does not introduce new spurious structures at coarse scales that do not correspond to simplifications of corresponding structures at finer scales. In the scale-space literature, a number of different ways have been expressed to formulate this criterion in precise mathematical terms. The conclusion from several different axiomatic derivations that have been presented is that the Gaussian scale space constitutes the ''canonical'' way to generate a linear scale space, based on the essential requirement that new structures must not be created when going from a fine scale to any coarser scale. Conditions, referred to as '' scale-space axioms'', that have been used for deriving the uniqueness of the Gaussian kernel include
linearity Linearity is the property of a mathematical relationship ('' function'') that can be graphically represented as a straight line. Linearity is closely related to '' proportionality''. Examples in physics include rectilinear motion, the linear ...
, shift invariance,
semi-group In mathematics, a semigroup is an algebraic structure consisting of a set together with an associative internal binary operation on it. The binary operation of a semigroup is most often denoted multiplicatively: ''x''·''y'', or simply ''xy'', ...
structure, non-enhancement of
local extrema In mathematical analysis, the maxima and minima (the respective plurals of maximum and minimum) of a function, known collectively as extrema (the plural of extremum), are the largest and smallest value of the function, either within a given ra ...
,
scale invariance In physics, mathematics and statistics, scale invariance is a feature of objects or laws that do not change if scales of length, energy, or other variables, are multiplied by a common factor, and thus represent a universality. The technical term ...
and
rotational invariance In mathematics, a function defined on an inner product space is said to have rotational invariance if its value does not change when arbitrary rotations are applied to its argument. Mathematics Functions For example, the function :f(x,y) = x ...
. In the works, the uniqueness claimed in the arguments based on scale invariance has been criticized, and alternative self-similar scale-space kernels have been proposed. The Gaussian kernel is, however, a unique choice according to the scale-space axiomatics based on causality or non-enhancement of local extrema.


Alternative definition

''Equivalently'', the scale-space family can be defined as the solution of the diffusion equation (for example in terms of the heat equation), :\partial_t L = \frac \nabla^2 L, with initial condition L(x, y; 0) = f(x, y). This formulation of the scale-space representation ''L'' means that it is possible to interpret the intensity values of the image ''f'' as a "temperature distribution" in the image plane and that the process that generates the scale-space representation as a function of ''t'' corresponds to heat
diffusion Diffusion is the net movement of anything (for example, atoms, ions, molecules, energy) generally from a region of higher concentration to a region of lower concentration. Diffusion is driven by a gradient in Gibbs free energy or chemica ...
in the image plane over time ''t'' (assuming the thermal conductivity of the material equal to the arbitrarily chosen constant ½). Although this connection may appear superficial for a reader not familiar with
differential equation In mathematics, a differential equation is an equation that relates one or more unknown functions and their derivatives. In applications, the functions generally represent physical quantities, the derivatives represent their rates of change, an ...
s, it is indeed the case that the main scale-space formulation in terms of non-enhancement of local extrema is expressed in terms of a sign condition on partial derivatives in the 2+1-D volume generated by the scale space, thus within the framework of partial differential equations. Furthermore, a detailed analysis of the discrete case shows that the diffusion equation provides a unifying link between continuous and discrete scale spaces, which also generalizes to nonlinear scale spaces, for example, using
anisotropic diffusion In image processing and computer vision, anisotropic diffusion, also called Perona–Malik diffusion, is a technique aiming at reducing image noise without removing significant parts of the image content, typically edges, lines or other details ...
. Hence, one may say that the primary way to generate a scale space is by the diffusion equation, and that the Gaussian kernel arises as the
Green's function In mathematics, a Green's function is the impulse response of an inhomogeneous linear differential operator defined on a domain with specified initial conditions or boundary conditions. This means that if \operatorname is the linear differenti ...
of this specific partial differential equation.


Motivations

The motivation for generating a scale-space representation of a given data set originates from the basic observation that real-world objects are composed of different structures at different
scales Scale or scales may refer to: Mathematics * Scale (descriptive set theory), an object defined on a set of points * Scale (ratio), the ratio of a linear dimension of a model to the corresponding dimension of the original * Scale factor, a number w ...
. This implies that real-world objects, in contrast to idealized mathematical entities such as points or lines, may appear in different ways depending on the scale of observation. For example, the concept of a "tree" is appropriate at the scale of meters, while concepts such as leaves and molecules are more appropriate at finer scales. For a computer vision system analysing an unknown scene, there is no way to know a priori what
scales Scale or scales may refer to: Mathematics * Scale (descriptive set theory), an object defined on a set of points * Scale (ratio), the ratio of a linear dimension of a model to the corresponding dimension of the original * Scale factor, a number w ...
are appropriate for describing the interesting structures in the image data. Hence, the only reasonable approach is to consider descriptions at multiple scales in order to be able to capture the unknown scale variations that may occur. Taken to the limit, a scale-space representation considers representations at all scales. Another motivation to the scale-space concept originates from the process of performing a physical measurement on real-world data. In order to extract any information from a measurement process, one has to apply ''operators of non-infinitesimal size'' to the data. In many branches of computer science and applied mathematics, the size of the measurement operator is disregarded in the theoretical modelling of a problem. The scale-space theory on the other hand explicitly incorporates the need for a non-infinitesimal size of the image operators as an integral part of any measurement as well as any other operation that depends on a real-world measurement. There is a close link between scale-space theory and biological vision. Many scale-space operations show a high degree of similarity with receptive field profiles recorded from the mammalian retina and the first stages in the visual cortex. In these respects, the scale-space framework can be seen as a theoretically well-founded paradigm for early vision, which in addition has been thoroughly tested by algorithms and experiments.


Gaussian derivatives

At any scale in scale space, we can apply local derivative operators to the scale-space representation: :L_(x, y; t) = \left( \partial_ L \right)(x, y; t) . Due to the commutative property between the derivative operator and the Gaussian smoothing operator, such ''scale-space derivatives'' can equivalently be computed by convolving the original image with Gaussian derivative operators. For this reason they are often also referred to as ''Gaussian derivatives'': :L_(\cdot, \cdot; t) = \partial_ g(\cdot, \cdot;\, t) * f (\cdot, \cdot). The uniqueness of the Gaussian derivative operators as local operations derived from a scale-space representation can be obtained by similar axiomatic derivations as are used for deriving the uniqueness of the Gaussian kernel for scale-space smoothing.


Visual front end

These Gaussian derivative operators can in turn be combined by linear or non-linear operators into a larger variety of different types of feature detectors, which in many cases can be well modelled by differential geometry. Specifically, invariance (or more appropriately ''covariance'') to local geometric transformations, such as rotations or local affine transformations, can be obtained by considering differential invariants under the appropriate class of transformations or alternatively by normalizing the Gaussian derivative operators to a locally determined coordinate frame determined from e.g. a preferred orientation in the image domain, or by applying a preferred local affine transformation to a local image patch (see the article on
affine shape adaptation Affine shape adaptation is a methodology for iteratively adapting the shape of the smoothing kernels in an affine group of smoothing kernels to the local image structure in neighbourhood region of a specific image point. Equivalently, affine shap ...
for further details). When Gaussian derivative operators and differential invariants are used in this way as basic feature detectors at multiple scales, the uncommitted first stages of visual processing are often referred to as a ''visual front-end''. This overall framework has been applied to a large variety of problems in computer vision, including feature detection, feature classification,
image segmentation In digital image processing and computer vision, image segmentation is the process of partitioning a digital image into multiple image segments, also known as image regions or image objects ( sets of pixels). The goal of segmentation is to simpli ...
, image matching,
motion estimation Motion estimation is the process of determining ''motion vectors'' that describe the transformation from one 2D image to another; usually from adjacent frames in a video sequence. It is an ill-posed problem as the motion is in three dimensions ...
, computation of
shape A shape or figure is a graphical representation of an object or its external boundary, outline, or external surface, as opposed to other properties such as color, texture, or material type. A plane shape or plane figure is constrained to lie ...
cues and
object recognition Object recognition – technology in the field of computer vision for finding and identifying objects in an image or video sequence. Humans recognize a multitude of objects in images with little effort, despite the fact that the image of the ...
. The set of Gaussian derivative operators up to a certain order is often referred to as the '' N-jet'' and constitutes a basic type of feature within the scale-space framework.


Detector examples

Following the idea of expressing visual operations in terms of differential invariants computed at multiple scales using Gaussian derivative operators, we can express an edge detector from the set of points that satisfy the requirement that the gradient magnitude :L_v = \sqrt should assume a local maximum in the gradient direction :\nabla L = (L_x, L_y)^T. By working out the differential geometry, it can be shown that this differential edge detector can equivalently be expressed from the zero-crossings of the second-order differential invariant :_v^2 = L_x^2 \, L_ + 2 \, L_x \, L_y \, L_ + L_y^2 \, L_ = 0 that satisfy the following sign condition on a third-order differential invariant: :_v^3 = L_x^3 \, L_ + 3 \, L_x^2 \, L_y \, L_ + 3 \, L_x \, L_y^2 \, L_ + L_y^3 \, L_ < 0. Similarly, multi-scale blob detectors at any given fixed scale can be obtained from local maxima and local minima of either the Laplacian operator (also referred to as the Laplacian of Gaussian) :\nabla^2 L = L_ + L_ \, or the determinant of the Hessian matrix :\operatorname H L(x, y; t) = (L_ L_ - L_^2). In an analogous fashion, corner detectors and ridge and valley detectors can be expressed as local maxima, minima or zero-crossings of multi-scale differential invariants defined from Gaussian derivatives. The algebraic expressions for the corner and ridge detection operators are, however, somewhat more complex and the reader is referred to the articles on
corner detection Corner detection is an approach used within computer vision systems to extract certain kinds of features and infer the contents of an image. Corner detection is frequently used in motion detection, image registration, video tracking, image mo ...
and
ridge detection In image processing, ridge detection is the attempt, via software, to locate ridges in an image, defined as curves whose points are local maxima of the function, akin to geographical ridges. For a function of ''N'' variables, its ridges are ...
for further details. Scale-space operations have also been frequently used for expressing coarse-to-fine methods, in particular for tasks such as image matching and for multi-scale image segmentation.


Scale selection

The theory presented so far describes a well-founded framework for ''representing'' image structures at multiple scales. In many cases it is, however, also necessary to select locally appropriate scales for further analysis. This need for ''scale selection'' originates from two major reasons; (i) real-world objects may have different size, and this size may be unknown to the vision system, and (ii) the distance between the object and the camera can vary, and this distance information may also be unknown ''a priori''. A highly useful property of scale-space representation is that image representations can be made invariant to scales, by performing automatic local scale selectionT. Lindeberg "Spatio-temporal scale selection in video data", Journal of Mathematical Imaging and Vision, 60(4): 525–562, 2018.
/ref>T. Lindeberg "Dense scale selection over space, time and space-time", SIAM Journal on Imaging Sciences, 11(1): 407–441, 2018.
/ref> based on local maxima (or minima) over scales of scale-normalized
derivative In mathematics, the derivative of a function of a real variable measures the sensitivity to change of the function value (output value) with respect to a change in its argument (input value). Derivatives are a fundamental tool of calculus. ...
s :L_(x, y; t) = t^ L_(x, y; t) where \gamma \in ,1/math> is a parameter that is related to the dimensionality of the image feature. This algebraic expression for ''scale normalized Gaussian derivative operators'' originates from the introduction of ''\gamma-normalized derivatives'' according to :\partial_ = t^ \partial_x\quad and \quad\partial_ = t^ \partial_y. It can be theoretically shown that a scale selection module working according to this principle will satisfy the following ''scale covariance property'': if for a certain type of image feature a local maximum is assumed in a certain image at a certain scale t_0, then under a rescaling of the image by a scale factor s the local maximum over scales in the rescaled image will be transformed to the scale level s^2 t_0.


Scale invariant feature detection

Following this approach of gamma-normalized derivatives, it can be shown that different types of ''scale adaptive and scale invariant feature detectors'' can be expressed for tasks such as
blob detection In computer vision, blob detection methods are aimed at detecting regions in a digital image that differ in properties, such as brightness or color, compared to surrounding regions. Informally, a blob is a region of an image in which some prope ...
,
corner detection Corner detection is an approach used within computer vision systems to extract certain kinds of features and infer the contents of an image. Corner detection is frequently used in motion detection, image registration, video tracking, image mo ...
,
ridge detection In image processing, ridge detection is the attempt, via software, to locate ridges in an image, defined as curves whose points are local maxima of the function, akin to geographical ridges. For a function of ''N'' variables, its ridges are ...
,
edge detection Edge detection includes a variety of mathematical methods that aim at identifying edges, curves in a digital image at which the image brightness changes sharply or, more formally, has discontinuities. The same problem of finding discontinuitie ...
and spatio-temporal interest point detection (see the specific articles on these topics for in-depth descriptions of how these scale-invariant feature detectors are formulated). Furthermore, the scale levels obtained from automatic scale selection can be used for determining regions of interest for subsequent
affine shape adaptation Affine shape adaptation is a methodology for iteratively adapting the shape of the smoothing kernels in an affine group of smoothing kernels to the local image structure in neighbourhood region of a specific image point. Equivalently, affine shap ...
to obtain affine invariant interest points or for determining scale levels for computing associated image descriptors, such as locally scale adapted N-jets. Recent work has shown that also more complex operations, such as scale-invariant
object recognition Object recognition – technology in the field of computer vision for finding and identifying objects in an image or video sequence. Humans recognize a multitude of objects in images with little effort, despite the fact that the image of the ...
can be performed in this way, by computing local image descriptors (N-jets or local histograms of gradient directions) at scale-adapted interest points obtained from scale-space extrema of the normalized Laplacian operator (see also
scale-invariant feature transform The scale-invariant feature transform (SIFT) is a computer vision algorithm to detect, describe, and match local ''features'' in images, invented by David Lowe in 1999. Applications include object recognition, robotic mapping and navigation, ima ...
) or the determinant of the Hessian (see also SURF); see also the Scholarpedia article on th
scale-invariant feature transform
ref name="Lindeberg-Scholarpedia" /> for a more general outlook of object recognition approaches based on receptive field responses in terms Gaussian derivative operators or approximations thereof.


Related multi-scale representations

An image
pyramid A pyramid (from el, πυραμίς ') is a structure whose outer surfaces are triangular and converge to a single step at the top, making the shape roughly a pyramid in the geometric sense. The base of a pyramid can be trilateral, quadrilat ...
is a discrete representation in which a scale space is sampled in both space and scale. For scale invariance, the scale factors should be sampled exponentially, for example as integer powers of 2 or . When properly constructed, the ratio of the sample rates in space and scale are held constant so that the impulse response is identical in all levels of the pyramid. Fast, O(N), algorithms exist for computing a scale invariant image pyramid, in which the image or signal is repeatedly smoothed then subsampled. Values for scale space between pyramid samples can easily be estimated using interpolation within and between scales and allowing for scale and position estimates with sub resolution accuracy. In a scale-space representation, the existence of a continuous scale parameter makes it possible to track zero crossings over scales leading to so-called ''deep structure''. For features defined as zero-crossings of
differential invariant In mathematics, a differential invariant is an invariant for the action of a Lie group on a space that involves the derivatives of graphs of functions in the space. Differential invariants are fundamental in projective differential geometry, and t ...
s, the implicit function theorem directly defines
trajectories A trajectory or flight path is the path that an object with mass in motion follows through space as a function of time. In classical mechanics, a trajectory is defined by Hamiltonian mechanics via canonical coordinates; hence, a complete traj ...
across scales, and at those scales where
bifurcation Bifurcation or bifurcated may refer to: Science and technology * Bifurcation theory, the study of sudden changes in dynamical systems ** Bifurcation, of an incompressible flow, modeled by squeeze mapping the fluid flow * River bifurcation, the ...
s occur, the local behaviour can be modelled by
singularity theory In mathematics, singularity theory studies spaces that are almost manifolds, but not quite. A string can serve as an example of a one-dimensional manifold, if one neglects its thickness. A singularity can be made by balling it up, dropping it ...
.Florack, L., Kuijper, A. The topological structure of scale-space images. Journal of Mathematical Imaging and Vision 12, 65–79, 2000.
/ref> Extensions of linear scale-space theory concern the formulation of non-linear scale-space concepts more committed to specific purposes. These '' non-linear scale-spaces'' often start from the equivalent diffusion formulation of the scale-space concept, which is subsequently extended in a non-linear fashion. A large number of evolution equations have been formulated in this way, motivated by different specific requirements (see the abovementioned book references for further information). It should be noted, however, that not all of these non-linear scale-spaces satisfy similar "nice" theoretical requirements as the linear Gaussian scale-space concept. Hence, unexpected artifacts may sometimes occur and one should be very careful of not using the term "scale-space" for just any type of one-parameter family of images. A first-order extension of the isotropic Gaussian scale space is provided by the ''affine (Gaussian) scale space''. One motivation for this extension originates from the common need for computing image descriptors subject for real-world objects that are viewed under a perspective camera model. To handle such non-linear deformations locally, partial invariance (or more correctly
covariance In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the ...
) to local affine deformations can be achieved by considering affine Gaussian kernels with their shapes determined by the local image structure, see the article on
affine shape adaptation Affine shape adaptation is a methodology for iteratively adapting the shape of the smoothing kernels in an affine group of smoothing kernels to the local image structure in neighbourhood region of a specific image point. Equivalently, affine shap ...
for theory and algorithms. Indeed, this affine scale space can also be expressed from a non-isotropic extension of the linear (isotropic) diffusion equation, while still being within the class of linear partial differential equations. There exists a more general extension of the Gaussian scale-space model to affine and spatio-temporal scale-spaces.Lindeberg, T. Generalized Gaussian scale-space axiomatics comprising linear scale-space, affine scale-space and spatio-temporal scale-space, Journal of Mathematical Imaging and Vision, 40(1): 36–81, 2011.
/ref>Lindeberg, T. Generalized axiomatic scale-space theory, Advances in Imaging and Electron Physics, Elsevier, volume 178, pages 1–96, 2013.
/ref>T. Lindeberg (2016) "Time-causal and time-recursive spatio-temporal receptive fields", Journal of Mathematical Imaging and Vision, 55(1): 50–88.
/ref> In addition to variabilities over scale, which original scale-space theory was designed to handle, this ''generalized scale-space theory'' also comprises other types of variabilities caused by geometric transformations in the image formation process, including variations in viewing direction approximated by local affine transformations, and relative motions between objects in the world and the observer, approximated by local Galilean transformations. This generalized scale-space theory leads to predictions about receptive field profiles in good qualitative agreement with receptive field profiles measured by cell recordings in biological vision.Lindeberg, T. A computational theory of visual receptive fields, Biological Cybernetics, 107(6): 589–635, 2013.
/ref>Lindeberg, T. Invariance of visual operations at the level of receptive fields, PLoS ONE 8(7):e66990, 2013
/ref> There are strong relations between scale-space theory and
wavelet theory A wavelet is a wave-like oscillation with an amplitude that begins at zero, increases or decreases, and then returns to zero one or more times. Wavelets are termed a "brief oscillation". A taxonomy of wavelets has been established, based on the num ...
, although these two notions of multi-scale representation have been developed from somewhat different premises. There has also been work on other
multi-scale approaches The scale space representation of a signal obtained by Gaussian smoothing satisfies a number of special properties, scale-space axioms, which make it into a special form of multi-scale representation. There are, however, also other types of "multi ...
, such as
pyramids A pyramid (from el, πυραμίς ') is a structure whose outer surfaces are triangular and converge to a single step at the top, making the shape roughly a pyramid in the geometric sense. The base of a pyramid can be trilateral, quadrilat ...
and a variety of other kernels, that do not exploit or require the same requirements as true scale-space descriptions do.


Relations to biological vision and hearing

There are interesting relations between scale-space representation and biological vision and hearing. Neurophysiological studies of biological vision have shown that there are
receptive field The receptive field, or sensory space, is a delimited medium where some physiological stimuli can evoke a sensory neuronal response in specific organisms. Complexity of the receptive field ranges from the unidimensional chemical structure of o ...
profiles in the mammalian
retina The retina (from la, rete "net") is the innermost, light-sensitive layer of tissue of the eye of most vertebrates and some molluscs. The optics of the eye create a focused two-dimensional image of the visual world on the retina, which then ...
and
visual cortex The visual cortex of the brain is the area of the cerebral cortex that processes visual information. It is located in the occipital lobe. Sensory input originating from the eyes travels through the lateral geniculate nucleus in the thalamus and ...
, that can be well modelled by linear Gaussian derivative operators, in some cases also complemented by a non-isotropic affine scale-space model, a spatio-temporal scale-space model and/or non-linear combinations of such linear operators.Lindeberg, T. (2021) Normative theory of visual receptive fields, Heliyon 7(1): e05897
/ref> Regarding biological hearing there are
receptive field The receptive field, or sensory space, is a delimited medium where some physiological stimuli can evoke a sensory neuronal response in specific organisms. Complexity of the receptive field ranges from the unidimensional chemical structure of o ...
profiles in the
inferior colliculus The inferior colliculus (IC) (Latin for ''lower hill'') is the principal midbrain nucleus of the auditory pathway and receives input from several peripheral brainstem nuclei in the auditory pathway, as well as inputs from the auditory cortex. T ...
and the
primary auditory cortex The auditory cortex is the part of the temporal lobe that processes auditory information in humans and many other vertebrates. It is a part of the auditory system, performing basic and higher functions in hearing, such as possible relations to ...
that can be well modelled by spectra-temporal receptive fields that can be well modelled by Gaussian derivates over logarithmic frequencies and windowed Fourier transforms over time with the window functions being temporal scale-space kernels.T. Lindeberg and A. Friberg "Idealized computational models of auditory receptive fields", PLOS ONE, 10(3): e0119032, pages 1–58, 2015
/ref>T. Lindeberg and A. Friberg (2015) ``Scale-space theory for auditory signals", Proc. SSVM 2015: Scale-Space and Variational Methods in Computer Vision, Springer LNCS 9087: 3–15.
/ref>


Deep learning and scale space

In the area of classical computer vision, scale-space theory has established itself as a theoretical framework for early vision, with Gaussian derivatives constituting a canonical model for the first layer of receptive fields. With the introduction of deep learning, there has also been work on also using Gaussian derivatives or Gaussian kernels as a general basis for receptive fields in deep networks.Jacobsen, J.J., van Gemert, J., Lou, Z., Smeulders, A.W.M. (2016) Structured receptive fields in CNNs. In: Proceedings of Computer Vision and Pattern Recognition, pp. 2610–2619.
/ref>Worrall, D., Welling, M. (2019) Deep scale-spaces: Equivariance over scale. In: Advances in Neural Information Processing Systems (NeurIPS 2019), pp. 7366–7378.
/ref>Lindeberg, T. (2020) Provably scale-covariant continuous hierarchical networks based on scale-normalized differential expressions coupled in cascade. J. Math. Imaging Vis. 62, 120–148.
/ref>Lindeberg, T. (2022) Scale-covariant and scale-invariant Gaussian derivative networks. J. Math. Imaging Vis. 64, 223–242.
/ref>Pintea, S. L., Tömen, N., Goes, S. F., Loog, M., & van Gemert, J. C. (2021). Resolution learning in deep convolutional networks using scale-space theory. IEEE Transactions on Image Processing, 30, 8342-8353.
/ref> Using the transformation properties of the Gaussian derivatives and Gaussian kernels under scaling transformations, it is in this way possible to obtain scale covariance/equivariance and scale invariance of the deep network to handle image structures at different scales in a theoretically well-founded manner. There have also been approaches developed to obtain scale covariance/equivariance and scale invariance by learned filters combined with multiple scale channels.Sosnovik, I., Szmaja, M., Smeulders, A. (2020) Scale-equivariant steerable networks. In: International Conference on Learning Representations.
/ref>Bekkers, E.J.: B-spline CNNs on Lie groups (2020) In: International Conference on Learning Representations.
/ref>Jansson, Y., Lindeberg, T. (2021) Exploring the ability of CNNs to generalise to previously unseen scales over wide scale ranges. In: International Conference on Pattern Recognition (ICPR 2020), pp. 1181–1188.
/ref>Sosnovik, I., Moskalev, A., Smeulders, A. (2021) DISCO: Accurate discrete scale convolutions. In: British Machine Vision Conference.
/ref>Jansson, Y., Lindeberg, T. (2022) Scale-invariant scale-channel networks: Deep networks that generalise to previously unseen scales, Journal of Mathematical Imaging and Vision, 64(5): 506-536.
/ref>Zhu, W., Qiu, Q., Calderbank, R., Sapiro, G., & Cheng, X. (2022) Scaling-translation-equivariant networks with decomposed convolutional filters. Journal of Machine Learning Research, 23(68): 1-45.
/ref> Specifically, using the notions of scale covariance/equivariance and scale invariance, it is possible to make deep networks operate robustly at scales not spanned by the training data, thus enabling scale generalization.


Implementation issues

When implementing scale-space smoothing in practice there are a number of different approaches that can be taken in terms of continuous or discrete Gaussian smoothing, implementation in the Fourier domain, in terms of pyramids based on binomial filters that approximate the Gaussian or using recursive filters. More details about this are given in a separate article on
scale space implementation In the areas of computer vision, image analysis and signal processing, the notion of scale-space representation is used for processing measurement data at multiple scales, and specifically enhance or suppress image features over different ranges o ...
.


See also

*
Difference of Gaussians In imaging science, difference of Gaussians (DoG) is a feature enhancement algorithm that involves the subtraction of one Gaussian blurred version of an original image from another, less blurred version of the original. In the simple case of grays ...
*
Gaussian function In mathematics, a Gaussian function, often simply referred to as a Gaussian, is a function of the base form f(x) = \exp (-x^2) and with parametric extension f(x) = a \exp\left( -\frac \right) for arbitrary real constants , and non-zero . It is ...
*
mipmap In computer graphics, mipmaps (also MIP maps) or pyramids are pre-calculated, optimized sequences of images, each of which is a progressively lower resolution representation of the previous. The height and width of each image, or level, in the ...
ping


References


Further reading

*
Lindeberg, Tony: Scale-space theory: A basic tool for analysing structures at different scales, in J. of Applied Statistics, 21(2), pp. 224–270, 1994.
(longer pdf tutorial on scale-space)
Lindeberg, Tony: Scale-space: A framework for handling image structures at multiple scales, Proc. CERN School of Computing, 96(8): 27-38, 1996.Romeny, Bart ter Haar: Introduction to Scale-Space Theory: Multiscale Geometric Image Analysis, Tutorial VBC ’96, Hamburg, Germany, Fourth International Conference on Visualization in Biomedical Computing.Florack, Luc, Romeny, Bart ter Haar, Viergever, Max, & Koenderink, Jan: Linear scale space, Journal of Mathematical Imaging and Vision volume 4: 325–351, 1994.Lindeberg, Tony, "Principles for automatic scale selection", In: B. Jähne (et al., eds.), Handbook on Computer Vision and Applications, volume 2, pp 239—274, Academic Press, Boston, USA, 1999.
(tutorial on approaches to automatic scale selection)
Lindeberg, Tony: "Scale-space theory"
In: Encyclopedia of Mathematics, (
Michiel Hazewinkel Michiel Hazewinkel (born 22 June 1943) is a Dutch mathematician, and Emeritus Professor of Mathematics at the Centre for Mathematics and Computer Science and the University of Amsterdam, particularly known for his 1978 book ''Formal groups and a ...
, ed) Kluwer, 1997. *Web archive backup
Lecture on scale-space at the University of Massachusetts
(pdf)


External links



* * ttp://www.mathworks.fr/matlabcentral/fileexchange/42927-find-peaks-using-scale-space-approach Peak detection in 1D data using a scale-space approach BSD-licensed MATLAB code {{DEFAULTSORT:Scale Space Image processing Computer vision