Scale Space Implementation
   HOME

TheInfoList



OR:

In the areas of
computer vision Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the hum ...
,
image analysis Image analysis or imagery analysis is the extraction of meaningful information from images; mainly from digital images by means of digital image processing techniques. Image analysis tasks can be as simple as reading bar coded tags or as sophi ...
and
signal processing Signal processing is an electrical engineering subfield that focuses on analyzing, modifying and synthesizing ''signals'', such as audio signal processing, sound, image processing, images, and scientific measurements. Signal processing techniq ...
, the notion of scale-space representation is used for processing measurement data at multiple scales, and specifically enhance or suppress image features over different ranges of scale (see the article on
scale space Scale-space theory is a framework for multi-scale signal representation developed by the computer vision, image processing and signal processing communities with complementary motivations from physics and biological vision. It is a formal theor ...
). A special type of scale-space representation is provided by the Gaussian scale space, where the image data in ''N'' dimensions is subjected to smoothing by Gaussian
convolution In mathematics (in particular, functional analysis), convolution is a operation (mathematics), mathematical operation on two function (mathematics), functions ( and ) that produces a third function (f*g) that expresses how the shape of one is ...
. Most of the theory for Gaussian scale space deals with continuous images, whereas one when implementing this theory will have to face the fact that most measurement data are discrete. Hence, the theoretical problem arises concerning how to discretize the continuous theory while either preserving or well approximating the desirable theoretical properties that lead to the choice of the Gaussian kernel (see the article on
scale-space axioms In image processing and computer vision, a scale space framework can be used to represent an image as a family of gradually smoothed images. This framework is very general and a variety of scale space representations exist. A typical approach ...
). This article describes basic approaches for this that have been developed in the literature.


Statement of the problem

The Gaussian scale-space representation of an ''N''-dimensional continuous signal, :f_C \left (x_1, \cdots, x_N, t \right), is obtained by convolving ''fC'' with an ''N''-dimensional
Gaussian kernel In mathematics, a Gaussian function, often simply referred to as a Gaussian, is a function of the base form f(x) = \exp (-x^2) and with parametric extension f(x) = a \exp\left( -\frac \right) for arbitrary real constants , and non-zero . It is ...
: :g_N \left (x_1, \cdots, x_N, t \right). In other words: :L \left (x_1, \cdots, x_N, t \right ) =\int_^ \cdots \int_^ f_C \left (x_1-u_1, \cdots, x_N-u_N, t \right ) \cdot g_N \left(u_1, \cdots, u_N, t \right) \, du_1 \cdots du_N. However, for implementation, this definition is impractical, since it is continuous. When applying the scale space concept to a discrete signal ''fD'', different approaches can be taken. This article is a brief summary of some of the most frequently used methods.


Separability

Using the ''separability property'' of the Gaussian kernel :g_N \left (x_1,\dots, x_N, t \right) = G\left(x_1, t \right) \cdots G\left(x_N, t\right) the ''N''-dimensional
convolution In mathematics (in particular, functional analysis), convolution is a operation (mathematics), mathematical operation on two function (mathematics), functions ( and ) that produces a third function (f*g) that expresses how the shape of one is ...
operation can be decomposed into a set of separable smoothing steps with a one-dimensional Gaussian kernel ''G'' along each dimension :L(x_1, \cdots, x_N, t) = \int_^ \cdots \int_^ f_C(x_1-u_1, \cdots, x_N-u_N, t) G(u_1, t) \, du_1 \cdots G(u_N, t) \, du_N, where :G(x, t) = \frac e^ and the standard deviation of the Gaussian σ is related to the scale parameter ''t'' according to ''t'' = σ2. Separability will be assumed in all that follows, even when the kernel is not exactly Gaussian, since separation of the dimensions is the most practical way to implement multidimensional smoothing, especially at larger scales. Therefore, the rest of the article focuses on the one-dimensional case.


The sampled Gaussian kernel

When implementing the one-dimensional smoothing step in practice, the presumably simplest approach is to convolve the discrete signal ''fD'' with a ''sampled Gaussian kernel'': :L(x, t) = \sum_^ f(x-n) \, G(n, t) where :G(n, t) = \frac e^ (with ''t'' = σ2) which in turn is truncated at the ends to give a filter with finite impulse response :L(x, t) = \sum_^ f(x-n) \, G(n, t) for ''M'' chosen sufficiently large (see
error function In mathematics, the error function (also called the Gauss error function), often denoted by , is a complex function of a complex variable defined as: :\operatorname z = \frac\int_0^z e^\,\mathrm dt. This integral is a special (non-elementary ...
) such that :2 \int_M^ G(u, t) \, du = 2 \int_^ G(v, 1) \, dv < \varepsilon. A common choice is to set ''M'' to a constant ''C'' times the standard deviation of the Gaussian kernel :M = C \sigma + 1 = C \sqrt + 1 where ''C'' is often chosen somewhere between 3 and 6. Using the sampled Gaussian kernel can, however, lead to implementation problems, in particular when computing higher-order derivatives at finer scales by applying sampled derivatives of Gaussian kernels. When accuracy and robustness are primary design criteria, alternative implementation approaches should therefore be considered. For small values of ε (10−6 to 10−8) the errors introduced by truncating the Gaussian are usually negligible. For larger values of ε, however, there are many better alternatives to a rectangular
window function In signal processing and statistics, a window function (also known as an apodization function or tapering function) is a mathematical function that is zero-valued outside of some chosen interval, normally symmetric around the middle of the inte ...
. For example, for a given number of points, a
Hamming window In discrete-time signal processing, windowing is a preliminary signal shaping technique, usually applied to improve the appearance and usefulness of a subsequent Discrete Fourier Transform. Several '' window functions'' can be defined, based on ...
, Blackman window, or
Kaiser window The Kaiser window, also known as the Kaiser–Bessel window, was developed by James Kaiser at Bell Laboratories. It is a one-parameter family of window functions used in finite impulse response filter design and spectral estimation, spectral analy ...
will do less damage to the spectral and other properties of the Gaussian than a simple truncation will. Notwithstanding this, since the Gaussian kernel decreases rapidly at the tails, the main recommendation is still to use a sufficiently small value of ε such that the truncation effects are no longer important.


The discrete Gaussian kernel

] A more refined approach is to convolve the original signal with the ''discrete Gaussian kernel'' ''T''(''n'', ''t'')Lindeberg, T., "Scale-space for discrete signals," PAMI(12), No. 3, March 1990, pp. 234-254.
/ref>
R.A. Haddad and A.N. Akansu,
A Class of Fast Gaussian Binomial Filters for Speech and Image Processing
" IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 39, pp 723-727, March 1991.
:L(x, t) = \sum_^ f(x-n) \, T(n, t) where :T(n, t) = e^ I_n(t) and I_n(t) denotes the
modified Bessel function Bessel functions, first defined by the mathematician Daniel Bernoulli and then generalized by Friedrich Bessel, are canonical solutions of Bessel's differential equation x^2 \frac + x \frac + \left(x^2 - \alpha^2 \right)y = 0 for an arbitrary ...
s of integer order, ''n''. This is the discrete counterpart of the continuous Gaussian in that it is the solution to the discrete
diffusion equation The diffusion equation is a parabolic partial differential equation. In physics, it describes the macroscopic behavior of many micro-particles in Brownian motion, resulting from the random movements and collisions of the particles (see Fick's la ...
(discrete space, continuous time), just as the continuous Gaussian is the solution to the continuous diffusion equation. This filter can be truncated in the spatial domain as for the sampled Gaussian :L(x, t) = \sum_^ f(x-n) \, T(n, t) or can be implemented in the Fourier domain using a closed-form expression for its
discrete-time Fourier transform In mathematics, the discrete-time Fourier transform (DTFT) is a form of Fourier analysis that is applicable to a sequence of values. The DTFT is often used to analyze samples of a continuous function. The term ''discrete-time'' refers to the ...
: :\widehat(\theta, t) = \sum_^ T(n, t) \, e^ = e^. With this frequency-domain approach, the scale-space properties transfer ''exactly'' to the discrete domain, or with excellent approximation using periodic extension and a suitably long
discrete Fourier transform In mathematics, the discrete Fourier transform (DFT) converts a finite sequence of equally-spaced samples of a function into a same-length sequence of equally-spaced samples of the discrete-time Fourier transform (DTFT), which is a complex- ...
to approximate the
discrete-time Fourier transform In mathematics, the discrete-time Fourier transform (DTFT) is a form of Fourier analysis that is applicable to a sequence of values. The DTFT is often used to analyze samples of a continuous function. The term ''discrete-time'' refers to the ...
of the signal being smoothed. Moreover, higher-order derivative approximations can be computed in a straightforward manner (and preserving scale-space properties) by applying small support central difference operators to the discrete
scale space representation Scale-space theory is a framework for multi-scale signal representation developed by the computer vision, image processing and signal processing communities with complementary motivations from physics and biological vision. It is a formal th ...
. As with the sampled Gaussian, a plain truncation of the infinite impulse response will in most cases be a sufficient approximation for small values of ε, while for larger values of ε it is better to use either a decomposition of the discrete Gaussian into a cascade of generalized binomial filters or alternatively to construct a finite approximate kernel by multiplying by a
window function In signal processing and statistics, a window function (also known as an apodization function or tapering function) is a mathematical function that is zero-valued outside of some chosen interval, normally symmetric around the middle of the inte ...
. If ε has been chosen too large such that effects of the truncation error begin to appear (for example as spurious extrema or spurious responses to higher-order derivative operators), then the options are to decrease the value of ε such that a larger finite kernel is used, with cutoff where the support is very small, or to use a tapered window.


Recursive filters

Since computational efficiency is often important, low-order ''
recursive filter In signal processing, a recursive filter is a type of filter which re-uses one or more of its outputs as an input. This feedback typically results in an unending impulse response (commonly referred to as ''infinite impulse response'' (IIR)), chara ...
s'' are often used for scale-space smoothing. For example, Young and van Vliet use a third-order recursive filter with one real
pole Pole may refer to: Astronomy *Celestial pole, the projection of the planet Earth's axis of rotation onto the celestial sphere; also applies to the axis of rotation of other planets *Pole star, a visible star that is approximately aligned with the ...
and a pair of complex poles, applied forward and backward to make a sixth-order symmetric approximation to the Gaussian with low computational complexity for any smoothing scale. By relaxing a few of the axioms, Lindeberg concluded that good smoothing filters would be "normalized Pólya frequency sequences", a family of discrete kernels that includes all filters with real poles at 0 < ''Z'' < 1 and/or ''Z'' > 1, as well as with real zeros at ''Z'' < 0. For symmetry, which leads to approximate directional homogeneity, these filters must be further restricted to pairs of poles and zeros that lead to zero-phase filters. To match the transfer function curvature at zero frequency of the discrete Gaussian, which ensures an approximate
semi-group In mathematics, a semigroup is an algebraic structure consisting of a set together with an associative internal binary operation on it. The binary operation of a semigroup is most often denoted multiplicatively: ''x''·''y'', or simply ''xy'', ...
property of additive ''t'', two poles at :Z = 1 + \frac - \sqrt can be applied forward and backwards, for symmetry and stability. This filter is the simplest implementation of a normalized Pólya frequency sequence kernel that works for any smoothing scale, but it is not as excellent an approximation to the Gaussian as Young and van Vliet's filter, which is ''not'' normalized Pólya frequency sequence, due to its complex poles. The transfer function, ''H''1, of a symmetric pole-pair recursive filter is closely related to the
discrete-time Fourier transform In mathematics, the discrete-time Fourier transform (DTFT) is a form of Fourier analysis that is applicable to a sequence of values. The DTFT is often used to analyze samples of a continuous function. The term ''discrete-time'' refers to the ...
of the discrete Gaussian kernel via first-order approximation of the exponential: :\widehat(\theta, t) = \frac \approx \frac = H_1(\theta, t), where the ''t'' parameter here is related to the stable pole position ''Z'' = ''p'' via: :t = \frac. Furthermore, such filters with ''N'' pairs of poles, such as the two pole pairs illustrated in this section, are an even better approximation to the exponential: :\frac = H_N(\theta, t), where the stable pole positions are adjusted by solving: :\frac = \frac. The impulse responses of these filters are not very close to gaussian unless more than two pole pairs are used. However, even with only one or two pole pairs per scale, a signal successively smoothed at increasing scales will be very close to a gaussian-smoothed signal. The semi-group property is poorly approximated when too few pole pairs are used.
Scale-space axioms In image processing and computer vision, a scale space framework can be used to represent an image as a family of gradually smoothed images. This framework is very general and a variety of scale space representations exist. A typical approach ...
that are still satisfied by these filters are: *''linearity'' *''shift invariance'' (integer shifts) *''non-creation of local extrema'' (zero-crossings) in one dimension *''non-enhancement of local extrema'' in any number of dimensions *''positivity'' *''normalization'' The following are only approximately satisfied, the approximation being better for larger numbers of pole pairs: *existence of an ''infinitesimal generator'' ''A'' (the infinitesimal generator of the discrete Gaussian, or a filter approximating it, approximately maps a recursive filter response to one of infinitesimally larger ''t'') *the ''semi-group structure'' with the associated ''cascade smoothing property'' (this property is approximated by considering kernels to be equivalent when they have the same ''t'' value, even if they are not quite equal) *''rotational symmetry'' *''scale invariance'' This recursive filter method and variations to compute both the Gaussian smoothing as well as Gaussian derivatives has been described by several authors. Tan ''et al.'' have analyzed and compared some of these approaches, and have pointed out that the Young and van Vliet filters are a cascade (multiplication) of forward and backward filters, while the Deriche and the Jin ''et al.'' filters are sums of forward and backward filters. At fine scales, the recursive filtering approach as well as other separable approaches are not guaranteed to give the best possible approximation to rotational symmetry, so non-separable implementations for 2D images may be considered as an alternative. When computing several derivatives in the
N-jet An ''N''-jet is the set of (partial) derivatives of a function f(x) up to order ''N''. Specifically, in the area of computer vision, the ''N''-jet is usually computed from a scale space representation L of the input image f(x, y), and the p ...
simultaneously, discrete scale-space smoothing with the discrete analogue of the Gaussian kernel, or with a recursive filter approximation, followed by small support difference operators, may be both faster and more accurate than computing recursive approximations of each derivative operator.


Finite-impulse-response (FIR) smoothers

For small scales, a low-order
FIR filter In signal processing, a finite impulse response (FIR) filter is a filter whose impulse response (or response to any finite length input) is of ''finite'' duration, because it settles to zero in finite time. This is in contrast to infinite impulse ...
may be a better smoothing filter than a recursive filter. The symmetric 3-kernel , for ''t'' ≤ 0.5 smooths to a scale of ''t'' using a pair of real zeros at ''Z'' < 0, and approaches the discrete Gaussian in the limit of small ''t''. In fact, with infinitesimal ''t'', either this two-zero filter or the two-pole filter with poles at ''Z'' = ''t''/2 and ''Z'' = 2/''t'' can be used as the infinitesimal generator for the discrete Gaussian kernels described above. The FIR filter's zeros can be combined with the recursive filter's poles to make a general high-quality smoothing filter. For example, if the smoothing process is to always apply a biquadratic (two-pole, two-zero) filter forward then backwards on each row of data (and on each column in the 2D case), the poles and zeros can each do a part of the smoothing. The zeros limit out at ''t'' = 0.5 per pair (zeros at ''Z'' = –1), so for large scales the poles do most of the work. At finer scales, the combination makes an excellent approximation to the discrete Gaussian if the poles and zeros each do about half the smoothing. The ''t'' values for each portion of the smoothing (poles, zeros, forward and backward multiple applications, etc.) are additive, in accordance with the approximate semi-group property. The FIR filter transfer function is closely related to the discrete Gaussian's DTFT, just as was the recursive filter's. For a single pair of zeros, the transfer function is :\widehat(\theta, t) = e^ \approx = F_1(\theta, t), where the ''t'' parameter here is related to the zero positions ''Z'' = ''z'' via: :t = -\frac, and we require ''t'' ≤ 0.5 to keep the transfer function non-negative. Furthermore, such filters with ''N'' pairs of zeros, are an even better approximation to the exponential and extend to higher values of ''t'' : :\left(1 -\frac(1 - \cos \theta) \right)^N = F_N(\theta, t), where the stable zero positions are adjusted by solving: :\frac = -\frac. These FIR and pole-zero filters are valid scale-space kernels, satisfying the same axioms as the all-pole recursive filters.


Real-time implementation within pyramids and discrete approximation of scale-normalized derivatives

Regarding the topic of automatic scale selection based on normalized derivatives, pyramid approximations are frequently used to obtain real-time performance.Lowe, D. G., “Distinctive image features from scale-invariant keypoints”, International Journal of Computer Vision, 60, 2, pp. 91-110, 2004.
/ref> The appropriateness of approximating scale-space operations within a pyramid originates from the fact that repeated cascade smoothing with generalized binomial kernels leads to equivalent smoothing kernels that under reasonable conditions approach the Gaussian. Furthermore, the binomial kernels (or more generally the class of generalized binomial kernels) can be shown to constitute the unique class of finite-support kernels that guarantee non-creation of local extrema or zero-crossings with increasing scale (see the article on
multi-scale approaches The scale space representation of a signal obtained by Gaussian smoothing satisfies a number of special properties, scale-space axioms, which make it into a special form of multi-scale representation. There are, however, also other types of "multi ...
for details). Special care may, however, need to be taken to avoid discretization artifacts.


Other multi-scale approaches

For one-dimensional kernels, there is a well-developed theory of
multi-scale approaches The scale space representation of a signal obtained by Gaussian smoothing satisfies a number of special properties, scale-space axioms, which make it into a special form of multi-scale representation. There are, however, also other types of "multi ...
, concerning filters that do not create new local extrema or new zero-crossings with increasing scales. For continuous signals, filters with real poles in the ''s''-plane are within this class, while for discrete signals the above-described recursive and FIR filters satisfy these criteria. Combined with the strict requirement of a continuous semi-group structure, the continuous Gaussian and the discrete Gaussian constitute the unique choice for continuous and discrete signals. There are many other multi-scale signal processing, image processing and data compression techniques, using
wavelets A wavelet is a wave-like oscillation with an amplitude that begins at zero, increases or decreases, and then returns to zero one or more times. Wavelets are termed a "brief oscillation". A taxonomy of wavelets has been established, based on the num ...
and a variety of other kernels, that do not exploit or require the same requirements as
scale space Scale-space theory is a framework for multi-scale signal representation developed by the computer vision, image processing and signal processing communities with complementary motivations from physics and biological vision. It is a formal theor ...
descriptions do; that is, they do not depend on a coarser scale not generating a new extremum that was not present at a finer scale (in 1D) or non-enhancement of local extrema between adjacent scale levels (in any number of dimensions).


See also

*
Scale space Scale-space theory is a framework for multi-scale signal representation developed by the computer vision, image processing and signal processing communities with complementary motivations from physics and biological vision. It is a formal theor ...
*
Pyramid (image processing) Pyramid, or pyramid representation, is a type of multi-scale signal representation developed by the computer vision, image processing and signal processing communities, in which a signal or an image is subject to repeated smoothing and subsampl ...
*
Multi-scale approaches The scale space representation of a signal obtained by Gaussian smoothing satisfies a number of special properties, scale-space axioms, which make it into a special form of multi-scale representation. There are, however, also other types of "multi ...
*
Gaussian filter In electronics and signal processing mainly in digital signal processing, a Gaussian filter is a filter whose impulse response is a Gaussian function (or an approximation to it, since a true Gaussian response would have infinite impulse response) ...


References

Image processing Computer vision Gaussian function