Semi-global matching (SGM) is a

computer vision Computer vision is an Interdisciplinarity, interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate t ...

algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...

for the estimation of a dense disparity map from a rectified stereo image pair, introduced in 2005 by Heiko Hirschmüller while working at the

German Aerospace Center The German Aerospace Center (german: Deutsches Zentrum für Luft- und Raumfahrt e.V., abbreviated DLR, literally ''German Center for Air- and Space-flight'') is the national center for aerospace, energy and transportation research of Germany ...

.Hirschmüller (2005), pp. 807-814 Given its predictable run time, its favourable trade-off between quality of the results and computing time, and its suitability for fast parallel implementation in

ASIC An application-specific integrated circuit (ASIC ) is an integrated circuit (IC) chip customized for a particular use, rather than intended for general-purpose use, such as a chip designed to run in a digital voice recorder or a high-effici ...

FPGA A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by a customer or a designer after manufacturinghence the term ''Field-programmability, field-programmable''. The FPGA configuration is generally specifi ...

, it has encountered wide adoption in

real-time Real-time or real time describes various operations in computing or other processes that must guarantee response times within a specified time (deadline), usually a relatively short time. A real-time process is generally one that happens in defined ...

stereo vision applications such as

robotic Robotics is an interdisciplinary branch of computer science and engineering. Robotics involves design, construction, operation, and use of robots. The goal of robotics is to design machines that can help and assist humans. Robotics integrate ...

s and

advanced driver assistance system An advanced driver-assistance system (ADAS) is any of a groups of electronic technologies that assist drivers in driving and parking functions. Through a safe human-machine interface, ADAS increase car and road safety. ADAS uses automated techno ...

s.Hirschmüller (2011), pp. 178–184

Problem

Pixelwise stereo matching allows to perform real-time calculation of disparity maps by measuring the similarity of each pixel in one stereo image to each pixel within a subset in the other stereo image. Given a rectified stereo image pair, for a pixel with coordinates

(x, y)

the set of pixels in the other image is usually selected as

\

, where

D

is a maximum allowed disparity shift. A simple search for the best matching pixel produces many spurious matches, and this problem can be mitigated with the addition of a regularisation term that penalises jumps in disparity between adjacent pixels, with a cost function in the form :

E(\boldsymbol) = \sum_p D(p, d_p) + \sum_ R(p, d_p, q, d_q)

where

D(p, d_p)

is the pixel-wise dissimilarity cost at pixel

p

with disparity

d_p

, and

R(p, d_p, q, d_q)

is the regularisation cost between pixels

p

and

q

with disparities

d_p

and

d_q

respectively, for all pairs of neighbouring pixels

\mathcal

. Such constraint can be efficiently enforced on a per-scanline basis by using

dynamic programming Dynamic programming is both a mathematical optimization method and a computer programming method. The method was developed by Richard Bellman in the 1950s and has found applications in numerous fields, from aerospace engineering to economics. I ...

(e.g. the

Viterbi algorithm The Viterbi algorithm is a dynamic programming algorithm for obtaining the maximum a posteriori probability estimate of the most likely sequence of hidden states—called the Viterbi path—that results in a sequence of observed events, especially ...

), but such limitation can still introduce streaking artefacts in the depth map, because little or no regularisation is performed across scanlines. A possible solution is to perform global optimisation in 2D, which is however an

NP-complete In computational complexity theory, a problem is NP-complete when: # it is a problem for which the correctness of each solution can be verified quickly (namely, in polynomial time) and a brute-force search algorithm can find a solution by tryin ...

problem in the general case. For some families of cost functions (e.g.

submodular function In mathematics, a submodular set function (also known as a submodular function) is a set function whose value, informally, has the property that the difference in the incremental value of the function that a single element makes when added to an ...

s) a solution with strong optimality properties can be found in polynomial time using graph cut optimization, however such global methods are generally too expensive for real-time processing.

Algorithm

The idea behind SGM is to perform line optimisation along multiple directions and computing an aggregated cost

S(p, d)

by summing the costs to reach pixel

p

with disparity

d

from each direction. The number of directions affects the run time of the algorithm, and while 16 directions usually ensure good quality, a lower number can be used to achieve faster execution.Hirschmüller (2007), p. 331 A typical 8-direction implementation of the algorithm can compute the cost in two passes, a forward pass accumulating the cost from the left, top-left, top, and top-right, and a backward pass accumulating the cost from right, bottom-right, bottom, and bottom-left. A single-pass algorithm can be implemented with only five directions. The cost is composed by a matching term

D(p, d)

and a binary regularisation term

R(d_p, d_q)

. The former can be in principle any local image dissimilarity measure, and commonly used functions are absolute or squared intensity difference (usually summed over a window around the pixel, and after applying a

high-pass filter A high-pass filter (HPF) is an electronic filter that passes signals with a frequency higher than a certain cutoff frequency and attenuates signals with frequencies lower than the cutoff frequency. The amount of attenuation for each frequenc ...

to the images to gain some illumination invariance), Birchfield–Tomasi dissimilarity,

Hamming distance In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number of ''substitutions'' required to chang ...

of the census transform,

Pearson correlation In statistics, the Pearson correlation coefficient (PCC, pronounced ) ― also known as Pearson's ''r'', the Pearson product-moment correlation coefficient (PPMCC), the bivariate correlation, or colloquially simply as the correlation coefficien ...

(

normalized cross-correlation In signal processing, cross-correlation is a measure of similarity of two series as a function of the displacement of one relative to the other. This is also known as a ''sliding dot product'' or ''sliding inner-product''. It is commonly used ...

). Even

mutual information In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual dependence between the two variables. More specifically, it quantifies the " amount of information" (in units such as ...

can be approximated as a sum over the pixels, and thus used as a local similarity metric.Kim et al. (2003), pp. 1033–1040 The regularisation term has the form :

R(d_p, d_q) =
\begin
    0   \quad &d_p = d_q \\
    P_1       &, d_p - d_q,  = 1 \\
    P_2       &, d_p - d_q,  > 1
\end

where

P_1

and

P_2

are two constant parameters, with

P_1 < P_2

. The three-way comparison allows to assign a smaller penalty for unitary changes in disparity, thus allowing smooth transitions corresponding e.g. to slanted surfaces, and penalising larger jumps while preserving discontinuities due to the constant penalty term. To further preserve discontinuities, the gradient of the intensity can be used to adapt the penalty term, because discontinuities in depth usually correspond to a discontinuity in image intensity

I

, by setting :

P_2 = \max \left\

for each pair of pixels

p

and

q

.Hirschmüller (2007), p. 330 The accumulated cost

S(p, d) = \sum_r L_r(p, d)

is the sum of all costs

L_r(p, d)

to reach pixel

p

with disparity

d

along direction

r

. Each term can be expressed recursively as :

L_r(p,d) =
    D(p,d)
    + \min \left\
    - \min_k L_r(p - r, k)

where the minimum cost at the previous pixel

\min_k L_r(p - r, k)

is subtracted for numerical stability, since it is constant for all values of disparity at the current pixel and therefore it does not affect the optimisation. The value of disparity at each pixel is given by

d^*(p) = \operatorname_d S(p, d)

, and sub-pixel accuracy can be achieved by fitting a curve in

d^*(p)

and its neighbouring costs and taking the minimum along the curve. Since the two images in the stereo pair are not treated symmetrically in the calculations, a consistency check can be performed by computing the disparity a second time in the opposite direction, swapping the role of the left and right image, and invalidating the result for the pixels where the result differs between the two calculations. Further post-processing techniques for the refinement of the disparity image include morphological filtering to remove outliers, intensity consistency checks to refine textureless regions, and interpolation to fill in pixels invalidated by consistency checks.Hirschmüller (2007), p. 332-334 The cost volume

C(p,d)

for all values of

p = (x, y)

and

d

can be precomputed and in an implementation of the full algorithm, using

D

possible disparity shifts and

R

directions, each pixel is subsequently visited

R

times, therefore the

computational complexity In computer science, the computational complexity or simply complexity of an algorithm is the amount of resources required to run it. Particular focus is given to computation time (generally measured by the number of needed elementary operations) ...

of the algorithm for an image of size

W \times H

O(WHD)

.Hirschmüller et al. (2012), p. 372

Memory efficient variant

The main drawback of SGM is its memory consumption. An implementation of the two-pass 8-directions version of the algorithm requires to store

W \times H \times D + 3 \times W \times D + D

elements, since the accumulated cost volume has a size of

W \times H \times D

and to compute the cost for a pixel during each pass it is necessary to keep track of the

D

path costs of its left or right neighbour along one direction and of the

W \times D

path costs of the pixels in the row above or below along 3 directions. One solution to reduce memory consumption is to compute SGM on partially overlapping image tiles, interpolating the values over the overlapping regions. This method also allows to apply SGM to very large images, that would not fit within memory in the first place.Hirschmüller (2007), p. 334-335 A memory-efficient approximation of SGM stores for each pixel only the costs for the disparity values that represent a minimum along some direction, instead of all possible disparity values. The true minimum is highly likely to be predicted by the minima along the eight directions, thus yielding similar quality of the results. The algorithm uses eight directions and three passes, and during the first pass it stores for each pixel the cost for the optimal disparity along the four top-down directions, plus the two closest lower and higher values (for sub-pixel interpolation). Since the cost volume is stored in a sparse fashion, the four values of optimal disparity need also to be stored. In the second pass, the other four bottom-up directions are computed, completing the calculations for the four disparity values selected in the first pass, that now have been evaluated along all eight directions. An intermediate value of cost and disparity is computed from the output of the first pass and stored, and the memory of the four outputs from the first pass is replaced with the four optimal disparity values and their costs from the directions in the second pass. A third pass goes again along the same directions used in the first pass, completing the calculations for the disparity values from the second pass. The final result is then selected among the four minima from the third pass and the intermediate result computed during the second pass.Hirschmüller et al. (2012), p. 373 In each pass four disparity values are stored, together with three cost values each (the minimum and its two closest neighbouring costs), plus the disparity and cost values of the intermediate result, for a total of eighteen values for each pixel, making the total memory consumption equal to

18 \times W \times H + 3 \times W \times D + D

, at the cost in time of an additional pass over the image.

References

* * * * * *

External links

* {{cite web, url=http://www.6d-vision.com/home/stereovision, title=Stereo vision, archive-url=https://web.archive.org/web/20190101105327/http://www.6d-vision.com/home/stereovision, archive-date=2019-01-01 Geometry in computer vision

Problem

Algorithm

Memory efficient variant

See also

References

External links