HOME

TheInfoList



OR:

In
digital image processing Digital image processing is the use of a digital computer to process digital images through an algorithm. As a subcategory or field of digital signal processing, digital image processing has many advantages over analog image processing. It allo ...
, the sum of absolute differences (SAD) is a measure of the similarity between image blocks. It is calculated by taking the
absolute difference The absolute difference of two real numbers x and y is given by , x-y, , the absolute value of their difference. It describes the distance on the real line between the points corresponding to x and y. It is a special case of the Lp distance for a ...
between each
pixel In digital imaging, a pixel (abbreviated px), pel, or picture element is the smallest addressable element in a raster image, or the smallest point in an all points addressable display device. In most digital display devices, pixels are the smal ...
in the original block and the corresponding pixel in the block being used for comparison. These differences are summed to create a simple metric of block similarity, the ''L''1 norm of the difference image or
Manhattan distance A taxicab geometry or a Manhattan geometry is a geometry whose usual distance function or Metric (mathematics), metric of Euclidean geometry is replaced by a new metric in which the distance between two points is the sum of the absolute differences ...
between two image blocks. The sum of absolute differences may be used for a variety of purposes, such as
object recognition Object recognition – technology in the field of computer vision for finding and identifying objects in an image or video sequence. Humans recognize a multitude of objects in images with little effort, despite the fact that the image of the ...
, the generation of disparity maps for
stereo Stereophonic sound, or more commonly stereo, is a method of sound reproduction that recreates a multi-directional, 3-dimensional audible perspective. This is usually achieved by using two independent audio channels through a configuration ...
images, and
motion estimation Motion estimation is the process of determining ''motion vectors'' that describe the transformation from one 2D image to another; usually from adjacent frames in a video sequence. It is an ill-posed problem as the motion is in three dimensions b ...
for
video compression In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression ...
.


Example

This example uses the sum of absolute differences to identify which part of a search image is most similar to a template image. In this example, the template image is 3 by 3 pixels in size, while the search image is 3 by 5 pixels in size. Each pixel is represented by a single
integer An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign (−1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the language ...
from 0 to 9.
Template    Search image
 2 5 5       2 7 5 8 6
 4 0 7       1 7 4 2 7
 7 5 9       8 4 6 8 5
There are exactly three unique locations within the search image where the template may fit: the left side of the image, the center of the image, and the right side of the image. To calculate the SAD values, the absolute value of the difference between each corresponding pair of pixels is used: the difference between 2 and 2 is 0, 4 and 1 is 3, 7 and 8 is 1, and so forth. Calculating the values of the absolute differences for each pixel, for the three possible template locations, gives the following:
Left    Center   Right
0 2 0   5 0 3    3 3 1
3 7 3   3 4 5    0 2 0
1 1 3   3 1 1    1 3 4
For each of these three image patches, the 9 absolute differences are added together, giving SAD values of 20, 25, and 17, respectively. From these SAD values, it could be asserted that the right side of the search image is the most similar to the template image, because it has the lowest sum of absolute differences as compared to the other two locations.


Comparison to other metrics


Object recognition

The sum of absolute differences provides a simple way to automate the searching for objects inside an image, but may be unreliable due to the effects of contextual factors such as changes in lighting, color, viewing direction, size, or shape. The SAD may be used in conjunction with other object recognition methods, such as
edge detection Edge detection includes a variety of mathematical methods that aim at identifying edges, curves in a digital image at which the image brightness changes sharply or, more formally, has discontinuities. The same problem of finding discontinuitie ...
, to improve the reliability of results.


Video compression

SAD is an extremely fast metric due to its simplicity; it is effectively the simplest possible metric that takes into account every
pixel In digital imaging, a pixel (abbreviated px), pel, or picture element is the smallest addressable element in a raster image, or the smallest point in an all points addressable display device. In most digital display devices, pixels are the smal ...
in a block. Therefore, it is very effective for a wide motion search of many different blocks. SAD is also easily
parallelizable In mathematics, a differentiable manifold M of dimension ''n'' is called parallelizable if there exist smooth vector fields \ on the manifold, such that at every point p of M the tangent vectors \ provide a basis of the tangent space at p. Equi ...
since it analyzes each pixel separately, making it easily implementable with such instructions as
ARM NEON ARM (stylised in lowercase as arm, formerly an acronym for Advanced RISC Machines and originally Acorn RISC Machine) is a family of reduced instruction set computer (RISC) instruction set architectures for computer processors, configured ...
or x86 SSE2. For example, SSE has packed sum of absolute differences instruction (PSADBW) specifically for this purpose. Once candidate blocks are found, the final refinement of the motion estimation process is often done with other slower but more accurate metrics, which better take into account
human perception Perception () is the organization, identification, and interpretation of sensory information in order to represent and understand the presented information or environment. All perception involves signals that go through the nervous system ...
. These include the sum of absolute transformed differences (SATD), the
sum of squared differences In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference between ...
(SSD), and rate-distortion optimization.


See also

*
Computer stereo vision Computer stereo vision is the extraction of 3D information from digital images, such as those obtained by a CCD camera. By comparing information about a scene from two vantage points, 3D information can be extracted by examining the relative positi ...
*
Hadamard transform The Hadamard transform (also known as the Walsh–Hadamard transform, Hadamard–Rademacher–Walsh transform, Walsh transform, or Walsh–Fourier transform) is an example of a generalized class of Fourier transforms. It performs an orthogonal ...
*
Motion compensation Motion compensation in computing, is an algorithmic technique used to predict a frame in a video, given the previous and/or future frames by accounting for motion of the camera and/or objects in the video. It is employed in the encoding of video d ...
*
Motion estimation Motion estimation is the process of determining ''motion vectors'' that describe the transformation from one 2D image to another; usually from adjacent frames in a video sequence. It is an ill-posed problem as the motion is in three dimensions b ...
*
Object recognition (computer vision) Object recognition – technology in the field of computer vision for finding and identifying objects in an image or video sequence. Humans recognize a multitude of objects in images with little effort, despite the fact that the image of the ...
* Rate-distortion optimization


References

*{{cite book , last = E. G. Richardson , first = Iain , title = H.264 and MPEG-4 Video Compression: Video Coding for Next-generation Multimedia , publisher = John Wiley & Sons Ltd. , year = 2003 , location = Chichester Video compression Signal processing metrics Loss functions