digital image processing Digital image processing is the use of a digital computer to process digital images through an algorithm. As a subcategory or field of digital signal processing, digital image processing has many advantages over analog image processing. It allo ...

and

computer vision Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the hum ...

, image segmentation is the process of partitioning a

digital image A digital image is an image composed of picture elements, also known as ''pixels'', each with ''finite'', '' discrete quantities'' of numeric representation for its intensity or gray level that is an output from its two-dimensional functions ...

into multiple image segments, also known as image regions or image objects ( sets of

pixel In digital imaging, a pixel (abbreviated px), pel, or picture element is the smallest addressable element in a raster image, or the smallest point in an all points addressable display device. In most digital display devices, pixels are the smal ...

s). The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Linda G. Shapiro and George C. Stockman (2001): “Computer Vision”, pp 279–325, New Jersey, Prentice-Hall, Image segmentation is typically used to locate objects and boundaries (lines, curves, etc.) in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics. The result of image segmentation is a set of segments that collectively cover the entire image, or a set of

contour Contour may refer to: * Contour (linguistics), a phonetic sound * Pitch contour * Contour (camera system), a 3D digital camera system * Contour, the KDE Plasma 4 interface for tablet devices * Contour line, a curve along which the function has a ...

s extracted from the image (see

edge detection Edge detection includes a variety of mathematical methods that aim at identifying edges, curves in a digital image at which the image brightness changes sharply or, more formally, has discontinuities. The same problem of finding discontinuitie ...

). Each of the pixels in a region are similar with respect to some characteristic or computed property , such as

color Color (American English) or colour (British English) is the visual perceptual property deriving from the spectrum of light interacting with the photoreceptor cells of the eyes. Color categories and physical specifications of color are associ ...

intensity Intensity may refer to: In colloquial use *Strength (disambiguation) *Amplitude * Level (disambiguation) * Magnitude (disambiguation) In physical sciences Physics *Intensity (physics), power per unit area (W/m2) *Field strength of electric, ma ...

, or

texture Texture may refer to: Science and technology * Surface texture, the texture means smoothness, roughness, or bumpiness of the surface of an object * Texture (roads), road surface characteristics with waves shorter than road roughness * Texture (c ...

. Adjacent regions are significantly different color respect to the same characteristic(s). When applied to a stack of images, typical in

medical imaging Medical imaging is the technique and process of imaging the interior of a body for clinical analysis and medical intervention, as well as visual representation of the function of some organs or tissues (physiology). Medical imaging seeks to rev ...

, the resulting contours after image segmentation can be used to create

3D reconstruction In computer vision and computer graphics, 3D reconstruction is the process of capturing the shape and appearance of real objects. This process can be accomplished either by active or passive methods. If the model is allowed to change its shape i ...

s with the help of interpolation algorithms like

marching cubes Marching cubes is a computer graphics algorithm, published in the 1987 SIGGRAPH proceedings by Lorensen and Cline, for extracting a polygonal mesh of an isosurface from a three-dimensional discrete scalar field (the elements of which are sometime ...

Applications

Some of the practical applications of image segmentation are: *

Content-based image retrieval Content-based image retrieval, also known as query by image content ( QBIC) and content-based visual information retrieval (CBVIR), is the application of computer vision techniques to the image retrieval problem, that is, the problem of searching ...

Machine vision Machine vision (MV) is the technology and methods used to provide imaging-based automatic inspection and analysis for such applications as automatic inspection, process control, and robot guidance, usually in industry. Machine vision refers to m ...

Medical imaging Medical imaging is the technique and process of imaging the interior of a body for clinical analysis and medical intervention, as well as visual representation of the function of some organs or tissues (physiology). Medical imaging seeks to rev ...

, including volume rendered images from

computed tomography A computed tomography scan (CT scan; formerly called computed axial tomography scan or CAT scan) is a medical imaging technique used to obtain detailed internal images of the body. The personnel that perform CT scans are called radiographers ...

and

magnetic resonance imaging Magnetic resonance imaging (MRI) is a medical imaging technique used in radiology to form pictures of the anatomy and the physiological processes of the body. MRI scanners use strong magnetic fields, magnetic field gradients, and radio wave ...

. ** Locate tumors and other pathologies ** Measure tissue volumes ** Diagnosis, study of anatomical structure ** Surgery planning ** Virtual surgery simulation ** Intra-surgery navigation *

Object detection Object detection is a computer technology related to computer vision and image processing that deals with detecting instances of semantic objects of a certain class (such as humans, buildings, or cars) in digital images and videos. Well-researched ...

** Pedestrian detection **

Face detection Face detection is a computer technology being used in a variety of applications that identifies human faces in digital images. Face detection also refers to the psychological process by which humans locate and attend to faces in a visual scene. ...

** Brake light detection ** Locate objects in satellite images (roads, forests, crops, etc.) * Recognition Tasks **

Face recognition A facial recognition system is a technology capable of matching a human face from a digital image or a video frame against a database of faces. Such a system is typically employed to authenticate users through ID verification services, and wo ...

Fingerprint recognition A fingerprint is an impression left by the friction ridges of a human finger. The recovery of partial fingerprints from a crime scene is an important method of forensic science. Moisture and grease on a finger result in fingerprints on surfac ...

Iris recognition Iris recognition is an automated method of biometric identification that uses mathematical pattern-recognition techniques on video images of one or both of the irises of an individual's eyes, whose complex patterns are unique, stable, and can b ...

* Traffic control systems *

Video surveillance Closed-circuit television (CCTV), also known as video surveillance, is the use of video cameras to transmit a signal to a specific place, on a limited set of monitors. It differs from broadcast television in that the signal is not openly tr ...

* Video object co-segmentation and action localization Several general-purpose

algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...

s and techniques have been developed for image segmentation. To be useful, these techniques must typically be combined with a domain's specific knowledge in order to effectively solve the domain's segmentation problems.

Classes of segmentation techniques

There are two classes of segmentation techniques. * Classical computer vision approaches * AI based techniques

Groups of image segmentation

* Semantic segmentation is an approach detecting, for every pixel, belonging class of the object. For example, when all people in a figure are segmented as one object and background as one object. * Instance segmentation is an approach that identifies, for every pixel, a belonging instance of the object. It detects each distinct object of interest in the image. For example, when each person in a figure is segmented as an individual object. * Panoptic segmentation combines semantic and instance segmentation. Like semantic segmentation, panoptic segmentation is an approach that identifies, for every pixel, the belonging class. Unlike semantic segmentation, panoptic segmentation distinguishes different instances of the same class.

Thresholding

The simplest method of image segmentation is called the thresholding method. This method is based on a clip-level (or a threshold value) to turn a gray-scale image into a binary image. The key of this method is to select the threshold value (or values when multiple-levels are selected). Several popular methods are used in industry including the maximum entropy method, balanced histogram thresholding,

Otsu's method In computer vision and image processing, Otsu's method, named after , is used to perform automatic image thresholding. In the simplest form, the algorithm returns a single intensity threshold that separate pixels into two classes, foreground a ...

(maximum variance), and

k-means clustering ''k''-means clustering is a method of vector quantization, originally from signal processing, that aims to partition ''n'' observations into ''k'' clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or ...

. Recently, methods have been developed for thresholding computed tomography (CT) images. The key idea is that, unlike Otsu's method, the thresholds are derived from the radiographs instead of the (reconstructed) image. New methods suggested the usage of multi-dimensional fuzzy rule-based non-linear thresholds. In these works decision over each pixel's membership to a segment is based on multi-dimensional rules derived from fuzzy logic and evolutionary algorithms based on image lighting environment and application.

Clustering methods

The

K-means algorithm ''k''-means clustering is a method of vector quantization, originally from signal processing, that aims to partition ''n'' observations into ''k'' clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or ...

is an

iterative Iteration is the repetition of a process in order to generate a (possibly unbounded) sequence of outcomes. Each repetition of the process is a single iteration, and the outcome of each iteration is then the starting point of the next iteration. ...

technique that is used to partition an image into ''K'' clusters. The basic

is # Pick ''K'' cluster centers, either

random In common usage, randomness is the apparent or actual lack of pattern or predictability in events. A random sequence of events, symbols or steps often has no :wikt:order, order and does not follow an intelligible pattern or combination. Ind ...

ly or based on some

heuristic A heuristic (; ), or heuristic technique, is any approach to problem solving or self-discovery that employs a practical method that is not guaranteed to be optimal, perfect, or rational, but is nevertheless sufficient for reaching an immediate, ...

method, for example K-means++ # Assign each pixel in the image to the cluster that minimizes the

distance Distance is a numerical or occasionally qualitative measurement of how far apart objects or points are. In physics or everyday usage, distance may refer to a physical length or an estimation based on other criteria (e.g. "two counties over"). ...

between the pixel and the cluster center # Re-compute the cluster centers by averaging all of the pixels in the cluster # Repeat steps 2 and 3 until convergence is attained (i.e. no pixels change clusters) In this case,

is the squared or absolute difference between a pixel and a cluster center. The difference is typically based on pixel

, and location, or a weighted combination of these factors. ''K'' can be selected manually,

ly, or by a

. This algorithm is guaranteed to converge, but it may not return the

optimal Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...

solution. The quality of the solution depends on the initial set of clusters and the value of ''K''. The

Mean Shift Mean shift is a non-parametric feature space, feature-space mathematical analysis technique for locating the maxima of a density function, a so-called mode (statistics), mode-seeking algorithm. Application domains include cluster analysis in compu ...

algorithm is a technique that is used to partition an image into an unknown apriori no. of clusters. This has the advantage of not having to start with an initial guess of such parameter which makes it a better general solution for more diverse cases.

Motion and interactive segmentation

Motion based segmentation is a technique that relies on motion in the image to perform segmentation. The idea is simple: look at the differences between a pair of images. Assuming the object of interest is moving, the difference will be exactly that object. Improving on this idea, Kenney et al. proposed interactive segmentatio

They use a robot to poke objects in order to generate the motion signal necessary for motion-based segmentation. Interactive segmentation follows the interactive perception framework proposed by Dov Kat

and Oliver Broc

Another technique that is based on motion is

rigid motion segmentation In computer vision, rigid motion segmentation is the process of separating regions, features, or trajectories from a video sequence into coherent subsets of space and time. These subsets correspond to independent rigidly moving objects in the scene ...

Compression-based methods

Compression based methods postulate that the optimal segmentation is the one that minimizes, over all possible segmentations, the coding length of the data. The connection between these two concepts is that segmentation tries to find patterns in an image and any regularity in the image can be used to compress it. The method describes each segment by its texture and boundary shape. Each of these components is modeled by a probability distribution function and its coding length is computed as follows: # The boundary encoding leverages the fact that regions in natural images tend to have a smooth contour. This prior is used by

Huffman coding In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code proceeds by means of Huffman coding, an algori ...

to encode the difference

chain code A chain code is a lossless compression based image segmentation method for binary images based upon tracing image contours. The basic principle of chain coding, like other contour codings, is to separately encode each connected component, or "blob" ...

of the contours in an image. Thus, the smoother a boundary is, the shorter coding length it attains. # Texture is encoded by

lossy compression In information technology, lossy compression or irreversible compression is the class of data compression methods that uses inexact approximations and partial data discarding to represent the content. These techniques are used to reduce data size ...

in a way similar to

minimum description length Minimum Description Length (MDL) is a model selection principle where the shortest description of the data is the best model. MDL methods learn through a data compression perspective and are sometimes described as mathematical applications of Occam ...

(MDL) principle, but here the length of the data given the model is approximated by the number of samples times the

entropy Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynam ...

of the model. The texture in each region is modeled by a

multivariate normal distribution In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One d ...

whose entropy has a closed form expression. An interesting property of this model is that the estimated entropy bounds the true entropy of the data from above. This is because among all distributions with a given mean and covariance, normal distribution has the largest entropy. Thus, the true coding length cannot be more than what the algorithm tries to minimize. For any given segmentation of an image, this scheme yields the number of bits required to encode that image based on the given segmentation. Thus, among all possible segmentations of an image, the goal is to find the segmentation which produces the shortest coding length. This can be achieved by a simple agglomerative clustering method. The distortion in the lossy compression determines the coarseness of the segmentation and its optimal value may differ for each image. This parameter can be estimated heuristically from the contrast of textures in an image. For example, when the textures in an image are similar, such as in camouflage images, stronger sensitivity and thus lower quantization is required.

Histogram-based methods

Histogram A histogram is an approximate representation of the distribution of numerical data. The term was first introduced by Karl Pearson. To construct a histogram, the first step is to " bin" (or "bucket") the range of values—that is, divide the ent ...

-based methods are very efficient compared to other image segmentation methods because they typically require only one pass through the

s. In this technique, a histogram is computed from all of the pixels in the image, and the peaks and valleys in the histogram are used to locate the clusters in the image.

Color Color (American English) or colour (British English) is the visual perceptual property deriving from the spectrum of light interacting with the photoreceptor cells of the eyes. Color categories and physical specifications of color are associ ...

can be used as the measure. A refinement of this technique is to

recursively Recursion (adjective: ''recursive'') occurs when a thing is defined in terms of itself or of its type. Recursion is used in a variety of disciplines ranging from linguistics to logic. The most common application of recursion is in mathematics ...

apply the histogram-seeking method to clusters in the image in order to divide them into smaller clusters. This operation is repeated with smaller and smaller clusters until no more clusters are formed. One disadvantage of the histogram-seeking method is that it may be difficult to identify significant peaks and valleys in the image. Histogram-based approaches can also be quickly adapted to apply to multiple frames, while maintaining their single pass efficiency. The histogram can be done in multiple fashions when multiple frames are considered. The same approach that is taken with one frame can be applied to multiple, and after the results are merged, peaks and valleys that were previously difficult to identify are more likely to be distinguishable. The histogram can also be applied on a per-pixel basis where the resulting information is used to determine the most frequent color for the pixel location. This approach segments based on active objects and a static environment, resulting in a different type of segmentation useful in

video tracking Video tracking is the process of locating a moving object (or multiple objects) over time using a camera. It has a variety of uses, some of which are: human-computer interaction, security and surveillance, video communication and compression, aug ...

Edge detection

Edge detection Edge detection includes a variety of mathematical methods that aim at identifying edges, curves in a digital image at which the image brightness changes sharply or, more formally, has discontinuities. The same problem of finding discontinuitie ...

is a well-developed field on its own within image processing. Region boundaries and edges are closely related, since there is often a sharp adjustment in intensity at the region boundaries. Edge detection techniques have therefore been used as the base of another segmentation technique. The edges identified by edge detection are often disconnected. To segment an object from an image however, one needs closed region boundaries. The desired edges are the boundaries between such objects or spatial-taxons. Spatial-taxons are information granules, consisting of a crisp pixel region, stationed at abstraction levels within a hierarchical nested scene architecture. They are similar to the

Gestalt Gestalt may refer to: Psychology * Gestalt psychology, a school of psychology * Gestalt therapy, a form of psychotherapy * Bender Visual-Motor Gestalt Test, an assessment of development disorders * Gestalt Practice, a practice of self-exploration ...

psychological designation of figure-ground, but are extended to include foreground, object groups, objects and salient object parts. Edge detection methods can be applied to the spatial-taxon region, in the same manner they would be applied to a silhouette. This method is particularly useful when the disconnected edge is part of an illusory contour Segmentation methods can also be applied to edges obtained from edge detectors. Lindeberg and Li developed an integrated method that segments edges into straight and curved edge segments for parts-based object recognition, based on a minimum description length (M_DL) criterion that was optimized by a split-and-merge-like method with candidate breakpoints obtained from complementary junction cues to obtain more likely points at which to consider partitions into different segments.

Dual clustering method

This method is a combination of three characteristics of the image: partition of the image based on histogram analysis is checked by high compactness of the clusters (objects), and high gradients of their borders. For that purpose two spaces have to be introduced: one space is the one-dimensional histogram of brightness ''H'' = ''H''(''B''); the second space is the dual 3-dimensional space of the original image itself ''B'' = ''B''(''x'', ''y''). The first space allows to measure how compactly the brightness of the image is distributed by calculating a minimal clustering kmin. Threshold brightness T corresponding to kmin defines the binary (black-and-white) image – bitmap ''b'' = ''φ''(''x'', ''y''), where ''φ''(''x'', ''y'') = 0, if ''B''(''x'', ''y'') < ''T'', and ''φ''(''x'', ''y'') = 1, if ''B''(''x'', ''y'') ≥ ''T''. The bitmap ''b'' is an object in dual space. On that bitmap a measure has to be defined reflecting how compact distributed black (or white) pixels are. So, the goal is to find objects with good borders. For all ''T'' the measure ''M''_DC = ''G''/(''k'' × ''L'') has to be calculated (where ''k'' is difference in brightness between the object and the background, ''L'' is length of all borders, and ''G'' is mean gradient on the borders). Maximum of MDC defines the segmentation.

Region-growing methods

Region-growing Region growing is a simple region-based image segmentation method. It is also classified as a pixel-based image segmentation method since it involves the selection of initial seed points. This approach to segmentation examines neighboring pixels ...

methods rely mainly on the assumption that the neighboring pixels within one region have similar values. The common procedure is to compare one pixel with its neighbors. If a similarity criterion is satisfied, the pixel can be set to belong to the same cluster as one or more of its neighbors. The selection of the similarity criterion is significant and the results are influenced by noise in all instances. The method of Statistical Region MergingR. Nock and F. Nielsen
Statistical Region Merging
IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol 26, No 11, pp 1452–1458, 2004. (SRM) starts by building the graph of pixels using 4-connectedness with edges weighted by the absolute value of the intensity difference. Initially each pixel forms a single pixel region. SRM then sorts those edges in a priority queue and decides whether or not to merge the current regions belonging to the edge pixels using a statistical predicate. One

region-growing Region growing is a simple region-based image segmentation method. It is also classified as a pixel-based image segmentation method since it involves the selection of initial seed points. This approach to segmentation examines neighboring pixels ...

method is the seeded region growing method. This method takes a set of seeds as input along with the image. The seeds mark each of the objects to be segmented. The regions are iteratively grown by comparison of all unallocated neighboring pixels to the regions. The difference between a pixel's intensity value and the region's mean,

\delta

, is used as a

measure of similarity In statistics and related fields, a similarity measure or similarity function or similarity metric is a real-valued function that quantifies the similarity between two objects. Although no single definition of a similarity exists, usually such meas ...

. The pixel with the smallest difference measured in this way is assigned to the respective region. This process continues until all pixels are assigned to a region. Because seeded region growing requires seeds as additional input, the segmentation results are dependent on the choice of seeds, and noise in the image can cause the seeds to be poorly placed. Another

method is the unseeded region growing method. It is a modified algorithm that does not require explicit seeds. It starts with a single region

A_1

—the pixel chosen here does not markedly influence the final segmentation. At each iteration it considers the neighboring pixels in the same way as seeded region growing. It differs from seeded region growing in that if the minimum

\delta

is less than a predefined threshold

T

then it is added to the respective region

A_j

. If not, then the pixel is considered different from all current regions

A_i

and a new region

A_

is created with this pixel. One variant of this technique, proposed by

Haralick Robert M. Haralick (born 1943) is Distinguished Professor in Computer Science at Graduate Center of the City University of New York (CUNY). Haralick is one of the leading figures in computer vision, pattern recognition, and image analysis. He is a ...

and Shapiro (1985), is based on pixel intensities. The

mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value (magnitude and sign) of a given data set. For a data set, the ''arithme ...

and scatter of the region and the intensity of the candidate pixel are used to compute a test statistic. If the test statistic is sufficiently small, the pixel is added to the region, and the region’s mean and scatter are recomputed. Otherwise, the pixel is rejected, and is used to form a new region. A special region-growing method is called

\lambda

-connected segmentation (see also lambda-connectedness). It is based on pixel intensities and neighborhood-linking paths. A degree of connectivity (connectedness) is calculated based on a path that is formed by pixels. For a certain value of

\lambda

, two pixels are called

\lambda

-connected if there is a path linking those two pixels and the connectedness of this path is at least

\lambda

\lambda

-connectedness is an equivalence relation.L. Chen, H. D. Cheng, and J. Zhang
Fuzzy subfiber and its application to seismic lithology classification
Information Sciences: Applications, Vol 1, No 2, pp 77–95, 1994. Split-and-merge segmentation is based on a

quadtree A quadtree is a tree data structure in which each internal node has exactly four children. Quadtrees are the two-dimensional analog of octrees and are most often used to partition a two-dimensional space by recursively subdividing it into four q ...

partition of an image. It is sometimes called quadtree segmentation. This method starts at the root of the tree that represents the whole image. If it is found non-uniform (not homogeneous), then it is split into four child squares (the splitting process), and so on. If, in contrast, four child squares are homogeneous, they are merged as several connected components (the merging process). The node in the tree is a segmented node. This process continues recursively until no further splits or merges are possible.S.L. Horowitz and T. Pavlidis, Picture Segmentation by a Directed Split and Merge Procedure, Proc. ICPR, 1974, Denmark, pp. 424–433.S.L. Horowitz and T. Pavlidis, Picture Segmentation by a Tree Traversal Algorithm, Journal of the ACM, 23 (1976), pp. 368–388. When a special data structure is involved in the implementation of the algorithm of the method, its time complexity can reach

O(n\log n)

, an optimal algorithm of the method.L. Chen
The lambda-connected segmentation and the optimal algorithm for split-and-merge segmentation
Chinese J. Computers, 14(1991), pp 321–331

Partial differential equation-based methods

Using a

partial differential equation In mathematics, a partial differential equation (PDE) is an equation which imposes relations between the various partial derivatives of a Multivariable calculus, multivariable function. The function is often thought of as an "unknown" to be sol ...

(PDE)-based method and solving the PDE equation by a numerical scheme, one can segment the image. Curve propagation is a popular technique in this category, with numerous applications to object extraction, object tracking, stereo reconstruction, etc. The central idea is to evolve an initial curve towards the lowest potential of a cost function, where its definition reflects the task to be addressed. As for most

inverse problems An inverse problem in science is the process of calculating from a set of observations the causal factors that produced them: for example, calculating an image in X-ray computed tomography, source reconstruction in acoustics, or calculating the ...

, the minimization of the cost functional is non-trivial and imposes certain smoothness constraints on the solution, which in the present case can be expressed as geometrical constraints on the evolving curve.

Parametric methods

Lagrangian Lagrangian may refer to: Mathematics * Lagrangian function, used to solve constrained minimization problems in optimization theory; see Lagrange multiplier ** Lagrangian relaxation, the method of approximating a difficult constrained problem with ...

techniques are based on parameterizing the contour according to some sampling strategy and then evolving each element according to image and internal terms. Such techniques are fast and efficient, however the original "purely parametric" formulation (due to Kass,

Witkin Witkin is a surname. Notable people with the surname include: * Andrew Witkin (1952–2010), American computer scientist who made major contributions in computer vision and computer graphics * Beatrice Witkin (1916-1990), American composer and pian ...

and Terzopoulos in 1987 and known as "

snakes Snakes are elongated, limbless, carnivorous reptiles of the suborder Serpentes . Like all other squamates, snakes are ectothermic, amniote vertebrates covered in overlapping scales. Many species of snakes have skulls with several more joi ...

"), is generally criticized for its limitations regarding the choice of sampling strategy, the internal geometric properties of the curve, topology changes (curve splitting and merging), addressing problems in higher dimensions, etc.. Nowadays, efficient "discretized" formulations have been developed to address these limitations while maintaining high efficiency. In both cases, energy minimization is generally conducted using a steepest-gradient descent, whereby derivatives are computed using, e.g., finite differences.

Level-set methods

The

level-set method Level-set methods (LSM) are a conceptual framework for using level sets as a tool for numerical analysis of surfaces and shapes. The advantage of the level-set model is that one can perform numerical computations involving curves and surfaces on a ...

was initially proposed to track moving interfaces by Dervieux and Thomasset in 1979 and 1981 and was later reinvented by Osher and Sethian in 1988. This has spread across various imaging domains in the late 1990s. It can be used to efficiently address the problem of curve/surface/etc. propagation in an implicit manner. The central idea is to represent the evolving contour using a signed function whose zero corresponds to the actual contour. Then, according to the motion equation of the contour, one can easily derive a similar flow for the implicit surface that when applied to the zero level will reflect the propagation of the contour. The level-set method affords numerous advantages: it is implicit, is parameter-free, provides a direct way to estimate the geometric properties of the evolving structure, allows for change of topology, and is intrinsic. It can be used to define an optimization framework, as proposed by Zhao, Merriman and Osher in 1996. One can conclude that it is a very convenient framework for addressing numerous applications of computer vision and medical image analysis. Research into various

level-set data structures In computer science a level set data structure is designed to represent discretely Sampling (statistics), sampled dynamic level sets functions. A common use of this form of data structure is in efficient image Rendering (computer graphics), render ...

has led to very efficient implementations of this method.

Fast marching methods

The

fast marching method The fast marching methodJ.A. Sethian. A Fast Marching Level Set Method for Monotonically Advancing Fronts, Proc. Natl. Acad. Sci., 93, 4, pp.1591--1595, 1996/ref> is a numerical method created by James Sethian for solving boundary value problems ...

has been used in image segmentation, and this model has been improved (permitting both positive and negative propagation speeds) in an approach called the generalized fast marching method.

Variational methods

The goal of variational methods is to find a segmentation which is optimal with respect to a specific energy functional. The functionals consist of a data fitting term and a regularizing terms. A classical representative is the

Potts model In statistical mechanics, the Potts model, a generalization of the Ising model, is a model of interacting spins on a crystalline lattice. By studying the Potts model, one may gain insight into the behaviour of ferromagnets and certain other phenome ...

defined for an image

f

by :

\operatorname_u \gamma \,  \nabla u \, _0 + \int (u - f)^2 \, dx.

A minimizer

u^*

is a piecewise constant image which has an optimal tradeoff between the squared L2 distance to the given image

f

and the total length of its jump set. The jump set of

u^*

defines a segmentation. The relative weight of the energies is tuned by the parameter

\gamma >0

. The binary variant of the Potts model, i.e., if the range of

u

is restricted to two values, is often called Chan- Vese model. An important generalization is the Mumford-Shah model given by :

\operatorname_  \gamma , K,  +
 \mu \int_ , \nabla u, ^2 \, dx + \int (u - f)^2 \, dx.

The functional value is the sum of the total length of the segmentation curve

K

, the smoothness of the approximation

u

, and its distance to the original image

f

. The weight of the smoothness penalty is adjusted by

\mu > 0

. The Potts model is often called piecewise constant Mumford-Shah model as it can be seen as the degenerate case

\mu \to \infty

. The optimization problems are known to be NP-hard in general but near-minimizing strategies work well in practice. Classical algorithms are graduated non-convexity and Ambrosio-Tortorelli approximation.

Graph partitioning methods

Graph Graph may refer to: Mathematics *Graph (discrete mathematics), a structure made of vertices and edges **Graph theory, the study of such graphs and their properties *Graph (topology), a topological space resembling a graph in the sense of discre ...

partitioning methods are an effective tools for image segmentation since they model the impact of pixel neighborhoods on a given cluster of pixels or pixel, under the assumption of homogeneity in images. In these methods, the image is modeled as a weighted,

undirected graph In discrete mathematics, and more specifically in graph theory, a graph is a structure amounting to a set of objects in which some pairs of the objects are in some sense "related". The objects correspond to mathematical abstractions called '' v ...

. Usually a pixel or a group of pixels are associated with

nodes In general, a node is a localized swelling (a "knot") or a point of intersection (a Vertex (graph theory), vertex). Node may refer to: In mathematics *Vertex (graph theory), a vertex in a mathematical graph *Vertex (geometry), a point where two ...

and

edge Edge or EDGE may refer to: Technology Computing * Edge computing, a network load-balancing system * Edge device, an entry point to a computer network * Adobe Edge, a graphical development application * Microsoft Edge, a web browser developed by ...

weights define the (dis)similarity between the neighborhood pixels. The graph (image) is then partitioned according to a criterion designed to model "good" clusters. Each partition of the nodes (pixels) output from these algorithms are considered an object segment in the image; see

Segmentation-based object categorization The image segmentation problem is concerned with partitioning an image into multiple regions according to some homogeneity criterion. This article is primarily concerned with graph theoretic approaches to image segmentation applying graph partitioni ...

. Some popular algorithms of this category are normalized cuts, random walker, minimum cut, isoperimetric partitioning,

minimum spanning tree-based segmentation Image segmentation strives to partition a digital image into regions of pixels with similar properties, e.g. homogeneity. The higher-level region representation simplifies image analysis tasks such as counting objects or detecting changes, because r ...

, and

segmentation-based object categorization The image segmentation problem is concerned with partitioning an image into multiple regions according to some homogeneity criterion. This article is primarily concerned with graph theoretic approaches to image segmentation applying graph partitioni ...

Markov random fields

The application of

Markov random field In the domain of physics and probability, a Markov random field (MRF), Markov network or undirected graphical model is a set of random variables having a Markov property described by an undirected graph. In other words, a random field is said to b ...

s (MRF) for images was suggested in early 1984 by Geman and Geman. Their strong mathematical foundation and ability to provide a global optimum even when defined on local features proved to be the foundation for novel research in the domain of image analysis, de-noising and segmentation. MRFs are completely characterized by their prior probability distributions, marginal probability distributions,

cliques A clique ( AusE, CanE, or ), in the social sciences, is a group of individuals who interact with one another and share similar interests. Interacting with cliques is part of normative social development regardless of gender, ethnicity, or popular ...

, smoothing constraint as well as criterion for updating values. The criterion for image segmentation using MRFs is restated as finding the labelling scheme which has maximum probability for a given set of features. The broad categories of image segmentation using MRFs are supervised and unsupervised segmentation.

Supervised image segmentation using MRF and MAP

In terms of image segmentation, the function that MRFs seek to maximize is the probability of identifying a labelling scheme given a particular set of features are detected in the image. This is a restatement of the

maximum a posteriori estimation In Bayesian statistics, a maximum a posteriori probability (MAP) estimate is an estimate of an unknown quantity, that equals the mode of the posterior distribution. The MAP can be used to obtain a point estimate of an unobserved quantity on the b ...

method.

The generic algorithm for image segmentation using MAP is given below:

Optimization algorithms

Each optimization algorithm is an adaptation of models from a variety of fields and they are set apart by their unique cost functions. The common trait of cost functions is to penalize change in pixel value as well as difference in pixel label when compared to labels of neighboring pixels.

= Iterated conditional modes/gradient descent

= The

iterated conditional modes In statistics, iterated conditional modes is a deterministic algorithm for obtaining a configuration of a local maximum of the joint probability of a Markov random field. It does this by iteratively maximizing the probability of each variable condi ...

(ICM) algorithm tries to reconstruct the ideal labeling scheme by changing the values of each pixel over each iteration and evaluating the energy of the new labeling scheme using the cost function given below, :

\alpha(1-\delta(\ell_i-\ell_)+ \beta \Sigma_(1 - \delta(\ell_i,\ell_)).

where is the penalty for change in pixel label and is the penalty for difference in label between neighboring pixels and chosen pixel. Here

N(i)

is neighborhood of pixel i and is the Kronecker delta function. A major issue with ICM is that, similar to gradient descent, it has a tendency to rest over local maxima and thus not obtain a globally optimal labeling scheme.

= Simulated annealing (SA)

= Derived as an analogue of annealing in metallurgy,

simulated annealing Simulated annealing (SA) is a probabilistic technique for approximating the global optimum of a given function. Specifically, it is a metaheuristic to approximate global optimization in a large search space for an optimization problem. It ...

(SA) uses change in pixel label over iterations and estimates the difference in energy of each newly formed graph to the initial data. If the newly formed graph is more profitable, in terms of low energy cost, given by: :

\Delta U = U^\text - U^\text

\ell_i =
\begin
\ell^\text_i, & \text \Delta U \leq 0 ,\\
\ell^\text_i, & \text \Delta U > 0 \text \delta < e^,
\ell^\text_i
\end

the algorithm selects the newly formed graph. Simulated annealing requires the input of temperature schedules which directly affects the speed of convergence of the system, as well as energy threshold for minimization to occur.

= Alternative algorithms

= A range of other methods exist for solving simple as well as higher order MRFs. They include Maximization of Posterior Marginal, Multi-scale MAP estimation, Multiple Resolution segmentation and more. Apart from likelihood estimates, graph-cut using maximum flow and other highly constrained graph based methods exist for solving MRFs.

Image segmentation using MRF and expectation–maximization

The

expectation–maximization algorithm In statistics, an expectation–maximization (EM) algorithm is an iterative method to find (local) maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variabl ...

is utilized to iteratively estimate the a posterior probabilities and distributions of labeling when no training data is available and no estimate of segmentation model can be formed. A general approach is to use histograms to represent the features of an image and proceed as outlined briefly in this three-step algorithm: 1. A random estimate of the model parameters is utilized. 2. E step: Estimate class statistics based on the random segmentation model defined. Using these, compute the

conditional probability In probability theory, conditional probability is a measure of the probability of an event occurring, given that another event (by assumption, presumption, assertion or evidence) has already occurred. This particular method relies on event B occur ...

of belonging to a label given the feature set is calculated using naive

Bayes' theorem In probability theory and statistics, Bayes' theorem (alternatively Bayes' law or Bayes' rule), named after Thomas Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For examp ...

. :

P(\lambda \mid f_i) = \frac

Here

\lambda \in \Lambda

, the set of all possible labels. 3. M step: The established relevance of a given feature set to a labeling scheme is now used to compute the a priori estimate of a given label in the second part of the algorithm. Since the actual number of total labels is unknown (from a training data set), a hidden estimate of the number of labels given by the user is utilized in computations. :

P(\lambda) = \frac

where

\Omega

is the set of all possible features. Sample segmentation HMRF-EM

Disadvantages of MAP and EM based image segmentation

# Exact MAP estimates cannot be easily computed. # Approximate MAP estimates are computationally expensive to calculate. # Extension to multi-class labeling degrades performance and increases storage required. # Reliable estimation of parameters for EM is required for global optima to be achieved. # Based on method of optimization, segmentation may cluster to local minima.

Watershed transformation

The watershed transformation considers the gradient magnitude of an image as a topographic surface. Pixels having the highest gradient magnitude intensities (GMIs) correspond to watershed lines, which represent the region boundaries. Water placed on any pixel enclosed by a common watershed line flows downhill to a common local intensity minimum (LIM). Pixels draining to a common minimum form a catch basin, which represents a segment.

Model-based segmentation

The central assumption of model-based approaches is that the structures of interest have a tendency towards a particular shape. Therefore, one can seek a probabilistic model that characterizes the shape and its variation. When segmenting an image, constraints can be imposed using this model as a prior. Such a task may involve (i) registration of the training examples to a common pose, (ii) probabilistic representation of the variation of the registered samples, and (iii) statistical inference between the model and the image. Other important methods in the literature for model-based segmentation include

active shape model Active shape models (ASMs) are statistical models of the shape of objects which iteratively deform to fit to an example of the object in a new image, developed by Tim Cootes and Chris Taylor in 1995. The shapes are constrained by the PDM (point dist ...

s and

active appearance model An active appearance model (AAM) is a computer vision algorithm for matching a statistical model of object shape and appearance to a new image. They are built during a training phase. A set of images, together with coordinates of landmarks that ap ...

Multi-scale segmentation

Image segmentations are computed at multiple scales in

scale space Scale-space theory is a framework for multi-scale signal representation developed by the computer vision, image processing and signal processing communities with complementary motivations from physics and biological vision. It is a formal theor ...

and sometimes propagated from coarse to fine scales; see

scale-space segmentation Scale-space segmentation or multi-scale segmentation is a general framework for signal and image segmentation, based on the computation of image descriptors at multiple scales of smoothing. One-dimensional hierarchical signal segmentation Wit ...

. Segmentation criteria can be arbitrarily complex and may take into account global as well as local criteria. A common requirement is that each region must be connected in some sense.

One-dimensional hierarchical signal segmentation

Witkin's seminal work in scale space included the notion that a one-dimensional signal could be unambiguously segmented into regions, with one scale parameter controlling the scale of segmentation. A key observation is that the zero-crossings of the second derivatives (minima and maxima of the first derivative or slope) of multi-scale-smoothed versions of a signal form a nesting tree, which defines hierarchical relations between segments at different scales. Specifically, slope extrema at coarse scales can be traced back to corresponding features at fine scales. When a slope maximum and slope minimum annihilate each other at a larger scale, the three segments that they separated merge into one segment, thus defining the hierarchy of segments.

Image segmentation and primal sketch

There have been numerous research works in this area, out of which a few have now reached a state where they can be applied either with interactive manual intervention (usually with application to medical imaging) or fully automatically. The following is a brief overview of some of the main research ideas that current approaches are based upon. The nesting structure that Witkin described is, however, specific for one-dimensional signals and does not trivially transfer to higher-dimensional images. Nevertheless, this general idea has inspired several other authors to investigate coarse-to-fine schemes for image segmentation. Koenderink proposed to study how iso-intensity contours evolve over scales and this approach was investigated in more detail by Lifshitz and Pizer. Unfortunately, however, the intensity of image features changes over scales, which implies that it is hard to trace coarse-scale image features to finer scales using iso-intensity information. LindebergLindeberg, Tony, Scale-Space Theory in Computer Vision, Kluwer Academic Publishers, 1994
studied the problem of linking local extrema and saddle points over scales, and proposed an image representation called the scale-space primal sketch which makes explicit the relations between structures at different scales, and also makes explicit which image features are stable over large ranges of scale including locally appropriate scales for those. Bergholm proposed to detect edges at coarse scales in scale-space and then trace them back to finer scales with manual choice of both the coarse detection scale and the fine localization scale. Gauch and Pizer studied the complementary problem of ridges and valleys at multiple scales and developed a tool for interactive image segmentation based on multi-scale watersheds. The use of multi-scale watershed with application to the gradient map has also been investigated by Olsen and Nielsen and been carried over to clinical use by Dam. Vincken et al. proposed a hyperstack for defining probabilistic relations between image structures at different scales. The use of stable image structures over scales has been furthered by Ahuja and his co-workers into a fully automated system. A fully automatic brain segmentation algorithm based on closely related ideas of multi-scale watersheds has been presented by Undeman and Lindeberg and been extensively tested in brain databases. These ideas for multi-scale image segmentation by linking image structures over scales have also been picked up by Florack and Kuijper. Bijaoui and Rué associate structures detected in scale-space above a minimum noise threshold into an object tree which spans multiple scales and corresponds to a kind of feature in the original signal. Extracted features are accurately reconstructed using an iterative conjugate gradient matrix method.

Semi-automatic segmentation

In one kind of segmentation, the user outlines the region of interest with the mouse clicks and algorithms are applied so that the path that best fits the edge of the image is shown. Techniques like SIOX, Livewire, Intelligent Scissors or IT-SNAPS are used in this kind of segmentation. In an alternative kind of semi-automatic segmentation, the algorithms return a spatial-taxon (i.e. foreground, object-group, object or object-part) selected by the user or designated via prior probabilities.

Trainable segmentation

Most of the aforementioned segmentation methods are based only on color information of pixels in the image. Humans use much more knowledge when performing image segmentation, but implementing this knowledge would cost considerable human engineering and computational time, and would require a huge

domain knowledge Domain knowledge is knowledge of a specific, specialized discipline or field, in contrast to general (or domain-independent) knowledge. The term is often used in reference to a more general discipline—for example, in describing a software engin ...

database which does not currently exist. Trainable segmentation methods, such as

neural network A neural network is a network or circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up of biological ...

segmentation, overcome these issues by modeling the domain knowledge from a dataset of labeled pixels. An image segmentation

can process small areas of an image to extract simple features such as edges.

Mahinda Pathegama Mahinda may refer to: People * Mahinda Rajapaksa, former Sri Lankan President *Mahinda Amaraweera, Sri Lankan politician *Mahinda Samarasinghe, Sri Lankan politician *Mahinda Ratnatilaka, Sri Lankan politician *Mahinda Wijesekara, Sri Lankan politi ...

& Ö Göl (2004): "Edge-end pixel extraction for edge-based image segmentation", ''Transactions on Engineering, Computing and Technology,'' vol. 2, pp 213–216, ISSN 1305-5313 Another neural network, or any decision-making mechanism, can then combine these features to label the areas of an image accordingly. A type of network designed this way is the

Kohonen map A self-organizing map (SOM) or self-organizing feature map (SOFM) is an unsupervised machine learning technique used to produce a low-dimensional (typically two-dimensional) representation of a higher dimensional data set while preserving the to ...

. Pulse-coupled neural networks (PCNNs) are neural models proposed by modeling a cat’s visual cortex and developed for high-performance

biomimetic Biomimetics or biomimicry is the emulation of the models, systems, and elements of nature for the purpose of solving complex human problems. The terms "biomimetics" and "biomimicry" are derived from grc, βίος (''bios''), life, and μίμησ ...

image processing An image is a visual representation of something. It can be two-dimensional, three-dimensional, or somehow otherwise feed into the visual system to convey information. An image can be an artifact, such as a photograph or other two-dimensiona ...

. In 1989, Reinhard Eckhorn introduced a neural model to emulate the mechanism of a cat’s visual cortex. The Eckhorn model provided a simple and effective tool for studying the visual cortex of small mammals, and was soon recognized as having significant application potential in image processing. In 1994, the Eckhorn model was adapted to be an image processing algorithm by John L. Johnson, who termed this algorithm Pulse-Coupled Neural Network. Over the past decade, PCNNs have been utilized for a variety of image processing applications, including: image segmentation, feature generation, face extraction, motion detection, region growing, noise reduction, and so on. A PCNN is a two-dimensional neural network. Each neuron in the network corresponds to one pixel in an input image, receiving its corresponding pixel’s color information (e.g. intensity) as an external stimulus. Each neuron also connects with its neighboring neurons, receiving local stimuli from them. The external and local stimuli are combined in an internal activation system, which accumulates the stimuli until it exceeds a dynamic threshold, resulting in a pulse output. Through iterative computation, PCNN neurons produce temporal series of pulse outputs. The temporal series of pulse outputs contain information of input images and can be utilized for various image processing applications, such as image segmentation and feature generation. Compared with conventional image processing means, PCNNs have several significant merits, including robustness against noise, independence of geometric variations in input patterns, capability of bridging minor intensity variations in input patterns, etc.

U-Net U-Net is a convolutional neural network that was developed for biomedical image segmentation at the Computer Science Department of the University of Freiburg. The network is based on the fully convolutional network and its architecture was modifie ...

is a

convolutional neural network In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of artificial neural network (ANN), most commonly applied to analyze visual imagery. CNNs are also known as Shift Invariant or Space Invariant Artificial Neural Netwo ...

which takes as input an image and outputs a label for each pixel. U-Net initially was developed to detect cell boundaries in biomedical images. U-Net follows classical

autoencoder An autoencoder is a type of artificial neural network used to learn efficient codings of unlabeled data (unsupervised learning). The encoding is validated and refined by attempting to regenerate the input from the encoding. The autoencoder lear ...

architecture, as such it contains two sub-structures. The encoder structure follows the traditional stack of convolutional and max pooling layers to increase the receptive field as it goes through the layers. It is used to capture the context in the image. The decoder structure utilizes transposed convolution layers for upsampling so that the end dimensions are close to that of the input image. Skip connections are placed between convolution and transposed convolution layers of the same shape in order to preserve details that would have been lost otherwise. In addition to pixel-level semantic segmentation tasks which assign a given category to each pixel, modern segmentation applications include instance-level semantic segmentation tasks in which each individual in a given category must be uniquely identified, as well as panoptic segmentation tasks which combines these two tasks to provide a more complete scene segmentation.

Segmentation of related images and videos

Related images such as a photo album or a sequence of video frames often contain semantically similar objects and scenes, therefore it is often beneficial to exploit such correlations. The task of simultaneously segmenting scenes from related images or video frames is termed co-segmentation, which is typically used in human action localization. Unlike conventional

bounding box In geometry, the minimum or smallest bounding or enclosing box for a point set in dimensions is the box with the smallest measure (area, volume, or hypervolume in higher dimensions) within which all the points lie. When other kinds of measure a ...

-based

object detection Object detection is a computer technology related to computer vision and image processing that deals with detecting instances of semantic objects of a certain class (such as humans, buildings, or cars) in digital images and videos. Well-researched ...

, human action localization methods provide finer-grained results, typically per-image segmentation masks delineating the human object of interest and its action category (e.g., ''Segment-Tube''). Techniques such as dynamic

Markov Networks In the domain of physics and probability, a Markov random field (MRF), Markov network or undirected graphical model is a set of random variables having a Markov property described by an undirected graph. In other words, a random field is said to b ...

CNN CNN (Cable News Network) is a multinational cable news channel headquartered in Atlanta, Georgia, U.S. Founded in 1980 by American media proprietor Ted Turner and Reese Schonfeld as a 24-hour cable news channel, and presently owned by the M ...

and

LSTM Long short-term memory (LSTM) is an artificial neural network used in the fields of artificial intelligence and deep learning. Unlike standard feedforward neural networks, LSTM has feedback connections. Such a recurrent neural network (RNN) ca ...

are often employed to exploit the inter-frame correlations.

Other methods

There are many other methods of segmentation like

multispectral segmentation Multispectral segmentation is a method for differentiating tissue classes of similar characteristics in a single imaging modality using several independent images of the same anatomical slice in different modalities (e.g., T2, proton density, T ...

or connectivity-based segmentation based on DTI images.Menke, RA, Jbabdi, S, Miller, KL, Matthews, PM and Zarei, M.: , Neuroimage, 52:4, pp. 1175–80, 2010.]

Notes

References

3D Entropy Based Image Segmentation
*

External links

by Syed Zainudeen. University Technology of Malaysia.
Generalized Fast Marching method
by Forcadel et al.

008 008, OO8, O08, or 0O8 may refer to: * The Streetwear Brand @008us , inspired by Ian Fleming & Virgil Abloh *"030", the fictional 030 Agent of MI6 * '' 038: Operation Exterminate'', a 1965 Italian action film * '' Explosivo 030'' a 1940 Argentine c ...

for applications in image segmentation.
Image Processing Research Group
An Online Open Image Processing Research Community.

an
Minimizing energy to segment images
by Mathworks
More image segmentation methods with detailed algorithms
by Yu-Hsiang Wang (王昱翔), National Taiwan University, Taipei, Taiwan, ROC

by IPOL Journal {{DEFAULTSORT:Segmentation (Image Processing) Digital imaging