Connected-component labeling (CCL), connected-component analysis (CCA), blob extraction, region labeling, blob discovery, or region extraction is an algorithmic application of

graph theory In mathematics and computer science, graph theory is the study of ''graph (discrete mathematics), graphs'', which are mathematical structures used to model pairwise relations between objects. A graph in this context is made up of ''Vertex (graph ...

, where subsets of connected components are uniquely labeled based on a given

heuristic A heuristic or heuristic technique (''problem solving'', '' mental shortcut'', ''rule of thumb'') is any approach to problem solving that employs a pragmatic method that is not fully optimized, perfected, or rationalized, but is nevertheless ...

. Connected-component labeling is not to be confused with segmentation. Connected-component labeling is used in

computer vision Computer vision tasks include methods for image sensor, acquiring, Image processing, processing, Image analysis, analyzing, and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical ...

to detect connected

region In geography, regions, otherwise referred to as areas, zones, lands or territories, are portions of the Earth's surface that are broadly divided by physical characteristics (physical geography), human impact characteristics (human geography), and ...

s in

binary Binary may refer to: Science and technology Mathematics * Binary number, a representation of numbers using only two values (0 and 1) for each digit * Binary function, a function that takes two arguments * Binary operation, a mathematical op ...

digital image A digital image is an image composed of picture elements, also known as pixels, each with '' finite'', '' discrete quantities'' of numeric representation for its intensity or gray level that is an output from its two-dimensional functions f ...

s, although

color image A digital image is an image composed of picture elements, also known as pixels, each with '' finite'', '' discrete quantities'' of numeric representation for its intensity or gray level that is an output from its two-dimensional functions fe ...

s and data with higher dimensionality can also be processed. When integrated into an

image recognition Computer vision tasks include methods for acquiring, processing, analyzing, and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information, e.g. in the form o ...

system or human-computer interaction interface, connected component labeling can operate on a variety of information. Blob extraction is generally performed on the resulting

binary image A binary image is a digital image that consists of pixels that can have one of exactly two colors, usually black and white. Each pixel is stored as a single bit — i.e. either a 0 or 1. A binary image can be stored in memory as a bitmap: a p ...

from a thresholding step, but it can be applicable to gray-scale and color images as well. Blobs may be counted, filtered, and tracked. Blob extraction is related to but distinct from

blob detection In computer vision and image processing, blob detection methods are aimed at detecting regions in a digital image that differ in properties, such as brightness or color, compared to surrounding regions. Informally, a ''blob'' is a region of an ...

Overview

A graph, containing vertices and connecting edges, is constructed from relevant input data. The vertices contain information required by the comparison heuristic, while the edges indicate connected 'neighbors'. An algorithm traverses the graph, labeling the vertices based on the connectivity and relative values of their neighbors. Connectivity is determined by the medium; image graphs, for example, can be

4-connected neighborhood In image processing, pixel connectivity is the way in which pixels in 2-dimensional (or hypervoxels in n-dimensional) images relate to their neighbors. Formulation In order to specify a set of connectivities, the dimension and the width of t ...

8-connected neighborhood In image processing, pixel connectivity is the way in which pixels in 2-dimensional (or hypervoxels in n-dimensional) images relate to their neighbors. Formulation In order to specify a set of connectivities, the dimension and the width of t ...

. Following the labeling stage, the graph may be partitioned into subsets, after which the original information can be recovered and processed .

Definition

The usage of the term connected-components labeling (CCL) and its definition is quite consistent in the academic literature, whereas connected-components analysis (CCA) varies in terms of both terminology and problem definition. Rosenfeld et al. define connected components labeling as the “ eation of a labeled image in which the positions associated with the same connected component of the binary input image have a unique label.” Shapiro et al. define CCL as an operator whose “input is a binary image and ..output is a symbolic image in which the label assigned to each pixel is an integer uniquely identifying the connected component to which that pixel belongs.” There is no consensus on the definition of CCA in the academic literature. It is often used interchangeably with CCL. A more extensive definition is given by Shapiro et al.: “Connected component analysis consists of connected component labeling of the black pixels followed by property measurement of the component regions and decision making.” The definition for connected-component analysis presented here is more general, taking the thoughts expressed in into account.

Algorithms

The algorithms discussed can be generalized to arbitrary dimensions, albeit with increased time and space complexity.

One component at a time

This is a fast and very simple method to implement and understand. It is based on

graph traversal In computer science, graph traversal (also known as graph search) refers to the process of visiting (checking and/or updating) each vertex in a graph. Such traversals are classified by the order in which the vertices are visited. Tree traversa ...

methods in graph theory. In short, once the first pixel of a connected component is found, all the connected pixels of that connected component are labelled before going onto the next pixel in the image. This algorithm is part of Vincent and Soille's watershed segmentation algorithm, other implementations also exist. In order to do that a

linked list In computer science, a linked list is a linear collection of data elements whose order is not given by their physical placement in memory. Instead, each element points to the next. It is a data structure consisting of a collection of nodes whi ...

is formed that will keep the indexes of the pixels that are connected to each other, steps (2) and (3) below. The method of defining the linked list specifies the use of a depth or a

breadth Length is a measure of distance. In the International System of Quantities, length is a quantity with dimension distance. In most systems of measurement a base unit for length is chosen, from which all other units are derived. In the Interna ...

first search. For this particular application, there is no difference which strategy to use. The simplest kind of a last in first out queue implemented as a

singly linked list In computer science, a linked list is a linear collection of data elements whose order is not given by their physical placement in memory. Instead, each element points to the next. It is a data structure consisting of a collection of nodes whic ...

will result in a depth first search strategy. It is assumed that the input image is a

, with pixels being either background or foreground and that the connected components in the foreground pixels are desired. The algorithm steps can be written as: # Start from the first pixel in the image. Set current label to 1. Go to (2). # If this pixel is a foreground pixel and it is not already labelled, give it the current label and add it as the first element in a queue, then go to (3). If it is a background pixel or it was already labelled, then repeat (2) for the next pixel in the image. # Pop out an element from the queue, and look at its neighbours (based on any type of connectivity). If a neighbour is a foreground pixel and is not already labelled, give it the current label and add it to the queue. Repeat (3) until there are no more elements in the queue. # Go to (2) for the next pixel in the image and increment current label by 1. Note that the pixels are labelled before being put into the queue. The queue will only keep a pixel to check its neighbours and add them to the queue if necessary. This algorithm only needs to check the neighbours of each foreground pixel once and doesn't check the neighbours of background pixels. The

pseudocode In computer science, pseudocode is a description of the steps in an algorithm using a mix of conventions of programming languages (like assignment operator, conditional operator, loop) with informal, usually self-explanatory, notation of actio ...

is: algorithm OneComponentAtATime(data) ''input'' : imageData

Dim Dimness is a measure of an object's luminosity. Dim or dimness may refer to: Computing * .dim, a disk image * A keyword in most versions of the BASIC programming language Chemistry, biology, and medicine * 3,3'-Diindolylmethane, an anticarcinoge ...

yDim] ''initialization'' : label = 0, labelArray

yDim] = 0, statusArray

yDim] = false, queue1, queue2; for i = 0 to xDim do for j = 0 to yDim do if imageData j] has not been processed do if imageData j] is a foreground pixel do check its four neighbors(north, south, east, west) : if neighbor is not processed do if neighbor is a foreground pixel do add it to queue1 else update its status to processed end if labelArray j] = label (give label) statusArray j] = true (update status) while queue1 is not empty do For each pixel in the queue do : check its four neighbors if neighbor is not processed do if neighbor is a foreground pixel do add it to queue2 else update its status to processed end if give it the current label update its status to processed remove the current element from queue1 copy queue2 into queue1 end While increase the label end if else update its status to processed end if end if end if end for end for

Two-pass

Relatively simple to implement and understand, the two-pass algorithm, (also known as the

Hoshen–Kopelman algorithm The Hoshen–Kopelman algorithm is a simple and efficient algorithm for labeling clusters on a grid, where the grid is a regular network of cells, with the cells being either occupied or unoccupied. This algorithm is based on a well-known union-f ...

) iterates through 2-dimensional binary data. The algorithm makes two passes over the image: the first pass to assign temporary labels and record equivalences, and the second pass to replace each temporary label by the smallest label of its equivalence class. The input data can be modified ''

in situ is a Latin phrase meaning 'in place' or 'on site', derived from ' ('in') and ' ( ablative of ''situs'', ). The term typically refers to the examination or occurrence of a process within its original context, without relocation. The term is use ...

'' (which carries the risk of

data corruption Data corruption refers to errors in computer data that occur during writing, reading, storage, transmission, or processing, which introduce unintended changes to the original data. Computer, transmission, and storage systems use a number of meas ...

), or labeling information can be maintained in an additional data structure. Connectivity checks are carried out by checking neighbor pixels' labels (neighbor elements whose labels are not assigned yet are ignored), or say, the north-east, the north, the north-west and the west of the current pixel (assuming 8-connectivity). 4-connectivity uses only north and west neighbors of the current pixel. The following conditions are checked to determine the value of the label to be assigned to the current pixel (4-connectivity is assumed) Conditions to check: #Does the pixel to the left (west) have the same value as the current pixel? ##Yes – We are in the same region. Assign the same label to the current pixel ##No – Check next condition #Do both pixels to the north and west of the current pixel have the same value as the current pixel but not the same label? ##Yes – We know that the north and west pixels belong to the same region and must be merged. Assign the current pixel the minimum of the north and west labels, and record their equivalence relationship ##No – Check next condition #Does the pixel to the left (west) have a different value and the one to the north the same value as the current pixel? ##Yes – Assign the label of the north pixel to the current pixel ##No – Check next condition #Do the pixel's north and west neighbors have different pixel values than current pixel? ##Yes – Create a new label id and assign it to the current pixel The algorithm continues this way, and creates new region labels whenever necessary. The key to a fast algorithm, however, is how this merging is done. This algorithm uses the union-find data structure which provides excellent performance for keeping track of equivalence relationships. Union-find essentially stores labels which correspond to the same blob in a

disjoint-set data structure In computer science, a disjoint-set data structure, also called a union–find data structure or merge–find set, is a data structure that stores a collection of Disjoint sets, disjoint (non-overlapping) Set (mathematics), sets. Equivalently, it ...

, making it easy to remember the equivalence of two labels by the use of an interface method E.g.: findSet(l). findSet(l) returns the minimum label value that is equivalent to the function argument 'l'. Once the initial labeling and equivalence recording is completed, the second pass merely replaces each pixel label with its equivalent disjoint-set representative element. A faster-scanning algorithm for connected-region extraction is presented below. On the first pass: #Iterate through each element of the data by column, then by row (Raster Scanning) #If the element is not the background ## Get the neighboring elements of the current element ## If there are no neighbors, uniquely label the current element and continue ## Otherwise, find the neighbor with the smallest label and assign it to the current element ## Store the equivalence between neighboring labels On the second pass: #Iterate through each element of the data by column, then by row #If the element is not the background ## Relabel the element with the lowest equivalent label Here, the background is a classification, specific to the data, used to distinguish salient elements from the foreground. If the background variable is omitted, then the two-pass algorithm will treat the background as another region.

Graphical example of two-pass algorithm

1. The array from which connected regions are to be extracted is given below (8-connectivity based). We first assign different binary values to elements in the graph. The values "0~1" at the center of each of the elements in the following graph are the elements' values, whereas the "1,2,...,7" values in the next two graphs are the elements' labels. The two concepts should not be confused. 2. After the first pass, the following labels are generated: A total of 7 labels are generated in accordance with the conditions highlighted above. The label equivalence relationships generated are, 3. Array generated after the merging of labels is carried out. Here, the label value that was the smallest for a given region "floods" throughout the connected region and gives two distinct labels, and hence two distinct labels. 4. Final result in color to clearly see two different regions that have been found in the array. Screenshot-Figure 1

The

is: algorithm TwoPass(data) is linked = [] labels = structure with dimensions of data, initialized with the value of Background NextLabel = 0 ''First pass'' for row in data do for column in row do if data owcolumn] is not Background then neighbors = connected elements with the current element's value if neighbors is empty then linked extLabel= ''set'' containing NextLabel labels owcolumn] = NextLabel NextLabel += 1 else ''Find the smallest label'' L = neighbors labels labels owcolumn] = ''min''(L) for label in L do linked

abel Abel ( ''Hébel'', in pausa ''Hā́ḇel''; ''Hábel''; , ''Hābēl'') is a biblical figure in the Book of Genesis within the Abrahamic religions. Born as the second son of Adam and Eve, the first two humans created by God in Judaism, God, he ...

= ''union''(linked

L) ''Second pass'' for row in data do for column in row do if data owcolumn] is not Background then labels owcolumn] = ''find''(labels owcolumn]) return labels The ''find'' and ''union'' algorithms are implemented as described in union find.

Sequential algorithm

Create a region counter Scan the image (in the following example, it is assumed that scanning is done from left to right and from top to bottom): * For every pixel check the ''north'' and ''west'' pixel (when considering 4- connectivity) or the ''northeast'', ''north'', ''northwest'', and ''west'' pixel for 8-connectivity for a given region criterion (i.e. intensity value of 1 in binary image, or similar intensity to connected pixels in gray-scale image). * If none of the neighbors fit the criterion then assign to region value of the region counter. Increment region counter. * If only one neighbor fits the criterion assign pixel to that region. * If multiple neighbors match and are all members of the same region, assign pixel to their region. * If multiple neighbors match and are members of different regions, assign pixel to one of the regions (it doesn't matter which one). Indicate that all of these regions are equivalent. * Scan image again, assigning all equivalent regions the same region value.

Others

Some of the steps present in the two-pass algorithm can be merged for efficiency, allowing for a single sweep through the image. Multi-pass algorithms also exist, some of which run in

linear time In theoretical computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations ...

relative to the number of image pixels. In the early 1990s, there was considerable interest in parallelizing connected-component algorithms in

image analysis Image analysis or imagery analysis is the extraction of meaningful information from images; mainly from digital images by means of digital image processing techniques. Image analysis tasks can be as simple as reading barcode, bar coded tags or a ...

applications, due to the bottleneck of sequentially processing each pixel. The interest to the algorithm arises again with an extensive use of CUDA.

Pseudocode for the one-component-at-a-time algorithm

Algorithm: #Connected-component matrix is initialized to size of image matrix. #A mark is initialized and incremented for every detected object in the image. #A counter is initialized to count the number of objects. #A row-major scan is started for the entire image. #If an object pixel is detected, then following steps are repeated while (Index !=0) ##Set the corresponding pixel to 0 in Image. ##A vector (Index) is updated with all the neighboring pixels of the currently set pixels. ##Unique pixels are retained and repeated pixels are removed. ##Set the pixels indicated by Index to mark in the connected-component matrix. #Increment the marker for another object in the image. ''One-Component-at-a-Time''(''image'')

, N The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...

:= size(''image'') ''connected'' := zeros(M, N) ''mark'' := value ''difference'' := increment ''offsets'' := 1; M; 1; -M ''index'' := [] ''no_of_objects'' := 0 for i: 1:M do for j: 1:N do if (''image''(i, j)

1) then ''no_of_objects'' := ''no_of_objects'' + 1 ''index'' := (j-1) × M + i) ''connected''(''index'') := ''mark'' while ~isempty(''index'') do ''image''(''index'') := 0 ''neighbors'' := bsxfun(@plus, ''index'', ''offsets'') ''neighbors'' := unique(''neighbors''(:)) ''index'' := ''neighbors''(find(''image''(''neighbors''))) ''connected''(''index'') := ''mark'' end while ''mark'' := ''mark'' + ''difference'' end if end for end for The run time of the algorithm depends on the size of the image and the amount of foreground. The time complexity is comparable to the two pass algorithm if the foreground covers a significant part of the image. Otherwise the time complexity is lower. However, memory access is less structured than for the two-pass algorithm, which tends to increase the run time in practice.

Performance evaluation

In the last two decades many novel approaches to connected-component labeling have been proposed, but almost none of them have been subjected to a comparative performance assessment using the same data set. YACCLAB (acronym for Yet Another Connected Components Labeling Benchmark) is an example of

C++ C++ (, pronounced "C plus plus" and sometimes abbreviated as CPP or CXX) is a high-level, general-purpose programming language created by Danish computer scientist Bjarne Stroustrup. First released in 1985 as an extension of the C programmin ...

open source framework which collects, runs, and tests connected-component labeling algorithms.

Hardware architectures

The emergence of

FPGAs A field-programmable gate array (FPGA) is a type of configurable integrated circuit that can be repeatedly programmed after manufacturing. FPGAs are a subset of logic devices referred to as programmable logic devices (PLDs). They consist of a ...

with enough capacity to perform complex image processing tasks also led to high-performance architectures for connected-component labeling. Most of these architectures utilize the single pass variant of this algorithm, because of the limited memory resources available on an

FPGA A field-programmable gate array (FPGA) is a type of configurable integrated circuit that can be repeatedly programmed after manufacturing. FPGAs are a subset of logic devices referred to as programmable logic devices (PLDs). They consist of a ...

. These types of connected component labeling architectures can process several image pixels in parallel, thereby achieving high throughput and low processing latency.

References

General

* *

External links

Implementation in C#

about Extracting objects from image and Direct Connected Component Labeling Algorithm
{{Authority control Computer vision