Multi-focus image fusion is a
multiple image compression technique using input images with different
focus depths to make one output image that preserves all information.
Overview
In recent years,
image fusion
The image fusion process is defined as gathering all the important information from multiple images, and their inclusion into fewer images, usually a single one. This single image is more informative and accurate than any single source image, and i ...
has been used in many applications such as remote sensing,
surveillance, medical diagnosis, and photography applications.
Two major applications of image fusion in photography are fusion of multi-focus images and
multi-exposure
In photography and cinematography, a multiple exposure is the superimposition of two or more exposures to create a single image, and double exposure has a corresponding meaning in respect of two images. The exposure values may or may not be ide ...
images.
The main idea of image fusion is gathering important and the essential information from the input images into one single image which ideally has all of the information of the input images.
The research history of image fusion spans over 30 years and many scientific papers.
Image fusion generally has two aspects: image fusion methods and objective evaluation metrics.

In
visual sensor networks (VSN), sensors are cameras which record images and video sequences. In many applications of VSN, a camera can't give a perfect
illustration
An illustration is a decoration, interpretation or visual explanation of a text, concept or process, designed for integration in print and digital published media, such as posters, flyers, magazines, books, teaching materials, animations, vide ...
including all details of the scene. This is because of the limited depth of focus of the
optical lens
A lens is a transmissive optical device which focuses or disperses a light beam by means of refraction. A simple lens consists of a single piece of transparent material, while a compound lens consists of several simple lenses (''elements''), ...
of cameras. Therefore, just the object located in the
focal length
The focal length of an optical system is a measure of how strongly the system converges or diverges light; it is the inverse of the system's optical power. A positive focal length indicates that a system converges light, while a negative foca ...
of camera is focused and clear, and other parts of the image are blurred.
VSN captures images with different depths of focus using several cameras. Due to the large amount of data generated by cameras compared to other sensors such as pressure and temperature sensors and some limitations of
bandwidth
Bandwidth commonly refers to:
* Bandwidth (signal processing) or ''analog bandwidth'', ''frequency bandwidth'', or ''radio bandwidth'', a measure of the width of a frequency range
* Bandwidth (computing), the rate of data transfer, bit rate or thr ...
, energy consumption and processing time, it is essential to process the local input images to decrease the amount of transmitted data.
Much research on multi-focus image fusion has been done in recent years and can be classified into two categories: transform and spatial domains. Commonly used transforms for image fusion are
Discrete cosine transform (DCT) and Multi-Scale Transform (MST).
Recently,
Deep learning (DL) has been thriving in several image processing and
computer vision
Computer vision is an Interdisciplinarity, interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate t ...
applications.
Multi-Focus image fusion in the spatial domain
Huang and Jing have reviewed and applied several focus measurements in the spatial domain for the multi-focus image fusion process, suitable for real-time applications. They mentioned some focus measurements including
variance
In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of number ...
, energy of
image gradient
An image gradient is a directional change in the intensity or color in an image. The gradient of the image is one of the fundamental building blocks in image processing. For example, the Canny edge detector uses image gradient for edge detection. ...
(EOG), Tenenbaum's algorithm (Tenengrad), energy of
Laplacian
In mathematics, the Laplace operator or Laplacian is a differential operator given by the divergence of the gradient of a scalar function on Euclidean space. It is usually denoted by the symbols \nabla\cdot\nabla, \nabla^2 (where \nabla is ...
(EOL), sum-modified-
Laplacian
In mathematics, the Laplace operator or Laplacian is a differential operator given by the divergence of the gradient of a scalar function on Euclidean space. It is usually denoted by the symbols \nabla\cdot\nabla, \nabla^2 (where \nabla is ...
(SML), and
spatial frequency
In mathematics, physics, and engineering, spatial frequency is a characteristic of any structure that is periodic across position in space. The spatial frequency is a measure of how often sinusoidal components (as determined by the Fourier tr ...
(SF). Their experiments showed that EOL gave better results than other methods like variance and spatial frequency.
Multi-Focus image fusion in multi-scale transform and DCT domain
Image fusion based on the multi-scale transform is the most commonly used and promising technique. Laplacian
pyramid
A pyramid (from el, πυραμίς ') is a structure whose outer surfaces are triangular and converge to a single step at the top, making the shape roughly a pyramid in the geometric sense. The base of a pyramid can be trilateral, quadrila ...
transform, gradient pyramid-based transform, morphological pyramid transform and the premier ones, discrete
wavelet transform
In mathematics, a wavelet series is a representation of a square-integrable ( real- or complex-valued) function by a certain orthonormal series generated by a wavelet. This article provides a formal, mathematical definition of an orthonormal ...
,
shift-invariant wavelet transform (SIDWT), and
discrete cosine
Discrete may refer to:
*Discrete particle or quantum in physics, for example in quantum theory
*Discrete device, an electronic component with just one circuit element, either passive or active, other than an integrated circuit
*Discrete group, a ...
harmonic wavelet transform In the mathematics of signal processing, the harmonic wavelet transform, introduced by David Edward Newland in 1993, is a wavelet-based linear transformation of a given function into a time-frequency representation. It combines advantages of the ...
(DCHWT) are some examples of image fusion methods based on multi-scale transform.
These methods are complex and have some limitations e.g. processing time and energy consumption. For example, multi-focus image fusion methods based on DWT require a lot of
convolution
In mathematics (in particular, functional analysis), convolution is a mathematical operation on two functions ( and ) that produces a third function (f*g) that expresses how the shape of one is modified by the other. The term ''convolution' ...
operations, so they take more time and energy to process. Therefore, most methods in multi-scale transform are not suitable for real-time applications.
Moreover, these methods are not very successful along edges, due to the
wavelet
A wavelet is a wave-like oscillation with an amplitude that begins at zero, increases or decreases, and then returns to zero one or more times. Wavelets are termed a "brief oscillation". A taxonomy of wavelets has been established, based on the num ...
transform process missing the edges of the image. They create ringing
artefacts in the output image and reduce its quality.
Due to the aforementioned problems in the multi-scale transform methods, researchers are interested in multi-focus image fusion in the DCT domain. DCT-based methods are more efficient in terms of transmission and archiving images coded in
Joint Photographic Experts Group
The Joint Photographic Experts Group (JPEG) is the joint committee between ISO/ IEC JTC 1/ SC 29 and ITU-T Study Group 16 that created and maintains the JPEG, JPEG 2000, JPEG XR, JPEG XT, JPEG XS, JPEG XL, and related digital image standar ...
(JPEG) standard to the upper node in the VSN agent. A JPEG system consists of a pair of an
encoder Encoder may refer to:
Electronic circuits
* Audio encoder, converts digital audio to analog audio signals
* Video encoder, converts digital video to analog video signals
* Simple encoder, assigns a binary code to an active input line
* Priority e ...
and a decoder. In the encoder, images are divided into non-overlapping 8×8 blocks, and the DCT
coefficients
In mathematics, a coefficient is a multiplicative factor in some term of a polynomial, a series, or an expression; it is usually a number, but may be any expression (including variables such as , and ). When the coefficients are themselves ...
are calculated for each. Since the quantization of DCT coefficients is a
lossy
In information technology, lossy compression or irreversible compression is the class of data compression methods that uses inexact approximations and partial data discarding to represent the content. These techniques are used to reduce data size ...
process, many of the small-valued DCT coefficients are quantized to zero, which corresponds to high frequencies. DCT-based image fusion algorithms work better when the multi-focus image fusion methods are applied in the compressed domain.
In addition, in the spatial-based methods, the input images must be decoded and then transferred to the spatial domain. After implementation of the image fusion operations, the output fused images must again be encoded. DCT domain-based methods do not require complex and time-consuming consecutive
decoding
Decoding or decode may refer to: is the process of converting code into plain text or any format that is useful for subsequent processes.
Science and technology
* Decoding, the reverse of encoding
* Parsing, in computer science
* Digital-to-analog ...
and encoding operations. Therefore, the image fusion methods based on DCT domain operate with much less energy and processing time.
Recently, a lot of research has been carried out in the DCT domain. DCT+Variance, DCT+Corr_Eng, DCT+EOL, and DCT+VOL are some prominent examples of DCT based methods.
Multi-Focus image fusion using Deep Learning
Nowadays, the deep learning is utilized in image fusion applications such as multi-focus image fusion. Liu et al. were the first researchers that used CNN for multi-focus image fusion. They used the Siamese architecture for comparing the focused and unfocused patches.
C. Du et al. submitted MSCNN method that obtains the initial segmented decision map with
image segmentation
In digital image processing and computer vision, image segmentation is the process of partitioning a digital image into multiple image segments, also known as image regions or image objects ( sets of pixels). The goal of segmentation is to simpl ...
between the focused and unfocused patches through the multi-scale convolution
neural network
A neural network is a network or neural circuit, circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up ...
. H. Tang et al. introduced the
pixel
In digital imaging, a pixel (abbreviated px), pel, or picture element is the smallest addressable element in a raster image, or the smallest point in an all points addressable display device.
In most digital display devices, pixels are the s ...
-wise convolution neural network (p-CNN) for classification of the focused and unfocused patches.
All of these CNN based multi-focus image fusion methods have enhanced the decision map. Nevertheless, their initial segmented decision maps have a lot of weakness and errors. Therefore, satisfaction of their final fusion decision map depends to use vast post-processing algorithms such as Consistency Verification (CV),
morphological operations, watershed, guiding filters, and small region removal on the initial segmented decision map. Along with the CNN based multi-focus image fusion methods, fully
convolutional network (FCN) is also utilized in multi-focus image fusion.
[{{Cite journal, last1=Guo, first1=Xiaopeng, last2=Nie, first2=Rencan, last3=Cao, first3=Jinde, last4=Zhou, first4=Dongming, last5=Qian, first5=Wenhua, date=2018-06-12, title=Fully Convolutional Network-Based Multifocus Image Fusion, journal=Neural Computation, volume=30, issue=7, pages=1775–1800, doi=10.1162/neco_a_01098, pmid=29894654, s2cid=48358558 , issn=0899-7667]
ECNN: Ensemble of CNN for Multi-Focus Image Fusion

The
Convolutional Neural Networks
In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of artificial neural network (ANN), most commonly applied to analyze visual imagery. CNNs are also known as Shift Invariant or Space Invariant Artificial Neural Networ ...
(CNNs) based multi-focus image fusion methods have recently attracted enormous attention. They greatly enhanced the constructed decision map compared with the previous state of the art methods that have been done in the spatial and transform domains. Nevertheless, these methods have not reached to the satisfactory initial decision map, and they need to undergo vast post-processing
algorithms
In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
to achieve a satisfactory decision map.
In the method of ECNN, a novel CNNs based method with the help of the
ensemble learning
In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone.
Unlike a statistical ensemble in statist ...
is proposed. It is very reasonable to use various models and datasets rather than just one. The ensemble learning based methods intend to pursue increasing diversity among the models and datasets in order to decrease the problem of the overfitting on the training
dataset A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record of the d ...
.
It is obvious that the results of an ensemble of CNNs are better than just one single CNNs. Also, the proposed method introduces a new simple type of multi-focus images dataset. It simply changes the arranging of the patches of the multi-focus datasets, which is very useful for obtaining the better accuracy. With this new type arrangement of datasets, the three different datasets including the original and the
Gradient
In vector calculus, the gradient of a scalar-valued differentiable function of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p is the "direction and rate of fastest increase". If the gr ...
in directions of vertical and horizontal patches are generated from the
COCO
Coco commonly refers to:
* Coco (folklore), a mythical bogeyman in many Hispano- and Lusophone nations
Coco may also refer to:
People
* Coco (given name), a first name, its shorthand, or unrelated nickname
* Coco (surname), a list of people wi ...
dataset. Therefore, the proposed method introduces a new network that three CNNs models which have been trained on three different created datasets to construct the initial segmented decision map. These ideas greatly improve the initial segmented decision map of the proposed method which is similar, or even better than, the other final decision map of CNNs based methods obtained after applying many post-processing algorithms. Many real multi-focus test images are used in our experiments, and the results are compared with quantitative and qualitative criteria. The obtained experimental results indicate that the proposed CNNs based network is more accurate and have the better decision map without post-processing algorithms than the other existing state of the art multi-focus fusion methods which used many post-processing algorithms.

This method introduces a new network for achieving the cleaner initial segmented decision map compared with the others. The pro- posed method introduces a new architecture which uses an ensemble of three CNNs trained on three different datasets. Also, the proposed method prepares a new simple type of multi- focus image datasets for achieving the better fusion performance than the other popular multi-focus image datasets.
This idea is very helpful to achieve the better initial segmented decision map, which is the same or even better than the others initial segmented decision map by using vast post-processing algorithms.
External links
The source code of ECNN http://amin-naji.com/publications/ and https://github.com/mostafaaminnaji/ECNN
References
Image processing
Image compression