![Saliencymap example](https://upload.wikimedia.org/wikipedia/commons/1/1a/Saliencymap_example.jpg)
In
computer vision
Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the hum ...
, a saliency map is an
image
An image is a visual representation of something. It can be two-dimensional, three-dimensional, or somehow otherwise feed into the visual system to convey information. An image can be an artifact, such as a photograph or other two-dimensiona ...
that highlights the region on which people's eyes focus first. The goal of a saliency map is to reflect the
degree of importance
Importance is a property of entities that matter or make a difference. For example, World War II was an important event and Albert Einstein was an important person because of how they affected the world. There are disagreements in the academic li ...
of a
pixel
In digital imaging, a pixel (abbreviated px), pel, or picture element is the smallest addressable element in a raster image, or the smallest point in an all points addressable display device.
In most digital display devices, pixels are the smal ...
to the human
visual system
The visual system comprises the sensory organ (the eye) and parts of the central nervous system (the retina containing photoreceptor cells, the optic nerve, the optic tract and the visual cortex) which gives organisms the sense of sight (the a ...
. For example, in this image, a person first looks at the fort and light clouds, so they should be highlighted on the saliency map. Saliency maps engineered in artificial or computer vision are typically not the same as the actual
saliency map constructed by biological or natural vision.
Application
Overview
Saliency maps have applications in a variety of different problems. Some general applications:
* Image and
video compression
In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression ...
: The human eye focuses only on a small
region of interest
A region of interest (often abbreviated ROI) is a sample within a data set identified for a particular purpose. The concept of a ROI is commonly used in many application areas. For example, in medical imaging, the boundaries of a tumor may be def ...
in the frame. Therefore, it is not necessary to compress the entire frame with uniform quality. According to the authors, using a salience map reduces the final size of the video with the same visual perception.
*Image and
video quality
Video quality is a characteristic of a video passed through a video transmission or processing system that describes perceived video degradation (typically, compared to the original video). Video processing systems may introduce some amount of dist ...
assessment: The main task for an image or
video quality
Video quality is a characteristic of a video passed through a video transmission or processing system that describes perceived video degradation (typically, compared to the original video). Video processing systems may introduce some amount of dist ...
metric is a high
correlation
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...
with user opinions. Differences in salient regions are given more importance and thus contribute more to the quality score.
*
Image retargeting: It aims at resizing an image by expanding or shrinking the noninformative regions. Therefore, retargeting algorithms rely on the availability of saliency maps that accurately estimate all the salient image details.
*
Object detection
Object detection is a computer technology related to computer vision and image processing that deals with detecting instances of semantic objects of a certain class (such as humans, buildings, or cars) in digital images and videos. Well-researched ...
and
recognition
Recognition may refer to:
*Award, something given in recognition of an achievement
Machine learning
*Pattern recognition, a branch of machine learning which encompasses the meanings below
Biometric
* Recognition of human individuals, or biomet ...
: Instead of applying a computationally complex algorithm to the whole image, we can use it to the most salient regions of an image most likely to contain an object.
Saliency as a segmentation problem
Saliency estimation may be viewed as an instance of
image segmentation
In digital image processing and computer vision, image segmentation is the process of partitioning a digital image into multiple image segments, also known as image regions or image objects ( sets of pixels). The goal of segmentation is to simpl ...
. In
computer vision
Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the hum ...
, image segmentation is the process of partitioning a digital image into multiple segments (sets of pixels, also known as
superpixels). The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is typically used to locate objects and boundaries (lines, curves, etc.) in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics.
Algorithms
Overview
There are three forms of classic saliency estimation algorithm
implementedin
OpenCV
OpenCV (''Open Source Computer Vision Library'') is a library of programming functions mainly aimed at real-time computer vision. Originally developed by Intel, it was later supported by Willow Garage then Itseez (which was later acquired by In ...
:
* Static saliency: Relies on image features and statistics to localize the
regions of interest of an image.
* Motion saliency: Relies on motion in a video, detected by
optical flow
Optical flow or optic flow is the pattern of apparent motion of objects, surfaces, and edges in a visual scene caused by the relative motion between an observer and a scene. Optical flow can also be defined as the distribution of apparent veloci ...
. Objects that move are considered salient.
* Objectness: Objectness reflects how likely an image window covers an object. These algorithms generate a set of bounding boxes of where an object may lie in an image.
In addition to classic approaches,
neural-network-based are also popular. There are examples of neural networks for motion saliency estimation:
TASED-Net It consists of two building blocks. First, the
encoder network extracts low-resolution spatiotemporal features, and then the following prediction network decodes the spatially encoded features while aggregating all the temporal information.
STRA-Net It emphasizes two essential issues. First, spatiotemporal features integrated via appearance and
optical flow
Optical flow or optic flow is the pattern of apparent motion of objects, surfaces, and edges in a visual scene caused by the relative motion between an observer and a scene. Optical flow can also be defined as the distribution of apparent veloci ...
coupling, and then multi-scale saliency learned via
attention
Attention is the behavioral and cognitive process of selectively concentrating on a discrete aspect of information, whether considered subjective or objective, while ignoring other perceivable information. William James (1890) wrote that "Atte ...
mechanism.
STAViS It combines spatiotemporal visual and auditory information. This approach employs a single network that learns to localize sound sources and to fuse the two saliencies to obtain a final saliency map.
Example implementation
First, we should calculate the distance of each pixel to the rest of pixels in the same frame:
:
is the value of pixel
, in the range of
,255
The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline ...
The following equation is the expanded form of this equation.
:
Where N is the total number of pixels in the current frame. Then we can further restructure our formula. We put the value that has same I together.
:
Where is the frequency of . And the value of n belongs to
,255
The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline ...
The frequencies is expressed in the form of histogram, and the computational time of histogram is time complexity.
Time complexity
This saliency map algorithm has
time complexity
In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by ...
. Since the computational time of histogram is time complexity which N is the number of pixel's number of a frame. Besides, the minus part and multiply part of this equation need 256 times operation. Consequently, the time complexity of this algorithm is which equals to .
Pseudocode
All of the following code is
pseudo
The prefix pseudo- (from Greek ψευδής, ''pseudes'', "false") is used to mark something that superficially appears to be (or behaves like) one thing, but is something else. Subject to context, ''pseudo'' may connote coincidence, imitation, ...
MATLAB
MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementation ...
code. First, read data from video sequences.
for k = 2 : 1 : 13 % which means from frame 2 to 13, and in every loop K's value increase one.
I = imread(currentfilename); % read current frame
I1 = im2single(I); % convert double image into single(requirement of command vlslic)
l = imread(previousfilename); % read previous frame
I2 = im2single(l);
regionSize = 10; % set the parameter of SLIC this parameter setting are the experimental result. RegionSize means the superpixel size.
regularizer = 1; % set the parameter of SLIC
segments1 = vl_slic(I1, regionSize, regularizer); % get the superpixel of current frame
segments2 = vl_slic(I2, regionSize, regularizer); % get superpixel of the previous frame
numsuppix = max(segments1(:)); % get the number of superpixel all information about superpixel is in this lin
regstats1 = regionprops(segments1, ’all’);
regstats2 = regionprops(segments2, ’all’); % get the region characteristic based on segments1
After we read data, we do superpixel process to each frame.
Spnum1 and Spnum2 represent the pixel number of current frame and previous pixel.
% First, we calculate the value distance of each pixel.
% This is our core code
for i = 1:1:spnum1 % From the first pixel to the last one. And in every loop i++
for j = 1:1:spnum2 % From the first pixel to the last one. j++. previous frame
centredist(i:j) = sum((center(i) - center(j))); % calculate the center distance
end
end
Then we calculate the color distance of each pixel, this process we call it contract function.
for i = 1:1:spnum1 % From first pixel of current frame to the last one pixel. I ++
for j = 1:1:spnum2 % From first pixel of previous frame to the last one pixel. J++
posdiff(i, j) = sum((regstats1(j).Centroid’ - mupwtd(:, i))); % Calculate the color distance.
end
end
After this two process, we will get a saliency map, and then store all of these maps into a new FileFolder.
Difference in algorithms
The major difference between function one and two is the difference of contract function. If spnum1 and spnum2 both represent the current frame's pixel number, then this contract function is for the first saliency function. If spnum1 is the current frame's pixel number and spnum2 represent the previous frame's pixel number, then this contract function is for second saliency function. If we use the second contract function which using the pixel of the same frame to get center distance to get a saliency map, then we apply this saliency function to each frame and use current frame's saliency map minus previous frame's saliency map to get a new image which is the new saliency result of the third saliency function.
Datasets
The saliency dataset usually contains human eye movements on some image sequences. It is valuable for new saliency algorithm creation or benchmarking the existing one. The most valuable dataset parameters are spatial resolution, size, and
Eye tracking, eye-tracking equipment. Here is part of the large datasets table fro
MIT/Tübingen Saliency Benchmark datasets for example.
To collect a saliency dataset, image or video sequences and
Eye tracking, eye-tracking equipment must be prepared, and observers must be invited. Observers must have normal or corrected to normal vision and must be at the same distance from the screen. At the beginning of each recording session, the
eye-tracker recalibrates. To do this, the observer fixates his gaze on the screen center. Then the session started, and saliency data are collected by showing sequences and recording eye gazes.
The
Eye tracking, eye-tracking device is a
high-speed camera
A high-speed camera is a device capable of capturing moving images with exposures of less than 1/1,000 second or frame rates in excess of 250 frames per second, fps. It is used for recording fast-moving objects as photographic images onto a storag ...
, capable of recording eye movements at least 250
frames per second
A frame is often a structural system that supports other components of a physical construction and/or steel frame that limits the construction's extent.
Frame and FRAME may also refer to:
Physical objects
In building construction
*Framing (con ...
. Images from the camera are processed by the software, running on a dedicated computer returning gaze data.
References
External links
* {{Cite book, last1=Zhai, first1=Yun, last2=Shah, first2=Mubarak, date=2006-10-23, title=Visual Attention Detection in Video Sequences Using Spatiotemporal Cues, journal=Proceedings of the 14th ACM International Conference on Multimedia, series=MM '06, location=New York, NY, USA, publisher=ACM, pages=815–824, doi=10.1145/1180639.1180824, isbn=978-1595934475, citeseerx=10.1.1.80.4848, s2cid=5219826
* VLfeat: http://www.vlfeat.org/index.html
Saliency mapat
Scholarpedia
''Scholarpedia'' is an English-language wiki-based online encyclopedia with features commonly associated with open-access online academic journals, which aims to have quality content in science and medicine.
''Scholarpedia'' articles are written ...
See also
*
Image segmentation
In digital image processing and computer vision, image segmentation is the process of partitioning a digital image into multiple image segments, also known as image regions or image objects ( sets of pixels). The goal of segmentation is to simpl ...
*
Salience (neuroscience)
Salience (also called saliency) is that property by which some thing stands out. Salient events are an attentional mechanism by which organisms learn and survive; those organisms can focus their limited perceptual and cognitive resources on the ...
Computer vision
Image processing