HOME

TheInfoList



OR:

Mean shift is a
non-parametric Nonparametric statistics is the branch of statistics that is not based solely on parametrized families of probability distributions (common examples of parameters are the mean and variance). Nonparametric statistics is based on either being distri ...
feature-space mathematical analysis technique for locating the maxima of a
density function In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can ...
, a so-called
mode Mode ( la, modus meaning "manner, tune, measure, due measure, rhythm, melody") may refer to: Arts and entertainment * '' MO''D''E (magazine)'', a defunct U.S. women's fashion magazine * ''Mode'' magazine, a fictional fashion magazine which is ...
-seeking algorithm. Application domains include
cluster analysis Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of ...
in
computer vision Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the hum ...
and
image processing An image is a visual representation of something. It can be two-dimensional, three-dimensional, or somehow otherwise feed into the visual system to convey information. An image can be an artifact, such as a photograph or other two-dimensiona ...
.


History

The mean shift procedure is usually credited to work by Fukunaga and Hostetler in 1975. It is, however, reminiscent of earlier work by Schnell in 1964.


Overview

Mean shift is a procedure for locating the maxima—the
modes Mode ( la, modus meaning "manner, tune, measure, due measure, rhythm, melody") may refer to: Arts and entertainment * '' MO''D''E (magazine)'', a defunct U.S. women's fashion magazine * ''Mode'' magazine, a fictional fashion magazine which is ...
—of a density function given discrete data sampled from that function. This is an iterative method, and we start with an initial estimate x . Let a
kernel function In operator theory, a branch of mathematics, a positive-definite kernel is a generalization of a positive-definite function or a positive-definite matrix. It was first introduced by James Mercer in the early 20th century, in the context of solving ...
K(x_i - x) be given. This function determines the weight of nearby points for re-estimation of the mean. Typically a
Gaussian kernel In mathematics, a Gaussian function, often simply referred to as a Gaussian, is a function of the base form f(x) = \exp (-x^2) and with parametric extension f(x) = a \exp\left( -\frac \right) for arbitrary real constants , and non-zero . It is n ...
on the distance to the current estimate is used, K(x_i - x) = e^ . The weighted mean of the density in the window determined by K is : m(x) = \frac where N(x) is the neighborhood of x , a set of points for which K(x_i - x) \neq 0 . The difference m(x) - x is called ''mean shift'' in Fukunaga and Hostetler. The ''mean-shift algorithm'' now sets x \leftarrow m(x) , and repeats the estimation until m(x) converges. Although the mean shift algorithm has been widely used in many applications, a rigid proof for the convergence of the algorithm using a general kernel in a high dimensional space is still not known. Aliyari Ghassabeh showed the convergence of the mean shift algorithm in one dimension with a differentiable, convex, and strictly decreasing profile function. However, the one-dimensional case has limited real world applications. Also, the convergence of the algorithm in higher dimensions with a finite number of the stationary (or isolated) points has been proved. However, sufficient conditions for a general kernel function to have finite stationary (or isolated) points have not been provided. Gaussian Mean-Shift is an
Expectation–maximization algorithm In statistics, an expectation–maximization (EM) algorithm is an iterative method to find (local) maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variabl ...
.


Details

Let data be a finite set S embedded in the n-dimensional Euclidean space, X. Let K be a flat kernel that is the characteristic function of the \lambda-ball in X,
K(x) = \begin 1 & \text\ \, x\, \leq \lambda\\ 0 & \text\ \, x\, > \lambda\\ \end
In each iteration of the algorithm, s \leftarrow m(s) is performed for all s \in S simultaneously. The first question, then, is how to estimate the density function given a sparse set of samples. One of the simplest approaches is to just smooth the data, e.g., by convolving it with a fixed kernel of width h,
f(x) = \sum_K(x - x_i) = \sum_k \left(\frac\right)
where x_i are the input samples and k(r) is the kernel function (or ''Parzen window''). h is the only parameter in the algorithm and is called the bandwidth. This approach is known as ''kernel density estimation'' or the Parzen window technique. Once we have computed f(x) from the equation above, we can find its local maxima using gradient ascent or some other optimization technique. The problem with this "brute force" approach is that, for higher dimensions, it becomes computationally prohibitive to evaluate f(x) over the complete search space. Instead, mean shift uses a variant of what is known in the optimization literature as ''multiple restart gradient descent''. Starting at some guess for a local maximum, y_k, which can be a random input data point x_1, mean shift computes the gradient of the density estimate f(x) at y_k and takes an uphill step in that direction.


Types of kernels

Kernel definition: Let X be the n-dimensional Euclidean space, \mathbb^n . The norm of x is a non-negative number, \, x\, ^2=x^x \geq 0 . A function K: X\rightarrow \mathbb is said to be a kernel if there exists a ''profile'', k: , \inftyrightarrow \mathbb , such that K(x) = k(\, x\, ^2) and * k is non-negative. * k is non-increasing: k(a)\ge k(b) if a < b . * k is piecewise continuous and \int_0^\infty k(r)\,dr < \infty\ The two most frequently used kernel profiles for mean shift are: ;Flat kernel
k(x) = \begin 1 & \text\ x \le \lambda\\ 0 & \text\ x > \lambda\\ \end
;Gaussian kernel
k(x) = e^,
where the standard deviation parameter \sigma works as the bandwidth parameter, h .


Applications


Clustering

Consider a set of points in two-dimensional space. Assume a circular window centered at C and having radius r as the kernel. Mean-shift is a hill climbing algorithm which involves shifting this kernel iteratively to a higher density region until convergence. Every shift is defined by a mean shift vector. The mean shift vector always points toward the direction of the maximum increase in the density. At every iteration the kernel is shifted to the centroid or the mean of the points within it. The method of calculating this mean depends on the choice of the kernel. In this case if a Gaussian kernel is chosen instead of a flat kernel, then every point will first be assigned a weight which will decay exponentially as the distance from the kernel's center increases. At convergence, there will be no direction at which a shift can accommodate more points inside the kernel.


Tracking

The mean shift algorithm can be used for visual tracking. The simplest such algorithm would create a confidence map in the new image based on the color histogram of the object in the previous image, and use mean shift to find the peak of a confidence map near the object's old position. The confidence map is a probability density function on the new image, assigning each pixel of the new image a probability, which is the probability of the pixel color occurring in the object in the previous image. A few algorithms, such as kernel-based object tracking, ensemble tracking, CAMshift
Gary Bradski Gary Bradski is an American scientist, engineer, entrepreneur, and author. He co-founded Industrial Perception, a company that developed perception applications for industrial robotic application (since acquired by Google in 2012 ) and has worke ...
(1998
Computer Vision Face Tracking For Use in a Perceptual User Interface
, Intel Technology Journal, No. Q2.
expand on this idea.


Smoothing

Let x_i and z_i, i = 1,...,n, be the d-dimensional input and filtered image pixels in the joint spatial-range domain. For each pixel, * Initialize j = 1 and y_ = x_i * Compute y_ according to m(\cdot) until convergence, y = y_. * Assign z_i =(x_i^s,y_^r). The superscripts s and r denote the spatial and range components of a vector, respectively. The assignment specifies that the filtered data at the spatial location axis will have the range component of the point of convergence y_^r.


Strengths

# Mean shift is an application-independent tool suitable for real data analysis. # Does not assume any predefined shape on data clusters. # It is capable of handling arbitrary feature spaces. # The procedure relies on choice of a single parameter: bandwidth. # The bandwidth/window size 'h' has a physical meaning, unlike ''k''-means.


Weaknesses

# The selection of a window size is not trivial. # Inappropriate window size can cause modes to be merged, or generate additional “shallow” modes. # Often requires using adaptive window size.


Availability

Variants of the algorithm can be found in machine learning and image processing packages: *
ELKI ELKI (for ''Environment for DeveLoping KDD-Applications Supported by Index-Structures'') is a data mining (KDD, knowledge discovery in databases) software framework developed for use in research and teaching. It was originally at the database s ...
. Java data mining tool with many clustering algorithms. *
ImageJ ImageJ is a Java-based image processing program developed at the National Institutes of Health and the Laboratory for Optical and Computational Instrumentation (LOCI, University of Wisconsin). Its first version, ImageJ 1.x, is developed in the publ ...
. Image filtering using the mean shift filter. *
mlpack mlpack is a machine learning software library for C++, built on top of the Armadillo library and thensmallennumerical optimization library. mlpack has an emphasis on scalability, speed, and ease-of-use. Its aim is to make machine learning possibl ...
. Efficient dual-tree algorithm-based implementation. *
OpenCV OpenCV (''Open Source Computer Vision Library'') is a library of programming functions mainly aimed at real-time computer vision. Originally developed by Intel, it was later supported by Willow Garage then Itseez (which was later acquired by Int ...
contains mean-shift implementation via cvMeanShift Method *
Orfeo toolbox In computer science, Orfeo Toolbox (OTB) is a software library for processing images from Earth observation satellites. OTB was initiated by the French space agency (CNES) in 2006. The software is released under a free licence; a number of contri ...
. A C++ implementation. *
scikit-learn scikit-learn (formerly scikits.learn and also known as sklearn) is a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support-vector m ...
Numpy/Python implementation uses ball tree for efficient neighboring points lookup


See also

*
DBSCAN Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm proposed by Martin Ester, Hans-Peter Kriegel, Jörg Sander and Xiaowei Xu in 1996. It is a density-based clustering non-parametric algorithm: give ...
*
OPTICS algorithm Ordering points to identify the clustering structure (OPTICS) is an algorithm for finding density-based clusters in spatial data. It was presented by Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel and Jörg Sander. Its basic idea is simil ...
*
Kernel density estimation In statistics, kernel density estimation (KDE) is the application of kernel smoothing for probability density estimation, i.e., a non-parametric method to estimate the probability density function of a random variable based on ''kernels'' as w ...
(KDE) *
Kernel (statistics) The term kernel is used in statistical analysis to refer to a window function. The term "kernel" has several distinct meanings in different branches of statistics. Bayesian statistics In statistics, especially in Bayesian statistics, the kernel ...


References

{{reflist, 30em Computer vision Cluster analysis algorithms