In
statistics
Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, adaptive or "variable-bandwidth" kernel density estimation is a form of
kernel density estimation
In statistics, kernel density estimation (KDE) is the application of kernel smoothing for probability density estimation, i.e., a non-parametric method to estimate the probability density function of a random variable based on ''kernels'' as w ...
in which the size of the kernels used in the estimate are varied
depending upon either the location of the samples or the location of the test point.
It is a particularly effective technique when the sample space is multi-dimensional.
[
]
Rationale
Given a set of samples,
, we wish to estimate the
density,
, at a test point,
:
:
:
:
where ''n'' is the number of samples, ''K'' is the
"kernel", ''h'' is its width and ''D'' is the number of dimensions in
.
The kernel can be thought of as a simple,
linear filter
Linear filters process time-varying input signals to produce output signals, subject to the constraint of linearity. In most cases these linear filters are also time invariant (or shift invariant) in which case they can be analyzed exactly using ...
.
Using a fixed filter width may mean that in regions of low density, all samples
will fall in the tails of the filter with very low weighting, while regions of high
density will find an excessive number of samples in the central region with weighting
close to unity. To fix this problem, we vary the width of the kernel in different
regions of the sample space.
There are two methods of doing this: balloon and pointwise estimation.
In a balloon estimator, the kernel width is varied depending on the location
of the test point. In a pointwise estimator, the kernel width is varied depending
on the location of the sample.
For multivariate estimators, the parameter, ''h'', can be generalized to
vary not just the size, but also the shape of the kernel. This more complicated approach
will not be covered here.
Balloon estimators
A common method of varying the kernel width is to make it inversely proportional to the density at the test point:
:
where ''k'' is a constant.
If we back-substitute the estimated PDF, and assuming a Gaussian
kernel function In operator theory, a branch of mathematics, a positive-definite kernel is a generalization of a positive-definite function or a positive-definite matrix. It was first introduced by James Mercer in the early 20th century, in the context of solving ...
,
we can show that ''W'' is a constant:
:
A similar derivation holds for any kernel whose normalising function is of the order , although with a different constant factor in place of the term. This produces a generalization of the
k-nearest neighbour algorithm
In statistics, the ''k''-nearest neighbors algorithm (''k''-NN) is a non-parametric supervised learning method first developed by Evelyn Fix and Joseph Hodges in 1951, and later expanded by Thomas Cover. It is used for classification and reg ...
.
That is, a uniform
kernel function In operator theory, a branch of mathematics, a positive-definite kernel is a generalization of a positive-definite function or a positive-definite matrix. It was first introduced by James Mercer in the early 20th century, in the context of solving ...
will return the
KNN technique.
There are two components to the error: a variance term and a bias term. The variance term is given as:
:
.
The bias term is found by evaluating the approximated function in the limit as the kernel
width becomes much larger than the sample spacing. By using a Taylor expansion for the real function, the bias term drops out:
:
An optimal kernel width that minimizes the error of each estimate can thus be derived.
Use for statistical classification
The method is particularly effective when applied to
statistical classification
In statistics, classification is the problem of identifying which of a set of categories (sub-populations) an observation (or observations) belongs to. Examples are assigning a given email to the "spam" or "non-spam" class, and assigning a diagno ...
.
There are two ways we can proceed: the first is to compute the PDFs of
each class separately, using different bandwidth parameters,
and then compare them as in Taylor.
[
]
Alternatively, we can divide up the sum based on the class of each sample:
:
where ''c
i'' is the class of the ''i''th sample.
The class of the test point may be estimated through
maximum likelihood
In statistics, maximum likelihood estimation (MLE) is a method of estimation theory, estimating the Statistical parameter, parameters of an assumed probability distribution, given some observed data. This is achieved by Mathematical optimization, ...
.
External links
akde1d.m-
Matlab
MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementation ...
m-file for one-dimensional adaptive kernel density estimation.
libAGF- A
C++
C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
library for multivariate adaptive kernel density estimation.
akde.m-
Matlab
MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementation ...
function for multivariate (high-dimensional) variable kernel density estimation.
References
*
Statistical classification
Estimation of densities
Nonparametric statistics