The term kernel is used in
statistical analysis
Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers propertie ...
to refer to a
window function
In signal processing and statistics, a window function (also known as an apodization function or tapering function) is a mathematical function that is zero-valued outside of some chosen interval, normally symmetric around the middle of the in ...
. The term "kernel" has several distinct meanings in different branches of statistics.
Bayesian statistics
In statistics, especially in
Bayesian statistics
Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about the event, ...
, the kernel of a
probability density function
In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) c ...
(pdf) or
probability mass function
In probability and statistics, a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes it is also known as the discrete density function. The probability mass ...
(pmf) is the form of the pdf or pmf in which any factors that are not functions of any of the variables in the domain are omitted. Note that such factors may well be functions of the
parameter
A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...
s of the pdf or pmf. These factors form part of the
normalization factor of the
probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomeno ...
, and are unnecessary in many situations. For example, in
pseudo-random number sampling, most sampling algorithms ignore the normalization factor. In addition, in
Bayesian analysis of
conjugate prior
In Bayesian probability theory, if the posterior distribution p(\theta \mid x) is in the same probability distribution family as the prior probability distribution p(\theta), the prior and posterior are then called conjugate distributions, and t ...
distributions, the normalization factors are generally ignored during the calculations, and only the kernel considered. At the end, the form of the kernel is examined, and if it matches a known distribution, the normalization factor can be reinstated. Otherwise, it may be unnecessary (for example, if the distribution only needs to be sampled from).
For many distributions, the kernel can be written in closed form, but not the normalization constant.
An example is the
normal distribution
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu i ...
. Its
probability density function
In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) c ...
is
:
and the associated kernel is
:
Note that the factor in front of the exponential has been omitted, even though it contains the parameter
, because it is not a function of the domain variable
.
Pattern analysis
The kernel of a
reproducing kernel Hilbert space
In functional analysis (a branch of mathematics), a reproducing kernel Hilbert space (RKHS) is a Hilbert space of functions in which point evaluation is a continuous linear functional. Roughly speaking, this means that if two functions f and g i ...
is used in the suite of techniques known as
kernel methods
In machine learning, kernel machines are a class of algorithms for pattern analysis, whose best known member is the support-vector machine (SVM). The general task of pattern analysis is to find and study general types of relations (for example c ...
to perform tasks such as
statistical classification
In statistics, classification is the problem of identifying which of a set of categories (sub-populations) an observation (or observations) belongs to. Examples are assigning a given email to the "spam" or "non-spam" class, and assigning a diag ...
,
regression analysis
In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one ...
, and
cluster analysis
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of ...
on data in an implicit space. This usage is particularly common in
machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
.
Nonparametric statistics
In
nonparametric statistics
Nonparametric statistics is the branch of statistics that is not based solely on parametrized families of probability distributions (common examples of parameters are the mean and variance). Nonparametric statistics is based on either being distr ...
, a kernel is a weighting function used in
non-parametric estimation techniques. Kernels are used in
kernel density estimation
In statistics, kernel density estimation (KDE) is the application of kernel smoothing for probability density estimation, i.e., a non-parametric method to estimate the probability density function of a random variable based on '' kernels'' as ...
to estimate
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the p ...
s'
density function
In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can ...
s, or in
kernel regression
In statistics, kernel regression is a non-parametric technique to estimate the conditional expectation of a random variable. The objective is to find a non-linear relation between a pair of random variables ''X'' and ''Y''.
In any nonparametr ...
to estimate the
conditional expectation of a random variable. Kernels are also used in
time-series
In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. ...
, in the use of the
periodogram In signal processing, a periodogram is an estimate of the spectral density of a signal. The term was coined by Arthur Schuster in 1898. Today, the periodogram is a component of more sophisticated methods (see spectral estimation). It is the most c ...
to estimate the
spectral density
The power spectrum S_(f) of a time series x(t) describes the distribution of power into frequency components composing that signal. According to Fourier analysis, any physical signal can be decomposed into a number of discrete frequencies, ...
where they are known as
window functions. An additional use is in the estimation of a time-varying intensity for a
point process
In statistics and probability theory, a point process or point field is a collection of mathematical points randomly located on a mathematical space such as the real line or Euclidean space. Kallenberg, O. (1986). ''Random Measures'', 4th edition ...
where window functions (kernels) are convolved with time-series data.
Commonly, kernel widths must also be specified when running a non-parametric estimation.
Definition
A kernel is a
non-negative
In mathematics, the sign of a real number is its property of being either positive, negative, or zero. Depending on local conventions, zero may be considered as being neither positive nor negative (having no sign or a unique third sign), or it ...
real-valued
In mathematics, value may refer to several, strongly related notions.
In general, a mathematical value may be any definite mathematical object. In elementary mathematics, this is most often a number – for example, a real number such as or an ...
integrable function ''K.'' For most applications, it is desirable to define the function to satisfy two additional requirements:
*
Normalization:
:
*Symmetry:
:
The first requirement ensures that the method of kernel density estimation results in a
probability density function
In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) c ...
. The second requirement ensures that the average of the corresponding distribution is equal to that of the sample used.
If ''K'' is a kernel, then so is the function ''K''* defined by ''K''*(''u'') = λ''K''(λ''u''), where λ > 0. This can be used to select a scale that is appropriate for the data.
Kernel functions in common use

Several types of kernel functions are commonly used: uniform, triangle, Epanechnikov, quartic (biweight), tricube, triweight, Gaussian, quadratic
and cosine.
In the table below, if
is given with a bounded
support, then
for values of ''u'' lying outside the support.
See also
*
Kernel density estimation
In statistics, kernel density estimation (KDE) is the application of kernel smoothing for probability density estimation, i.e., a non-parametric method to estimate the probability density function of a random variable based on '' kernels'' as ...
*
Kernel smoother
*
Stochastic kernel In probability theory, a Markov kernel (also known as a stochastic kernel or probability kernel) is a map that in the general theory of Markov processes plays the role that the transition matrix does in the theory of Markov processes with a finite ...
*
Positive-definite kernel
*
Density estimation
*
Multivariate kernel density estimation Kernel density estimation is a nonparametric technique for density estimation i.e., estimation of probability density functions, which is one of the fundamental questions in statistics. It can be viewed as a generalisation of histogram density esti ...
References
*
*
*{{cite journal, year=2002
, first1=D, last1=Comaniciu, first2= P, last2= Meer
, title=Mean shift: A robust approach toward feature space analysis
, journal=IEEE Transactions on Pattern Analysis and Machine Intelligence, volume= 24, issue= 5, pages= 603–619
, citeseerx = 10.1.1.76.8968 , doi=10.1109/34.1000236
Nonparametric statistics
Time series
Point processes
Bayesian statistics