In data mining, cluster-weighted modeling (CWM) is an algorithm-based approach to non-linear prediction of outputs (

dependent variables Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or demand ...

) from inputs (

independent variables Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or demand ...

) based on

density estimation In statistics, probability density estimation or simply density estimation is the construction of an estimate, based on observed data, of an unobservable underlying probability density function. The unobservable density function is thought o ...

using a set of models (clusters) that are each notionally appropriate in a sub-region of the input space. The overall approach works in jointly input-output space and an initial version was proposed by

Neil Gershenfeld Neil Adam Gershenfeld (born December 1, 1959) is an American professor at MIT and the director of MIT's Center for Bits and Atoms, a sister lab to the MIT Media Lab. His research studies are predominantly focused in interdisciplinary studies in ...

Basic form of model

The procedure for cluster-weighted modeling of an input-output problem can be outlined as follows. In order to construct predicted values for an output variable ''y'' from an input variable ''x'', the modeling and calibration procedure arrives at a

joint probability density function In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can ...

, ''p''(''y'',''x''). Here the "variables" might be uni-variate, multivariate or time-series. For convenience, any model parameters are not indicated in the notation here and several different treatments of these are possible, including setting them to fixed values as a step in the calibration or treating them using a

Bayesian analysis Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is an important technique in statistics, and ...

. The required predicted values are obtained by constructing the conditional probability density ''p''(''y'', ''x'') from which the prediction using the

conditional expected value In probability theory, the conditional expectation, conditional expected value, or conditional mean of a random variable is its expected value – the value it would take “on average” over an arbitrarily large number of occurrences – given ...

can be obtained, with the

conditional variance In probability theory and statistics, a conditional variance is the variance of a random variable given the value(s) of one or more other variables. Particularly in econometrics, the conditional variance is also known as the scedastic function or ...

providing an indication of uncertainty. The important step of the modeling is that ''p''(''y'', ''x'') is assumed to take the following form, as a

mixture model In statistics, a mixture model is a probabilistic model for representing the presence of subpopulations within an overall population, without requiring that an observed data set should identify the sub-population to which an individual observation ...

: :

p(y,x)=\sum_1^n w_jp_j(y,x),

where ''n'' is the number of clusters and are weights that sum to one. The functions ''p_j''(''y'',''x'') are joint probability density functions that relate to each of the ''n'' clusters. These functions are modeled using a decomposition into a conditional and a

marginal density In probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability distribution of the variables contained in the subset. It gives the probabilities of various values of the variables ...

: :

p_j(y,x)=p_j(y, x)p_j(x),

where: :*''p_j''(''y'', ''x'') is a model for predicting ''y'' given ''x'', and given that the input-output pair should be associated with cluster ''j'' on the basis of the value of ''x''. This model might be a

regression model In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one o ...

in the simplest cases. :*''p_j''(''x'') is formally a density for values of ''x'', given that the input-output pair should be associated with cluster ''j''. The relative sizes of these functions between the clusters determines whether a particular value of ''x'' is associated with any given cluster-center. This density might be a

Gaussian function In mathematics, a Gaussian function, often simply referred to as a Gaussian, is a function of the base form f(x) = \exp (-x^2) and with parametric extension f(x) = a \exp\left( -\frac \right) for arbitrary real constants , and non-zero . It is n ...

centered at a parameter representing the cluster-center. In the same way as for

regression analysis In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one ...

, it will be important to consider preliminary

data transformation In computing, data transformation is the process of converting data from one format or structure into another format or structure. It is a fundamental aspect of most data integrationCIO.com. Agile Comes to Data Integration. Retrieved from: http ...

s as part of the overall modeling strategy if the core components of the model are to be simple regression models for the cluster-wise condition densities, and

normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...

s for the cluster-weighting densities ''p_j''(''x'').

General versions

The basic CWM algorithm gives a single output cluster for each input cluster. However, CWM can be extended to multiple clusters which are still associated with the same input cluster. Each cluster in CWM is localized to a Gaussian input region, and this contains its own trainable local model. It is recognized as a versatile inference algorithm which provides simplicity, generality, and flexibility; even when a feedforward layered network might be preferred, it is sometimes used as a "second opinion" on the nature of the training problem. The original form proposed by Gershenfeld describes two innovations: * Enabling CWM to work with continuous streams of data * Addressing the problem of local minima encountered by the CWM parameter adjustment process CWM can be used to classify media in printer applications, using at least two parameters to generate an output that has a joint dependency on the input parameters.

References

{{Reflist Multivariate statistics Cluster analysis algorithms Estimation of densities