HOME

TheInfoList



OR:

Feature scaling is a method used to normalize the range of independent variables or features of data. In
data processing Data processing is the collection and manipulation of digital data to produce meaningful information. Data processing is a form of ''information processing'', which is the modification (processing) of information in any manner detectable by an ...
, it is also known as data normalization and is generally performed during the
data preprocessing Data preprocessing can refer to manipulation or dropping of data before it is used in order to ensure or enhance performance, and is an important step in the data mining process. The phrase "garbage in, garbage out" is particularly applicable to ...
step.


Motivation

Since the range of values of raw data varies widely, in some
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
algorithms, objective functions will not work properly without normalization. For example, many classifiers calculate the distance between two points by the
Euclidean distance In mathematics, the Euclidean distance between two points in Euclidean space is the length of a line segment between the two points. It can be calculated from the Cartesian coordinates of the points using the Pythagorean theorem, therefor ...
. If one of the features has a broad range of values, the distance will be governed by this particular feature. Therefore, the range of all features should be normalized so that each feature contributes approximately proportionately to the final distance. Another reason why feature scaling is applied is that
gradient descent In mathematics, gradient descent (also often called steepest descent) is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. The idea is to take repeated steps in the opposite direction of the ...
converges much faster with feature scaling than without it. It's also important to apply feature scaling if
regularization Regularization may refer to: * Regularization (linguistics) * Regularization (mathematics) * Regularization (physics) In physics, especially quantum field theory, regularization is a method of modifying observables which have singularities in ...
is used as part of the loss function (so that coefficients are penalized appropriately).


Methods


Rescaling (min-max normalization)

Also known as min-max scaling or min-max normalization, rescaling is the simplest method and consists in rescaling the range of features to scale the range in
, 1 The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline (t ...
or ˆ’1, 1 Selecting the target range depends on the nature of the data. The general formula for a min-max of
, 1 The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline (t ...
is given as: : x' = \frac where x is an original value, x' is the normalized value. For example, suppose that we have the students' weight data, and the students' weights span 60 pounds, 200 pounds To rescale this data, we first subtract 160 from each student's weight and divide the result by 40 (the difference between the maximum and minimum weights). To rescale a range between an arbitrary set of values , b the formula becomes: : x' = a + \frac where a,b are the min-max values.


Mean normalization

: x' = \frac where x is an original value, x' is the normalized value, \bar=\text(x) is the mean of that feature vector. There is another form of the means normalization which divides by the standard deviation which is also called standardization.


Standardization (Z-score Normalization)

In machine learning, we can handle various types of data, e.g. audio signals and pixel values for image data, and this data can include multiple
dimensions In physics and mathematics, the dimension of a mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any point within it. Thus, a line has a dimension of one (1D) because only one coordina ...
. Feature standardization makes the values of each feature in the data have zero-mean (when subtracting the mean in the numerator) and unit-variance. This method is widely used for normalization in many machine learning algorithms (e.g.,
support vector machine In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratorie ...
s,
logistic regression In statistics, the logistic model (or logit model) is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear function (calculus), linear combination of one or more independent var ...
, and
artificial neural network Artificial neural networks (ANNs), usually simply called neural networks (NNs) or neural nets, are computing systems inspired by the biological neural networks that constitute animal brains. An ANN is based on a collection of connected unit ...
s). The general method of calculation is to determine the distribution
mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value (magnitude and sign) of a given data set. For a data set, the ''arithme ...
and
standard deviation In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while ...
for each feature. Next we subtract the mean from each feature. Then we divide the values (mean is already subtracted) of each feature by its standard deviation. : x' = \frac Where x is the original feature vector, \bar=\text(x) is the mean of that feature vector, and \sigma is its standard deviation.


Scaling to unit length

Another option that is widely used in machine-learning is to scale the components of a feature vector such that the complete vector has length one. This usually means dividing each component by the Euclidean length of the vector: :x' = \frac In some applications (e.g., histogram features) it can be more practical to use the L1 norm (i.e.,
taxicab geometry A taxicab geometry or a Manhattan geometry is a geometry whose usual distance function or metric of Euclidean geometry is replaced by a new metric in which the distance between two points is the sum of the absolute differences of their Cartesian co ...
) of the feature vector. This is especially important if in the following learning steps the scalar metric is used as a distance measure. Note that this only works for x\neq \mathbf.


Application

In
stochastic gradient descent Stochastic gradient descent (often abbreviated SGD) is an iterative method for optimizing an objective function with suitable smoothness properties (e.g. differentiable or subdifferentiable). It can be regarded as a stochastic approximation of ...
, feature scaling can sometimes improve the convergence speed of the algorithm. In support vector machines, it can reduce the time to find support vectors. Note that feature scaling changes the SVM result .


See also

*
Normalization (statistics) In statistics and applications of statistics, normalization can have a range of meanings. In the simplest cases, normalization of ratings means adjusting values measured on different scales to a notionally common scale, often prior to averaging ...
*
Standard score In statistics, the standard score is the number of standard deviations by which the value of a raw score (i.e., an observed value or data point) is above or below the mean value of what is being observed or measured. Raw scores above the mean ...
*
fMLLR In signal processing, Feature space Maximum Likelihood Linear Regression (fMLLR) is a global feature transform that are typically applied in a speaker adaptive way, where fMLLR transforms acoustic features to speaker adapted features by a multiplica ...
, Feature space Maximum Likelihood Linear Regression


References


Further reading

* {{cite book , first1=Jiawei , last1=Han , first2=Micheline , last2=Kamber , first3=Jian , last3=Pei , title=Data Mining: Concepts and Techniques , publisher=Elsevier , year=2011 , chapter=Data Transformation and Data Discretization , pages=111–118 , isbn=9780123814807 , chapter-url=https://www.google.com/books/edition/Data_Mining_Concepts_and_Techniques/pQws07tdpjoC?hl=en&gbpv=1&pg=PA111


External links


Lecture by Andrew Ng on feature scaling
Machine learning Statistical data transformation