HOME

TheInfoList



OR:

Data binning, also called data discrete binning or data bucketing, is a
data pre-processing Data preprocessing can refer to manipulation or dropping of data before it is used in order to ensure or enhance performance, and is an important step in the data mining process. The phrase "garbage in, garbage out" is particularly applicable to ...
technique used to reduce the effects of minor observation errors. The original data values which fall into a given small interval, a '' bin'', are replaced by a value representative of that interval, often a central value (
mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value ( magnitude and sign) of a given data set. For a data set, the '' ar ...
or
median In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value. The basic f ...
). It is related to quantization: data binning operates on the
abscissa In common usage, the abscissa refers to the (''x'') coordinate and the ordinate refers to the (''y'') coordinate of a standard two-dimensional graph. The distance of a point from the y-axis, scaled with the x-axis, is called abscissa or x coo ...
axis while quantization operates on the
ordinate In common usage, the abscissa refers to the (''x'') coordinate and the ordinate refers to the (''y'') coordinate of a standard two-dimensional graph. The distance of a point from the y-axis, scaled with the x-axis, is called abscissa or x c ...
axis. Binning is a generalization of
rounding Rounding means replacing a number with an approximate value that has a shorter, simpler, or more explicit representation. For example, replacing $ with $, the fraction 312/937 with 1/3, or the expression with . Rounding is often done to ob ...
. Statistical data binning is a way to group numbers of more-or-less continuous values into a smaller number of "bins". For example, if you have data about a group of people, you might want to arrange their ages into a smaller number of age intervals (for example, grouping every five years together). It can also be used in
multivariate statistics Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable. Multivariate statistics concerns understanding the different aims and background of each of the dif ...
, binning in several dimensions at once. In
digital image processing Digital image processing is the use of a digital computer to process digital images through an algorithm. As a subcategory or field of digital signal processing, digital image processing has many advantages over analog image processing. It allo ...
, "binning" has a very different meaning. Pixel binning is the process of combining blocks of adjacent
pixel In digital imaging, a pixel (abbreviated px), pel, or picture element is the smallest addressable element in a raster image, or the smallest point in an all points addressable display device. In most digital display devices, pixels are the ...
s throughout an image, by summing or averaging their values, during or after readout. It reduces the amount of data; also the relative noise level in the result is lower.


Example usage

Histogram A histogram is an approximate representation of the frequency distribution, distribution of numerical data. The term was first introduced by Karl Pearson. To construct a histogram, the first step is to "Data binning, bin" (or "Data binning, buck ...
s are an example of data binning used in order to observe underlying frequency distributions. They typically occur in
one-dimensional space In physics and mathematics, a sequence of ''n'' numbers can specify a location in ''n''-dimensional space. When , the set of all such locations is called a one-dimensional space. An example of a one-dimensional space is the number line, where t ...
and in equal intervals for ease of visualization. Data binning may be used when small instrumental shifts in the spectral dimension from
mass spectrometry Mass spectrometry (MS) is an analytical technique that is used to measure the mass-to-charge ratio of ions. The results are presented as a '' mass spectrum'', a plot of intensity as a function of the mass-to-charge ratio. Mass spectrometry is u ...
(MS) or
nuclear magnetic resonance Nuclear magnetic resonance (NMR) is a physical phenomenon in which nuclei in a strong constant magnetic field are perturbed by a weak oscillating magnetic field (in the near field) and respond by producing an electromagnetic signal with a ...
(NMR) experiments will be falsely interpreted as representing different components, when a collection of data profiles is subjected to
pattern recognition Pattern recognition is the automated recognition of patterns and regularities in data. It has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphics ...
analysis. A straightforward way to cope with this problem is by using binning techniques in which the spectrum is reduced in resolution to a sufficient degree to ensure that a given peak remains in its bin despite small spectral shifts between analyses. For example, in
NMR Nuclear magnetic resonance (NMR) is a physical phenomenon in which nuclei in a strong constant magnetic field are perturbed by a weak oscillating magnetic field (in the near field) and respond by producing an electromagnetic signal with ...
the
chemical shift In nuclear magnetic resonance (NMR) spectroscopy, the chemical shift is the resonant frequency of an atomic nucleus relative to a standard in a magnetic field. Often the position and number of chemical shifts are diagnostic of the structure o ...
axis may be discretized and coarsely binned, and in MS the spectral accuracies may be rounded to integer
atomic mass unit The dalton or unified atomic mass unit (symbols: Da or u) is a non-SI unit of mass widely used in physics and chemistry. It is defined as of the mass of an unbound neutral atom of carbon-12 in its nuclear and electronic ground state and at ...
values. Also, several
digital camera A digital camera is a camera that captures photographs in digital memory. Most cameras produced today are digital, largely replacing those that capture images on photographic film. Digital cameras are now widely incorporated into mobile devices ...
systems incorporate an automatic pixel binning function to improve image contrast. Binning is also used in machine learning to speed up the decision-tree boosting method for supervised classification and regression in algorithms such as
Microsoft Microsoft Corporation is an American multinational technology corporation producing computer software, consumer electronics, personal computers, and related services headquartered at the Microsoft Redmond campus located in Redmond, Washi ...
's
LightGBM LightGBM, short for light gradient-boosting machine, is a free and open-source distributed gradient-boosting framework for machine learning, originally developed by Microsoft. It is based on decision tree algorithms and used for ranking, classifi ...
and
scikit-learn scikit-learn (formerly scikits.learn and also known as sklearn) is a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support-vector ...
'
Histogram-based Gradient Boosting Classification Tree


See also

* Binning (disambiguation) * Discretization of continuous features * Grouped data *
Histogram A histogram is an approximate representation of the frequency distribution, distribution of numerical data. The term was first introduced by Karl Pearson. To construct a histogram, the first step is to "Data binning, bin" (or "Data binning, buck ...
*
Level of measurement Level of measurement or scale of measure is a classification that describes the nature of information within the values assigned to variables. Psychologist Stanley Smith Stevens developed the best-known classification with four levels, or scal ...
*
Quantization (signal processing) Quantization, in mathematics and digital signal processing, is the process of mapping input values from a large set (often a continuous set) to output values in a (countable) smaller set, often with a finite number of elements. Rounding and ...
*
Rounding Rounding means replacing a number with an approximate value that has a shorter, simpler, or more explicit representation. For example, replacing $ with $, the fraction 312/937 with 1/3, or the expression with . Rounding is often done to ob ...


References

Statistical data coding {{Statistics-stub