HOME

TheInfoList



OR:

Data binning, also called data discrete binning or data bucketing, is a data pre-processing technique used to reduce the effects of minor observation errors. The original data values which fall into a given small interval, a '' bin'', are replaced by a value representative of that interval, often a central value (
mean A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...
or
median The median of a set of numbers is the value separating the higher half from the lower half of a Sample (statistics), data sample, a statistical population, population, or a probability distribution. For a data set, it may be thought of as the “ ...
). It is related to quantization: data binning operates on the abscissa axis while quantization operates on the
ordinate In mathematics, the abscissa (; plural ''abscissae'' or ''abscissas'') and the ordinate are respectively the first and second coordinate of a point in a Cartesian coordinate system: : abscissa \equiv x-axis (horizontal) coordinate : ordinate \e ...
axis. Binning is a generalization of
rounding Rounding or rounding off is the process of adjusting a number to an approximate, more convenient value, often with a shorter or simpler representation. For example, replacing $ with $, the fraction 312/937 with 1/3, or the expression √2 with ...
. Statistical data binning is a way to group numbers of more-or-less continuous values into a smaller number of "bins". For example, if you have data about a group of people, you might want to arrange their ages into a smaller number of age intervals (for example, grouping every five years together). It can also be used in
multivariate statistics Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable, i.e., '' multivariate random variables''. Multivariate statistics concerns understanding the differ ...
, binning in several dimensions at once. In
digital image processing Digital image processing is the use of a digital computer to process digital images through an algorithm. As a subcategory or field of digital signal processing, digital image processing has many advantages over analog image processing. It allo ...
, "binning" has a very different meaning.
Pixel binning Pixel binning, also known as binning, is a process image sensors of digital cameras use to combine adjacent pixels throughout an image, by summing or averaging their values, during or after readout. It improves low-light performance while still al ...
is the process of combining blocks of adjacent
pixel In digital imaging, a pixel (abbreviated px), pel, or picture element is the smallest addressable element in a Raster graphics, raster image, or the smallest addressable element in a dot matrix display device. In most digital display devices, p ...
s throughout an image, by summing or averaging their values, during or after readout. It reduces the amount of data; also the relative noise level in the result is lower.


Example usage

Histogram A histogram is a visual representation of the frequency distribution, distribution of quantitative data. To construct a histogram, the first step is to Data binning, "bin" (or "bucket") the range of values— divide the entire range of values in ...
s are an example of data binning used in order to observe underlying frequency distributions. They typically occur in
one-dimensional space A one-dimensional space (1D space) is a mathematical space in which location can be specified with a single coordinate. An example is the number line, each point of which is described by a single real number. Any straight line or smooth curv ...
and in equal intervals for ease of visualization. Data binning may be used when small instrumental shifts in the spectral dimension from
mass spectrometry Mass spectrometry (MS) is an analytical technique that is used to measure the mass-to-charge ratio of ions. The results are presented as a ''mass spectrum'', a plot of intensity as a function of the mass-to-charge ratio. Mass spectrometry is used ...
(MS) or
nuclear magnetic resonance Nuclear magnetic resonance (NMR) is a physical phenomenon in which nuclei in a strong constant magnetic field are disturbed by a weak oscillating magnetic field (in the near field) and respond by producing an electromagnetic signal with a ...
(NMR) experiments will be falsely interpreted as representing different components, when a collection of data profiles is subjected to
pattern recognition Pattern recognition is the task of assigning a class to an observation based on patterns extracted from data. While similar, pattern recognition (PR) is not to be confused with pattern machines (PM) which may possess PR capabilities but their p ...
analysis. A straightforward way to cope with this problem is by using binning techniques in which the spectrum is reduced in resolution to a sufficient degree to ensure that a given peak remains in its bin despite small spectral shifts between analyses. For example, in
NMR Nuclear magnetic resonance (NMR) is a physical phenomenon in which atomic nucleus, nuclei in a strong constant magnetic field are disturbed by a weak oscillating magnetic field (in the near and far field, near field) and respond by producing ...
the
chemical shift In nuclear magnetic resonance (NMR) spectroscopy, the chemical shift is the resonant frequency of an atomic nucleus relative to a standard in a magnetic field. Often the position and number of chemical shifts are diagnostic of the structure of ...
axis may be discretized and coarsely binned, and in MS the spectral accuracies may be rounded to an integer multiple of the dalton. Also, several
digital camera A digital camera, also called a digicam, is a camera that captures photographs in Digital data storage, digital memory. Most cameras produced today are digital, largely replacing those that capture images on photographic film or film stock. Dig ...
systems incorporate an automatic pixel binning function to improve image contrast. Binning is also used in
machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
to speed up the decision-tree boosting method for supervised classification and regression in algorithms such as
Microsoft Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...
's LightGBM and
scikit-learn scikit-learn (formerly scikits.learn and also known as sklearn) is a free and open-source machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support ...
'
Histogram-based Gradient Boosting Classification Tree


See also

* Binning (disambiguation) *
Censoring (statistics) In statistics, censoring is a condition in which the Value (mathematics), value of a measurement or observation is only partially known. For example, suppose a study is conducted to measure the impact of a drug on mortality rate. In such a study, ...
* Discretization of continuous features * Grouped data *
Histogram A histogram is a visual representation of the frequency distribution, distribution of quantitative data. To construct a histogram, the first step is to Data binning, "bin" (or "bucket") the range of values— divide the entire range of values in ...
*
Level of measurement Level of measurement or scale of measure is a classification that describes the nature of information within the values assigned to variables. Psychologist Stanley Smith Stevens developed the best-known classification with four levels, or scale ...
*
Quantization (signal processing) Quantization, in mathematics and digital signal processing, is the process of mapping input values from a large set (often a continuous set) to output values in a (countable) smaller set, often with a finite number of elements. Rounding and tr ...
*
Rounding Rounding or rounding off is the process of adjusting a number to an approximate, more convenient value, often with a shorter or simpler representation. For example, replacing $ with $, the fraction 312/937 with 1/3, or the expression √2 with ...


References

Statistical data coding {{Statistics-stub