In
statistics
Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, indust ...
and
machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
, discretization refers to the process of converting or partitioning continuous
attributes
Attribute may refer to:
* Attribute (philosophy), an extrinsic property of an object
* Attribute (research), a characteristic of an object
* Grammatical modifier, in natural languages
* Attribute (computing), a specification that defines a prope ...
,
features
Feature may refer to:
Computing
* Feature (CAD), could be a hole, pocket, or notch
* Feature (computer vision), could be an edge, corner or blob
* Feature (software design) is an intentional distinguishing characteristic of a software ite ...
or
variables to discretized or
nominal attributes/features/variables/
intervals. This can be useful when creating probability mass functions – formally, in
density estimation
In statistics, probability density estimation or simply density estimation is the construction of an estimate, based on observed data, of an unobservable underlying probability density function. The unobservable density function is thought of ...
. It is a form of
discretization
In applied mathematics, discretization is the process of transferring continuous functions, models, variables, and equations into discrete counterparts. This process is usually carried out as a first step toward making them suitable for numerica ...
in general and also of
binning, as in making a
histogram
A histogram is an approximate representation of the frequency distribution, distribution of numerical data. The term was first introduced by Karl Pearson. To construct a histogram, the first step is to "Data binning, bin" (or "Data binning, buck ...
. Whenever
continuous
Continuity or continuous may refer to:
Mathematics
* Continuity (mathematics), the opposing concept to discreteness; common examples include
** Continuous probability distribution or random variable in probability and statistics
** Continuous g ...
data is discretized, there is always some amount of
discretization error
In numerical analysis, computational physics, and simulation, discretization error is the error resulting from the fact that a function of a continuous variable is represented in the computer by a finite number of evaluations, for example, on a ...
. The goal is to reduce the amount to a level considered