Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data
Data (; ) are individual facts, statistics, or items of information, often numeric. In a more technical sens ...
Machine learning (ML) is the study of computer algorithms that can improve automatically through experience and by the use of data. It is seen as a part of artificial intelligence. Machine learning algorithms build a model based on sample data ...
, discretization refers to the process of converting or partitioning continuous
Attribute may refer to:
* Attribute (philosophy)
Logic (from Ancient Greek, Greek: grc, wikt:λογική, λογική, label=none, lit=possessed of reason, intellectual, dialectical, argumentative, translit=logikḗ)Also relate ...
Feature may refer to:
* Feature (CAD), could be a hole, pocket, or notch
* Feature (computer vision), could be an edge, corner or blob
* Feature (software design) is an intentional distinguishing characteristic of a software item ...
to discretized or
Nominal may refer to:
Linguistics and grammar
* Nominal (linguistics), one of the parts of speech
* Nominal, the adjectival form of "noun", as in "nominal agreement" (= "noun agreement")
* Nominal sentence, a sentence without a finite verb
* Nou ...
Interval may refer to:
Mathematics and physics
* Interval (mathematics)
Mathematics (from Greek: ) includes the study of such topics as numbers (arithmetic and number theory), formulas and related structures (algebra), sh ...
. This can be useful when creating probability mass functions – formally, in
Probability is the branch of mathematics
Mathematics (from Greek: ) includes the study of such topics as numbers (arithmetic and number theory), formulas and related structures (algebra), shapes and spaces in which ...
. It is a form of
In applied mathematics
Applied mathematics is the application of mathematical methods by different fields such as physics
Physics is the natural science that studies matter, its Elementary particle, fundamental constituents, its Motion ...
in general and also of binning
, as in making a
A histogram is an approximate representation of the distributionDistribution may refer to:
Distributions, also known as Schwartz distributions or generalized functions, are objects that generaliz ...
Continuity or continuous may refer to:
* Continuity (mathematics), the opposing concept to discreteness; common examples include
** Continuous probability distribution or random variable in probability and statistics
** Continuous ga ...
data is discretized, there is always some amount of
In numerical analysis
Numerical analysis is the study of algorithms that use numerical approximation (as opposed to symbolic computation, symbolic manipulations) for the problems of mathematical analysis (as distinguished from discrete mathemat ...
. The goal is to reduce the amount to a level considered negligible
A model is an informative representation of an object, person or system. The term originally denoted the Plan_(drawing), plans of a building in late 16th-century English, and derived via French and Italian ultimately from Latin ''modulus'', a meas ...
purposes at hand.
Typically data is discretized into partitions of ''K'' equal lengths/width (equal intervals) or K% of the total data (equal frequencies).
Mechanisms for discretizing continuous data include Fayyad
& Irani's MDL method, which uses
In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual Statistical dependence, dependence between the two variables. More specifically, it quantifies the "Information content ...
to recursively define the best bins, CAIM, CACC, Ameva, and many others
Many machine learning algorithms are known to produce better models by discretizing continuous attributes.
This is a partial list of software that implement MDL algorithm.
tool designed to work with popular CRF
in the R package discretization
in the R package RWeka
* Density estimation
* Continuity correction
Estimation of densities
Statistical data coding