In various science/engineering applications, such as

independent component analysis In signal processing, independent component analysis (ICA) is a computational method for separating a multivariate signal into additive subcomponents. This is done by assuming that at most one subcomponent is Gaussian and that the subcomponents ar ...

, image analysis,

genetic analysis Genetic analysis is the overall process of studying and researching in fields of science that involve genetics and molecular biology. There are a number of applications that are developed from this research, and these are also considered parts of ...

speech recognition Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the ...

manifold learning Nonlinear dimensionality reduction, also known as manifold learning, refers to various related techniques that aim to project high-dimensional data onto lower-dimensional latent manifolds, with the goal of either visualizing the data in the low- ...

, and time delay estimationBenesty, J.; Yiteng Huang; Jingdong Chen (2007) Time Delay Estimation via Minimum Entropy. In ''Signal Processing Letters'', Volume 14, Issue 3, March 2007 157–160 it is useful to estimate the

differential entropy Differential entropy (also referred to as continuous entropy) is a concept in information theory that began as an attempt by Claude Shannon to extend the idea of (Shannon) entropy, a measure of average surprisal of a random variable, to continuo ...

of a system or process, given some observations. The simplest and most common approach uses histogram-based estimation, but other approaches have been developed and used, each with its own benefits and drawbacks.J. Beirlant, E. J. Dudewicz, L. Gyorfi, and E. C. van der Meulen (1997
Nonparametric entropy estimation: An overview
In ''International Journal of Mathematical and Statistical Sciences'', Volume 6, pp. 17– 39. The main factor in choosing a method is often a trade-off between the bias and the variance of the estimate,T. Schürmann, Bias analysis in entropy estimation. In ''J. Phys. A: Math. Gen'', 37 (2004), pp. L295–L301. although the nature of the (suspected) distribution of the data may also be a factor.

Histogram estimator

The histogram approach uses the idea that the differential entropy of a probability distribution

f(x)

for a continuous random variable

x

, :

h(X) = -\int_\mathbb f(x)\log f(x)\,dx

can be approximated by first approximating

f(x)

with a histogram of the observations, and then finding the discrete entropy of a quantization of

x

H(X)  =  - \sum_^nf(x_i)\log \left(\frac \right)

with bin probabilities given by that histogram. The histogram is itself a maximum-likelihood (ML) estimate of the discretized frequency distribution ), where

w

is the width of the

i

th bin. Histograms can be quick to calculate, and simple, so this approach has some attraction. However, the estimate produced is

bias Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individual, a group ...

ed, and although corrections can be made to the estimate, they may not always be satisfactory.G. Miller (1955) Note on the bias of information estimates. In ''Information Theory in Psychology: Problems and Methods'', pp. 95–100. A method better suited for multidimensional

probability density function In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) ca ...

s (pdf) is to first make a pdf estimate with some method, and then, from the pdf estimate, compute the entropy. A useful pdf estimate method is e.g. Gaussian

mixture model In statistics, a mixture model is a probabilistic model for representing the presence of subpopulations within an overall population, without requiring that an observed data set should identify the sub-population to which an individual observatio ...

ing (GMM), where the

expectation maximization Expectation or Expectations may refer to: Science * Expectation (epistemic) * Expected value, in mathematical probability theory * Expectation value (quantum mechanics) * Expectation–maximization algorithm, in statistics Music * ''Expectati ...

(EM) algorithm is used to find an ML estimate of a

weighted sum A weight function is a mathematical device used when performing a sum, integral, or average to give some elements more "weight" or influence on the result than other elements in the same set. The result of this application of a weight function is ...

of Gaussian pdf's approximating the data pdf.

Estimates based on sample-spacings

If the data is one-dimensional, we can imagine taking all the observations and putting them in order of their value. The spacing between one value and the next then gives us a rough idea of (the

reciprocal Reciprocal may refer to: In mathematics * Multiplicative inverse, in mathematics, the number 1/''x'', which multiplied by ''x'' gives the product 1, also known as a ''reciprocal'' * Reciprocal polynomial, a polynomial obtained from another pol ...

of) the probability density in that region: the closer together the values are, the higher the probability density. This is a very rough estimate with high

variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...

, but can be improved, for example by thinking about the space between a given value and the one ''m'' away from it, where ''m'' is some fixed number. The probability density estimated in this way can then be used to calculate the entropy estimate, in a similar way to that given above for the histogram, but with some slight tweaks. One of the main drawbacks with this approach is going beyond one dimension: the idea of lining the data points up in order falls apart in more than one dimension. However, using analogous methods, some multidimensional entropy estimators have been developed.E. G. Learned-Miller (2003) A new class of entropy estimators for multi-dimensional densities, in ''Proceedings of the

International Conference on Acoustics, Speech, and Signal Processing ICASSP, the International Conference on Acoustics, Speech, and Signal Processing, is an annual flagship conference organized of IEEE Signal Processing Society. All papers included in its proceedings have been indexed by Ei Compendex. The first ICA ...

(ICASSP’03)'', vol. 3, April 2003, pp. 297–300.I. Lee (2010) Sample-spacings based density and entropy estimators for spherically invariant multidimensional data, In ''Neural Computation'', vol. 22, issue 8, April 2010, pp. 2208–2227.

Estimates based on nearest-neighbours

For each point in our dataset, we can find the distance to its nearest neighbour. We can in fact estimate the entropy from the distribution of the nearest-neighbour-distance of our datapoints. (In a uniform distribution these distances all tend to be fairly similar, whereas in a strongly nonuniform distribution they may vary a lot more.)

Bayesian estimator

When in under-sampled regime, having a prior on the distribution can help the estimation. One such

Bayesian estimator In estimation theory and decision theory, a Bayes estimator or a Bayes action is an estimator or decision rule that minimizes the posterior expected value of a loss function (i.e., the posterior expected loss). Equivalently, it maximizes the ...

was proposed in the neuroscience context known as the NSB ( Nemenman–Shafee– Bialek) estimator.Ilya Nemenman, Fariel Shafee, William Bialek (2003) Entropy and Inference, Revisited. Advances in Neural Information ProcessingIlya Nemenman,

William Bialek William Samuel Bialek (born 1960, in Los Angeles, California) is a theoretical biophysicist and a professor at Princeton University and The Graduate Center, CUNY. Much of his work, which has ranged over a wide variety of theoretical problems at t ...

, de Ruyter (2004) Entropy and information in neural spike trains: Progress on the sampling problem. Physical Review E The NSB estimator uses a mixture of Dirichlet prior, chosen such that the induced prior over the entropy is approximately uniform.

Estimates based on expected entropy

A new approach to the problem of entropy evaluation is to compare the expected entropy of a sample of random sequence with the calculated entropy of the sample. The method gives very accurate results, but it is limited to calculations of random sequences modeled as Markov chains of the first order with small values of bias and correlations. This is the first known method that takes into account the size of the sample sequence and its impact on the accuracy of the calculation of entropy.Marek Lesniewicz (2014) Expected Entropy as a Measure and Criterion of Randomness of Binary Sequence

In ''Przeglad Elektrotechniczny'', Volume 90, 1/2014, pp. 42– 46.Marek Lesniewicz (2016) Analyses and Measurements of Hardware Generated Random Binary Sequences Modeled as Markov Chain

In ''Przeglad Elektrotechniczny'', Volume 92, 11/2016, pp. 268-274.

References

{{DEFAULTSORT:Entropy Estimation Entropy and information Information theory Statistical randomness Random number generation