Maximal Information Coefficient
   HOME

TheInfoList



OR:

In
statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, the maximal information coefficient (MIC) is a measure of the strength of the linear or non-linear association between two variables ''X'' and ''Y''. The MIC belongs to the maximal information-based nonparametric exploration (MINE) class of statistics. In a simulation study, MIC outperformed some selected low power tests, however concerns have been raised regarding reduced
statistical power In statistics, the power of a binary hypothesis test is the probability that the test correctly rejects the null hypothesis (H_0) when a specific alternative hypothesis (H_1) is true. It is commonly denoted by 1-\beta, and represents the chances ...
in detecting some associations in settings with low sample size when compared to powerful methods such as
distance correlation In statistics and in probability theory, distance correlation or distance covariance is a measure of dependence between two paired random vectors of arbitrary, not necessarily equal, dimension. The population distance correlation coefficient is zer ...
and Heller–Heller–Gorfine (HHG). Comparisons with these methods, in which MIC was outperformed, were made in Simon and Tibshirani and in Gorfine, Heller, and Heller. It is claimed that MIC approximately satisfies a property called ''equitability'' which is illustrated by selected simulation studies. It was later proved that no non-trivial coefficient can exactly satisfy the ''equitability'' property as defined by Reshef et al., although this result has been challenged. Some criticisms of MIC are addressed by Reshef et al. in further studies published on arXiv.


Overview

The maximal information coefficient uses binning as a means to apply
mutual information In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual dependence between the two variables. More specifically, it quantifies the " amount of information" (in units such ...
on continuous random variables. Binning has been used for some time as a way of applying mutual information to continuous distributions; what MIC contributes in addition is a methodology for selecting the number of bins and picking a maximum over many possible grids. The rationale is that the bins for both variables should be chosen in such a way that the mutual information between the variables be maximal. That is achieved whenever \mathrm\left(X_b\right)=\mathrm\left(Y_b\right)=\mathrm\left(X_b,Y_b\right).The "b" subscripts have been used to emphasize that the mutual information is calculated using the bins Thus, when the mutual information is maximal over a binning of the data, we should expect that the following two properties hold, as much as made possible by the own nature of the data. First, the bins would have roughly the same size, because the entropies \mathrm(X_b) and \mathrm(Y_b) are maximized by equal-sized binning. And second, each bin of ''X'' will roughly correspond to a bin in ''Y''. Because the variables X and Y are
real number In mathematics, a real number is a number that can be used to measure a ''continuous'' one-dimensional quantity such as a distance, duration or temperature. Here, ''continuous'' means that values can have arbitrarily small variations. Every real ...
s, it is almost always possible to create exactly one bin for each (''x'',''y'') datapoint, and that would yield a very high value of the MI. To avoid forming this kind of trivial partitioning, the authors of the paper propose taking a number of bins n_x for ''X'' and n_y whose product is relatively small compared with the size N of the
data sample In statistics, quality assurance, and survey methodology, sampling is the selection of a subset (a statistical sample) of individuals from within a statistical population to estimate characteristics of the whole population. Statisticians attem ...
. Concretely, they propose: n_x\times n_y \leq \mathrm^ In some cases it is possible to achieve a good correspondence between X_b and Y_b with numbers as low as n_x=2 and n_y=2, while in other cases the number of bins required may be higher. The maximum for \mathrm(X_b;Y_b) is determined by H(X), which is in turn determined by the number of bins in each axis, therefore, the mutual information value will be dependent on the number of bins selected for each variable. In order to compare mutual information values obtained with partitions of different sizes, the mutual information value is normalized by dividing by the maximum achieveable value for the given partition size. It is worth noting that a similar adaptive binning procedure for estimating mutual information had been proposed previously. Entropy is maximized by uniform probability distributions, or in this case, bins with the same number of elements. Also, joint entropy is minimized by having a one-to-one correspondence between bins. If we substitute such values in the formula I(X;Y)=H(X)+H(Y)-H(X,Y), we can see that the maximum value achieveable by the MI for a given pair n_x,n_y of bin counts is \log\min\left(n_x,n_y\right). Thus, this value is used as a normalizing divisor for each pair of bin counts. Last, the normalized maximal mutual information value for different combinations of n_x and n_y is tabulated, and the maximum value in the table selected as the value of the statistic. It is important to note that trying all possible binning schemes that satisfy n_x\times n_y \leq \mathrm^ is computationally unfeasible even for small n. Therefore, in practice the authors apply a heuristic which may or may not find the true maximum.


Notes


References

{{Reflist Information theory Covariance and correlation