Medcouple
   HOME

TheInfoList



OR:

In
statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, the medcouple is a
robust statistic Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal. Robust statistical methods have been developed for many common problems, such ...
that measures the
skewness In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined. For a unimodal d ...
of a
univariate distribution In statistics, a univariate distribution is a probability distribution of only one random variable. This is in contrast to a multivariate distribution, the probability distribution of a random vector (consisting of multiple random variables). Examp ...
. It is defined as a scaled median difference between the left and right half of a distribution. Its robustness makes it suitable for identifying
outlier In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are ...
s in adjusted boxplots. Ordinary
box plot In descriptive statistics, a box plot or boxplot is a method for graphically demonstrating the locality, spread and skewness groups of numerical data through their quartiles. In addition to the box on a box plot, there can be lines (which are ca ...
s do not fare well with skew distributions, since they label the longer unsymmetrical tails as outliers. Using the medcouple, the whiskers of a boxplot can be adjusted for skew distributions and thus have a more accurate identification of outliers for non-symmetrical distributions. As a kind of
order statistic In statistics, the ''k''th order statistic of a statistical sample is equal to its ''k''th-smallest value. Together with rank statistics, order statistics are among the most fundamental tools in non-parametric statistics and inference. Import ...
, the medcouple belongs to the class of incomplete generalised L-statistics. Like the ordinary
median In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value. The basic fe ...
or
mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value (magnitude and sign) of a given data set. For a data set, the ''arithme ...
, the medcouple is a nonparametric statistic, thus it can be computed for any distribution.


Definition

The following description uses zero-based indexing in order to harmonise with the indexing in many programming languages. Let X := \ be an ordered sample of size n, and let x_m be the
median In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value. The basic fe ...
of X. Define the sets ::X^+ := \, ::X^- := \, of sizes p := , X^+, and q := , X^-, respectively. For x_i^+ \in X^+ and x_j^- \in X^-, we define the ''kernel function'' :h(x_i^+, x_j^-) := \begin \displaystyle\frac & \text x_i^+ > x_j^-, \\ \operatorname (p - 1 - i - j) & \text x_i^+ = x_m = x_j^-, \end where \operatorname is the
sign function In mathematics, the sign function or signum function (from '' signum'', Latin for "sign") is an odd mathematical function that extracts the sign of a real number. In mathematical expressions the sign function is often represented as . To avoi ...
. The ''medcouple'' is then the median of the set :: \. In other words, we split the distribution into all values greater or equal to the median and all values less than or equal to the median. We define a kernel function whose first variable is over the p greater values and whose second variable is over the q lesser values. For the special case of values tied to the median, we define the kernel by the
signum function In mathematics, the sign function or signum function (from '' signum'', Latin for "sign") is an odd mathematical function that extracts the sign of a real number. In mathematical expressions the sign function is often represented as . To avo ...
. The medcouple is then the median over all pq values of h(x_i^+, x_j^-). Since the medcouple is not a median applied to all (x_i, x_j) couples, but only to those for which x_i^+ \geq x_m \geq x_j^-, it belongs to the class of incomplete generalised L-statistics.


Properties of the medcouple

The medcouple has a number of desirable properties. A few of them are directly inherited from the kernel function.


The medcouple kernel

We make the following observations about the kernel function h(x_i^+, x_j^-): # The kernel function is location-invariant. If we add or subtract any value to each element of the sample X, the corresponding values of the kernel function do not change. # The kernel function is scale-invariant. Equally scaling all elements of the sample X does not alter the values of the kernel function. These properties are in turn inherited by the medcouple. Thus, the medcouple is independent of the
mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value (magnitude and sign) of a given data set. For a data set, the ''arithme ...
and
standard deviation In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while ...
of a distribution, a desirable property for measuring
skewness In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined. For a unimodal d ...
. For ease of computation, these properties enable us to define the two sets ::Z^+ := \left.\left\ ::Z^- := \left.\left\ where r = 2 \max_ , x_i, . This makes the set Z := Z^+ \cup Z^- have
range Range may refer to: Geography * Range (geographic), a chain of hills or mountains; a somewhat linear, complex mountainous or hilly area (cordillera, sierra) ** Mountain range, a group of mountains bordered by lowlands * Range, a term used to i ...
of at most 1, median 0, and keep the same medcouple as X. For Z, the medcouple kernel reduces to ::h(z_i^+, z_j^-) := \begin \displaystyle\frac & \text z_i^+ > z_j^- \\ \operatorname (p - 1 - i - j) & \text z_i^+ = 0 = z_j^- \end Using the recentred and rescaled set Z we can observe the following. #
  • The kernel function is between -1 and 1, that is, , h(z_i^+, z_j^-), \leq 1. This follows from the
    reverse triangle inequality In mathematics, the triangle inequality states that for any triangle, the sum of the lengths of any two sides must be greater than or equal to the length of the remaining side. This statement permits the inclusion of degenerate triangles, but ...
    , a, - , b, \leq , a - b, with a = z_i^+ and b = z_j^- and the fact that z_i^+ \geq 0 \geq z_j^-.
  • #The medcouple kernel h(z_i^+, z_j^-) is non-decreasing in each variable. This can be verified by the partial derivatives \frac and \frac, both nonnegative, since z_i^+ \geq 0 \geq z_j^-. With properties 1, 2, and 4, we can thus define the following
    matrix Matrix most commonly refers to: * ''The Matrix'' (franchise), an American media franchise ** ''The Matrix'', a 1999 science-fiction action film ** "The Matrix", a fictional setting, a virtual reality environment, within ''The Matrix'' (franchis ...
    , :: H :=(h_) = (h(z_i^+, z_j^-)) = \begin h(z_0^+, z_0^-) & \cdots & h(z_0^+, z_^-) \\ \vdots & \ddots & \vdots \\ h(z_^+, z_0^-) & \cdots & h(z_^+, z_^-) \end. If we sort the sets Z^+ and Z^- in decreasing order, then the matrix H has sorted rows and sorted columns, :: H = \begin h(z_0^+, z_0^-) & \geq & \cdots & \geq & h(z_0^+, z_^-) \\ \geq & & & & \geq \\ \vdots & & \ddots & & \vdots \\ \geq & & & & \geq \\ h(z_^+, z_0^-) & \geq & \cdots & \geq & h(z_^+, z_^-) \end. The medcouple is then the median of this matrix with sorted rows and sorted columns. The fact that the rows and columns are sorted allows the implementation of a fast algorithm for computing the medcouple.


    Robustness

    The
    breakdown point Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal. Robust statistical methods have been developed for many common problems, such ...
    is the number of values that a statistic can resist before it becomes meaningless, i.e. the number of arbitrarily large outliers that the data set X may have before the value of the statistic is affected. For the medcouple, the breakdown point is 25%, since it is a median taken over the couples (x_i, x_j) such that x_i \geq x_m \geq x_j.


    Values

    Like all measures of
    skewness In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined. For a unimodal d ...
    , the medcouple is positive for distributions that are skewed to the right, negative for distributions skewed to the left, and zero for symmetrical distributions. In addition, the values of the medcouple are bounded by 1 in absolute value.


    Algorithms for computing the medcouple

    Before presenting medcouple algorithms, we recall that there exist O(n) algorithms for the finding the median. Since the medcouple is a median, ordinary algorithms for median-finding are important.


    Naïve algorithm

    The naïve
    algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
    for computing the medcouple is slow. It proceeds in two steps. First, it constructs the medcouple matrix H which contains all of the possible values of the medcouple kernel. In the second step, it finds the median of this matrix. Since there are pq \approx \frac entries in the matrix in the case when all elements of the data set X are unique, the
    algorithmic complexity Algorithmic may refer to: *Algorithm, step-by-step instructions for a calculation **Algorithmic art, art made by an algorithm **Algorithmic composition, music made by an algorithm ** Algorithmic trading, trading decisions made by an algorithm **Alg ...
    of the naïve algorithm is O(n^2). More concretely, the naïve algorithm proceeds as follows. Recall that we are using zero-based indexing. function naïve_medcouple(vector X): ''// X is a vector of size n.'' ''// Sorting in decreasing order can be done in-place in O(n log n) time'' sort_decreasing(X) xm := median(X) xscale := 2 * max(abs(X)) ''// Define the upper and lower centred and rescaled vectors'' ''// they inherit X's own decreasing sorting'' Zplus := x in X such that x >= xm Zminus := x in X such that x <= xm p := size(Zplus) q := size(Zminus) ''// Define the kernel function
    closing Closing may refer to: Business and law * Closing (law), a closing argument, a summation * Closing (real estate), the final step in executing a real estate transaction * Closing (sales), the process of making a sale * Closure (business), Closing a ...
    over Zplus and Zminus'' function h(i, j): a := Zplus b := Zminus if a

    b: return signum(p - 1 - i - j) else: return (a + b) / (a - b) endif endfunction ''// O(n^2) operations necessary to form this vector'' H := i in ,_1,_...,_p_-_1and_j_in_[0,_1,_...,_q_-_1 _____ _____return_median(H) _endfunction The_final_call_to_median_on_a_vector_of_size_O(n^2)_can_be_done_itself_in_O(n^2)_operations,_hence_the_entire_naïve_medcouple_algorithm_is_of_the_same_complexity.


    __Fast_algorithm_

    The_fast_algorithm_outperforms_the_naïve_algorithm_by_exploiting_the_sorted_nature_of_the_medcouple_matrix_H._Instead_of_computing_all_entries_of_the_matrix,_the_fast_algorithm_uses_the_Kth_pair_algorithm_of_Johnson_&_Mizoguchi. The_first_stage_of_the_fast_algorithm_proceeds_as_the_naïve_algorithm._We_first_compute_the_necessary_ingredients_for_the_kernel_matrix,_H_=_(h_),_with_sorted_rows_and_sorted_columns_in_decreasing_order._Rather_than_computing_all_values_of_h_,_we_instead_exploit_the_monotonicity_in_rows_and_columns,_via_the_following_observations.


    __Comparing_a_value_against_the_kernel_matrix_

    First,_we_note_that_we_can_compare_any_u_with_all_values_h__of_H_in_O(n)_time._For_example,_for_determining_all_i_and_j_such_that_h__>_u,_we_have_the_following_function: _____function_greater_h(kernel_h,_int_p,_int_q,_real_u): _________//_h_is_the_kernel_function,_h(i,j)_gives_the_ith,_jth_entry_of_H _________//_p_and_q_are_the_number_of_rows_and_columns_of_the_kernel_matrix_H _________ _________//_vector_of_size_p _________P_:=_vector(p) _________ _________//_indexing_from_zero _________j_:=_0 _________ _________//_starting_from_the_bottom,_compute_the_supremum.html" "title=",_1,_...,_q_-_1.html" ;"title=", 1, ..., p - 1and j in [0, 1, ..., q - 1">, 1, ..., p - 1and j in [0, 1, ..., q - 1 return median(H) endfunction The final call to median on a vector of size O(n^2) can be done itself in O(n^2) operations, hence the entire naïve medcouple algorithm is of the same complexity.


    Fast algorithm

    The fast algorithm outperforms the naïve algorithm by exploiting the sorted nature of the medcouple matrix H. Instead of computing all entries of the matrix, the fast algorithm uses the Kth pair algorithm of Johnson & Mizoguchi. The first stage of the fast algorithm proceeds as the naïve algorithm. We first compute the necessary ingredients for the kernel matrix, H = (h_), with sorted rows and sorted columns in decreasing order. Rather than computing all values of h_, we instead exploit the monotonicity in rows and columns, via the following observations.


    Comparing a value against the kernel matrix

    First, we note that we can compare any u with all values h_ of H in O(n) time. For example, for determining all i and j such that h_ > u, we have the following function: function greater_h(kernel h, int p, int q, real u): // h is the kernel function, h(i,j) gives the ith, jth entry of H // p and q are the number of rows and columns of the kernel matrix H // vector of size p P := vector(p) // indexing from zero j := 0 // starting from the bottom, compute the supremum">least upper bound In mathematics, the infimum (abbreviated inf; plural infima) of a subset S of a partially ordered set P is a greatest element in P that is less than or equal to each element of S, if such an element exists. Consequently, the term ''greatest low ...
    for each row for i := p - 1, p - 2, ..., 1, 0: // search this row until we find a value less than u while j < q and h(i, j) > u: j := j + 1 endwhile // the entry preceding the one we just found is greater than u P := j - 1 endfor return P endfunction This greater_h function is traversing the kernel matrix from the bottom left to the top right, and returns a vector P of indices that indicate for each row where the boundary lies between values greater than u and those less than or equal to u. This method works because of the row-column sorted property of H = (h_). Since greater_h computes at most p + q values of h_, its complexity is O(n). Conceptually, the resulting P vector can be visualised as establishing a boundary on the matrix as suggested by the following diagram, where the red entries are all larger than u: :: The symmetric algorithm for computing the values of h_ less than u is very similar. It instead proceeds along H in the opposite direction, from the top right to the bottom left: function less_h(kernel h, int p, int q, real u): // vector of size p Q := vector(p) // last possible row index j := q - 1 // starting from the top, compute the
    greatest lower bound In mathematics, the infimum (abbreviated inf; plural infima) of a subset S of a partially ordered set P is a greatest element in P that is less than or equal to each element of S, if such an element exists. Consequently, the term ''greatest l ...
    for each row for i := 0, 1, ..., p - 2, p - 1: // search this row until we find a value greater than u while j >= 0 and h(i, j) < u: j := j - 1 endwhile // the entry following the one we just found is less than u Q := j + 1 endfor return Q endfunction
    This lower boundary can be visualised like so, where the blue entries are smaller than u: :: For each i, we have that P_i \geq Q_i, with strict inequality occurring only for those rows that have values equal to u. We also have that the sums : \sum_^ (P_i + 1) ~\qquad~ \sum_^ Q_i give, respectively, the number of elements of H that are greater than u, and the number of elements that are greater than or equal to u. Thus this method also yields the
    rank Rank is the relative position, value, worth, complexity, power, importance, authority, level, etc. of a person or object within a ranking, such as: Level or position in a hierarchical organization * Academic rank * Diplomatic rank * Hierarchy * H ...
    of u within the elements h_ of H.


    Weighted median of row medians

    The second observation is that we can use the sorted matrix structure to instantly compare any element to at least half of the entries in the matrix. For example, the median of the row medians across the entire matrix is less than the upper left quadrant in red, but greater than the lower right quadrant in blue: :: More generally, using the boundaries given by the P and Q vectors from the previous section, we can assume that after some iterations, we have pinpointed the position of the medcouple to lie between the red left boundary and the blue right boundary: :: The yellow entries indicate the median of each row. If we mentally re-arrange the rows so that the medians align and ignore the discarded entries outside the boundaries, :: we can select a
    weighted median In statistics, a weighted median of a sample is the 50% weighted percentile. It was first proposed by F. Y. Edgeworth in 1888. Like the median, it is useful as an estimator of central tendency, robust against outliers. It allows for non-uniform ...
    of these medians, each entry weighted by the number of remaining entries on this row. This ensures that we can discard at least 1/4 of all remaining values no matter if we have to discard the larger values in red or the smaller values in blue: :: Each row median can be computed in O(1) time, since the rows are sorted, and the
    weighted median In statistics, a weighted median of a sample is the 50% weighted percentile. It was first proposed by F. Y. Edgeworth in 1888. Like the median, it is useful as an estimator of central tendency, robust against outliers. It allows for non-uniform ...
    can be computed in O(n) time, using a binary search.


    Kth pair algorithm

    Putting together these two observations, the fast medcouple algorithm proceeds broadly as follows. # Compute the necessary ingredients for the medcouple kernel function h(i,j) with p sorted rows and q sorted columns. # At each iteration, approximate the medcouple with the
    weighted median In statistics, a weighted median of a sample is the 50% weighted percentile. It was first proposed by F. Y. Edgeworth in 1888. Like the median, it is useful as an estimator of central tendency, robust against outliers. It allows for non-uniform ...
    of the row medians. # Compare this tentative guess to the entire matrix obtaining right and left boundary vectors P and Q respectively. The sum of these vectors also gives us the
    rank Rank is the relative position, value, worth, complexity, power, importance, authority, level, etc. of a person or object within a ranking, such as: Level or position in a hierarchical organization * Academic rank * Diplomatic rank * Hierarchy * H ...
    of this tentative medcouple. ## If the rank of the tentative medcouple is exactly pq/2, then stop. We have found the medcouple. ## Otherwise, discard the entries greater than or less than the tentative guess by picking either P or Q as the new right or left boundary, depending on which side the element of rank pq/2 is in. This step always discards at least 1/4 of all remaining entries. # Once the number of candidate medcouples between the right and left boundaries is less than or equal to p, perform a rank selection amongst the remaining entries, such that the rank within this smaller candidate set corresponds to the pq/2 rank of the medcouple within the whole matrix. The initial sorting in order to form the h(i,j) function takes O(n \log n) time. At each iteration, the weighted median takes O(n) time, as well as the computations of the new tentative P and Q left and right boundaries. Since each iteration discards at least 1/4 of all remaining entries, there will be at most O(\log n) iterations. Thus, the whole fast algorithm takes O(n \log n) time. Let us restate the fast algorithm in more detail. function medcouple(vector X): ''// X is a vector of size n'' ''// Compute initial ingredients as for the naïve medcouple'' sort_decreasing(X) xm := median(X) xscale := 2 * max(abs(X)) Zplus := x in X such that x >= xm Zminus := x in X such that x <= xm p := size(Zplus) q := size(Zminus) function h(i, j): a := Zplus b := Zminus if a

    b: return signum(p - 1 - i - j) else: return (a + b) / (a - b) endif endfunction ''// Begin Kth pair algorithm (Johnson & Mizoguchi)'' ''// The initial left and right boundaries, two vectors of size p'' L := , 0, ..., 0 R := - 1, q - 1, ..., q - 1 ''// number of entries to the left of the left boundary'' Ltotal := 0 ''// number of entries to the left of the right boundary'' Rtotal := p*q ''// Since we are indexing from zero, the medcouple index is one'' ''// less than its rank.'' medcouple_index :=
    floor A floor is the bottom surface of a room or vehicle. Floors vary from simple dirt in a cave to many layered surfaces made with modern technology. Floors may be stone, wood, bamboo, metal or any other material that can support the expected load ...
    (Rtotal / 2) ''// Iterate while the number of entries between the boundaries is'' ''// greater than the number of rows in the matrix.'' while Rtotal - Ltotal > p: ''// Compute row medians and their associated weights, but skip'' ''// any rows that are already empty.'' middle_idx := i in <=_R[i.html" ;"title=", 1, ..., p - 1such that L <= R[i">, 1, ..., p - 1such that L <= R[i row_medians := [h(i,
    floor A floor is the bottom surface of a room or vehicle. Floors vary from simple dirt in a cave to many layered surfaces made with modern technology. Floors may be stone, wood, bamboo, metal or any other material that can support the expected load ...
    ((L + R[i])/2) , i in middle_idx] weights := [R - L + 1 , i in middle_idx] WM :=
    weighted median In statistics, a weighted median of a sample is the 50% weighted percentile. It was first proposed by F. Y. Edgeworth in 1888. Like the median, it is useful as an estimator of central tendency, robust against outliers. It allows for non-uniform ...
    (row_medians, weights) ''// New tentative right and left boundaries'' P := Medcouple#Comparing a value against the kernel matrix, greater_h(h, p, q, WM) Q := less_h(h, p, q, WM) Ptotal := sum(P) + size(P) Qtotal := sum(Q) ''// Determine which entries to discard, or if we've found the medcouple'' if medcouple_index <= Ptotal - 1: R := P Rtotal := Ptotal else: if medcouple_index > Qtotal - 1: L := Q Ltotal := Qtotal else: // Found the medcouple, rank of the weighted median equals medcouple index return WM endif endif endwhile // Did not find the medcouple, but there are very few tentative entries remaining remaining := i in , 1, ..., p - 1 j in [L[i L + 1, ..., R[i such that L <= R ] ''// Select the medcouple by rank amongst the remaining entries'' medcouple := select_nth(remaining, medcouple_index - Ltotal) return medcouple endfunction In real-world use, the algorithm also needs to account for errors arising from finite-precision
    floating point arithmetic In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can be r ...
    . For example, the comparisons for the medcouple kernel function should be done within
    machine epsilon Machine epsilon or machine precision is an upper bound on the relative approximation error due to rounding in floating point arithmetic. This value characterizes computer arithmetic in the field of numerical analysis, and by extension in the subjec ...
    , as well as the order comparisons in the greater_h and less_h functions.


    Software/source code

    * The fast medcouple algorithm is implemented in R'
    robustbase package.
    * The fast medcouple algorithm is implemented in a C extension for Python in th
    Robustats Python package
    * A GPL'ed
    C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
    implementation of th
    fast algorithm
    derived from the R implementation. * A
    Stata Stata (, , alternatively , occasionally stylized as STATA) is a general-purpose statistical software package developed by StataCorp for data manipulation, visualization, statistics, and automated reporting. It is used by researchers in many fie ...
    implementation of th
    fast algorithm
    * An implementation o
    the naïve algorithm
    in
    Matlab MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementation ...
    (and hence
    GNU Octave GNU Octave is a high-level programming language primarily intended for scientific computing and numerical computation. Octave helps in solving linear and nonlinear problems numerically, and for performing other numerical experiments using a langu ...
    ). * The naïve algorithm is also implemented for the
    Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
    packag
    statsmodels


    See also

    *
    Robust statistic Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal. Robust statistical methods have been developed for many common problems, such ...
    *
    Skewness In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined. For a unimodal d ...
    * Adjusted boxplots


    References

    {{reflist, refs= {{cite journal , first1=G., last1=Brys , first2=M., last2=Hubert, author2-link=Mia Hubert , first3=A., last3=Struyf , date=November 2004 , title=A robust measure of skewness , journal=
    Journal of Computational and Graphical Statistics The ''Journal of Computational and Graphical Statistics'' is a quarterly peer-reviewed scientific journal published by Taylor & Francis on behalf of the American Statistical Association. Established in 1992, the journal covers the use of computat ...
    , volume=13 , issue=4 , pages=996–1017 , doi=10.1198/106186004X12632 , mr=2425170
    {{cite web , url = http://exploringdatablog.blogspot.ca/2011/02/boxplots-and-beyond-part-ii-asymmetry.html , title = Boxplots and Beyond – Part II: Asymmetry , first = Ron , last = Pearson , date = February 6, 2011 , accessdate = April 6, 2015 , website = ExploringDataBlog {{cite journal , first1=M., last1=Hubert , first2=E., last2=Vandervieren , date=2008 , title=An adjusted boxplot for skewed distributions , journal= Computational Statistics and Data Analysis , volume=52 , issue=12 , pages=5186–5201 , doi=10.1016/j.csda.2007.11.008 , mr=2526585 {{cite journal , first1=Donald B., last1=Johnson, author1-link=Donald B. Johnson , first2=Tetsuo, last2=Mizoguchi , date=May 1978 , title=Selecting the {{mvar, Kth element in {{math, ''X'' + ''Y'' and {{math, ''X''1 + ''X''2 +...+ ''X''''m'' , journal=
    SIAM Journal on Computing The ''SIAM Journal on Computing'' is a scientific journal focusing on the mathematical and formal aspects of computer science. It is published by the Society for Industrial and Applied Mathematics (SIAM). Although its official ISO abbreviation is ...
    , volume=7 , issue=2 , pages=147–153 , doi=10.1137/0207013 , mr=0502214
    Robust statistics Nonparametric statistics Statistical deviation and dispersion Statistical outliers