Median Trick
   HOME

TheInfoList



OR:

The median trick is a generic approach that increases the chances of a
probabilistic algorithm A randomized algorithm is an algorithm that employs a degree of randomness as part of its logic or procedure. The algorithm typically uses uniformly random bits as an auxiliary input to guide its behavior, in the hope of achieving good performan ...
to succeed. Apparently first used in 1986 by Jerrum et al. for
approximate counting algorithm The approximate counting algorithm allows the counting of a large number of events using a small amount of memory. Invented in 1977 by Robert Morris of Bell Labs, it uses probabilistic techniques to increment the counter. It was fully analyzed ...
s, the technique was later applied to a broad selection of
classification Classification is a process related to categorization, the process in which ideas and objects are recognized, differentiated and understood. Classification is the grouping of related facts into classes. It may also refer to: Business, organizat ...
and regression problems. The idea of median trick is very simple: run the randomized algorithm with numeric output multiple times, and use the
median In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value. The basic fe ...
of the obtained results as a final answer. For example, for sublinear in time algorithms the same algorithm can be run repeatedly (or in parallel) over random subsets of input data, and, per Chernoff inequality, the median of the results will converge to solution very fast. For the algorithms that are sublinear in space (e.g., counting the distinct elements of a stream), different randomizations of the algorithm (say, with different
hash function A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called ''hash values'', ''hash codes'', ''digests'', or simply ''hashes''. The values are usually u ...
s) should be used for repeated runs over the same data.


References


Sources

* * * {{algorithm-stub Statistical randomness