HOME

TheInfoList



OR:

In the field of
streaming algorithm In computer science, streaming algorithms are algorithms for processing data streams in which the input is presented as a sequence of items and can be examined in only a few passes, typically one-pass algorithm, just one. These algorithms are desi ...
s, Misra–Gries summaries are used to solve the frequent elements problem in the data stream model. That is, given a long stream of input that can only be examined once (and in some arbitrary order), the Misra-Gries algorithm can be used to compute which (if any) value makes up a majority of the stream, or more generally, the set of items that constitute some fixed fraction of the stream. The term "summary" is due to Graham Cormode.Graham Cormode, Misra-Gries Summaries
/ref> The algorithm was presented by Misra and Gries alongside a different algorithm for finding frequent elements, the
Misra–Gries heavy hitters algorithm Misra and Gries defined the ''heavy-hitters problem'' (though they did not introduce the term ''heavy-hitters'') and described the first algorithm for it in the paper ''Finding repeated elements''. Their algorithm extends the Boyer-Moore majorit ...
.


The Misra–Gries summary

As for all algorithms in the data stream model, the input is a finite
sequence In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is cal ...
of
integer An integer is the number zero (0), a positive natural number (1, 2, 3, ...), or the negation of a positive natural number (−1, −2, −3, ...). The negations or additive inverses of the positive natural numbers are referred to as negative in ...
s from a finite domain. The algorithm outputs an
associative array In computer science, an associative array, key-value store, map, symbol table, or dictionary is an abstract data type that stores a collection of (key, value) pairs, such that each possible key appears at most once in the collection. In math ...
which has values from the stream as keys, and estimates of their frequency as the corresponding values. It takes a parameter which determines the size of the array, which impacts both the quality of the estimates and the amount of memory used. algorithm misra-gries: input: A positive integer A finite sequence taking values in the range 1,2,..., output: An associative array with frequency estimates for each item in := new (empty) associative array while is not empty: take a value from if is in keys(): [] := [i] + 1 else if , keys(), < - 1: [] := 1 else: for each in keys(): [] := [] - 1 if [] = 0: remove from keys() return


Properties

The Misra–Gries algorithm uses O((log()+log()))
space Space is a three-dimensional continuum containing positions and directions. In classical physics, physical space is often conceived in three linear dimensions. Modern physicists usually consider it, with time, to be part of a boundless ...
, where is the number of distinct values in the stream and is the length of the stream. The factor accounts for the number of entries that are kept in the associative array . Each entry consists of a value and an associated counter . The counter can, in principle, take any value in , which requires ⌈log(+1)⌉ bits to store. Assuming that the values are integers in , storing them requires ⌈log()⌉ bits. Every item which occurs more than / times is guaranteed to appear in the output array. Therefore, in a second pass over the data, the exact frequencies for the −1 items can be computed to solve the frequent items problem, or in the case of =2, the majority problem. With the same arguments as above, this second pass also takes O((log()+log())) space. The summaries (arrays) output by the algorithm are ''mergeable'', in the sense that combining summaries of two streams and by adding their arrays keywise and then decrementing each counter in the resulting array until only keys remain results in a summary of the same (or better) quality as compared to running the Misra-Gries algorithm over the
concatenation In formal language theory and computer programming, string concatenation is the operation of joining character strings end-to-end. For example, the concatenation of "snow" and "ball" is "snowball". In certain formalizations of concatenati ...
of with .


Example

Let k=2 and the data stream be 1,4,5,4,4,5,4,4 (n=8,m=5). Note that 4 is appearing 5 times in the data stream which is more than n/k=4 times and thus should appear as the output of the algorithm. Since k=2 and , keys(A), =k−1=1 the algorithm can only have one key with its corresponding value. The algorithm will then execute as follows(- signifies that no key is present): Output: 4


References

{{DEFAULTSORT:Misra-Gries summary Streaming algorithms