Misra–Gries algorithm
A bag is a like a set in which the same value may occur multiple times. Assume that a bag is available as an array of elements. In the abstract description of the algorithm, we treat and its segments also as bags. Henceforth, a ''heavy hitter'' of bag is a value that occurs more than times in it, for some integer , . A ''-reduced bag'' for bag is derived from by repeating the following operation until no longer possible: Delete distinct elements from . From its definition, a -reduced bag contains fewer than different values. The following theorem is easy to prove: Theorem 1. Each heavy-hitter of is an element of a -reduced bag for . The first pass of the heavy-hitters computation constructs a -reduced bag . The second pass declares an element of to be a heavy-hitter if it occurs more than times in . According to Theorem 1, this procedure determines all and only the heavy-hitters. The second pass is easy to program, so we describe only the first pass. In order to construct , scan the values in in arbitrary order, for specificity the following algorithm scans them in the order of increasing indices. Invariant of the algorithm is that is a -reduced bag for the scanned values and is the number of distinct values in . Initially, no value has been scanned, is the empty bag, and is zero. Whenever element is scanned, in order to preserve the invariant: (1) if is not in , add it to and increase by 1, (2) if is in , add it to but don't modify , and (3) if becomes equal to , reduce by deleting distinct values from it and update appropriately. algorithm Misra–Gries is 1 = t, d := , 0; for i from 0 to n-1 do if b t then t, d:= t ∪ , d+1 else t, d:= t ∪ endif if d = k then Delete distinct values from update endif endfor A possible implementation of is as a set of pairs of the form , ) where each is a distinct value in and is the number of occurrences of in . Then is the size of this set. The step "Delete distinct values from " amounts to reducing each by 1 and then removing any pair (, ) from the set if becomes 0. Using an AVL tree implementation of , the algorithm has a running time of . In order to assess the space requirement, assume that the elements of can have possible values, so the storage of a value needs bits. Since each counter may have a value as high as , its storage needs bits. Therefore, for value-counter pairs, the space requirement is .Summaries
In the field ofReferences
{{reflist Streaming algorithms