Boyer–Moore Majority Vote Algorithm
   HOME

TheInfoList



OR:

The Boyer–Moore majority vote algorithm is an
algorithm In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...
for finding the
majority A majority is more than half of a total; however, the term is commonly used with other meanings, as explained in the "#Related terms, Related terms" section below. It is a subset of a Set (mathematics), set consisting of more than half of the se ...
of a sequence of elements using
linear time In theoretical computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations ...
and a constant number of words of memory. It is named after
Robert S. Boyer Robert Stephen Boyer is an American retired professor of computer science, mathematics, and philosophy at University of Texas at Austin, The University of Texas at Austin. He and J Strother Moore invented the Boyer–Moore string-search algorit ...
and
J Strother Moore J Strother Moore (his first name is the alphabetic character "J" – not an abbreviated "J.") is an American computer scientist. He is a co-developer of the Boyer–Moore string-search algorithm, Boyer–Moore majority vote algorithm, and the Boy ...
, who published it in 1981,. Originally published as a technical report in 1981. and is a prototypical example of a
streaming algorithm In computer science, streaming algorithms are algorithms for processing data streams in which the input is presented as a sequence of items and can be examined in only a few passes, typically one-pass algorithm, just one. These algorithms are desi ...
. In its simplest form, the algorithm finds a majority element, if there is one: that is, an element that occurs repeatedly for more than half of the elements of the input. A version of the algorithm that makes a second pass through the data can be used to verify that the element found in the first pass really is a majority. If a second pass is not performed and there is no majority, the algorithm will not detect that no majority exists. In the case that no strict majority exists, the returned element can be arbitrary; it is not guaranteed to be the element that occurs most often (the
mode Mode ( meaning "manner, tune, measure, due measure, rhythm, melody") may refer to: Arts and entertainment * MO''D''E (magazine), a defunct U.S. women's fashion magazine * ''Mode'' magazine, a fictional fashion magazine which is the setting fo ...
of the sequence). It is not possible for a streaming algorithm to find the most frequent element in less than linear space, for sequences whose number of repetitions can be small.


Description

The algorithm maintains in its
local variable In computer science, a local variable is a variable that is given ''local scope''. A local variable reference in the function or block in which it is declared overrides the same variable name in the larger scope. In programming languages with ...
s a sequence element and a counter, with the counter initially zero. It then processes the elements of the sequence, one at a time. When processing an element , if the counter is zero, the algorithm stores as its remembered sequence element and sets the counter to one. Otherwise, it compares to the stored element and either increments the counter (if they are equal) or decrements the counter (otherwise). At the end of this process, if the sequence has a majority, it will be the element stored by the algorithm. This can be expressed in
pseudocode In computer science, pseudocode is a description of the steps in an algorithm using a mix of conventions of programming languages (like assignment operator, conditional operator, loop) with informal, usually self-explanatory, notation of actio ...
as the following steps: *Initialize an element and a counter with *For each element of the input sequence: **If , then assign and **else if , then assign **else assign *Return Even when the input sequence has no majority, the algorithm will report one of the sequence elements as its result. However, it is possible to perform a second pass over the same input sequence in order to count the number of times the reported element occurs and determine whether it is actually a majority. This second pass is needed, as it is not possible for a sublinear-space algorithm to determine whether there exists a majority element in a single pass through the input.


Analysis

The amount of memory that the algorithm needs is the space for one element and one counter. In the
random access Random access (also called direct access) is the ability to access an arbitrary element of a sequence in equal time or any datum from a population of addressable elements roughly as easily and efficiently as any other, no matter how many elemen ...
model of computing usually used for the
analysis of algorithms In computer science, the analysis of algorithms is the process of finding the computational complexity of algorithms—the amount of time, storage, or other resources needed to execute them. Usually, this involves determining a function that r ...
, each of these values can be stored in a
machine word In computing, a word is any Central processing unit, processor design's natural unit of data. A word is a fixed-sized Data (computing), datum handled as a unit by the instruction set or the hardware of the processor. The number of bits or digits ...
and the total space needed is . If an array index is needed to keep track of the algorithm's position in the input sequence, it doesn't change the overall constant space bound. The algorithm's
bit complexity The bit is the most basic unit of information in computing and digital communication. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represented as ...
(the space it would need, for instance, on a
Turing machine A Turing machine is a mathematical model of computation describing an abstract machine that manipulates symbols on a strip of tape according to a table of rules. Despite the model's simplicity, it is capable of implementing any computer algori ...
) is higher, the sum of the
binary logarithm In mathematics, the binary logarithm () is the exponentiation, power to which the number must be exponentiation, raised to obtain the value . That is, for any real number , :x=\log_2 n \quad\Longleftrightarrow\quad 2^x=n. For example, th ...
s of the input length and the size of the universe from which the elements are drawn.. Both the random access model and bit complexity analyses only count the working storage of the algorithm, and not the storage for the input sequence itself. Similarly, on a random access machine, the algorithm takes time (linear time) on an input sequence of items, because it performs only a constant number of operations per input item. The algorithm can also be implemented on a Turing machine in time linear in the input length ( times the number of bits per input item).


Correctness

After processing ''n'' input elements, the input sequence can be partitioned into pairs of unequal elements, and copies of left over. This is a proof by induction; it is trivially true when , and is maintained every time an element is added: * If , add it to the set of copies of (and increment ). * If and , then remove one of the copies of from the left-over set and pair it with the final value (and decrement ). * If , then set and add to the (previously empty) set of copies of m (and set to 1). In all cases, the
loop invariant In computer science, a loop invariant is a property of a program loop that is true before (and after) each iteration. It is a logical assertion, sometimes checked with a code assertion. Knowing its invariant(s) is essential in understanding th ...
is maintained. After the entire sequence has been processed, it follows that no element can have a majority, because can equal at most one element of each unequal pair and none of the remaining copies of . Thus, if there is a majority element, it can only be .


See also

* Element distinctness problem, the problem of testing whether a collection of elements has any repeated elements *
Majority function In Boolean logic, the majority function (also called the median operator) is the Boolean function that evaluates to false when half or more arguments are false and true otherwise, i.e. the value of the function equals the value of the majority of t ...
, the majority of a collection of
Boolean value In mathematics and mathematical logic, Boolean algebra is a branch of algebra. It differs from elementary algebra in two ways. First, the values of the variables are the truth values ''true'' and ''false'', usually denoted by 1 and 0, whereas ...
s * Majority problem (cellular automaton), the problem of finding a majority element in the
cellular automaton A cellular automaton (pl. cellular automata, abbrev. CA) is a discrete model of computation studied in automata theory. Cellular automata are also called cellular spaces, tessellation automata, homogeneous structures, cellular structures, tessel ...
computational model *
Misra–Gries heavy hitters algorithm Misra and Gries defined the ''heavy-hitters problem'' (though they did not introduce the term ''heavy-hitters'') and described the first algorithm for it in the paper ''Finding repeated elements''. Their algorithm extends the Boyer-Moore majorit ...
and
Misra–Gries summary In the field of streaming algorithms, Misra–Gries summaries are used to solve the Streaming algorithm#Frequent elements, frequent elements problem in the Streaming algorithm#Data stream model, data stream model. That is, given a long stream of ...
, a natural generalization of the Boyer–Moore majority vote algorithm that stores more than one item and more than one count


References

{{DEFAULTSORT:Boyer-Moore majority vote algorithm Streaming algorithms