In the

theory of computation In theoretical computer science and mathematics, the theory of computation is the branch that deals with what problems can be solved on a model of computation, using an algorithm, how efficiently they can be solved or to what degree (e.g., a ...

and

automata theory Automata theory is the study of abstract machines and automata, as well as the computational problems that can be solved using them. It is a theory in theoretical computer science. The word ''automata'' comes from the Greek word αὐτόματο� ...

, the powerset construction or subset construction is a standard method for converting a

nondeterministic finite automaton In automata theory, a finite-state machine is called a deterministic finite automaton (DFA), if * each of its transitions is ''uniquely'' determined by its source state and input symbol, and * reading an input symbol is required for each state tr ...

(NFA) into a

deterministic finite automaton In the theory of computation, a branch of theoretical computer science, a deterministic finite automaton (DFA)—also known as deterministic finite acceptor (DFA), deterministic finite-state machine (DFSM), or deterministic finite-state automa ...

(DFA) which recognizes the same

formal language In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules. The alphabet of a formal language consists of symb ...

. It is important in theory because it establishes that NFAs, despite their additional flexibility, are unable to recognize any language that cannot be recognized by some DFA. It is also important in practice for converting easier-to-construct NFAs into more efficiently executable DFAs. However, if the NFA has ''n'' states, the resulting DFA may have up to 2^''n'' states, an exponentially larger number, which sometimes makes the construction impractical for large NFAs. The construction, sometimes called the Rabin–Scott powerset construction (or subset construction) to distinguish it from similar constructions for other types of automata, was first published by

Michael O. Rabin Michael Oser Rabin ( he, מִיכָאֵל עוזר רַבִּין; born September 1, 1931) is an Israeli mathematician and computer scientist and a recipient of the Turing Award. Biography Early life and education Rabin was born in 1931 in ...

and

Dana Scott Dana Stewart Scott (born October 11, 1932) is an American logician who is the emeritus Hillman University Professor of Computer Science, Philosophy, and Mathematical Logic at Carnegie Mellon University; he is now retired and lives in Berkeley, Ca ...

in 1959.

Intuition

To simulate the operation of a DFA on a given input string, one needs to keep track of a single state at any time: the state that the automaton will reach after seeing a

prefix A prefix is an affix which is placed before the Word stem, stem of a word. Adding it to the beginning of one word changes it into another word. For example, when the prefix ''un-'' is added to the word ''happy'', it creates the word ''unhappy'' ...

of the input. In contrast, to simulate an NFA, one needs to keep track of a set of states: all of the states that the automaton could reach after seeing the same prefix of the input, according to the nondeterministic choices made by the automaton. If, after a certain prefix of the input, a set of states can be reached, then after the next input symbol the set of reachable states is a deterministic function of and . Therefore, the sets of reachable NFA states play the same role in the NFA simulation as single DFA states play in the DFA simulation, and in fact the sets of NFA states appearing in this simulation may be re-interpreted as being states of a DFA.

Construction

The powerset construction applies most directly to an NFA that does not allow state transformations without consuming input symbols (aka: "ε-moves"). Such an automaton may be defined as a 5-tuple , in which is the set of states, is the set of input symbols, is the transition function (mapping a state and an input symbol to a set of states), is the initial state, and is the set of accepting states. The corresponding DFA has states corresponding to subsets of . The initial state of the DFA is , the (one-element) set of initial states. The transition function of the DFA maps a state (representing a subset of ) and an input symbol to the set , the set of all states that can be reached by an -transition from a state in . A state of the DFA is an accepting state if and only if at least one member of is an accepting state of the NFA. In the simplest version of the powerset construction, the set of all states of the DFA is the

powerset In mathematics, the power set (or powerset) of a set is the set of all subsets of , including the empty set and itself. In axiomatic set theory (as developed, for example, in the ZFC axioms), the existence of the power set of any set is postu ...

of , the set of all possible subsets of . However, many states of the resulting DFA may be useless as they may be unreachable from the initial state. An alternative version of the construction creates only the states that are actually reachable.

NFA with ε-moves

For an NFA with ε-moves (also called an ε-NFA), the construction must be modified to deal with these by computing the ''ε- closure'' of states: the set of all states reachable from some given state using only ε-moves. Van Noord recognizes three possible ways of incorporating this closure computation in the powerset construction: # Compute the ε-closure of the entire automaton as a preprocessing step, producing an equivalent NFA without ε-moves, then apply the regular powerset construction. This version, also discussed by Hopcroft and Ullman, is straightforward to implement, but impractical for automata with large numbers of ε-moves, as commonly arise in

natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to pro ...

application. # During the powerset computation, compute the ε-closure

\

of each state that is considered by the algorithm (and cache the result). # During the powerset computation, compute the ε-closure

\

of each subset of states that is considered by the algorithm, and add its elements to .

Multiple initial states

If NFAs are defined to allow for multiple initial states, the initial state of the corresponding DFA is the set of all initial states of the NFA, or (if the NFA also has ε-moves) the set of all states reachable from initial states by ε-moves.

Example

The NFA below has four states; state 1 is initial, and states 3 and 4 are accepting. Its alphabet consists of the two symbols 0 and 1, and it has ε-moves. NFA-powerset-construction-example

The initial state of the DFA constructed from this NFA is the set of all NFA states that are reachable from state 1 by ε-moves; that is, it is the set . A transition from by input symbol 0 must follow either the arrow from state 1 to state 2, or the arrow from state 3 to state 4. Additionally, neither state 2 nor state 4 have outgoing ε-moves. Therefore, (,0) = , and by the same reasoning the full DFA constructed from the NFA is as shown below. DFA-powerset-construction-example

As can be seen in this example, there are five states reachable from the start state of the DFA; the remaining 11 sets in the powerset of the set of NFA states are not reachable.

Complexity

Because the DFA states consist of sets of NFA states, an -state NFA may be converted to a DFA with at most states. For every , there exist -state NFAs such that every subset of states is reachable from the initial subset, so that the converted DFA has exactly states, giving Θ()

worst-case In computer science, best, worst, and average cases of a given algorithm express what the resource usage is ''at least'', ''at most'' and ''on average'', respectively. Usually the resource being considered is running time, i.e. time complexity, b ...

time complexity.. A simple example requiring nearly this many states is the language of strings over the alphabet in which there are at least characters, the th from last of which is 1. It can be represented by an -state NFA, but it requires DFA states, one for each -character suffix of the input; cf. picture for .

Applications

Brzozowski's algorithm for DFA minimization uses the powerset construction, twice. It converts the input DFA into an NFA for the reverse language, by reversing all its arrows and exchanging the roles of initial and accepting states, converts the NFA back into a DFA using the powerset construction, and then repeats its process. Its worst-case complexity is exponential, unlike some other known DFA minimization algorithms, but in many examples it performs more quickly than its worst-case complexity would suggest. Safra's construction, which converts a non-deterministic

Büchi automaton In computer science and automata theory, a deterministic Büchi automaton is a theoretical machine which either accepts or rejects infinite inputs. Such a machine has a set of states and a transition function, which determines which state the machi ...

with states into a deterministic

Muller automaton In automata theory, a Muller automaton is a type of an ω-automaton. The acceptance condition separates a Muller automaton from other ω-automata. The Muller automaton is defined using a Muller acceptance condition, i.e. the set of all states visi ...

or into a deterministic

Rabin automaton Rabin is a Hebrew surname. It originates from the Hebrew word ''rav'' meaning Rabbi, or from the name of the specific Rabbi Abin. The most well known bearer of the name was Yitzhak Rabin, prime minister of Israel and Nobel Peace prize Laureate. ...

with 2^{O( log )} states, uses the powerset construction as part of its machinery..