In the theory of

formal language In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules. The alphabet of a formal language consists of sy ...

s, the Myhill–Nerode theorem provides a

necessary and sufficient condition In logic and mathematics, necessity and sufficiency are terms used to describe a conditional or implicational relationship between two statements. For example, in the conditional statement: "If then ", is necessary for , because the truth of ...

for a language to be regular. The theorem is named for

John Myhill John R. Myhill Sr. (11 August 1923 – 15 February 1987) was a British mathematician. Education Myhill received his Ph.D. from Harvard University under Willard Van Orman Quine in 1949. He was professor at SUNY Buffalo from 1966 until his death ...

and

Anil Nerode Anil Nerode (born 1932) is an American mathematician. He received his undergraduate education and a Ph.D. in mathematics from the University of Chicago, the latter under the directions of Saunders Mac Lane. He enrolled in the Hutchins College at t ...

, who proved it at the

University of Chicago The University of Chicago (UChicago, Chicago, U of C, or UChi) is a private research university in Chicago, Illinois. Its main campus is located in Chicago's Hyde Park neighborhood. The University of Chicago is consistently ranked among the b ...

in 1958 .

Statement

Given a language

L

, and a pair of strings

x

and

y

, define a distinguishing extension to be a string

z

such that exactly one of the two strings

xz

and

yz

belongs to

L

. Define a relation

_L

on strings as

x\; _L\ y

iff In logic and related fields such as mathematics and philosophy, "if and only if" (shortened as "iff") is a biconditional logical connective between statements, where either both statements are true or both are false. The connective is bicon ...

there is no distinguishing extension for

x

and

y

. It is easy to show that

_L

is an equivalence relation on strings, and thus it divides the set of all strings into

equivalence class In mathematics, when the elements of some set S have a notion of equivalence (formalized as an equivalence relation), then one may naturally split the set S into equivalence classes. These equivalence classes are constructed so that elements a ...

es. The Myhill–Nerode theorem states that a language

L

is regular if and only if

_L

has a finite number of equivalence classes, and moreover, that this number is equal to the number of states in the minimal

deterministic finite automaton In the theory of computation, a branch of theoretical computer science, a deterministic finite automaton (DFA)—also known as deterministic finite acceptor (DFA), deterministic finite-state machine (DFSM), or deterministic finite-state autom ...

(DFA) recognizing

L

. In particular, this implies that there is a ''unique'' minimal DFA for each regular language . Some authors refer to the

_L

relation as Nerode congruence, in honor of

Proof

L

is a regular language, then by definition there is a DFA

A

that recognizes it, with only finitely many states. If there are

n

states, then partition the set of all finite strings into

n

subsets, where subset

S_i

is the set of strings that, when given as input to automaton

A

, cause it to end in state

i

. For every two strings

x

and

y

that belong to the same subset, and for every choice of a third string

z

, the automaton

A

reaches the same state on input

xz

as it reaches on input

yz

, and therefore must either accept both of the inputs

xz

and

yz

or reject both of them. Therefore, no string

z

can be a distinguishing extension for

x

and

y

, so they must be related by

_L

. Thus,

S_i

is a subset of an equivalence class of

_L

. Combining this fact with the fact that every member of one of these equivalence classes belongs to one of the sets

S_i

, this gives a surjective function from states of

A

to equivalence classes, implying that the number of equivalence classes is finite and at most

n

. In the other direction, suppose that

_L

has finitely many equivalence classes. In this case, it is possible to design a deterministic finite automaton that has one state for each equivalence class. The start state of the automaton corresponds to the equivalence class containing the empty string, and the transition function from a state

X

on input symbol

a

takes the automaton to a new state, the state corresponding to the equivalence class containing string

xa

, where

x

is an arbitrarily chosen string in the equivalence class corresponding to

X

. The definition of the Myhill–Nerode relation implies that the transition function is well-defined: no matter which representative string

x

is chosen for state

X

, the same transition function value will result. A state of this automaton is accepting if the corresponding equivalence class contains a string in

L

; in this case, again, the definition of the relation implies that all strings in the same equivalence class must also belong to

L

, for otherwise the empty string would be a distinguishing string for some pairs of strings in the class. Thus, the existence of a finite automaton recognizing

L

implies that the Myhill–Nerode relation has a finite number of equivalence classes, at most equal to the number of states of the automaton, and the existence of a finite number of equivalence classes implies the existence of an automaton with that many states.

Use and consequences

The Myhill–Nerode theorem may be used to show that a language

L

is regular by proving that the number of equivalence classes of

_L

is finite. This may be done by an exhaustive case analysis in which, beginning from the

empty string In formal language theory, the empty string, or empty word, is the unique string of length zero. Formal theory Formally, a string is a finite, ordered sequence of characters such as letters, digits or spaces. The empty string is the special cas ...

, distinguishing extensions are used to find additional equivalence classes until no more can be found. For example, the language consisting of binary representations of numbers that can be divided by 3 is regular. Given the empty string,

00

(or

11

01

, and

10

are distinguishing extensions resulting in the three classes (corresponding to numbers that give remainders 0, 1 and 2 when divided by 3), but after this step there is no distinguishing extension anymore. The minimal automaton accepting our language would have three states corresponding to these three equivalence classes. Another immediate corollary of the theorem is that if for a language

L

the relation

_L

has infinitely many equivalence classes, it is regular. It is this corollary that is frequently used to prove that a language is not regular.

Generalizations

The Myhill–Nerode theorem can be generalized to

tree automata A tree automaton is a type of state machine. Tree automata deal with tree structures, rather than the strings of more conventional state machines. The following article deals with branching tree automata, which correspond to regular languages ...

References

*. *. *.

Statement

Proof

Use and consequences

Generalizations

See also

References

Further reading