The Post correspondence problem is an
undecidable decision problem
In computability theory and computational complexity theory, a decision problem is a computational problem that can be posed as a yes–no question of the input values. An example of a decision problem is deciding by means of an algorithm whethe ...
that was introduced by
Emil Post
Emil Leon Post (; February 11, 1897 – April 21, 1954) was an American mathematician and logician. He is best known for his work in the field that eventually became known as computability theory.
Life
Post was born in Augustów, Suwałki Govern ...
in 1946.
Because it is simpler than the
halting problem
In computability theory, the halting problem is the problem of determining, from a description of an arbitrary computer program and an input, whether the program will finish running, or continue to run forever. Alan Turing proved in 1936 that a g ...
and the ''
Entscheidungsproblem
In mathematics and computer science, the ' (, ) is a challenge posed by David Hilbert and Wilhelm Ackermann in 1928. The problem asks for an algorithm that considers, as input, a statement and answers "Yes" or "No" according to whether the statem ...
'' it is often used in proofs of undecidability.
Definition of the problem
Let
be an alphabet with at least two symbols. The input of the problem consists of two finite lists
and
of words over
. A solution to this problem is a
sequence
In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is calle ...
of indices
with
and
for all
, such that
:
The decision problem then is to decide whether such a solution exists or not.
Alternative definition
:
:
This gives rise to an equivalent alternative definition often found in the literature, according to which any two homomorphisms
with a common domain and a common codomain form an instance of the Post correspondence problem, which now asks whether there exists a nonempty word
in the domain such that
:
.
Another definition describes this problem easily as a type of puzzle. We begin with a collection of dominos, each containing two strings, one on each side. An individual domino looks like
:
and a collection of dominos looks like
:
.
The task is to make a list of these dominos (repetition permitted) so that the string we get by reading off the symbols on the top is the same as the string of symbols on the bottom. This list is called a match. The Post Correspondence Problem is to determine whether a collection of dominos has a match.
For example, the following list is a match for this puzzle.
:
.
For some collections of dominos, finding a match may not be possible. For example, the collection
:
.
cannot contain a match because every top string is longer than the correspondending bottom string.
Example instances of the problem
Example 1
Consider the following two lists:
A solution to this problem would be the sequence (3, 2, 3, 1), because
:
Furthermore, since (3, 2, 3, 1) is a solution, so are all of its "repetitions", such as (3, 2, 3, 1, 3, 2, 3, 1), etc.; that is, when a solution exists, there are infinitely many solutions of this repetitive kind.
However, if the two lists had consisted of only
and
from those sets, then there would have been no solution (the last letter of any such α string is not the same as the letter before it, whereas β only constructs pairs of the same letter).
A convenient way to view an instance of a Post correspondence problem is as a collection of blocks of the form
there being an unlimited supply of each type of block. Thus the above example is viewed as
''i = 1''
''i = 2''
''i = 3''
where the solver has an endless supply of each of these three block types. A solution corresponds to some way of laying blocks next to each other so that the string in the top cells corresponds to the string in the bottom cells. Then the solution to the above example corresponds to:
''i
1 = 3''
''i
2 = 2''
''i
3 = 3''
''i
4 = 1''
Example 2
Again using blocks to represent an instance of the problem, the following is an example that has infinitely many solutions in addition to the kind obtained by merely "repeating" a solution.
''1''
''2''
''3''
In this instance, every sequence of the form (1, 2, 2, . . ., 2, 3) is a solution (in addition to all their repetitions):
''1''
''2''
''2''
''2''
''3''
Proof sketch of undecidability
The most common proof for the undecidability of PCP describes an instance of PCP that can simulate the computation of an arbitrary
Turing machine
A Turing machine is a mathematical model of computation describing an abstract machine that manipulates symbols on a strip of tape according to a table of rules. Despite the model's simplicity, it is capable of implementing any computer algori ...
on a particular input. A match will occur if and only if the input would be accepted by the Turing machine. Because
deciding if a Turing machine will accept an input is a basic undecidable problem, PCP cannot be decidable either. The following discussion is based on
Michael Sipser
Michael Fredric Sipser (born September 17, 1954) is an American theoretical computer scientist who has made early contributions to computational complexity theory. He is a professor of applied mathematics and was the Dean of Science at the Massac ...
's textbook ''Introduction to the Theory of Computation''.
In more detail, the idea is that the string along the top and bottom will be a
computation history
In computer science, a computation history is a sequence of steps taken by an abstract machine in the process of computing its result. Computation histories are frequently used in proofs about the capabilities of certain machines, and particularly ...
of the Turing machine's computation. This means it will list a string describing the initial state, followed by a string describing the next state, and so on until it ends with a string describing an accepting state. The state strings are separated by some separator symbol (usually written #). According to the definition of a Turing machine, the full state of the machine consists of three parts:
* The current contents of the tape.
* The current state of the
finite state machine
A finite-state machine (FSM) or finite-state automaton (FSA, plural: ''automata''), finite automaton, or simply a state machine, is a mathematical model of computation. It is an abstract machine that can be in exactly one of a finite number o ...
which operates the tape head.
* The current position of the tape head on the tape.
Although the tape has infinitely many cells, only some finite prefix of these will be non-blank. We write these down as part of our state. To describe the state of the finite control, we create new symbols, labelled ''q''
1 through ''q''
''k'', for each of the finite state machine's ''k'' states. We insert the correct symbol into the string describing the tape's contents at the position of the tape head, thereby indicating both the tape head's position and the current state of the finite control. For the alphabet , a typical state might look something like:
101101110''q''
700110.
A simple computation history would then look something like this:
''q''
0101#1''q''
401#11''q''
21#1''q''
810.
We start out with this block, where ''x'' is the input string and ''q''
0 is the start state:
The top starts out "lagging" the bottom by one state, and keeps this lag until the very end stage. Next, for each symbol ''a'' in the tape alphabet, as well as #, we have a "copy" block, which copies it unmodified from one state to the next:
We also have a block for each position transition the machine can make, showing how the tape head moves, how the finite state changes, and what happens to the surrounding symbols. For example, here the tape head is over a 0 in state 4, and then writes a 1 and moves right, changing to state 7:
Finally, when the top reaches an accepting state, the bottom needs a chance to finally catch up to complete the match. To allow this, we extend the computation so that once an accepting state is reached, each subsequent machine step will cause a symbol near the tape head to vanish, one at a time, until none remain. If ''q''
''f'' is an accepting state, we can represent this with the following transition blocks, where ''a'' is a tape alphabet symbol:
There are a number of details to work out, such as dealing with boundaries between states, making sure that our initial tile goes first in the match, and so on, but this shows the general idea of how a static tile puzzle can simulate a Turing machine computation.
The previous example
''q''
0101#1''q''
401#11''q''
21#1''q''
810.
is represented as the following solution to the Post correspondence problem:
...
Variants
Many variants of PCP have been considered. One reason is that, when one tries to prove undecidability of some new problem by reducing from PCP, it often happens that the first reduction one finds is not from PCP itself but from an apparently weaker version.
* The problem may be phrased in terms of
monoid morphism
In abstract algebra, a branch of mathematics, a monoid is a set equipped with an associative binary operation and an identity element. For example, the nonnegative integers with addition form a monoid, the identity element being 0.
Monoids ar ...
s ''f'', ''g'' from the free monoid ''B''
∗ to the free monoid ''A''
∗ where ''B'' is of size ''n''. The problem is to determine whether there is a word ''w'' in ''B''
+ such that ''f''(''w'') = ''g''(''w'').
* The condition that the alphabet
have at least two symbols is required since the problem is decidable if
has only one symbol.
* A simple variant is to fix ''n'', the number of tiles. This problem is decidable if ''n'' ≤ 2,
but remains undecidable for ''n'' ≥ 5. It is unknown whether the problem is decidable for 3 ≤ ''n'' ≤ 4.
* The ''circular'' Post correspondence problem asks whether indexes
can be found such that
and
are
conjugate words, i.e., they are equal modulo rotation. This variant is undecidable.
* One of the most important variants of PCP is the ''bounded'' Post correspondence problem, which asks if we can find a match using no more than ''k'' tiles, including repeated tiles. A brute force search solves the problem in time O(2
''k''), but this may be difficult to improve upon, since the problem is
NP-complete
In computational complexity theory, a problem is NP-complete when:
# it is a problem for which the correctness of each solution can be verified quickly (namely, in polynomial time) and a brute-force search algorithm can find a solution by tryi ...
.
Unlike some NP-complete problems like the
boolean satisfiability problem
In logic and computer science, the Boolean satisfiability problem (sometimes called propositional satisfiability problem and abbreviated SATISFIABILITY, SAT or B-SAT) is the problem of determining if there exists an interpretation that satisfie ...
, a small variation of the bounded problem was also shown to be complete for RNP, which means that it remains hard even if the inputs are chosen at random (it is hard on average over uniformly distributed inputs).
* Another variant of PCP is called the ''marked'' Post Correspondence Problem, in which each
must begin with a different symbol, and each
must also begin with a different symbol. Halava, Hirvensalo, and de Wolf showed that this variation is decidable in
exponential time
In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by ...
. Moreover, they showed that if this requirement is slightly loosened so that only one of the first two characters need to differ (the so-called 2-marked Post Correspondence Problem), the problem becomes undecidable again.
* The Post Embedding Problem is another variant where one looks for indexes
such that
is a
(scattered) subword of
. This variant is easily decidable since, when some solutions exist, in particular a length-one solution exists. More interesting is the ''Regular'' Post Embedding Problem, a further variant where one looks for solutions that belong to a given regular language (submitted, e.g., under the form of a regular expression on the set
). The Regular Post Embedding Problem is still decidable but, because of the added regular constraint, it has a very high complexity that dominates every multiply recursive function.
* The Identity Correspondence Problem (ICP) asks whether a finite set of pairs of words (over a group alphabet) can generate an identity pair by a sequence of concatenations. The problem is undecidable and equivalent to the following Group Problem: is the semigroup generated by a finite set of pairs of words (over a group alphabet) a group.
References
External links
* Eitan M. Gurari. ''An Introduction to the Theory of Computation'', Chapter 4
Post's Correspondence Problem A proof of the undecidability of PCP based on
Chomsky type-0 grammars.
* Dong, Jing
"The Analysis and Solution of a PCP Instance."2012 National Conference on Information Technology and Computer Science. The paper describes a heuristic rule for solving some specific PCP instances.
Online PHP Based PCP SolverPCP - a nice problemPCP solver in JavaPost Correspondence Problem
{{DEFAULTSORT:Post Correspondence Problem
Theory of computation
Computability theory
Undecidable problems