In
coding theory
Coding theory is the study of the properties of codes and their respective fitness for specific applications. Codes are used for data compression, cryptography, error detection and correction, data transmission and data storage. Codes are stud ...
, the Sardinas–Patterson algorithm is a classical algorithm for determining in
polynomial time
In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by ...
whether a given
variable-length code
In coding theory a variable-length code is a code which maps source symbols to a ''variable'' number of bits.
Variable-length codes can allow sources to be compressed and decompressed with ''zero'' error (lossless data compression) and still b ...
is uniquely decodable, named after August Albert Sardinas and George W. Patterson, who published it in 1953. The algorithm carries out a systematic search for a string which admits two different decompositions into codewords. As
Knuth reports, the algorithm was rediscovered about ten years later in 1963 by
Floyd, despite the fact that it was at the time already well known in coding theory.
Idea of the algorithm
Consider the code
. This code, which is based on an example by Berstel, is an example of a code which is not uniquely decodable, since the string
:011101110011
can be interpreted as the sequence of codewords
:01110 – 1110 – 011,
but also as the sequence of codewords
:011 – 1 – 011 – 10011.
Two possible decodings of this encoded string are thus given by ''cdb'' and ''babe''.
In general, a codeword can be found by the following idea: In the first round, we choose two codewords
and
such that
is a
prefix
A prefix is an affix which is placed before the stem of a word. Adding it to the beginning of one word changes it into another word. For example, when the prefix ''un-'' is added to the word ''happy'', it creates the word ''unhappy''. Particu ...
of
, that is,
for some "dangling suffix"
. If one tries first
and
, the dangling
suffix is
. If we manage to find two sequences
and
of codewords such that
, then we are finished: For then the string
can alternatively be decomposed as
, and we have found the desired string having at least two different decompositions into codewords.
In the second round, we try out two different approaches: the first trial is to look for a codeword that has ''w'' as prefix. Then we obtain a new dangling suffix ''w, with which we can continue our search. If we eventually encounter a dangling suffix that is itself a codeword (or the
empty word), then the search will terminate, as we know there exists a string with two decompositions. The second trial is to seek for a codeword that is itself a prefix of ''w''. In our example, we have
, and the sequence ''1'' is a codeword. We can thus also continue with ''w'=0'' as the new dangling suffix.
Precise description of the algorithm
The algorithm is described most conveniently using
quotients of
formal language
In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules.
The alphabet of a formal language consists of s ...
s. In general, for two sets of strings ''D'' and ''N'', the (left) quotient
is defined as the residual words obtained from ''D'' by removing some prefix in ''N''. Formally,
. Now let
denote the (finite) set of codewords in the given code.
The algorithm proceeds in rounds, where we maintain in each round not only one dangling suffix as described above, but the (finite) set of all potential dangling suffixes. Starting with round
, the set of potential dangling suffixes will be denoted by
. The sets
are defined
inductively as follows:
. Here, the symbol
denotes the
empty word.
, for all
.
The algorithm computes the sets
in increasing order of
. As soon as one of the
contains a word from ''C'' or the empty word, then the algorithm terminates and answers that the given code is not uniquely decodable. Otherwise, once a set
equals a previously encountered set
with