Sudoku codes are non-linear forward error correcting codes following rules of

sudoku Sudoku (; ; originally called Number Place) is a logic puzzle, logic-based, combinatorics, combinatorial number-placement puzzle. In classic Sudoku, the objective is to fill a 9 × 9 grid with digits so that each column, each row, and ...

puzzles designed for an

erasure channel In coding theory and information theory, a binary erasure channel (BEC) is a communications channel model. A transmitter sends a bit (a zero or a one), and the receiver either receives the bit correctly, or with some probability P_e receives a me ...

. Based on this model, the transmitter sends a sequence of all symbols of a solved sudoku. The receiver either receives a symbol correctly or an erasure symbol to indicate that the symbol was not received. The decoder gets a matrix with missing entries and uses the constraints of sudoku puzzles to reconstruct a limited amount of erased symbols. Sudoku codes are not suitable for practical usage but are subject of research. Questions like the rate and error performance are still unknown for general dimensions. In a sudoku one can find missing information by using different techniques to reproduce the full puzzle. This method can be seen as decoding a sudoku coded message that is sent over an erasure channel where some symbols got erased. By using the sudoku rules the decoder can recover the missing information. Sudokus can be modeled as a

probabilistic graphical model A graphical model or probabilistic graphical model (PGM) or structured probabilistic model is a probabilistic model for which a Graph (discrete mathematics), graph expresses the conditional dependence structure between random variables. Graphica ...

and thus methods from decoding

low-density parity-check code Low-density parity-check (LDPC) codes are a class of error correction codes which (together with the closely-related turbo codes) have gained prominence in coding theory and information theory since the late 1990s. The codes today are widely us ...

s like

belief propagation Belief propagation, also known as sum–product message passing, is a message-passing algorithm for performing inference on graphical models, such as Bayesian networks and Markov random fields. It calculates the marginal distribution for ea ...

can be used.

Erasure channel model

In the erasure channel model a symbol gets either transmitted correctly with probability

1-p_e

or is erased with probability

p_e

(see Figure \ref). The channel introduces no errors, i.e. no channel input is changed to another symbol. The example in Figure \ref shows the transmission of a

3 \times 3

Sudoku code. 5 of the 9 symbols got erased by the channel. The decoder is still able to reconstruct the message, i.e. the whole puzzle. Sudoku3x3BSC

Note that the symbols sent over the channel are not binary. For a binary channel the symbols (e.g. integers

\

) have to be mapped onto base 2. The

binary erasure channel In coding theory and information theory, a binary erasure channel (BEC) is a communications channel model. A transmitter sends a bit (a zero or a one), and the receiver either receives the bit correctly, or with some probability P_e receives a me ...

model however is not applicable because it erases only individual bits with some probability and not Sudoku symbols. If the symbols of the Sudoku are sent in packets the channel can be described as a packet erasure channel model.

Puzzle description

A sudoku is a

N \times N

number-placement puzzle. It is filled in a way, that in each column, row and sub-grid N distinct symbols occur exactly once. Typical alphabet is the set of the integers

\

. The size of the sub-grids limit the size of SUDOKUs to

N=n^2

with

n \in \Nu

. Every solved sudoku and every sub-grid of it is a

Latin square Latin ( or ) is a classical language belonging to the Italic branch of the Indo-European languages. Latin was originally spoken by the Latins in Latium (now known as Lazio), the lower Tiber area around Rome, Italy. Through the expansion o ...

, meaning every symbol occurs exactly once in each row and column. At the starting point (in this case after the erasure channel) the puzzle is only partially complete but has only one unique solution. For channel codes also other varieties of sudokus are conceivable. Diagonal regions instead of square sub-grids can be used for performance investigations. The diagonal sudoku has the advantage, that its size can be chosen more freely. Due to the sub-grid structure normal sudokus can only be of size n², diagonal sudokus have valid solutions for all odd

N

. Sudoku codes are non-linear. In a

linear code In coding theory, a linear code is an error-correcting code for which any linear combination of Code word (communication), codewords is also a codeword. Linear codes are traditionally partitioned into block codes and convolutional codes, although t ...

any linear combination of codewords give a new valid codeword, this does not hold for sudoku codes. The symbols of a sudoku are from a finite alphabet (e.g. integers

\

). The constraints of Sudoku codes are non-linear: all symbols within a constraint (row, line, sub-grid) must be different from any other symbol within this constraint. Hence there is no all-zero codeword in Sudoku codes. Sudoku codes can be represented by probabilistic graphical model in which they take the form of a

Decoding with belief propagation

There are several possible decoding methods for sudoku codes. Some algorithms are very specific developments for Sudoku codes. Several methods are described in

sudoku solving algorithms A standard Sudoku contains 81 cells, in a 9×9 grid, and has 9 boxes, each box being the intersection of the first, middle, or last 3 rows, and the first, middle, or last 3 columns. Each cell may contain a number from one to nine, and each number ...

. Another efficient method is with dancing links. Decoding methods like

are also used for

s are of special interest. Performance analysis of these methods on sudoku codes can help to understand decoding problems for low-density parity-check codes better. By modeling sudoku codes as a

can be used for Sudoku codes. Belief propagation on the

tanner graph {{Short description, Bipartite graph in coding theory In coding theory, a Tanner graph is a bipartite graph that can be used to express constraints (typically equations) that specify an error correcting codes, error correcting code. Tanner graphs p ...

factor graph A factor graph is a bipartite graph representing the factorization of a function (mathematics), function. In probability theory and its applications, factor graphs are used to represent factorization of a Probability distribution function (disam ...

to decode Sudoku codes is discussed in by Sayir. and Moon This method is originally designed for low-density parity-check codes. Due to its generality belief propagation works not only with the classical

9 \times 9

Sudoku but with a variety of those.

LDPC Low-density parity-check (LDPC) codes are a class of error correction codes which (together with the closely-related turbo codes) have gained prominence in coding theory and information theory since the late 1990s. The codes today are widely ...

decoding is a common use-case for belief propagation, with slight modifications this approach can be used for solving Sudoku codes. The constraint satisfaction using a

is shown in the figure on the right.

S_n

denotes the entries of the sudoku in row-scan order.

C_m

denotes the constraint functions:

m=1,...,9

associated with rows,

m=10,...,18

associated with columns and

m=19,...,27

associated with the

3 \times 3

sub-grids of the Sudoku.

C_m

is defined as

C_m(s_1,s_2,.....,s_9) = \begin 1, & \texts_1,s_2,...,s_9\text \\ 0, & \text \end

Every cell

S_n

is connected to 3 constraints: the row, column and sub-grid constraints. A specification of the general approach for belief propagation is suggested by Sayir: The initial probability of a received symbol is either 1 to the observed symbol and 0 to all others or uniform distributed on the whole alphabet if the symbol is erased. For the belief propagation algorithm it is sufficient to transmit only a subset of possibilities instead of distributions, since the distribution is always uniform over the subset. The candidates for the erased symbols narrow down to a subset of the alphabet as symbols get excluded due to constraints. All values that are used by another cell in the constraint, and pairs that are shared among two other cells and so on are eliminated. Sudoku players use this method of logic excluding to solve most sudoku puzzles.

Encoding

The aim of error-correcting codes is to encode data in a way such that it is more resilient to errors in the transmission process. The encoder has to map data

U

to a valid sudoku grid from which the codeword

X

can we taken e.g. in row-scan order.

U = 00101 \ldots \stackrel 
	\begin
	\hline
	1 & 2 & 3 \\
	\hline
	2 & 3 & 1 \\
	\hline
	3 & 1 & 2 \\
	\hline
	\end
	\Rightarrow X = 1,2,3,1,2,1,3,1,2

shows the necessary steps. A standard

9 \times 9

sudoku has about 72.5 bits of information as calculated in the next section.

Information Information is an Abstraction, abstract concept that refers to something which has the power Communication, to inform. At the most fundamental level, it pertains to the Interpretation (philosophy), interpretation (perhaps Interpretation (log ...

after Shannon is the degree of randomness in a set of data. An ideal coin toss for example has an information of

I=\log_2 2 = 1

bit. To represent the outcome of 72 coin tosses 72 bits are necessary. One Sudoku contains therewith about the same information as 72 coin tosses or a sequence of 72 bits. A sequence of 81 random symbols

\

has

I=81 \log_2 9 \approx 256.8

bits of information. One Sudoku code can be seen as 72.5 bits of information and 184.3 bits redundancy. Theoretically a string of 72 bits can be mapped to one sudoku that is sent over the channel as a string of 81 symbols. However, there is no linear function that maps a string to a sudoku code. A suggested encoding approach by Sayir is as follows: * Start with an empty grid * Do the following for all entries sequentially * Use belief propagation to determine all valid symbols for the entry * If the cardinality of valid symbols is k>1 then convert the source randomness into a k-ary symbol and use it in the cell SudokuCode4x4Rate

For a

4 \times 4

sudoku the first entry can be filled with a source of cardinality 4. In this example this a 1. For the rest of this row, column and

2 \times 2

sub-grid this number is excluded from the possibilities in the belief propagation decoder. For the second cell only the numbers 2,3,4 are valid. The source has to be transformed into a uniform distribution between three possibilities and mapped to the valid numbers and so on, until the grid is filled.

Performance of sudoku codes

The calculation of the rate of sudoku codes is not trivial. An example rate calculation of a

4 \times 4

sudoku is shown above. Filling line by line from the top left corner only the first entry has maximum information of

\log_2 4=2

bits. Every next entry cannot be any of the numbers used before, so the information reduces to

\log_2 3

\log_2 2

and

0

for the following entries, as they have to be of the remaining left numbers. In the second lines the information is additionally reduced by the area rule: cell

5

in row-scan order can only be a

3

4

as the numbers

1

and

2

are already used in the sub-grid. The last row contains no information at all. Adding all information up one gets

\log(4*3*2^5) \approx 8.58

bits. The rate in this example is

\frac \approx 0.27

. The exact number of possible Sudoku grids according to

Mathematics of Sudoku Mathematics can be used to study Sudoku puzzles to answer questions such as "How many filled Sudoku grids are there?", "What is the minimal number of clues in a valid puzzle?" and "In what ways can Sudoku grids be symmetric?" through the use o ...

6,670,903,752,021,072,936,960

. With the total information of

\begin
 I_ &= \log_6.67*10^ \approx 22.87 \\
 I_ &= \log_6.67*10^ \approx 72.50\,\text
 \end

the average rate of a standard Sudoku is

R = I_ / 9^2 \approx 0.28

. The average number of possible entries for a cell is

\approx 1.86

\log_ \approx 0.90\,\text

of information per Sudoku cell. Note that the rate may vary between codewords. The minimum number of given entries that render a unique solution was proven to be 17. In the worst case only four missing entries can lead to ambiguous solutions. For an erasure channel it is very unlikely that 17 successful transmissions are enough to reproduce the puzzle. There are only about 50,000 known solutions with 17 given entries.

Density evolution

Density evolution is a capacity analysis algorithm originally developed for

s on belief propagation decoding. Density evolution can also be applied to Sudoku-type constraints. One important simplification used in density evolution on LDPC codes is the sufficiency to analyze only the all-one codeword. With the Sudoku constraints however, this is not a valid codeword. Unlike for linear codes the weight-distance equivalence property does not hold for non-linear codes. Therewith it is necessary to compute density evolution recursions for every possible Sudoku puzzle to get precise performance analysis. A proposed simplification is to analyze the probability distribution of the cardinalities of messages instead of the probability distribution of the message. Density evolution is calculated on the entry nodes and the constraint nodes (compare Tanner graph above). On the entry nodes one analyzes the cardinalities of the constraints. If for example the constraints have the cardinalities

(1,1)

then the entry can only be of one symbol. If the constraints have cardinalities

(2,2)

then both constraints allow two different symbols. For both constraints the correct symbol is contained for sure, lets assume the correct symbol is

1

. The other symbol can be equal or different for the constraints. If the symbols are different then the correct symbol is determined. If the second symbol is equal, lets assume

2

the output symbols are of cardinality

2

i.e. the symbols

\left\

. Depending on the alphabet size (

q

) the probability for the unique output for the input cardinalities

(2,2)

\begin
p_1^=1 - \frac
 \end

and for output of cardinality 2

\begin
	p_2^= \frac.
 \end

For a standard

9 \times 9

Sudoku this results in a probability of

7/8

for a unique solution. An analog calculation is done for all cardinality combinations. In the end the distribution of output cardinalities are summed up from the results. Note that the order of the input cardinality is interchangeable. The calculation of non-decreasing constraint combinations is therewith sufficient. For constraint nodes the procedure is somewhat similar and described in the following example based on a

4 \times 4

standard Sudoku. Inputs to the constraint nodes are the possible symbols of the connected entry nodes. Cardinality 1 means that the symbol of the source node is already determined. Again a non-decreasing analysis is sufficient. Lets assume the true output value is 4 and the inputs have cardinalities

(1,1,2)

with the true symbols 1-2-3. The messages with cardinality 1 are

\left\

and

\left\

. The message of cardinality 2 might be

\left\

\left\

\left\

as the true symbol 3 must be contained. In two of three cases the output is the correct symbol 4 with cardinality 1:

\left\

\left\

\left\

and

\left\

\left\

\left\

. In one of three case the output cardinality is 2:

\left\

\left\

\left\

. The output symbols in this case are

\left\{3,4\right\}

. The final output cardinality distribution can be expressed by summing over all possible input combinations. For a

4 \times 4

standard Sudoku these are 64 combinations that can be grouped to 20 non-decreasing ones. If the cardinality converges to 1 the decoding is error-free. To find the threshold the erasure probability must be increased until the decoding error remains positive for any number of iterations. With the method of Sayir density evolution recursions can be used to calculate thresholds also for Sudoku codes up to an alphabet size

q=8

References

Sudoku