Computation of a

cyclic redundancy check A cyclic redundancy check (CRC) is an error-detecting code commonly used in digital networks and storage devices to detect accidental changes to digital data. Blocks of data entering these systems get a short ''check value'' attached, based on ...

is derived from the mathematics of polynomial division, modulo two. In practice, it resembles

long division In arithmetic, long division is a standard division algorithm suitable for dividing multi-digit Hindu-Arabic numerals (positional notation) that is simple enough to perform by hand. It breaks down a division problem into a series of easier step ...

of the

binary Binary may refer to: Science and technology Mathematics * Binary number, a representation of numbers using only two values (0 and 1) for each digit * Binary function, a function that takes two arguments * Binary operation, a mathematical op ...

message string, with a fixed number of zeroes appended, by the "generator polynomial" string except that

exclusive or Exclusive or, exclusive disjunction, exclusive alternation, logical non-equivalence, or logical inequality is a logical operator whose negation is the logical biconditional. With two inputs, XOR is true if and only if the inputs differ (on ...

operations replace subtractions. Division of this type is efficiently realised in hardware by a modified

shift register A shift register is a type of digital circuit using a cascade of flip-flop (electronics), flip-flops where the output of one flip-flop is connected to the input of the next. They share a single clock signal, which causes the data stored in the syst ...

, and in software by a series of equivalent

algorithm In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...

s, starting with simple code close to the mathematics and becoming faster (and arguably more

obfuscated Obfuscation is the obscuring of the intended meaning of communication by making the message difficult to understand, usually with confusing and ambiguous language. The obfuscation might be either unintentional or intentional (although intent u ...

) through

byte The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable un ...

-wise parallelism and

space–time tradeoff space–time trade-off, also known as time–memory trade-off or the algorithmic space-time continuum in computer science is a case where an algorithm or program trades increased space usage with decreased time. Here, ''space'' refers to the d ...

s. Various CRC standards extend the polynomial division algorithm by specifying an initial shift register value, a final Exclusive-Or step and, most critically, a bit ordering (

endianness file:Gullivers_travels.jpg, ''Gulliver's Travels'' by Jonathan Swift, the novel from which the term was coined In computing, endianness is the order in which bytes within a word (data type), word of digital data are transmitted over a data comm ...

). As a result, the code seen in practice deviates confusingly from "pure" division, and the register may shift left or right.

Example

As an example of implementing polynomial division in hardware, suppose that we are trying to compute an 8-bit CRC of an 8-bit message made of the

ASCII ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...

character "W", which is binary 01010111₂, decimal 87₁₀, or

hexadecimal Hexadecimal (also known as base-16 or simply hex) is a Numeral system#Positional systems in detail, positional numeral system that represents numbers using a radix (base) of sixteen. Unlike the decimal system representing numbers using ten symbo ...

57₁₆. For illustration, we will use the CRC-8-ATM ( HEC) polynomial

x^8+x^2+x+1

. Writing the first bit transmitted (the coefficient of the highest power of

x

) on the left, this corresponds to the 9-bit string "100000111". The byte value 57₁₆ can be transmitted in two different orders, depending on the bit ordering convention used. Each one generates a different message polynomial

M(x)

. Msbit-first, this is

x^6+x^4+x^2+x+1

= 01010111, while lsbit-first, it is

x^7+x^6+x^5+x^3+x

= 11101010. These can then be multiplied by

x^8

to produce two 16-bit message polynomials

x^8 M(x)

. Computing the remainder then consists of subtracting multiples of the generator polynomial

G(x)

. This is just like decimal long division, but even simpler because the only possible multiples at each step are 0 and 1, and the subtractions

borrow Borrow or borrowing can mean: to receive (something) from somebody temporarily, expecting to return it. *In finance, monetary debt *In linguistics, change in a language due to contact with other languages * In arithmetic, when a digit becomes less ...

"from infinity" instead of reducing the upper digits. Because we do not care about the quotient, there is no need to record it. Observe that after each subtraction, the bits are divided into three groups: at the beginning, a group which is all zero; at the end, a group which is unchanged from the original; and a blue shaded group in the middle which is "interesting". The "interesting" group is 8 bits long, matching the degree of the polynomial. Every step, the appropriate multiple of the polynomial is subtracted to make the zero group one bit longer, and the unchanged group becomes one bit shorter, until only the final remainder is left. In the msbit-first example, the remainder polynomial is

x^7+x^5+x

. Converting to a hexadecimal number using the convention that the highest power of ''x'' is the msbit; this is A2₁₆. In the lsbit-first, the remainder is

x^7+x^4+x^3

. Converting to hexadecimal using the convention that the highest power of ''x'' is the lsbit, this is 19₁₆.

Implementation

Writing out the full message at each step, as done in the example above, is very tedious. Efficient implementations use an

n

-bit

to hold only the interesting bits. Multiplying the polynomial by

x

is equivalent to shifting the register by one place, as the coefficients do not change in value but only move up to the next term of the polynomial. Here is a first draft of some

pseudocode In computer science, pseudocode is a description of the steps in an algorithm using a mix of conventions of programming languages (like assignment operator, conditional operator, loop) with informal, usually self-explanatory, notation of actio ...

for computing an ''n''-bit CRC. It uses a contrived

composite data type In computer science, a composite data type or compound data type is a data type that consists of programming language scalar data types and other composite types that may be heterogeneous and hierarchical in nature. It is sometimes called a struct ...

for polynomials, where ''x'' is not an integer variable, but a constructor generating a ''Polynomial''

object Object may refer to: General meanings * Object (philosophy), a thing, being, or concept ** Object (abstract), an object which does not exist at any particular time or place ** Physical object, an identifiable collection of matter * Goal, an a ...

that can be added, multiplied and exponentiated. To xor two polynomials is to add them, modulo two; that is, to

exclusive OR Exclusive or, exclusive disjunction, exclusive alternation, logical non-equivalence, or logical inequality is a logical operator whose negation is the logical biconditional. With two inputs, XOR is true if and only if the inputs differ (on ...

the coefficients of each matching term from both polynomials. function crc(''bit array'' bitString ..len ''int'' len) :Code fragment 1: Simple polynomial division Note that this example code avoids the need to specify a bit-ordering convention by not using bytes; the input bitString is already in the form of a bit array, and the remainderPolynomial is manipulated in terms of polynomial operations; the multiplication by

x

could be a left or right shift, and the addition of

bitString +n/code> is done to the  $x^0$  coefficient, which could be the right or left end of the register.

This code has two disadvantages.  First, it actually requires an ''n''+1-bit register to hold the remainderPolynomial so that the  $x^n$  coefficient can be tested. More significantly, it requires the bitString to be padded with ''n'' zero bits.

The first problem can be solved by testing the  $x^$  coefficient of the remainderPolynomial before it is multiplied by  $x$ .

The second problem could be solved by doing the last ''n'' iterations differently, but there is a more subtle optimization which is used universally, in both hardware and software implementations.

Because the XOR operation used to subtract the generator polynomial from the message is commutative 




In mathematics, a binary operation is commutative if changing the order of the operands does not change the result. It is a fundamental property of many binary operations, and many mathematical proofs depend on it. Perhaps most familiar as a pr ...
 and associative 





In mathematics, the associative property is a property of some binary operations that rearranging the  parentheses in an expression will not change the result. In propositional logic, associativity is a  valid  rule of replacement for  express ...
, it does not matter in what order the various inputs are combined into the remainderPolynomial.  And specifically, a given bit of the bitString does not need to be added to the remainderPolynomial until the very last instant when it is tested to determine whether to xor with the generatorPolynomial.

This eliminates the need to preload the remainderPolynomial with the first ''n'' bits of the message, as well:

 function crc(''bit array'' bitString ..len ''int'' len) 
:Code fragment 2: Polynomial division with deferred message XORing

This is the standard bit-at-a-time hardware CRC implementation, and is well worthy of study; once you understand why this computes exactly the same result as the first version, the remaining optimizations are quite straightforward.  If remainderPolynomial is only ''n'' bits long, then the  $x^n$  coefficients of it and of generatorPolynomial are simply discarded.  This is the reason that you will usually see CRC polynomials written in binary with the leading coefficient omitted.

In software, it is convenient to note that while one ''may'' delay the xor of each bit until the very last moment, it is also possible to do it earlier.  It is usually convenient to perform the xor a byte 




The byte is a  unit of digital information that most commonly consists of eight  bits. Historically, the byte was the number of bits used to encode a single  character of text in a computer and for this reason it is the smallest  addressable un ...
 at a time, even in a bit-at-a-time implementation.  Here, we take the input in 8-bit bytes:

 function crc(''byte array'' string ..len ''int'' len) 
:Code fragment 3: Polynomial division with bytewise message XORing

This is usually the most compact software implementation, used in microcontrollers 




A microcontroller (MC, uC, or μC) or microcontroller unit (MCU) is a small computer on a single integrated circuit. A microcontroller contains one or more  CPUs (processor cores) along with memory and programmable input/output peripherals. Pro ...
 when space is at a premium over speed.

  Bit ordering (endianness) 

When implemented in  bit serial  hardware, the generator polynomial uniquely describes the bit assignment; the first bit transmitted is always the coefficient of the highest power of  $x$ , and the last  $n$  bits transmitted are the CRC remainder  $R(x)$ , starting with the coefficient of  $x^$  and ending with the coefficient of  $x^0$ , a.k.a. the coefficient of 1.

However, when bits are processed a byte 




The byte is a  unit of digital information that most commonly consists of eight  bits. Historically, the byte was the number of bits used to encode a single  character of text in a computer and for this reason it is the smallest  addressable un ...
 at a time, such as when using parallel transmission 



In data transmission, parallel communication is a method of conveying multiple binary digits (bits) simultaneously using multiple conductors. This contrasts with serial communication, which conveys only a single bit at a time; this distinction i ...
, byte framing as in 8B/10B encoding 







In telecommunications, 8b/10b is a line code that maps 8-bit words to  10-bit symbols to achieve  DC balance and bounded disparity, and at the same time provide enough state changes to allow reasonable clock recovery.  This means that the di ...
 or RS-232 





In telecommunications, RS-232 or  Recommended Standard 232 is a  standard introduced in 1960 for  serial communication transmission of data. It formally defines signals connecting between a ''DTE'' (''data terminal equipment'') such as a compu ...
-style asynchronous serial communication 





Asynchronous serial communication is a form of  serial communication in which the communicating endpoints' interfaces are not continuously synchronized by a common clock signal. Synchronization ( clock recovery) is done by data-embedded signal ...
, or when implementing a CRC in software 




Software consists of computer programs that instruct the Execution (computing), execution of a computer. Software also includes design documents and specifications.



The history of software is closely tied to the development of digital comput ...
, it is necessary to specify the bit ordering (endianness) of the data; which bit in each byte is considered "first" and will be the coefficient of the higher power of  $x$ .

If the data is destined for serial communication 





In telecommunication and data transmission, serial communication is the process of sending data one  bit at a time, sequentially, over a communication channel or computer bus. This is in contrast to parallel communication, where several bits a ...
, it is best to use the bit ordering the data will ultimately be sent in.  This is because a CRC's ability to detect burst error 


In telecommunications, a burst error or error burst is a contiguous sequence of symbols, received over a communication channel, such that the first and last symbols are in error and there exists no contiguous subsequence of ''m'' correctly receiv ...
s is based on proximity in the message polynomial  $M(x)$ ; if adjacent polynomial terms are not transmitted sequentially, a physical error burst of one length may be seen as a longer burst due to the rearrangement of bits.

For example, both IEEE 802 



IEEE 802 is a family of Institute of Electrical and Electronics Engineers (IEEE) standards for local area networks (LANs), personal area networks (PANs), and metropolitan area networks (MANs). The IEEE 802 LAN/MAN Standards Committee (LMSC) main ...
 (ethernet 






Ethernet ( ) is a family of wired computer networking technologies commonly used in local area networks (LAN), metropolitan area networks (MAN) and wide area networks (WAN). It was commercially introduced in 1980 and first standardized in 198 ...
) and RS-232 





In telecommunications, RS-232 or  Recommended Standard 232 is a  standard introduced in 1960 for  serial communication transmission of data. It formally defines signals connecting between a ''DTE'' (''data terminal equipment'') such as a compu ...
 (serial port 




A serial port is a serial communication Interface (computing), interface through which information transfers in or out sequentially one bit at a time. This is in contrast to a parallel port, which communicates multiple bits simultaneously in Pa ...
) standards specify least-significant bit first (little-endian) transmission, so a software CRC implementation to protect data sent across such a link should map the least significant bits in each byte to coefficients of the highest powers of  $x$ .  On the other hand, floppy disk 





A floppy disk or floppy diskette (casually referred to as a floppy, a diskette, or a disk) is a type of disk storage composed of a thin and flexible disk of a  magnetic storage medium in a square or nearly square plastic enclosure lined with a ...
s and most hard drive 











A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating hard disk drive platter, pla ...
s write the most significant bit of each byte first.

The lsbit-first CRC is slightly simpler to implement in software, so is somewhat more commonly seen, but many programmers find the msbit-first bit ordering easier to follow.  Thus, for example, the XMODEM 


XMODEM is a simple file transfer protocol developed as a quick hack by  Ward Christensen for use in his 1977 MODEM.ASM  terminal program. It allowed users to transmit files between their computers when both sides used MODEM. Keith Petersen made a ...
-CRC extension, an early use of CRCs in software, uses an msbit-first CRC.

So far, the pseudocode has avoided specifying the ordering of bits within bytes by describing shifts in the pseudocode as multiplications by  $x$  and writing explicit conversions from binary to polynomial form.  In practice, the CRC is held in a standard binary register using a particular bit-ordering convention.  In msbit-first form, the most significant binary bits will be sent first and so contain the higher-order polynomial coefficients, while in lsbit-first form, the least-significant binary bits contain the higher-order coefficients.  The above pseudocode can be written in both forms.  For concreteness, this uses the 16-bit  CRC-16-CCITT 


The International Telecommunication Union Telecommunication Standardization Sector (ITU-T) is one of the three Sectors (branches) of the International Telecommunication Union (ITU). It is responsible for coordinating  standards for telecommunicat ...
 polynomial  $x^ + x^ + x^5 + 1$ :

 ''// Most significant bit first (big-endian)''
 ''// (x¹⁶)+x¹²+x⁵+1 = (1) 0001 0000 0010 0001 = 0x1021''
 function crc(''byte array'' string ..len ''int'' len) 
:Code fragment 4: Shift register based division, MSB first

 ''// Least significant bit first (little-endian)''
 ''// 1+x⁵+x¹²+(x¹⁶) = 1000 0100 0000 1000 (1) = 0x8408''
 function crc(''byte array'' string ..len ''int'' len) 
:Code fragment 5: Shift register based division, LSB first

Note that the lsbit-first form avoids the need to shift string /code> before the xor.  In either case, be sure to transmit the bytes of the CRC in the order that matches your chosen bit-ordering convention.

  Multi-bit computation using lookup tables 

Faster software implementations process more than one bit of dividend per iteration using lookup table 


In computer science, a lookup table (LUT) is an array data structure, array that replaces runtime (program lifecycle phase), runtime computation of a mathematical function (mathematics), function with a simpler array indexing operation, in a proc ...
s, indexed by highest order coefficients of rem, to  memoize the per-bit division steps.

  Sarwate algorithm (single lookup table) 

The most common technique uses a 256-entry lookup table, to process 8 bits of input per iteration.  This replaces the body of the outer loop (over i) with:

 // Msbit-first
 rem = (rem leftShift 8) xor big_endian_table tring[ixor ((leftmost 8 bits of rem) rightShift (n-8))">.html" ;"title="tring[i">tring[ixor ((leftmost 8 bits of rem) rightShift (n-8)) // Lsbit-first
 rem = (rem rightShift 8) xor little_endian_table tring[ixor (rightmost 8 bits of rem)]
:Code fragment 6: Cores of table based division

Using a 256-entry table is usually most convenient, but other sizes can be used.  In small microcontrollers, using a 16-entry table to process four bits at a time gives a useful speed improvement while keeping the table small.  On computers with ample storage, a -entry table can be used to process 16 bits at a time.

  Generating the lookup table 


The software to generate the lookup table is so small and fast that it is usually faster to compute them on program startup than to load precomputed tables from storage.  One popular technique is to use the bit-at-a-time code 256 times to generate the CRCs of the 256 possible 8-bit bytes.  However, this can be optimized significantly by taking advantage of the property that table  xor j
 table xor table /code>.  Only the table entries corresponding to powers of two need to be computed directly.

In the following example code, crc holds the value of table /code>:

 big_endian_table := 0
 crc := 0x8000 // ''Assuming a 16-bit polynomial''
 i := 1
 do  while i < 256
:Code fragment 7: Byte-at-a-time CRC table generation, msbit-first

 little_endian_table := 0
 crc := 1;
 i := 128
 do  while i > 0
:Code fragment 8: Byte-at-a-time CRC table generation, lsbit-first

In these code samples, the table index i + j is equivalent to i xor j; you may use whichever form is more convenient.

  CRC-32 example 

One of the most commonly encountered CRC polynomials is known as CRC-32, used by (among others) Ethernet 






Ethernet ( ) is a family of wired computer networking technologies commonly used in local area networks (LAN), metropolitan area networks (MAN) and wide area networks (WAN). It was commercially introduced in 1980 and first standardized in 198 ...
, FDDI 






Fiber Distributed Data Interface (FDDI) is a standard for data transmission in a local area network.
It uses optical fiber as its standard underlying physical medium.

It was also later specified to use copper cable, in which case it may be c ...
,  ZIP and other archive formats 
In computing, an archive file stores the content of one or more  files, possibly  compressed, with associated metadata such as  file name, directory structure, error detection and correction information, commentary, compressed data archives, storag ...
, and  PNG  image format.  Its polynomial can be written msbit-first as 0x04C11DB7, or lsbit-first as 0xEDB88320.

This is a practical example for the CRC-32 variant of CRC.
 
An alternate source is the W3C webpage on PNG, which includes an appendix with a short and simple table-driven implementation in C of CRC-32. You will note that the code corresponds to the lsbit-first byte-at-a-time algorithm presented here, and the table is generated using the bit-at-a-time code.

 Function CRC32
    Input:
       data:  Bytes     // Array of bytes
    Output:
       crc32: UInt32    // 32-bit unsigned CRC-32 value

 // Initialize CRC-32 to starting value
 crc32 ← 0xFFFFFFFF

 for each byte in data do
    nLookupIndex ← (crc32 xor byte) and 0xFF
    crc32 ← (crc32 shr 8) xor CRCTable LookupIndex // CRCTable is an array of 256 32-bit constants

 // Finalize the CRC-32 value by inverting all the bits
 crc32 ← crc32 xor 0xFFFFFFFF
 return crc32

In C, the algorithm looks like:

#include  // uint32_t, uint8_t

static uint32_t CRCTable 56

// Initialization by multiple threads is redundant, but safe.
static void CRC32_init(void)


uint32_t CRC32(const uint8_t data[], size_t data_length)



  Byte-Slicing using multiple tables 

There exists a slice-by-''n'' (typically slice-by-8 for CRC32) algorithm that usually doubles or triples the performance compared to the Sarwate algorithm. Instead of reading 8 bits at a time, the algorithm reads 8''n'' bits at a time. Doing so maximizes performance on superscalar 





A superscalar processor (or multiple-issue processor) is a  CPU that implements a form of  parallelism called  instruction-level parallelism within a single processor. In contrast to a  scalar processor, which can execute at most one single in ...
 processors.

It is unclear who actually invented the algorithm.

To understand the advantages, start with the slice-by-2 case.  We wish to compute a CRC two bytes (16 bits) at a time, but the standard table-based approach would require an inconveniently large 65536-entry table.  As mentioned in , CRC tables have the property that table  xor j= table xor table /code>.  We can use this identity to replace the large table by two 256-entry tables: table  + 256 × j= table_low xor table_high /code>.

So the large table is not stored explicitly, but each iteration computes the CRC value that would be there by combining the values in two smaller tables.  That is, the 16-bit index is "sliced" into two 8-bit indexes.  At first glance, this seems pointless; why do two lookups in separate tables, when the standard byte-at-a-time algorithm would do two lookups in the ''same'' table?

The difference is instruction-level parallelism 


Instruction-level parallelism (ILP) is the Parallel computing, parallel or simultaneous execution of a sequence of Instruction set, instructions in a computer program. More specifically, ILP refers to the average number of instructions run per st ...
.  In the standard algorithm, the index for each lookup depends on the value fetched in the previous one.  Thus, the second lookup cannot begin until the first lookup is complete.

When sliced tables are used, both lookups can begin at the same time.  If the processor can perform two loads in parallel (2020s microprocessors can keep track of over 100 loads in progress), then this has the potential to double the speed of the inner loop.

This technique can obviously be extended to as many slices as the processor can benefit from.

When the slicing width ''equals'' the CRC size, there is a minor speedup.  In the part of the basic Sarwate algorithm where the previous CRC value is shifted by the size of the table lookup, the previous CRC value is shifted away entirely (what remains is all zero), so the XOR can be eliminated from the critical path.

The resultant slice-by-''n'' inner loop consists of:
# XOR the current CRC with the next ''n'' bytes of the message,
# look up each byte of the resultant value in the ''n'' slice tables, then
# XOR the ''n'' results to get the next CRC.

This still has the property that all of the loads in the second step must be completed before the next iteration can commence, resulting in regular pauses during which the processor's memory subsystem (in particular, the data cache) is unused.  However, when the slicing width ''exceeds'' the CRC size, a significant second speedup appears.

This is because a portion of the results of the first step ''no longer depend'' on any previous iteration.  When XORing a 32-bit CRC with 64 bits of message, half of the result is simply a copy of the message.  If coded carefully (to avoid creating a false data dependency 

A data dependency in computer science is a situation in which a  program statement (instruction) refers to the data of a preceding statement. In compiler theory, the technique used to discover data dependencies among statements (or instructions) i ...
), half of the slice table loads can begin ''before'' the previous loop iteration has completed.  The result is enough work to keep the processor's memory subsystem ''continuously'' busy, which achieves maximum performance.  As mentioned, on post-2000 microprocessors, slice-by-8 is generally sufficient to reach this level.

There is no particular need for the slices to be 8 bits wide.  For example, it would be entirely possible to compute a CRC 64 bits at a time using a slice-by-9 algorithm, using 9 128-entry lookup tables to handle 63 bits, and the 64th bit handled by the bit-at-a-time algorithm (which is effectively a 1-bit, 2-entry lookup table).  This would almost halve the table size (going from 8×256 = 2048 entries to 9×128 = 1152) at the expense of one more data-dependent load per iteration.

  Generating the tables 

The tables for slicing computation are a simple extension of the table for the basic Sarwate algorithm.  The loop for the 256 entries of each table is identical, but beginning with the crc value left over from the previous table.

  Multi-bit computation without lookup tables 


Parallel update for a byte or a word at a time can also be done explicitly, without a table. For each bit an equation is solved after 8 bits have been shifted in.

Multiple reduction steps are normally expressed as a matrix operation.  One shift and reduction modulo a degree- $n$  generator polynomial  $G(x)$  is equivalent to multiplication by the  $n\times n$  companion matrix 
In linear algebra, the  Frobenius  companion matrix of the monic polynomial

p(x)=c_0 + c_1 x + \cdots + c_x^ + x^n

is the  square matrix defined as

C(p)=\begin
0 & 0 & \dots & 0 & -c_0 \\
1 & 0 & \dots & 0 & -c_1 \\
0 & 1 & \dots & 0 & -c_2 \\
\ ...
  $A = C(G)$ .   $r$  steps are written as the matrix  $A^r = C(G)^r$ .

This technique is normally used in high-speed hardware implementations, but is practical in software for small or sparse CRC polynomials.  For large, dense CRC polynomials, the code becomes impractically long.

  Examples for sparse polynomials 


The following tables list the equations processing 8 bits at a time modulo some commonly used polynomials, using the following symbols:








  Two-step computation 

For dense polynomials, such as the CRC-32 polynomial, computing the remainder a byte at a time produces equations where each bit depends on up to 8 bits of the previous iteration.  In byte-parallel hardware implementations this calls for either 8-input or cascaded XOR gates which have substantial  gate delay.

To maximise computation speed, an ''intermediate remainder'' can be calculated by first computing the CRC of the message modulo a sparse polynomial which is a multiple of the CRC polynomial.  For CRC-32, the polynomial ''x''¹²³ + ''x''¹¹¹ + ''x''⁹² + ''x''⁸⁴ + ''x''⁶⁴ + ''x''⁴⁶ + ''x''²³ + 1 has the property that its terms (feedback taps) are at least 8 positions apart.  Thus, a 123-bit shift register can be advanced 8 bits per iteration using only two-input XOR gates, the fastest possible.  Finally the intermediate remainder can be reduced modulo the standard polynomial in a second slower shift register (once per CRC, rather than once per input byte) to yield the CRC-32 remainder.

If 3- or 4-input XOR gates are permitted, shorter intermediate polynomials of degree 71 or 53, respectively, can be used.

  State-space transformation 

The preceding technique works, but requires a large intermediate shift register.  A more hardware-efficient technique which has been used for high-speed networking since  is ''state-space transformation''.  The inner loop of a  $r$ -bit-at-a-time CRC engine is to repeatedly update the intermediate remainder  $y$  to reflect an  $r$ -bit portion of the message  $y_i$  using:
: $y_i = A^r \cdot (y_ \oplus x_i).$ 
The implementation challenge is that the matrix multiplication by  $A^r$  must be performed in  $r$  bit times.  In general, as  $r$  increases, the so does the complexity of this multiplication, resulting in a maximum speedup of about  $r/2.$   To improve on this, first break this up the equation using distributivity 



In mathematics, the distributive property of binary operations is a generalization of the distributive law, which asserts that the equality
x \cdot (y + z) = x \cdot y + x \cdot z
is always true in elementary algebra.
For example, in elementary  ...
 into:
: $y_i = A^r y_ \oplus A^r x_i.$ 
Then, we find an invertible matrix  $T$  and perform a change of basis 



In mathematics, an  ordered basis of a vector space of finite dimension  allows representing uniquely any element of the vector space by a coordinate vector, which is a sequence of   scalars called coordinates. If two different bases are conside ...
, multipling the intermediate state by  $T^$ .  So the iteration becomes:
: $T^y_i = T^ A^r T (T^ y_) \oplus T^ A^r x_i.$ 
The final CRC is recovered as  $y = T(T^y).$ 
It's important to note that the input multiplication by  $T^A^r$  and the output multiplication by  $T$  are not time-critical, as they can be  pipelined to whatever depth is required to meet the performance target.  Only the central multiplication by  $T^A^rT$  must be completed within  $r$  bit times.  It is possible to find a transformation matrix  $T$  which gives that the form of a companion matrix.  In other words, it can be implemented using the same (fast) 2-input XOR gates as the bit-at-a-time algorithm.  This allows an  $r$ -bit parallel CRC to operate  $r$  times as fast as a 1-bit serial implementation.

There are many possible  $n\times n$  transformation matrices with this property, so it is possible to choose one which also minimizes the complexity of the input and output matrices  $T^A^r$  and  $T$ .

  Block-wise computation 


Block-wise computation of the remainder can be performed in hardware for any CRC polynomial by factorizing the state space transformation matrix needed to compute the remainder into two simpler Toeplitz matrices.

  One-pass checking 

When appending a CRC to a message, it is possible to detach the transmitted CRC, recompute it, and verify the recomputed value against the transmitted one.  However, a simpler technique is commonly
used in hardware.

When the CRC is transmitted with the correct byte order (matching the chosen bit-ordering convention), a receiver can compute an overall CRC, over the message ''and'' the CRC, and if they are correct, the result will be zero.  A good source for even more
This possibility is the reason that most network protocols which include a CRC do so ''before'' the ending delimiter; it is not necessary to know whether the end of the packet is imminent to check the CRC.

In fact, a few protocols use the CRC ''as'' the message delimiter, a technique called CRC-based framing CRC-based framing is a kind of frame synchronization used in Asynchronous Transfer Mode (ATM) and other similar protocols.

The concept of CRC-based framing was developed by  StrataCom, Inc. in order to improve the efficiency of a pre-standard Async ...
.  (This requires multiple frames to detect acquisition or loss of framing, so is limited to applications where the frames are a known length, and the frame contents are sufficiently random that valid CRCs in misaligned data are rare.)

  CRC variants 

In practice, most standards specify presetting the register to all-ones and inverting the CRC before transmission.  This has no effect on the ability of a CRC to detect changed bits, but gives it the ability to notice bits that are added to the message.

  Preset to −1 

The basic mathematics of a CRC accepts (considers as correctly transmitted) messages which, when interpreted as a polynomial, are a multiple of the CRC polynomial.  If some leading 0 bits are prepended to such a message, they will not change its interpretation as a polynomial.  This is equivalent to the fact that 0001 and 1 are the same number.

But if the message being transmitted does care about leading 0 bits, the inability of the basic CRC algorithm to detect such a change is undesirable.  If it is possible that a transmission error could add such bits, a simple solution is to start with the rem shift register set to some non-zero value; for convenience, the all-ones value is typically used.  This is mathematically equivalent to complementing (binary NOT) the first ''n'' bits of the message, where ''n'' is the number of bits in the CRC register.

This does not affect CRC generation and checking in any way, as long as both generator and checker use the same initial value.  Any non-zero initial value will do, and a few standards specify unusual values,E.g. low-frequency RFID  but the all-ones value (−1 in twos complement binary) is by far the most common.  Note that a one-pass CRC generate/check will still produce a result of zero when the message is correct, regardless of the preset value.

  Post-invert 

The same sort of error can occur at the end of a message, albeit with a more limited set of messages.  Appending 0 bits to a message is equivalent to multiplying its polynomial by ''x'', and if it was previously a multiple of the CRC polynomial, the result of that multiplication will be, as well.  This is equivalent to the fact that, since 726 is a multiple of 11, so is 7260.

A similar solution can be applied at the end of the message, inverting the CRC register before it is appended to the message.  Again, any non-zero change will do; inverting all the bits (XORing with an all-ones pattern) is simply the most common.

This has an effect on one-pass CRC checking: instead of producing a result of zero when the message is correct, it produces a fixed non-zero result.  (To be precise, the result is the CRC, with zero preset but with post-invert, of the inversion pattern.)  Once this constant has been obtained (e.g. by performing a one-pass CRC generate/check on an arbitrary message), it can be used directly to verify the correctness of any other message checked using the same CRC algorithm.

  See also 

General category
* Error correction code 


In computing, telecommunication, information theory, and coding theory, forward error correction (FEC) or channel coding is a technique used for  controlling errors in data transmission over unreliable or noisy communication channels.

The centra ...

* List of hash functions 
This is a list of hash functions, including cyclic redundancy checks, checksum functions, and cryptographic hash functions.


 Cyclic redundancy checks




Adler-32 is often mistaken for a CRC, but it is not: it is a checksum.
 Checksums



 Univer ...

*  Parity is equivalent to a 1-bit CRC with polynomial .

Non-CRC checksums
* Adler-32 
Adler-32 is a checksum algorithm written by  Mark Adler in 1995, modifying  Fletcher's checksum. Compared to a cyclic redundancy check of the same length, it trades reliability for speed. Adler-32 is more reliable than  Fletcher-16, and slightly le ...

* Fletcher's checksum 
The Fletcher checksum is an algorithm for computing a  position-dependent checksum devised by John G. Fletcher (1934–2012) at  Lawrence Livermore Labs in the late 1970s.  The objective of the Fletcher checksum was to provide error-detection prope ...


  References 



  External links 

* 

* {{cite web
,  url=https://github.com/lizardfs/lizardfs/tree/master/external/crcutil-1.0
,  title=Efficient (~1 CPU cycle per byte) CRC implementation
,  author=Andrew Kadarch, Bob Jenkins,  website=GitHub 




GitHub () is a Proprietary software, proprietary developer platform that allows developers to create, store, manage, and share their code. It uses Git to provide distributed version control and GitHub itself provides access control, bug trackin ...



 Cyclic redundancy checks
 Finite fields
 Articles with example pseudocode