2.4 Error Detection
As discussed in Chapter 1, bit errors are sometimes introduced into frames. This happens, for example, because of electrical interference or thermal noise. Although errors are rare, especially on optical links, some mechanism is needed to detect these errors so that corrective action can be taken. Otherwise, the end user is left wondering why the C program that successfully compiled just a moment ago now suddenly has a syntax error in it, when all that happened in the interim is that it was copied across a network file system.
There is a long history of techniques for dealing with bit errors in computer systems, dating back to at least the 1940s. Hamming and Reed-Solomon codes are two notable examples that were developed for use in punch card readers, when storing data on magnetic disks, and in early core memories. This section describes some of the error detection techniques most commonly used in networking.
Detecting errors is only one part of the problem. The other part is correcting errors once detected. Two basic approaches can be taken when the recipient of a message detects an error. One is to notify the sender that the message was corrupted so that the sender can retransmit a copy of the message. If bit errors are rare, then in all probability the retransmitted copy will be error free. Alternatively, some types of error detection algorithms allow the recipient to reconstruct the correct message even after it has been corrupted; such algorithms rely on error-correcting codes, discussed below.
One of the most common techniques for detecting transmission errors is a technique known as the cyclic redundancy check (CRC). It is used in nearly all the link-level protocols discussed in this chapter. This section outlines the basic CRC algorithm, but before discussing that approach, we first describe the simpler checksum scheme used by several Internet protocols.
The basic idea behind any error detection scheme is to add redundant
information to a frame that can be used to determine if errors have been
introduced. In the extreme, we could imagine transmitting two complete
copies of the data. If the two copies are identical at the receiver,
then it is probably the case that both are correct. If they differ, then
an error was introduced into one (or both) of them, and they must be
discarded. This is a rather poor error detection scheme for two reasons.
First, it sends
Fortunately, we can do a lot better than this simple scheme. In
general, we can provide quite strong error detection capability while
sending only
We say that the extra bits we send are redundant because they add no new information to the message. Instead, they are derived directly from the original message using some well-defined algorithm. Both the sender and the receiver know exactly what that algorithm is. The sender applies the algorithm to the message to generate the redundant bits. It then transmits both the message and those few extra bits. When the receiver applies the same algorithm to the received message, it should (in the absence of errors) come up with the same result as the sender. It compares the result with the one sent to it by the sender. If they match, it can conclude (with high likelihood) that no errors were introduced in the message during transmission. If they do not match, it can be sure that either the message or the redundant bits were corrupted, and it must take appropriate action—that is, discarding the message or correcting it if that is possible.
One note on the terminology for these extra bits. In general, they are referred to as error-detecting codes. In specific cases, when the algorithm to create the code is based on addition, they may be called a checksum. We will see that the Internet checksum is appropriately named: It is an error check that uses a summing algorithm. Unfortunately, the word checksum is often used imprecisely to mean any form of error-detecting code, including CRCs. This can be confusing, so we urge you to use the word checksum only to apply to codes that actually do use addition and to use error-detecting code to refer to the general class of codes described in this section.
2.4.1 Internet Checksum Algorithm
Our first approach to error detection is exemplified by the Internet checksum. Although it is not used at the link level, it nevertheless provides the same sort of functionality as CRCs, so we discuss it here.
The idea behind the Internet checksum is very simple—you add up all the words that are transmitted and then transmit the result of that sum. The result is the checksum. The receiver performs the same calculation on the received data and compares the result with the received checksum. If any transmitted data, including the checksum itself, is corrupted, then the results will not match, so the receiver knows that an error occurred.
You can imagine many different variations on the basic idea of a checksum. The exact scheme used by the Internet protocols works as follows. Consider the data being checksummed as a sequence of 16-bit integers. Add them together using 16-bit ones’ complement arithmetic (explained below) and then take the ones’ complement of the result. That 16-bit number is the checksum.
In ones’ complement arithmetic, a negative integer (-x) is represented as the complement of x; that is, each bit of x is inverted. When adding numbers in ones’ complement arithmetic, a carryout from the most significant bit needs to be added to the result. Consider, for example, the addition of -5 and -3 in ones’ complement arithmetic on 4-bit integers: +5 is 0101, so -5 is 1010; +3 is 0011, so -3 is 1100. If we add 1010 and 1100, ignoring the carry, we get 0110. In ones’ complement arithmetic, the fact that this operation caused a carry from the most significant bit causes us to increment the result, giving 0111, which is the ones’ complement representation of -8 (obtained by inverting the bits in 1000), as we would expect.
The following routine gives a straightforward implementation of the
Internet’s checksum algorithm. The count
argument gives the length
of buf
measured in 16-bit units. The routine assumes that buf
has already been padded with 0s to a 16-bit boundary.
u_short
cksum(u_short *buf, int count)
{
register u_long sum = 0;
while (count--)
{
sum += *buf++;
if (sum & 0xFFFF0000)
{
/* carry occurred, so wrap around */
sum &= 0xFFFF;
sum++;
}
}
return ~(sum & 0xFFFF);
}
This code ensures that the calculation uses ones’ complement arithmetic
rather than the twos’ complement that is used in most machines. Note the
if
statement inside the while
loop. If there is a carry into the
top 16 bits of sum
, then we increment sum
just as in the
previous example.
Compared to our repetition code, this algorithm scores well for using a small number of redundant bits—only 16 for a message of any length—but it does not score extremely well for strength of error detection. For example, a pair of single-bit errors, one of which increments a word and one of which decrements another word by the same amount, will go undetected. The reason for using an algorithm like this in spite of its relatively weak protection against errors (compared to a CRC, for example) is simple: This algorithm is much easier to implement in software. Experience has suggested that a checksum of this form was adequate, but one reason it is adequate is that this checksum is the last line of defense in an end-to-end protocol. The majority of errors are picked up by stronger error detection algorithms, such as CRCs, at the link level.
2.4.2 Cyclic Redundancy Check
It should be clear by now that a major goal in designing error detection algorithms is to maximize the probability of detecting errors using only a small number of redundant bits. Cyclic redundancy checks use some fairly powerful mathematics to achieve this goal. For example, a 32-bit CRC gives strong protection against common bit errors in messages that are thousands of bytes long. The theoretical foundation of the cyclic redundancy check is rooted in a branch of mathematics called finite fields. While this may sound daunting, the basic ideas can be easily understood.
To start, think of an (n+1)-bit message as being represented by an
We can thus think of a sender and a receiver as exchanging polynomials with each other.
For the purposes of calculating a CRC, a sender and receiver have to
agree on a divisor polynomial,
When a sender wishes to transmit a message
It will help to understand the following if you know a little about polynomial arithmetic; it is just slightly different from normal integer arithmetic. We are dealing with a special class of polynomial arithmetic here, where coefficients may be only one or zero, and operations on the coefficients are performed using modulo 2 arithmetic. This is referred to as “polynomial arithmetic modulo 2.” Since this is a networking book, not a mathematics text, let’s focus on the key properties of this type of arithmetic for our purposes (which we ask you to accept on faith):
Any polynomial
can be divided by a divisor polynomial if is of higher degree than .Any polynomial
can be divided once by a divisor polynomial if is of the same degree as .The remainder obtained when
is divided by is obtained by performing the exclusive OR (XOR) operation on each pair of matching coefficients.
For example, the polynomial
Now that we know the basic rules for dividing polynomials, we are able to do long division, which is necessary to deal with longer messages. An example appears below.
Recall that we wanted to create a polynomial for transmission that is
derived from the original message
Multiply
by ; that is, add zeros at the end of the message. Call this zero-extended message .Divide
by and find the remainder.Subtract the remainder from
.
It should be obvious that what is left at this point is a message that
is exactly divisible by
Consider the message
You can see from the very bottom of Figure 32 that the
remainder of the example calculation is 101. So we know that 10011010000
minus 101 would be exactly divisible by

Figure 32. CRC calculation using polynomial long division.
Now we will consider the question of where the polynomial
One common type of error is a single-bit error, which can be expressed
as
All single-bit errors, as long as the
and terms have nonzero coefficientsAll double-bit errors, as long as
has a factor with at least three termsAny odd number of errors, as long as
contains the factorAny “burst” error (i.e., sequence of consecutive errored bits) for which the length of the burst is less than
bits (Most burst errors of length greater than bits can also be detected.)
Six versions of
CRC-32 =
We have mentioned that it is possible to use codes that not only detect the presence of errors but also enable errors to be corrected. Since the details of such codes require yet more complex mathematics than that required to understand CRCs, we will not dwell on them here. However, it is worth considering the merits of correction versus detection.
At first glance, it would seem that correction is always better, since with detection we are forced to throw away the message and, in general, ask for another copy to be transmitted. This uses up bandwidth and may introduce latency while waiting for the retransmission. However, there is a downside to correction, as it generally requires a greater number of redundant bits to send an error-correcting code that is as strong (that is, able to cope with the same range of errors) as a code that only detects errors. Thus, while error detection requires more bits to be sent when errors occur, error correction requires more bits to be sent all the time. As a result, error correction tends to be most useful when (1) errors are quite probable, as they may be, for example, in a wireless environment, or (2) the cost of retransmission is too high, for example, because of the latency involved retransmitting a packet over a satellite link.
The use of error-correcting codes in networking is sometimes referred to as forward error correction (FEC) because the correction of errors is handled “in advance” by sending extra information, rather than waiting for errors to happen and dealing with them later by retransmission. FEC is commonly used in wireless networks such as 802.11.
Finally, we note that the CRC algorithm, while seemingly complex, is
easily implemented in hardware using a

Figure 33. CRC calculation using shift register.