<<

Fundamentals of Data Representation: Error checking and correction When you send data across the internet or even from your USB to a computer you are sending millions upon millions of ones and zeros. What would happen if one of them got corrupted? Think of this situation: You are buying a new game from an online retailer and put £40 into the payment box. You click on send and the number 40 is sent to your bank stored in a : 00101000. Now imagine if the second most significant got corrupted on its way to the bank, and the bank received the following: 01101000. You'd be paying £104 for that game! Error Checking and Correction stops things like this happening. There are many ways to detect and correct corrupted data, we are going to learn two. Parity

Sometime when you see ASCII code it only appears to have 7 bits. Surely they should be using a byte to represent a character, after all that would mean they could represent more characters than the measily 128 they can currently store (Note there is extended ASCII that uses the 8th bit as well but we don't need to cover that here). The eigth bit is used as a to detect that the data you have been sent is correct. It will not be able to tell you which digit is wrong, so it isn't corrective.

There are two types of parity odd and even. If you think back to primary school you would have learnt about odd and even numbers, hold that thought, we are going to need it.

 Odd numbers : 1,3,5,7,9  Even numbers : 0,2,4,6,8 (note 0 is here too) Example: How to detect errors using parity bits

When we send binary data we need to count the number of 1s that are present in it. For example sending the ASCII character 'B' 1000010. This has two occurrences of 1. We can then apply another bit to the front of it and send it across the internet.

 If we are using even parity 01000010  If we are using odd parity 11000010

Now when the data gets to the other end and we were using even parity

 01001010 - odd parity, there has been an error in transmission, ask for data to be sent again  01000010 - even parity, the data has been sent correctly

Checksum

A or hash is a small-size datum from a block ofdigital data for the purpose of detecting errors which may have been introduced during its transmission or storage. It is usually Exercise: Test your knowledge of parity bits Try and apply the correct parity bit to the following:

_1011010 (even parity)

[Collapse]

Answer :

01011010

_1011010 (odd parity)

[Collapse]

Answer :

11011010

_1111110 (even parity)

[Collapse]

Answer :

01111110

_0000000 (odd parity)

[Collapse]

Answer :

10000000

However, if we receive 10010110, knowing that the number had odd parity, where is the error? The best we can do is ask for the data to be resent and hope it's correct next time. Parity bits only provide detective error. We need something that detects and corrects..

applied to an installation file after it is received from the download server. By themselves are often used to verify , but should not be relied upon to also verify data authenticity.

The actual procedure which yields the checksum, given a data input is called a checksum function or checksum algorithm. Depending on its design goals, a good checksum algorithm will usually output a significantly different value, even for small changes made to the input. This is especially true ofcryptographic hash functions, which may be used to detect many errors and verify overall data integrity; if the computed checksum for the current data input matches the stored value of a previously computed checksum, there is a very high probability the data has not been accidentally altered or corrupted.

Checksum functions are related to hash functions, fingerprints, randomization functions, and cryptographic hash functions. However, each of those concepts has different applications and therefore different design goals. Checksums are used as cryptographic primitives in larger authentication algorithms. For cryptographic systems with these two specific design goals, see HMAC.

Check digits and parity bits are special cases of checksums, appropriate for small blocks of data (such as Social Security numbers, bank account numbers, computer words, single , etc.). Some error-correcting codes are based on special checksums which not only detect common errors but also allow the original data to be recovered in certain cases.

 1 Checksum algorithms o 1.1 Parity byte or parity word o 1.2 Modular sum o 1.3 Position-dependent checksums o 1.4 General considerations  2 See also  3 References  4 External links Checksum algorithms Parity byte or parity word

The simplest checksum algorithm is the so-called longitudinal parity check, which breaks the data into "words" with a fixed number n of bits, and then computes the exclusive or of all those words. The result is appended to the message as an extra word.To check the integrity of a message, the receiver computes the exclusive or (XOR) of all its words, including the checksum; if the result is not a word with n zeros, the receiver knows a transmission error occurred.

With this checksum, any transmission error which flips a single bit of the message, or an odd number of bits, will be detected as an incorrect checksum. However, an error which affects two bits will not be detected if those bits lie at the same position in two distinct words. Also swapping of two or more words will not be detected. If the affected bits are independently chosen at random, the probability of a two-bit error being undetected is 1/n. Modular sum

A variant of the previous algorithm is to add all the "words" as unsigned binary numbers, discarding any overflow bits, and append the two's complement of the total as the checksum. To validate a message, the receiver adds all the words in the same manner, including the checksum; if the result is not a word full of zeros, an error must have occurred. This variant too detects any single-bit error, but the promodular sum is used in SAE J1708.[1] Position-dependent checksums

The simple checksums described above fail to detect some common errors which affect many bits at once, such as changing the order of data words, or inserting or deleting words with all bits set to zero. The checksum algorithms most used in practice, such as Fletcher's checksum, Adler-32, and cyclic redundancy checks (CRCs), address these weaknesses by considering not only the value of each word but also its position in the sequence. This feature generally increases the cost of computing the checksum. General considerations

A single-bit transmission error then corresponds to a displacement from a valid corner (the correct message and checksum) to one of the m adjacent corners. An error which affects k bits moves the message to a corner which is ksteps removed from its correct corner. The goal of a good checksum algorithm is to spread the valid corners as far from each other as possible, so as to increase the likelihood "typical" transmission errors will end up in an invalid corner.

Data validation and verification Validation and verification are two ways to check that the data entered into a computer is correct. Data entered incorrectly is of little use. Validation

Validation is an automatic computer check to ensure that the data entered is sensible and reasonable. It does not check the accuracy of data.

For example, a secondary school student is likely to be aged between 11 and 16. The computer can be programmed only to accept numbers between 11 and 16. This is arange check.

However, this does not guarantee that the number typed in is correct. For example, a student's age might be 14, but if 11 is entered it will be valid but incorrect.

Types of validation There are a number of validation types that can be used to check the data that is being entered.

Validation How it works Example usage type

Check digit the last one or two digits in a code are bar code readers in supermarkets use check digits used to check the other digits are correct

Format checks the data is in the right format a National Insurance number is in the form LL 99 99 check 99 L where L is any letter and 9 is any number

Length checks the data isn't too short or too a password which needs to be six letters long check long

Lookup looks up acceptable values in a table there are only seven possible days of the week table Validation How it works Example usage type

Presence checks that data has been entered into in most databases a key fieldcannot be left blank check a field

Range check checks that a value falls within the number of hours worked must be less than 50 and specified range more than 0

Spell check looks up words in a dictionary when word processing

Verification

Verification is performed to ensure that the data entered exactly matches the original source.

There are two main methods of verification:

1. Double entry - entering the data twice and comparing the two copies. This effectively doubles the workload, and as most people are paid by the hour, it costs more too.

2. Proofreading data - this method involves someone checking the data entered against the original document. This is also time consuming and costly

Data validation and verification - Test

1. What is an automatic computer check to make sure data entered is sensible and reasonable known as?

double entry

verification

validation 2. What validation type would make sure a post code was entered in the correct format?

length check

format check

presence check 3. What validation type would you use to check that numbers fell within a certain range?

range check

presence check

4. What validation type checks that a field is not left blank?

format check

length check

presence check 5. What validation type uses the last one or two digits to check the other digits are correct?

length check

format check

check digit 6. What validation type checks a minimum number of characters have been entered?

length check

format check

range check 7. Data is to be entered into a computer in the format YYMMDD. Which of the following is not a valid date?

310921

211113

21st June 2004 8. Which of the following statements is false?

validation can check that the data is sensible

validation can check that the data falls between certain allowable boundaries

validation can check that the data is correct 9. Which of the following is NOT a method of verification?

double entry - typing the data in twice and getting the computer to check the second version against the first

using presence, range and length checks to make sure that no mistakes happen

printing out what you have typed in and comparing it against the source data