Data-Integrity
Total Page:16
File Type:pdf, Size:1020Kb
Fundamentals of Data Representation: Error checking and correction When you send data across the internet or even from your USB to a computer you are sending millions upon millions of ones and zeros. What would happen if one of them got corrupted? Think of this situation: You are buying a new game from an online retailer and put £40 into the payment box. You click on send and the number 40 is sent to your bank stored in a byte: 00101000. Now imagine if the second most significant bit got corrupted on its way to the bank, and the bank received the following: 01101000. You'd be paying £104 for that game! Error Checking and Correction stops things like this happening. There are many ways to detect and correct corrupted data, we are going to learn two. Parity bits Sometime when you see ASCII code it only appears to have 7 bits. Surely they should be using a byte to represent a character, after all that would mean they could represent more characters than the measily 128 they can currently store (Note there is extended ASCII that uses the 8th bit as well but we don't need to cover that here). The eigth bit is used as a parity bit to detect that the data you have been sent is correct. It will not be able to tell you which digit is wrong, so it isn't corrective. There are two types of parity odd and even. If you think back to primary school you would have learnt about odd and even numbers, hold that thought, we are going to need it. Odd numbers : 1,3,5,7,9 Even numbers : 0,2,4,6,8 (note 0 is here too) Example: How to detect errors using parity bits When we send binary data we need to count the number of 1s that are present in it. For example sending the ASCII character 'B' 1000010. This has two occurrences of 1. We can then apply another bit to the front of it and send it across the internet. If we are using even parity 01000010 If we are using odd parity 11000010 Now when the data gets to the other end and we were using even parity 01001010 - odd parity, there has been an error in transmission, ask for data to be sent again 01000010 - even parity, the data has been sent correctly Checksum A checksum or hash sum is a small-size datum from a block ofdigital data for the purpose of detecting errors which may have been introduced during its transmission or storage. It is usually Exercise: Test your knowledge of parity bits Try and apply the correct parity bit to the following: _1011010 (even parity) [Collapse] Answer : 01011010 _1011010 (odd parity) [Collapse] Answer : 11011010 _1111110 (even parity) [Collapse] Answer : 01111110 _0000000 (odd parity) [Collapse] Answer : 10000000 However, if we receive 10010110, knowing that the number had odd parity, where is the error? The best we can do is ask for the data to be resent and hope it's correct next time. Parity bits only provide detective error. We need something that detects and corrects.. applied to an installation file after it is received from the download server. By themselves checksums are often used to verify data integrity, but should not be relied upon to also verify data authenticity. The actual procedure which yields the checksum, given a data input is called a checksum function or checksum algorithm. Depending on its design goals, a good checksum algorithm will usually output a significantly different value, even for small changes made to the input. This is especially true ofcryptographic hash functions, which may be used to detect many data corruption errors and verify overall data integrity; if the computed checksum for the current data input matches the stored value of a previously computed checksum, there is a very high probability the data has not been accidentally altered or corrupted. Checksum functions are related to hash functions, fingerprints, randomization functions, and cryptographic hash functions. However, each of those concepts has different applications and therefore different design goals. Checksums are used as cryptographic primitives in larger authentication algorithms. For cryptographic systems with these two specific design goals, see HMAC. Check digits and parity bits are special cases of checksums, appropriate for small blocks of data (such as Social Security numbers, bank account numbers, computer words, single bytes, etc.). Some error-correcting codes are based on special checksums which not only detect common errors but also allow the original data to be recovered in certain cases. 1 Checksum algorithms o 1.1 Parity byte or parity word o 1.2 Modular sum o 1.3 Position-dependent checksums o 1.4 General considerations 2 See also 3 References 4 External links Checksum algorithms Parity byte or parity word The simplest checksum algorithm is the so-called longitudinal parity check, which breaks the data into "words" with a fixed number n of bits, and then computes the exclusive or of all those words. The result is appended to the message as an extra word.To check the integrity of a message, the receiver computes the exclusive or (XOR) of all its words, including the checksum; if the result is not a word with n zeros, the receiver knows a transmission error occurred. With this checksum, any transmission error which flips a single bit of the message, or an odd number of bits, will be detected as an incorrect checksum. However, an error which affects two bits will not be detected if those bits lie at the same position in two distinct words. Also swapping of two or more words will not be detected. If the affected bits are independently chosen at random, the probability of a two-bit error being undetected is 1/n. Modular sum A variant of the previous algorithm is to add all the "words" as unsigned binary numbers, discarding any overflow bits, and append the two's complement of the total as the checksum. To validate a message, the receiver adds all the words in the same manner, including the checksum; if the result is not a word full of zeros, an error must have occurred. This variant too detects any single-bit error, but the promodular sum is used in SAE J1708.[1] Position-dependent checksums The simple checksums described above fail to detect some common errors which affect many bits at once, such as changing the order of data words, or inserting or deleting words with all bits set to zero. The checksum algorithms most used in practice, such as Fletcher's checksum, Adler-32, and cyclic redundancy checks (CRCs), address these weaknesses by considering not only the value of each word but also its position in the sequence. This feature generally increases the cost of computing the checksum. General considerations A single-bit transmission error then corresponds to a displacement from a valid corner (the correct message and checksum) to one of the m adjacent corners. An error which affects k bits moves the message to a corner which is ksteps removed from its correct corner. The goal of a good checksum algorithm is to spread the valid corners as far from each other as possible, so as to increase the likelihood "typical" transmission errors will end up in an invalid corner. Data validation and verification Validation and verification are two ways to check that the data entered into a computer is correct. Data entered incorrectly is of little use. Validation Validation is an automatic computer check to ensure that the data entered is sensible and reasonable. It does not check the accuracy of data. For example, a secondary school student is likely to be aged between 11 and 16. The computer can be programmed only to accept numbers between 11 and 16. This is arange check. However, this does not guarantee that the number typed in is correct. For example, a student's age might be 14, but if 11 is entered it will be valid but incorrect. Types of validation There are a number of validation types that can be used to check the data that is being entered. Validation How it works Example usage type Check digit the last one or two digits in a code are bar code readers in supermarkets use check digits used to check the other digits are correct Format checks the data is in the right format a National Insurance number is in the form LL 99 99 check 99 L where L is any letter and 9 is any number Length checks the data isn't too short or too a password which needs to be six letters long check long Lookup looks up acceptable values in a table there are only seven possible days of the week table Validation How it works Example usage type Presence checks that data has been entered into in most databases a key fieldcannot be left blank check a field Range check checks that a value falls within the number of hours worked must be less than 50 and specified range more than 0 Spell check looks up words in a dictionary when word processing Verification Verification is performed to ensure that the data entered exactly matches the original source. There are two main methods of verification: 1. Double entry - entering the data twice and comparing the two copies. This effectively doubles the workload, and as most people are paid by the hour, it costs more too. 2. Proofreading data - this method involves someone checking the data entered against the original document.