Checksums, Your Best Friends, for Security

Published in Linux for You, August 2008 issue. - - - - - - - - - - - - - - - - Checksums, your best friends, for security S. Parthasarathy [email protected] Imagine that you write an electronic cheque for Rs. 1000 , payable to your friend, and send this cheque electronically to your friend. How do you,or your friend, or the bank ensure that the cheque has not been tampered with, or altered enroute. How do you ensure the authenticity of the cheque, particularly the amount payable and the person to whom payable ? This same problem can appear in various other forms in your day-to-day experience. How are you sure that the file you received as an attachment has not been altered on its way ? How are you sure that the iso image you downloaded is not the one created by an impostor ? Do you know why your passwords are so safe in a Linux system ? Whenever you receive any software -- that includes documentation and scripts -- from any source, it is important to ensure that there are no hidden risks and traps which have been planted by intruders. Or, when you transfer or copy files over a network, you want to ensure that the file has not got modified by transmision errors. When downloading software from online repositories, or when you receive prerecorded software (e.g. on a CDROM) from any source, it is important to consider the possibility that the site may have been compromised. One of the threats that users face, is that intruders could include malicious code in the software packages distributed by those sites. This code could take the form of Trojan horse programs, or backdoors. In large collections of files, intruders could slip in their own files containing malicious programs, or they could modify files which contain important material, particularly, intrusion detection software or procedures. In the simplest of cases, they could replace your material with material which would embarass you or damage your reputation. You want an effective and simple solution, to protect you in all such scenarios. The answer to all these questions lies in a simple idea called ªchecksumº. A checksum is something like your fingerprint. Technically speaking, a ªchecksumº (also known as a hash digest) is a form of redundancy check. This is a simple way to protect the integrity of data, by detecting errors (modifications) in data that are sent through space (telecommunications) or time (storage). A redundancy check, like the name implies, is the approach of adding redundant information to the data, such that any modificiation or alteration of the data can be detected (in theory) by just looking at the checksum. Note that this is just a way of ensuring that your data has not been modified. It does not automatically lead you to the modifications actually made. Nor does it prevent data being modified by unscrupulous agents. Nor will it tell you who modified the data and when. Now, this how it works. Let us say you have a file X which you want to protect with a checksum. You can use some algorithm of your own, and generate a checksum x which is derived from X. You append x to X (whenever you send the file X). the receiver will use the same algorithm you used, on the file X and get his checksum say y. If X has not been altered, x should be the same as y. If they are not, you can suspect some mischief somewhere from the time X was created, and till it was received by you. Let us take a simple, childish, example. Let us asume that your X is the file containg this article. Let us asume that we count all the characters (including punctuation marks) in this file, say x1. Count the number of spaces, say x2. Now concatenate x2 with x1, to get x1x2. Now, when we send X we will send x1x2 also, along with X. It is just a small operation for the receiver to ensure that the file has not been altered, by just recomputing the number of characters (say y1), and the blank spaces (say y2) in his copy of X. He can then compare y1y2 with x1x2 and find out if the file X has been altered before reaching him. Of course, this approach has a lot of glaring loopholes. For instance, if the miscreant, replaces one character by another, his mischief will never get caught. Or if he changes X, and also recomputes the checksum, and sends the recomputed checksum, the mischief will go unnoticed. Or, if he adds one blank space at one place, and removes a blank space from another place (compensating errors), he will never get caught. Do not despair. There are powerful schemes and extensions to checksums that are immune to such mischiefs. Checksums also come in handy in many other ways too. For instance if you have two jpeg files of the same image. Visually inspecting the two images will not show you any differences if the difference is at a pixel level (a picture may consist of several thousands of pixels). If you have a thousand similar looking images, it would not be easy to point out which ones are duplicates of each other. A simple way to solve this problem, will be to compute the checksum of each jpeg file. Now by comparing the checksums, you can get to recognise duplicates easily. The slightest difference in the images will show up us a different checksum for the jpeg file. Comparing pictures visually, will not always show differences, if any. Also, when the number of files is large, such a task would be too cumbersome for any human. Your own Linux machine uses a clever combination of checksum (md5/sha) and encryption (one way encryption plus Data Encryption Standard), to store your Linux password securily. When dealing with passwords, there are three things we are protecting against: - storing the password (can someone sneak into your machine and steal the password ?), transmission of password (example when logging in over the web), and the replay of the password (example, verifying the password entered by a user during login). Hash functions come in handy in all these cases. You can find out how this is done, by reading any good book on Linux/Unix internals. By storing passwords using a hash digest, even if your password file is compromised, the passwords cannot be decoded. The only disadvantage will be that users cannot retrieve lost passwords, they must reset them. Some important properties of good checksums are: 1. Two different sets of data will always give different checksums (Two different persons will have different fingerprints) 2. It is impossible (or extremely difficult) to reconstruct the original data set from the checksum of that data set (Can you get a person©s photograph, just from his fingerprint ? If you could, you can help our Police enormously) 3. A given data set will always lead to a unique checksum (the same person cannot have two fingerprints for the same finger) 4. Just like making a fingerprint of a person, computing the checksum for any arbitrary data set should be feasible and relatively easy (read efficient). Two of the most popular checksum algorithms (also known as hash digest algorithms or hash functions) are: sha, and md5. Of course there are other hash functions also available. Each has its own strengths and weaknesses. For a good briefing on hash functions, visit ªthe hash function loungeº, at: http://paginas.terra.com.br/informatica/paulobarreto /hflounge.html Another interesting site about hash digests is : http://www.hashemall.com/ Hashing is closely related to cryptography, and is the basis of a technique called ªdigital signatureº. A digital signature itself is an encrpted form of a hash digest. sha According to wikipedia -- ªSHA stands for Secure Hash Algorithm. Hash algorithms compute a fixed-length digital representation (known as a message digest) of an input data sequence (the message) of any length. The term SHA collectively denotes five cryptographic hash functions designed by the National Security Agency (NSA) and published by the NIST as a U.S. Federal Information Processing Standard. ª The original specification of the algorithm was published in 1993 as the Secure Hash Standard, FIPS PUB 180, by US government standards agency NIST (National Institute of Standards and Technology). This version is now often referred to as "SHA0". SHA-0 was withdrawn by the NSA shortly after publication and was superseded by the revised version, published in 1995 in FIPS PUB 180-1 and commonly referred to as "SHA1". The youngest in this series is SHA-512 and was born in 2000. The new hash functions SHA-224, SHA-256, SHA-384, and SHA-512, are collectively called the SHA-2 family. The Secure Hash Signature Standard (SHS) (FIPS PUB 180-2) specifies four secure hash algorithms - SHA-1, SHA-256, SHA-384, and SHA-512 ± for computing a condensed representation of electronic data (message). When a message of any length < 264 bits (for SHA-1 and SHA-256) or < 2128 bits(for SHA-384 and SHA-512) is input to an algorithm, the result is an output called a message digest. Message digests range in length from 160 to 512 bits, depending on the algorithm. Notice that in all the above, for a given message (of any arbitrary length), and a given SHA function, the length of the digest is fixed. This makes it easy to strip out the hash digest from the received ªpadded messageº (message plus digest). md5 MD5 (message digest 5) was designed by Ronald Rivest in 1991 (the ªRº in the famous technique called RSA cryptography).

Load more