Substitution Ciphers

Substitution Ciphers Substitution ciphers are those where each letter or group of letters in the plaintext message are replaced by another letter or group of letters to arrive at the ciphertext. The oldest known substitution cipher is the Caesar cipher where each letter is substituted with the letter that is offset by three from the alphabet. For example A is replaced by D, B is replaced by E and Z is replaced by C. In this simple substitution cipher, the key is 3. Obviously, using today’s English language capital-letters-only alphabet, there are only 25 possible keys and this would not take long to crack. There are many ways to improve the basic substitution cipher such as character mapping where each character of the alphabet is mapped to a different character of the alphabet resulting in a key length of 26! or 4 X 1026 possible keys. This is called monoalphabetic substitution. Trying each of these keys, one by one, at the rate of 1 per nanosecond, in an attempt to crack the code would take more 1010 years to cover the entire key space. While the monoalphabetic substitution cipher seems to be an excellent choice, if a cryptanalyst has even a small amount of ciphertext to work with, he or she can easily break the code using the statistical properties of natural languages. In the English language for example, the six most commonly used letters are e, t, o, a, n, and i in that order. Furthermore, the cryptanalyst will evaluate the ciphertext looking for common digrams and trigrams, two and three letter combinations. Common digrams are th, in, er, re, and an while common trigrams are the, ing, and, and ion. Thus the cryptanalyst will evaluate the frequencies of letters, digrams, and trigrams to determine tentative character mappings and then go from there. Cryptanalysts can further analyze the message that they are trying to crack by having some knowledge regarding the possible subject matter.