Math 550 Coding and Cryptography Workbook J. Swarts 0121709 Ii Contents

Math 550 Coding and Cryptography Workbook J. Swarts 0121709 ii Contents 1 Introduction and Basic Ideas 3 1.1 Introduction . 3 1.2 Terminology . 4 1.3 Basic Assumptions About the Channel . 5 2 Detecting and Correcting Errors 7 3 Linear Codes 13 3.1 Introduction . 13 3.2 The Generator and Parity Check Matrices . 16 3.3 Cosets . 20 3.3.1 Maximum Likelihood Decoding (MLD) of Linear Codes . 21 4 Bounds for Codes 25 5 Hamming Codes 31 5.1 Introduction . 31 5.2 Extended Codes . 33 6 Golay Codes 37 6.1 The Extended Golay Code : C24. ....................... 37 6.2 The Golay Code : C23.............................. 40 7 Reed-Muller Codes 43 7.1 The Reed-Muller Codes RM(1, m) ...................... 46 8 Decimal Codes 49 8.1 The ISBN Code . 49 8.2 A Single Error Correcting Decimal Code . 51 8.3 A Double Error Correcting Decimal Code . 53 iii iv CONTENTS 8.4 The Universal Product Code (UPC) . 56 8.5 US Money Order . 57 8.6 US Postal Code . 58 9 Hadamard Codes 61 9.1 Background . 61 9.2 Definition of the Codes . 66 9.3 How good are the Hadamard Codes ? . 67 10 Introduction to Cryptography 71 10.1 Basic Definitions . 71 10.2 Affine Cipher . 73 10.2.1 Cryptanalysis of the Affine Cipher . 74 10.3 Some Other Ciphers . 76 10.3.1 The VigenèreCipher . 76 10.3.2 Cryptanalysis of the VigenèreCipher : The Kasiski Examination . 77 10.3.3 The Vernam Cipher . 79 11 Public Key Cryptography 81 11.1 The Rivest-Shamir-Adleman (RSA) Cryptosystem . 82 11.1.1 Security of the RSA System . 85 11.2 Probabilistic Primality Testing . 86 12 The Rabin Cryptosystem 89 12.1 Security of the Rabin Cryptosystem . 94 13 Factorisation Algorithms 97 13.1 Pollard’s p 1 Factoring Method (ca. 1974) . 97 − 14 The ElGamal Cryptosystem 99 14.1 Discrete Logarithms (a.k.a indices) . 99 14.2 The system . 99 14.3 Attacking the ElGamal System . 101 15 Elliptic Curve Cryptography 105 15.1 The ElGamal System in Arbitrary Groups . 105 15.2 Elliptic Curves . 106 15.3 The Menezes-Vanstone Cryptosystem . 110 CONTENTS v 16 The Merkle-Hellman “knapsack” System 113 17 The McEliece Cryptosystem 117 A Assignments 121 vi CONTENTS Part I Coding 2 CONTENTS Chapter 1 Introduction and Basic Ideas 1.1 Introduction A model of a communication system is shown in Figure 1.1. Figure 1.1: Model of a Communications System The function of each block is as follows ✗ Source : This produces the output that we would like to transmit over the channel. The output can be either continuous, for example voice, or discrete as in voltage measurements taken at set times. ✗ Source Encoder : The source encoder transforms the source data into binary digits (bits). Further, the aim of the source encoder is to minimise the number of bits required to represent the source data. We have one of two options here in that we could require that the source data be perfectly reconstructible (as in computer data) or we may allow a certain level of errors in the reconstruction (as in images or speech). ✗ Encryption : Is intended to make the data unintelligible to all but the intended receiver. That is, it is intended to preserve the secrecy of the messages in the presence of unwelcome monitoring of the channel. Cryptography is the science of maintaining secrecy of data from both passive intrusion (eavesdropping) and active intrusion (introduction or alteration of messages). This introduces the following distinctions 3 4 CHAPTER 1. INTRODUCTION AND BASIC IDEAS ✎ Authentication : is the corroboration that the origin of a message is as claimed. ✎ Data Integrity : is the property that the data has not been changed in an unauthorised way. ✗ Channel Encoder : Tries to maximise the rate at which information can be reliably transmitted on the channel in the presence of disruptions (noise) that can introduce errors. Error correction coding, located in the channel encoder, adds redundancy in a controlled manner to messages to allow transmission errors to be detected and/or corrected. ✗ Modulator : Transforms the data into a format suitable for transmission over the channel. ✗ Channel : The medium that we use to convey the data. This could be in the form of a wireless link (as in cell phones), a fiber optic link or even a storage medium such as magnetic disks or CD’s. ✗ Demodulator : Converts the received data back into its original format (usually bits). ✗ Channel Decoder : Uses the redundancy introduced by the channel encoder to try and detect/correct any possible errors that the channel introduced. ✗ Decryption : Converts the data back into an intelligible form. At this point one would also perform authentication and data integrity checks. ✗ Source Decoder : Adds back the original (naturally occurring) redundancy that was removed by the source encoder. ✗ Receiver : The intended destination of the data produced by the source. 1.2 Terminology Definition 1.2.1 (Alphabet). An alphabet is a finite, nonempty set of symbols. Usually the binary alphabet K = 0, 1 is used and the elements of K are called binary digits or { } bits. Definition 1.2.2 (Word). A word is a finite sequence of elements from an alphabet. Definition 1.2.3 (Code). A code is a set C of words (or codewords). 1.3. BASIC ASSUMPTIONS ABOUT THE CHANNEL 5 Example 1.2.4. C = 00, 01, 10, 11 and C = 000, 001, 01, 1 are both codes over the 1 { } 2 { } alphabet K. Definition 1.2.5 (Block Code). A block code has all codewords of the same length; this number is called the length of the code. In our example C1 is a (binary) block code of length 2, while C2 is not a block code. Definition 1.2.6 (Prefix code). A prefix code is a code such that there do not exist distinct words wi and wj such that wi is a prefix (initial segment) of wj. C2 is an example of a prefix code. Prefix codes can be decoded easily since no codeword is a prefix of any other. We simply scan the word from left to right until we have a codeword, we then continue scanning from this point on until we reach the next codeword. This is continued until we reach the end of the received word. Example 1.2.7. Suppose the words 000, 001, 01 and 1 correspond to N, S, E and W respectively. If 001011100000101011 is received, the message is S E W W N S E E W. Since C2 is a prefix code, there is only one way to decode this message. Prefix codes can be used in a situation where not all data are equally likely — shorter words are used to encode data that occur more frequently and longer words are used for infrequent data. A good example of this is Morse code. In English, the letter E occurs quite frequently and so in Morse code it is represented by a single dot . On the other · hand the letter Z rarely occurs and it is assigned the sequence of dots and dashes. − − · · 1.3 Basic Assumptions About the Channel We make the following three assumptions about the channel 1. If n symbols are transmitted, then n symbols are received — though maybe not the same ones. That is, nothing is added or lost. 2. There is no difficulty identifying the beginning of the first word transmitted. 3. Noise is scattered randomly rather than being in clumps (called bursts). That is the probability of error is fixed and the channel stays constant over time. Most channels do not satisfy this property with one notable exception — the deep space channel (used to transmit images and other data between spacecraft and earth). In 6 CHAPTER 1. INTRODUCTION AND BASIC IDEAS almost all other channels the probability of error varies over time. For example on a CD, where certain areas are more damaged than others, and in cell phones where the radio channel varies continually as the users move about their terrain. A binary channel is called symmetric if 0 and 1 are transmitted with equal accuracy. The reliability of such a channel is the probability p, 0 p 1, that the digit sent is the ≤ ≤ digit received. The error probability q = 1 p is the probability that the digit sent is − received in error. Figure 1.2: The Binary Symmetric Channel Example 1.3.1. If we are transmitting codewords of length 4 over a binary symmetric channel (BSC) with reliability p and error probability q = 1 p, then − 4 Probability of no errors : p4q0, 0 4 Probability of 1 error : p3q1, 1 4 Probability of 2 errors : p2q2, 2 4 Probability of 3 errors : p1q3, 3 4 Probability of 4 errors : p0q4. 4 Chapter 2 Detecting and Correcting Errors Errors can be detected when the received word is not a codeword. If a codeword is received, then it could be that no errors, or several errors (changing one codeword into another) have occurred. Consider C = 00, 01, 10, 11 . Every received word is a codeword, so no errors can 1 { } be detected (and none can be corrected). On the other hand consider C = 000000, 010101, 101010, 111111 , 2 { } obtained from C1 by repeating each word in C1 three times — this is known as a repetition code. Here we can detect 2 or fewer errors as we need 3 or more errors to change one codeword into another. Example 2.1.2. Let C = 000000, 010101, 101010, 111111 be the code defined above. 2 { } Assume that 010101 is sent, but that 110111 is received.

Math 550 Coding and Cryptography Workbook J. Swarts 0121709 Ii Contents

A [49 16 6 ]-Linear Code As Product of the [7 4 2 ] Code Due to Aunu and the Hamming [7 4 3 ] Code

Coding Theory: Linear Error-Correcting Codes Coding Theory Basic Deﬁnitions Error Detection and Correction Finite Fields Anna Dovzhik

Superregular Matrices and Applications to Convolutional Codes

Mceliece Cryptosystem Based on Extended Golay Code

Reed-Solomon Code and Its Application 1 Equivalence

Coded Computing Via Binary Linear Codes: Designs and Performance Limits Mahdi Soleymani∗, Mohammad Vahid Jamali∗, and Hessam Mahdavifar, Member, IEEE

Reed-Solomon Error Correction

Linear Block Codes

Short Low-Rate Non-Binary Turbo Codes Gianluigi Liva, Bal´Azs Matuz, Enrico Paolini, Marco Chiani

Linear Block Codes Vector Spaces

Orthogonal Arrays and Codes

An Introduction to Coding Theory