Genomics and Proteomics Based Security Protocols for Secure Network Architectures

Harry Cornel Shaw

B.S. in Computer Science, August 1993, University of Maryland University College M.S. in Electrical Engineering, January 2006, The George Washington University

A Dissertation submitted to

The Faculty of The School of Engineering and Applied Science of The George Washington University in partial satisfaction of the requirements for the degree of Doctor of Philosophy

May 19, 2013

Dissertation directed by Hermann J. Helgert Professor of Engineering and Applied Science The School of Engineering and Applied Science of The George Washington

University certifies that Harry Cornel Shaw has passed the Final Examination for the degree of Doctor of Philosophy as of March 20, 2013. This is the final and approved form of the dissertation.

Genomics and Proteomics Based Security Protocols for Secure Network Architectures

Harry C. Shaw

Dissertation Research Committee:

Hermann J. Helgert, Professor of Engineering and Applied Science, Dissertation Director

Murray Loew, Professor of Engineering and Applied Science Committee Member

John J. Hudiburg, NASA/Goddard Space Flight Center, Exploration and Space Communications Division Mission Systems Engineer Committee Member

Tian Lan, Assistant Professor of Engineering and Applied Science Committee Member

Sayed Hussein, Professional Lecturer, Department of Electrical and Computer Engineering, The George Washington University Committee Member

ii

 Copyright  2013 by Harry C. Shaw All rights reserved

iii

Dedication

I dedicate this research to my mother Mary Alice, my wife, Debbie, my sister, Renee and all my friends and colleagues that have supported me through this long ordeal. The

GSFC Space Network Project, which has supported me and without their support, this achievement would not have been possible. Ron Miller supported my research at the very outset. I was then lucky enough to end up working for Ted Sobchak who made it possible for me to get to the finish line. Michelle Hamilton and Paula Tidwell without whose support I literally could not have continued in graduate school. The entire Space Network project staff, which helped me in so many ways I will probably never completely know the extent of their support. I dedicate this effort to Diane Rawlings, Taliha Brock, Patricia

Gregory, Cathy Barclay, Pat Boldosser, Andre Fortin, Yen Wong, Ron Zaleski, Darryl

Lakins, Anne Kosloski, Haleh Safavi (who commiserated with me on a regular basis),

Tim Rykowski, Jeff Lubelczyk, Phil Liebrecht, Jeff Volosin, Mary Ann Esfandiari and

John Hudiburg, Keiji Tasaki, and Roger Flaherty. At the White Sands Complex: Don

Shinners, Mike Bielucki, Markland Benson, Bert Ransom, Dan Hein and Richard Von

Wolfe. My trio of supervisors: Lakesha Bates, Miriam Wennersten, Lavida Cooper and the Electrical Engineering Division Chief, Janet Barth. I also dedicate this effort to my good friends Polly at JPL, Barbara, and Ernie for doing what good friends always do.

iv

Acknowledgement

I would like to acknowledge the support of my advisor, Herman Helgert and the committee members Murray Loew, John Hudiburg, Tian Lan, and Sayed Hussein. Dr.

Hussein spent many hours with me working on this activity and I very much appreciate his efforts as well as the efforts of the entire committee. I would like to acknowledge the support of the NASA Space Communications and Navigation Program Office for their support and encouragement in my academic efforts.

v

Abstract

Genomics and Proteomics Based Security Protocols for Secure Network Architectures

Network security is a vital component of the design of any network. There are five main requirements to be addressed in developing a secure network: Authentication, confidentiality, data integrity, non-repudiation, and access control. In vivo, biomolecular cellular systems of authenticate themselves through various means such as transcription factors and promoter sequences. These factors also enforce access control. They have means of retaining confidentiality of the meaning of genome sequences through processes such as control of protein expression. They are capable of establishing data integrity and non-repudiation through transcriptional and translational controls.

A suite of genomics and proteomics based authentication and confidentiality protocols will be demonstrated that augment traditional network security approaches with concepts from molecular biology via the regulation of gene expression. These protocols are agnostic to their implementation and can be incorporated into any existing network security protocol (Secure http, SSL, TLS, IPSec, etc.) or any future network security strategy. The protocols can be implemented for implementing web-based security strategies, digital signatures, digital rights management, and general purpose encryption for data in motion or data at rest.

These protocols will provide new challenges for network attackers by forcing them to work in both the information security domain and the molecular biology domain.

Although no security strategy is without vulnerabilities, the intent of this work is to

vi

present a completely new set of problems for network attackers.

vii

Table of Contents

Dedication……...... iv

Acknowledgement ...... v

Abstract………...... vi

Table of Contents ...... viii

List of Figures… ...... xvii

List of Tables…...... xxiii

CHAPTER 1  INTRODUCTION ...... 1

1.1 Security Elements in the Electronic and Biological Domains ...... 1

1.2 Natural sources of security concepts and architectures...... 2

1.3 Problem Statement ...... 3

1.3.1 Weak points with the current security approaches ...... 4

1.3.1.1 Cryptanalysis techniques are very

strong and improve with increases in computing capability ...... 4

1.3.1.2 Protocols for performing authentication are

vulnerable to social engineering...... 6

1.3.1.3 Certificate Authorities (CA) are vulnerable to identity

impersonation ...... 6

1.3.1.4 Useful lifetime of cryptographic codes is unpredictable ...... 8

1.3.1.5 Network vulnerability due to lax security implementation...... 9

1.4 Proposed Solution via the research goals...... 10

1.5 Organization of the Dissertation ...... 12

viii

CHAPTER 2  PREVIOUS WORK BY OTHER RESEARCHERS ...... 13

2.1 Genomic approaches that are not targeted for biological instantiation ...... 13

2.1.1 DNA and the central dogma ...... 13

2.1.2 DNA computing and Elliptic Curve Cryptography ...... 14

2.1.3 Other DNA encryption systems in the literature...... 14

2.2 Genomic approaches that are targeted for biological instantiation ...... 16

2.2.1 Cryptography on the basis of separation by gel electrophoresis...... 16

2.2.2 DNA Watermarks via coding in synonymous codons ...... 16

2.3 Relationship between the currently published

approaches and the current research...... 17

CHAPTER 3  KEY CONCEPTS FROM BIOLOGY UTILIZED IN THE DISSERTATION ...... 17

3.1 Short summary of the organization of DNA ...... 17

3.1.1 Eukaryotic DNA organization ...... 17

3.1.2 Prokaryotic DNA organization ...... 18

3.2 Gene Transcription and Translation ...... 19

3.3 Patterns of gene expression ...... 20

3.3.1 Selection of gene expression processes from

prokaryotic or eukaryotic groups ...... 22

3.4 Organization of genes in the eukaryotes ...... 23

3.4.1 DNA Nomenclature in the dissertation ...... 27

3.5 Transcription and the General Transcription

Machinery of the Eukaryotic Nucleus ...... 28

3.5.1 Additional regulatory sequences and their functions of interest...... 30 ix

3.5.1 Processes of Transcription ...... 31

3.6 Translation of Eukaryotic messenger RNA ...... 32

3.6.1 Processes of Translation ...... 32

3.7 Role of these processes in the dissertation ...... 33

CHAPTER 4  DESCRIPTION OF THE RESEARCH PRODUCTS ...... 34

4.1 Initial Research on a DNA-inspired authentication protocol

for Mobile ad hoc Networks (MANET) from 2006-2008...... 35

4.2 Protocol Enhancements Resulting in a Generalized DNA-based

HMAC for Authentication (2008-2010)...... 35

4.3 Genomics and Proteomics Protocols for Network Security (2010-2012) ...... 35

4.4 Development of the initial protocol ...... 36

4.4.1 Encryption Process...... 39

4.1.2 Mutation Effects and Fitness ...... 50

4.2 Genomics based security protocol: DNA Keyed Hash Message

Authentication Code ...... 53

4.2.1 Elements of the genomics HMAC architecture ...... 53

4.2.1.1 Lexicographic and DNA representation of plaintext ...... 54

4.2.1.2 Sentence-message order coding ...... 54

4.2.1.3 Message coding ...... 56

4.2.1.1 Availability of Genomic data for use in encryption...... 56

4.2.2 Encryption Process...... 57

4.2.2.1 Mismatches and Annealing ...... 58

4.2.3 Prototype DNA-based, keyed HMAC system ...... 60 x

4.2.3.1 Genomic hash code properties ...... 61

4.2.3.2 Initialize and Perform Lexicographic and DNA assignments ... 62

4.2.3.3 Binary representation of the DNA bases ...... 63

4.2.3.4 Encryption, Mismatches and Annealing ...... 64

4.2.3.5 Cryptographic Genome ...... 65

4.2.3.6 Protocol for Message Authentication...... 65

4.2.3.6.1 User Software for evaluation keyed HMAC performance ...... 68

4.2.3.7 Short Message Performance ...... 74

4.2.3.8 Effects of coding long strings of zeros ...... 78

4.2.3.9 Intronic sequence padding and potential

frameshift mutations can increase cryptographic hardness ...... 81

4.2.4 Relationship Between Cryptography and Gene Expression ...... 84

4.2.4.1 Epigenetic relationships between

cryptography and gene expression...... 89

4.3 Genomics and Proteomics Protocol Development ...... 90

4.3.1 Adaptive Self-Correcting Floating Point Source

Coding Methodology for a Genomic Encryption Protocol ...... 90

4.3.1.1 Overview of the source coding methodology ...... 90

4.3.1.2 High-level description of the transmitter source coding process ...... 91

4.3.1.3 High-level description of the receiver source decoding process ...... 98

4.3.1.4 Example of floating point source coding ...... 99

4.3.1.4.1 Analysis of source coding data ...... 101

4.3.1.5 Genetic Algorithm (GA) for Source code error correction ...... 109 xi

4.3.1.5.1 Simulation over BPSK channel ...... 109

4.3.1.5.1.1 A short description of the MATLAB model ...... 110

4.3.1.5.1.2 Error correction capability through

genetic algorithm selection of most fit codewords ...... 114

4.3.1.6 Error correction capability through redundancy

(Majority-weighted coding) ...... 126

4.4 A Cryptographic System of Authentication and Confidentiality

Based Upon the Principles of the Regulation of Gene Expression ...... 129

4.4.1 Introduction to genomic and proteomic cryptography protocol ...... 129

4.4.2 Algorithm Description ...... 132

4.4.2.1 Coding of sequences as objects...... 132

4.4.2.2 Sequence Column Vector Operations...... 133

4.4.2.3 Encryption Matrix ...... 136

4.4.2.4 Use of codes with other types of sequences...... 137

4.4.2.5 Coding of DNA-Protein Complexes: The General

Transcriptional Regulatory Complex as an example...... 137

4.4.2.5 Control of Transcription Factor Binding ...... 141

4.4.2.6 Encrypting the Ciphergene-transcription factor

as a protein-DNA complex ...... 147

4.4.3 Transcription and Translation ...... 149

4.4.3.1 Coding of RNA Polymerase II ...... 150

4.4.3.2 Co-Activators - Mediator ...... 151

4.4.3.3 General Transcription Factors and xii

Transcriptional Activator Factors ...... 151

4.4.3.4 Upstream Stimulatory Activity (USA) ...... 151

4.4.3.5 Upstream and Inducible Transcription Factors ...... 152

4.4.3.6 The transcription process – Initiation, Elongation, Termination ...... 153

4.4.3.6.1 Initiation ...... 153

4.4.3.6.2 Elongation ...... 156

4.4.3.6.3 Termination ...... 157

4.4.3.6.5 The Spliceosome and Alternative Splicing ...... 162

4.4.3.6.6 Transcriptional Regulation via forms of transcriptional

activation ...... 163

4.4.3.6.7 microRNA mediated mRNA post-transcriptional regulation .. 164

4.4.3.6.8 The role of non-coding DNA and RNA interference

in developing a genomic cryptographic protocol ...... 164

4.4.3.7 Translation ...... 167

4.4.3.7.2 tRNA function ...... 167

4.4.3.7.1 Translational regulatory network at Initiation...... 169

4.4.3.7.2 Translational regulatory network at Elongation ...... 170

4.4.3.7.3 Translational Regulation Network at Termination ...... 173

4.4.3.7.4 Post-translational modification coding, protein-protein

interaction ...... 173

4.4.3.8 Network Concepts for A Cryptographic System

Using the Principles Of Gene Regulation ...... 176

4.4.3.8.1 Using Gene Regulatory Networks as a xiii

basis of authentication in a Mobile Ad hoc Network (MANET)...... 176

4.4.3.8.2 Proteomic Authentication Messages...... 177

4.4.3.8.3 Integration of genomic and proteomic protocols into legacy

security networks...... 178

4.4.3.8.4 Network Firewalls via Patterns of Gene Expression ...... 181

4.4.3.8.5 Secure versus non-secure ciphercolony implementations ...... 189

CHAPTER 5  GENOMIC AND PROTEOMIC ENCRYPTION/DECRYPTION

PROCESSES AND SIMULATION ...... 191

5.1 Process Overview...... 191

5.1.1 General Information about the protocols...... 199

5.1.2 Source Data ...... 201

5.2 Encryption Process Example...... 201

5.2.1 Example Input Message ...... 201

5.2.2 -globin gene sequence information ...... 202

5.2.3 Ciphergene coding process using the -globin gene as the

message carrier...... 206

5.2.4 Encryption of base sequences ...... 217

5.2.5 Nomenclature ...... 220

5.2.6 Description of Message Traffic between Sender and Receiver...... 220

5.2 Level 2 - Coding the General Transcriptional Regulatory Complex ...... 224

5.3 Level 3A, 3B and 3C – Transcription and Translation ...... 237

5.3.1 Level 3A, Level 3B and Level 3C ...... 238

5.3.1.1 Basal Transcriptional Complex of Level 3A ...... 238 xiv

5.3.1.2 Transcription and Cipher mRNA of Level 3B ...... 239

5.3.1.3 Translation and Level 3C processing...... 243

5.4 Comparison of the genomic and proteomic encryption algorithms

with the Advanced Encryption System (AES) algorithms...... 258

5.4.1 Short summary of AES ...... 258

5.4.1.1 Decryption...... 263

5.4.2 Short summary of the genomic and proteomic algorithm overhead ... 263

5.4.2.1 Vulnerabilities ...... 265

5.4.8 Quality of Protection ...... 266

5.4.8.2 Vulnerabilities of networks, computers, and

mobile applications continue to grow...... 267

5.4.8.3 Two QoP metric concepts: Security LD50

and Security Rate Decay Constant ...... 268

CHAPTER 6. CONCEPTS OF NETWORK OPERATIONS

USING GENOMIC AND PROTEOMIC SECURITY ...... 274

6.1 Network of Networks with genomic and proteomic security ...... 274

6.2 National Smart Grid Application ...... 279

6.3 Hierarchal Protocol Architecture Utilizing a Certificate Authority ...... 281

6.3.1 Level 1 Encryption ...... 283

6.3.2 Level 2 Encryption ...... 284

6.3.3 Level 3A Encryption ...... 286

6.3.4 Level 3B Encryption ...... 288

xv

6.3.5 Level 3C Encryption ...... 289

6.3.5.1 Post-transcriptional and post translational modifications ...... 291

6.3.5.2 Post-transcriptional modifications ...... 291

6.3.5.3 Post-Translational Modifications ...... 294

6.3.6 Receiver Processing ...... 296

6.3.6.1 Receiver processing without ciphergene protein expression ... 296

6.3.6.2 Level 3B Decryption ...... 298

6.3.6.3 Level 3A Decryption...... 299

6.3.6.4 Level 2 Decryption...... 300

6.3.6.5 Level 1 Decryption ...... 301

6.3.6.6 Receiver Processing with gene expression ...... 302

6.3.7 A gene expression authentication session ...... 304

CHAPTER 7 - CONCLUSIONS ...... 309

REFERENCES ...... 311

APPENDIX A – DIFFUSION AND CONFUSION METRICS FOR ENCRYPTED

MESSAGES ...... 318

A.1 Generating Fitness algorithm from Diffusion and Confusion scores ...... 325

APPENDIX B - PLAINTEXT TO CIPHERTEXT WALKTHROUGH ...... 329

APPENDIX C - SAMPLE MUTANT ENCRYPTIONS...... 356

xvi

List of Figures

Figure 1-1. CA process for authentication, use of symmetric and asymmetric keys for authentication and confidentiality...... 8

Figure 1-2. Capabilities of the genomics and proteomics approach ...... 11

Figure 3-1. E. coli genome...... 19

Figure 3-2. DNA to RNA to Protein ...... 20

Figure 3-3. Genomic and proteomic complexity...... 22

Figure 3-4. Nomenclature for gene transcription and translation using -globin as an example...... 24

Figure 3-5. -globin transcript with control regions annotated ...... 25

Figure 3-6. DNA Nomenclature in the dissertation...... 28

Figure 3-7. Pre-initiation complex for transcription...... 29

Figure 3-8. hsp70 transcriptional control regions...... 31

Figure 4-1. Research Development and Products ...... 34

___ Figure 4-2: MANET routed over secure nodes at t1 ( ) and t2 (---)...... 39

Figure 4-3. DNA Dictionary Size versus word length ...... 42

Figure 4-4. Mating of chromosome to message and subsequent selection ...... 44

Figure 4-5. Anneal/mutation process ...... 45

Figure 4-6. Temporal route trust...... 47

Figure 4-7: Sample plaintext, encrypted mutation, annealed mutation ...... 52

Figure 4-8. Comparison of the definition of fitness with respect to

DNA encryption algorithm ...... 52 xvii

Figure 4-9. data from 2000-2010 ...... 57

Figure 4-11. Single strand chromosome encryption ...... 59

Figure 4-12. Dual strand chromosome encryptions ...... 59

Figure 4-13. MANET with trusted and untrusted nodes and routes ...... 60

Figure 4-14. Plaintext Coding, Encryption, and Annealing Process ...... 65

Figure 4-15. Sender and Receiver Protocol...... 67

Figure 4-16. Sample output of sender and receiver hash code of ‘jump out windows’ ... 68

Figure 4-17. Sender Keyed HMAC software interface ...... 69

Figure 4-18. Figure 4-18. Receiver keyed HMAC software interface using correct pre-shared secret (start location 125) ...... 70

Figure 4-19. Receiver authentication failure due to start location at

125 instead of 124...... 72

Figure 4-20. Receiver authentication failure due to start location at 126 instead of 125...... 73

Figure 4-22. Collision resistance tests for short messages ...... 76

Figure 4-23. MANET route establishment at a slice in time...... 76

Figure 4-24. Frameshift Mutations ...... 83

Figure 4-25. Confusion factors in actual DNA genome ...... 84

Figure 4-26. Conceptual example of Confidentiality and Authentication in E. coli using lacZ expression ...... 86

Figure 4-27. Simplified comparison between gene transcription control regions and MAC protocol...... 88

Figure 4-28. Organization of words for the source coding protocol...... 92 xviii

Figure 4-29. Source Coding Program User Interface...... 102

Figure 4-30. Source Decoding Program User Interface...... 103

Figure 4-31. Baseband BPSK model for analyzing source coding error correction characteristics...... 110

Figure 4-32. BPSK Channel Theoretical BER through AWGN channel...... 113

Figure 4-33. Uncorrected Codeword Error Rate...... 118

Figure 4-34. Coding gain of genetic algorithm scheme ...... 125

Figure 4-35. Comparison of performance of Uncorrected, Genetic

Algorithm scheme and Majority voting Rate-1/3 scheme ...... 128

Figure 4-36. Biological gene structure with substitution of

DNA text message into the exons...... 134

Figure 4-37. Requirement for the Pre-Transcriptional Complex...... 138

Figure 4-38. Binding of BRE to TFIIA and TATA to TFIIA ...... 139

Figure 4-39. Non-binding of BRE to TFIIH...... 140

Figure 4-41. Sample coding stages for the basal transcriptional complex...... 155

Figure 4-42. Start of Transcriptional Elongation ...... 158

Figure 4-43. Transcriptional Elongation of mRNA transcript ...... 159

Figure 4-44. Poly A signaling during Transcriptional Elongation ...... 160

Figure 4-45. Capping 5’ and 3’ ends of mRNA transcript ...... 161

Figure 4-46. Format of mRNA code after completion of transcription ...... 162

Figure 4-47. Transcriptional activation at PORE and MORE DNA regulatory sites ..... 163

Figure 4-48. The role of non-coding DNA ...... 166

Figure 4-49. tRNA Structure...... 168 xix

Figure 4-50. Translation Initiation Coding Sequence ...... 171

Figure 4-51. Translational Elongation Coding ...... 172

Figure 4-52. Translational Termination Coding ...... 175

Figure 4-53. MANET authentication via the general transcription machinery specified in a gene transcriptional regulatory network ...... 177

Figure 4-54. Protein Coded Authentication Challenge ...... 178

Figure 4-55. Alice and Bob communicate using genomic network security ...... 179

Figure 4-56. Network BioID architecture...... 180

Figure 4-57. Pattern of Expression at t=12 ...... 186

Figure 4-58. Pattern of Expression at t=22 ...... 186

Figure 4-55. Pattern of Expression at t=71 ...... 187

Figure 4-60. Pattern of Expression at t=113 ...... 187

Figure 4-61. Exchange of ciphercolony state information about

Proteins A and B between User 1 and User 2...... 188

Figure 4-62. Handshaking protocol between User 1 and User 2...... 188

Figure 4-63. Non secure ciphercolony implementation ...... 189

Figure 4-64. Secure Ciphercolony implementation ...... 190

Figure 5-1. Genomic and Proteomic Flowchart for Encryption and Decryption through all levels of the protocol ...... 193

Figure 5-2. Data Types for the Genomic and Proteomic Protocols ...... 194

Figure 5-3. Progression of the structure of the ciphertext...... 195

Figure 5-4. The Transformation of coding sequences into types...... 196

Figure 5-5. Operations performed on coded gene sequences ...... 197 xx

Figure 5-6. Coding and Decoding of mRNA sequences ...... 198

Figure 5-7. Coding and Decoding of Protein sequences ...... 198

Figure 5-8. Alice and Bob communicating with established -globin keys ...... 223

Figure 5-9. Pre-transcriptional complex regulatory Network ...... 225

Figure 5-10. General Transcriptional Machinery for -globin message ...... 226

Figure 5-11. Basal Transcriptional Complex Network ...... 247

Figure 5-12. cipher-mRNA Complex Network and Coding ...... 248

Figure 5-13. Cipherprotein Translation Network ...... 249

Figure 5-14. Volume of DNA text to Plaintext at various plaintext message lengths. ... 252

Figure 5-15. Ratio of DNA text to Plaintext at various plaintext message lengths ...... 252

Figure 5-16. Protocol overhead summary ...... 256

Figure 5-17. Lightweight authentication challenge and response scenario ...... 257

Figure 5-18. AES Flowchart for Encryption and Decryption ...... 259

Figure 5-19. AES-128 data and key structures...... 261

Figure 5-20. Level by level detail of encryption overhead for 128 bit block...... 264

Figure 5-21. Small, Medium and Heavy encryption overhead for plaintext character lengths from 500 to 2500 characters...... 265

Figure 5-22. Applications containing vulnerabilities targeted by web exploits in 2012 ...... 268

Figure 5-23. QoP as expressed as a decay constant...... 271

Figure 6-1. Network Concept of Operations using regulation of gene expression ...... 275

Figure 6-2. Network deployment strategy ...... 277

Figure 6-3. Network BioID deployment at the individual user level...... 278 xxi

Figure 6-4. Smart Grid Information Network ...... 279

Figure 6-5. Prototype Network BioID infused security architecture for the smart grid...... 280

Figure 6-6. Level 1 Sender Encryption...... 282

Figure 6-7. Level 2 Sender Encryption ...... 285

Figure 6-8. Level 3A Sender Encryption ...... 287

Figure 6-9. Level 3B Sender Encryption...... 289

Figure 6-10. Level 3C Encryption ...... 290

Figure 6-11. Sender Encryption Post Level 3A Post-transcriptional modifications ...... 292

Figure 6-12. Receiver Encryption Post Level 3C Post-transcriptional modifications .... 293

Figure 6-13. Sender Post Level 3B Post-translational modifications ...... 294

Figure 6-14. Receiver Post Level 3B Post-translational modifications ...... 295

Figure 6-15. Receiver Level 3C Decryption...... 297

Figure 6-16. Receiver Level 3B Decryption...... 298

Figure 6-17. Receiver Level 3A Decryption Process ...... 299

Figure 6-18. Receiver Level 2 Decryption...... 300

Figure 6-19. Receiver Level 1 Decryption ...... 301

Figure 6-20. Receiver processing with fluorescent protein detection ...... 303

Figure 6-21. Step 1 - User Access Request...... 306

Figure 6-22. Step 2 - Proteomic message challenge/response authentication ...... 306

Figure 6-23. Step 3 - User Response...... 307

Figure 6-24. Step 4 - User Access Granted...... 308

Figure A-3. Confusion and Diffusion metrics ...... 318 xxii

List of Tables

Table 1-1. Security Requirements and Types of Available Security Mechanisms ...... 3

Table 3-1. O. Indica Genome summary ...... 18

Table 4-1. DNA base coding...... 40

Table 4-2. Sample DNA dictionary entries...... 42

Table 4-3. Encryption Process ...... 42

Table 4-4. Genomic Hash Code Properties...... 62

Table 4-5. Sample of Alpha To DNA Conversion Codes...... 63

Table 4-6. Plaintext to Lexicographic Order and DNA Letter Codes...... 63

Table 4-7 Encryption and Annealing Table ...... 64

Table 4-8. Sample of hash code collision metrics...... 75

Table 4-9. Sample hash code strings of 217 consecutive zeros ...... 78

Table 4-10. Tests of collisions on strings of consecutive zeros ...... 79

Table 4-11. Sample Diffusion and Confusion Scores for Hash Code for Message ‘Jump Out Windows’ ...... 80

Table 4-12. Modified Shannon-Fano-Elias Coding ...... 95

Table 4-13. DNA Base Source Coding ...... 97

Table 4-14. Comparison of 6-character block coding and

4-character block coding recovery ...... 100

Table 4-15. Floating Point coding of sample text at 8 characters/block ...... 105

Table 4-16. Floating Point coding of sample text at 6 characters/block ...... 106

Table 4-17. Floating Point coding of sample text at 4 characters/block ...... 107

Table 4-18. Recovered plaintext for 4, 6, 8, 12, and 24 character blocks ...... 108 xxiii

Table 4-19. Source Codeword Set ...... 111

Table 4-20. Random seeds for AWGN channel ...... 114

Table 4-21. Codeword Error count ...... 114

Table 4-22. Eb/N0 = 10dB performance ...... 115

Table 4-23. Eb/N0 = 8dB performance ...... 115

Table 4-24. Examination of binary codewords received in error ...... 119

Table 4-25. Candidate Codeword Error Corrections for Eb/N0 = 10dB ...... 121

Table 4-26. Candidate Codeword Error Corrections for Eb/N0 = 8dB ...... 121

Table 4-27. Codeword Correction Improvement ...... 124

Table 4-28. Codeword Error count, Triple redundancy case ...... 126

Table 4-29. Comparison of uncoded, genetic algorithm and Rate 1/3 source coding .... 127

Table 4-30. Cryptographic Protocol ...... 130

Table 4-31. Sample of event joint probabilities for figure 4-39 ...... 140

Table 4-32. Joint distribution of gene regulatory and transcription factor codes...... 142

Table 4-33. Symbols of type  ...... 143

Table 4-34. Codeword Entropy and Random hit probability ...... 145

Table 4-35. Probability of successive codeword hits by random selection ...... 145

Table 4-36. Alternative tuples for type  ...... 147

Table 5-1. Plaintext Twitter Length messages ...... 201

Table 5-2. Floating point Source Encoded Message ...... 202

Table 5-2. -globin transcript information ...... 204

Table 5-3 Coding and Non-coding base position summary ...... 205

Table 5-4. -globin coding elements ...... 207 xxiv

Table 5-5. Type Identifier for -globin ...... 209

Table 5-6. Compressed Ciphergene Sequence matrix, F ...... 211

Table 5-7. Ciphergene expression profile matrix, G ...... 212

Table 5-8. Ciphergene coding matrix, C ...... 212

Table 5-9. Encryption key matrix E1 ...... 214

Table 5-10. Encryption key matrix E2 ...... 215

Table 5-11. Encrypted Output, Lout ...... 216

Table 5-13. Coding the Addend from the Message and the Gene sequence ...... 218

Table 5-14. Decoding the Message from the Addend and Gene sequence ...... 218

Table 5-15. Coding and Decoding DNA with Addend Codes ...... 218

Table 5-16. Combinations of codes upstream of transcription initiation site ...... 220

Table 5-17. Gene Regulatory Codes from  and

Transcription Factor Codes from  ...... 228

Table 5-18. Joint DNA-Protein and Protein-Protein Events required for Pre-

Transcriptional Complex of the -globin messages ...... 228

Table 5-19. BRE  TFIIB ...... 229

Table 5-20. TATA  TFIIB ...... 229

Table 5-21. TATA  TFIID ...... 229

Table 5-22. INR  TFIIE...... 230

Table 5-23. INR  TFIID...... 230

Table 5.24. MTE  TFIIH...... 230

Table 5-25. DPE  TFIID...... 231

xxv

Table 5-26. TFIID  TFIIA...... 231

Table 5-27. TFIID  TFIIB ...... 231

Table 5-28. TFIIE  TFIIH...... 232

Table 5-29. TFIIB  TFIIF...... 232

Table 5-30. TFIIF  TFIIH...... 232

Table 5-31. Prefix free S in type  ...... 233

Table 5-32. Representation of BRE  TFIIB ...... 233

Table 5-33. Compressed codes for all DNA-protein and protein-protein intersections at Level 2 (Transcriptional Regulation) ...... 236

Table 5-34. Level 3A codes ...... 238

Table 5-35. Coding the mRNA with the Message and RNA Addend Codes...... 240

Table 5-36. Decoding the Message with mRNA and RNA Addend Codes...... 240

Table 5-37. Coding and Decoding mRNA with Addend Codes ...... 241

Table 5-38. mRNA base position summary ...... 241

Table 5-39. Joint DNA-Protein and Protein-Protein Events required for cipher-mRNA ...... 242

Table 5-40. Codon substitution table ...... 243

Table 5-41. mRNA to Protein ...... 245

Table 5-42. Translation Joint Probabilities ...... 245

Table 5-43. Overhead for sample message of 140 characters ...... 250

Table 5-44. Message overhead at level 1 of the protocol for various length messages ...... 251

xxvi

Table 5-45. Adjusted sample message overhead characteristics ...... 253

Table 5-46. AES Parameters ...... 258

Table 5-47. Small, Medium and Heavy Implementations ...... 263

Table 5-48. Overhead for encryption of 128 bit plaintext block ...... 264

Table 5-49. Tabular values of figure 5-23 ...... 272

Table A-2. Diffusion metrics for 3 generations of encrypted sense mutants ...... 319

Table A-2. Confusion metrics for 3 generations of encrypted sense mutants ...... 322

Table A-3. Fitness scoring from three generations of encrypted output fitness scores...... 326

Table C-1. Mutant Encryptions...... 356

xxvii

CHAPTER 1  INTRODUCTION

1.1 Security Elements in the Electronic and Biological Domains

Encryption and authentication algorithms and protocols exist in a singular domain in which concepts and algorithms are vetted on a wide scale and then become adopted by systems and equipment providers. Malfeasors know where to concentrate on developing countermeasures to infiltrate, disrupt, and attack networks and users. The use of open source tools and wide internet distribution of information amplifies their attack capabilities. The high degree of security problems on Android-based applications on smartphones is an example of the types of security vulnerabilities easily exploited on open source platforms [1]. The goal of network providers is to maximize the time window between the adoption of a security concept or protocol and emergence of a successful attack strategy, which renders the concept obsolete. Network providers are forced into recurring development of counter-countermeasures to fight successful attack strategies and anticipated attack strategies. Given that all network security concepts have vulnerabilities that will lead to their eventual downfall, why not design networks around security concepts that will give malfeasors a completely new set of problems to solve.

Malfeasors, like developers, do not have infinite resources. New proprietary networks, designed using some of the tools of the existing network and communications toolbox, can increase the time window between adoption and collapse. The trick is to make sensible trades between the usage of existing tools and concepts and design of new tools and concepts such that the hybrid network is achievable in a practical sense. 1

This dissertation describes implementations of authentication and confidentiality that can augment existing security protocols but can exist with or without a centralized third party system and use of standard encryption and authentication approaches. Principles from molecular biology, specifically transcriptional and post-transcriptional regulation of gene expression will be utilized. The entire set of prokaryotic and eukaryotic genomes provide a rich landscape for novel encryption keys and novel forms of bio-certificate authorities. This landscape can be enhanced with synthetic genomes.

1.2 Natural sources of security concepts and architectures.

Nature provides a number of sources for novel security concepts and architectures.

One source is quantum cryptography. Quantum cryptography offers the possibility of data integrity via the Heisenberg uncertainty principle (any attempt to monitor the state of a photon alters its state). It also offers the possibility of encryption key distribution via quantum entanglement [2]. Another source is molecular biology. Molecular biology provides a rich source of gene expression and regulation mechanisms, which can be adopted to use in the information and electronic communication domains. For the five major challenges of network security (authentication, confidentiality, data integrity, non- repudiation, and access control) molecular biology provides clues for addressing each of these challenges as shown in table 1-1. The focus of this dissertation is adaptation of concepts from molecular biology for secure network architectures.

2

Table 1-1. Security Requirements and Types of Available Security Mechanisms

Security Physical Domain Electronic Domain Biomimetic Requirement Mechanisms Mechanisms Enhancement (people, objects, (Software and locations, etc.) Networks) Authentication Face-to-face Certificate Genomic based Identification, Authorities, Digital authentication fingerprints, Signatures, Network codes, in vivo vs. retinal scan, Security Protocols, in vitro DNA physical Signcryption[3], recognition. signatures, haptic Hash Codes Genomic tokens responses Confidentiality Limited physical Symmetric key Genomic and access, Security encryption, Proteomic based classifications, Asymmetric key encryption Shielded facilities encryption, Secure protocols, in vivo electronic vs. in vitro DNA transactions, Digital recognition Envelopes Access Control Restricted Access Control Biomarker sample physical access to Lists, Administrator based access computers, privilege control, control, genomic routers, switches, Network smart cards etc. Monitoring, Security Logging Data Integrity Hash Codes Embedded genomic based authentication hash codes Non- Digital Signatures, Genomic based repudiation Hash Codes, Digital digital signatures, Envelopes, Network digital envelopes Monitoring, Security Logging

1.3 Problem Statement

The existing authentication and confidentiality protocols and processes are becoming more vulnerable to attacks and networks are becoming less secure. Simultaneously, the

3

power of cryptanalysis against existing encryption methodologies is growing rapidly and will eventually render all of the existing algebraic approaches obsolete. However, the current infrastructure investment in these methodologies is too large to abandon. No alternate authentication infrastructure exists which can adequately replace the current methods. Given these constraints, this dissertation will address these problems by looking at specific, quantifiable weak points with the current security approaches.

1.3.1 Weak points with the current security approaches

 Cryptanalysis techniques are very strong and improve with increases in computing

capability.

 Protocols for performing authentication are vulnerable to social engineering.

 Certificate Authorities (CA) are vulnerable to identity impersonation in the fixed

infrastructure environment.

 Useful lifetime of cryptographic codes is unpredictable.

 The existence of network vulnerabilities due to lax implementation of existing

security protocols.

1.3.1.1 Cryptanalysis techniques are very strong and improve with increases

in computing capability

Currently used encryption algorithms are well studied and have acquired numerous attack strategies. All algebraic algorithmic approaches will eventually fall to a successful attack. Security protocols rely heavily upon algorithms such as the RSA algorithm have

4

been attacked using adaptive chosen cipherattacks. These attacks involve simple power analysis and differential power analysis of smartcard implementations. Smartcards are also vulnerable to reverse engineering using chip level diagnostic testing. The particular smartcard in question leaked side channel implementation through its implementation of the Chinese Remainder Algorithm [4]. Protocols using the modular exponentiation approach can be attacked via a number of methods:

i) Timing attacks using the Chinese Remainder Algorithm and

Montgomery’s Algorithm. This timing attack works by enabling

factorization of the RSA modulus n. It works if the exponentiation is

carried out by the Chinese Remainder Algorithm and the multiplication of

the prime factors is performed by Montgomery’s Algorithm [5]

ii) Analysis of short RSA exponents. This attack uses a continued fractions

algorithm to make an estimate using the public key exponent, e and the

modulus, p*q to make an estimate of the private key exponent, d. It relies

on the fact that with e < p*q and GCD (p-1,q-1) is small, d can be

estimated [6]

iii) Lattice basis reduction (LLL) algorithms. This type of attack can use a

forged signature to recover RSA keys [7]

iv) General timing attacks on modular exponential algorithms. These attacks

involve timing characterization of cryptographic functions such as RSA

and others to correlate key computation cycles and timing to actual key

values. [8] 5

v) Forging public signatures with public keys only. This is an attack on

signature schemes using forward security cryptographic protocols. They

involve a key evolution strategy in which the public key remains constant

over a longer period of time than a series of updated private keys, which

are distributed to users in a secure manner. The forward security scheme

should also be a blind protocol such that message contents are not revealed

to the signer. The authors provided a fast, seven-step solution for forging a

signature onto a valid message created under this security protocol [9]

1.3.1.2 Protocols for performing authentication are vulnerable to social

engineering.

Malfeasors gain information about network access through physical intrusions and eavesdropping. Propagation of large numbers of passwords, passphrases, and other authentication trivia leads individuals to keep easily found written records, thus confounding the electronic security protocols. Individuals share passwords.

Impersonation leads to activities such as the forging of multiple identities: ‘Sybil attack’

[10], [11]. These activities are carried out to perform tasks such as submitting multiple electronic votes, and raising one’s rank in search engines. Trusted certification is the main countermeasure for the Sybil attack.

1.3.1.3 Certificate Authorities (CA) are vulnerable to identity impersonation

Current forms of commercial network security rely upon the system of certificate authorities (CA) that issue authentication certificates. The CA is a like a virtual notary 6

public. It is a party that provides both the holder of a certificate and the acceptor of a certificate a certain level of certainty about the identity of the holder. The processing of certificates can involve use of symmetric key cryptography such that sender and receiver use a single key for CA processing; it can involve asymmetric key cryptography such that sender and receiver use a public/private key pair combination, or it can involve both symmetric and asymmetric key processing. All commercial processes involve a series of transactions utilizing a key infrastructure such that all parties can perform an authentication operation that certifies the identities of all participants. A variety of systems exists for the generation of symmetric and asymmetric keys and protocols for performing the authentication process. Figure 1-1 shows a generic overview of the CA process. This is an example of one possible CA process for authentication. Others are also used. Numerous variations on the figure 1-1 theme exist. They generally have the following things in common:

 Possessor of a given private key is assumed the legitimate owner of the key.

 The use of an authentication algorithm that computes a cryptographic

checksum (hash code) using the contents of a message between a sender and

recipient. The authentication algorithm is assumed to produce a unique hash

code for each message.

 The CA possesses a process to monitor users such that network privileges

can be revoked if the authentication process fails.

7

For the mobile ad hoc user, no third party CA infrastructure may be available. Many schemes have been proposed to provide a CA function for MANET. These may rely on public key encryption systems in addition to various mechanisms for enforcing a trust policy. These policies may include assigning the CA to one or more trusted nodes in a

MANET [12].

Figure 1-1. CA process for authentication, use of symmetric and asymmetric keys for authentication and confidentiality.

1.3.1.4 Useful lifetime of cryptographic codes is unpredictable

 MD5 was published in 1992 and was known to be cracked by 2005 (How to

Break MD5 and other hash functions) [13].

 DES was developed in 1974 and was broken as early as 1997. Multiple

approaches have been demonstrated including some biologically inspired

8

(Breaking DES using P-systems) [14]. DES was eventually abandoned for triple-

DES which was also broken and abandoned.

 AES is proving vulnerable to a number of side channel attacks. One particular

attack uses differential fault attacks and was capable of retrieving 128-bit key

with six faulty ciphertexts [15]. AES-256 attacks strategies are becoming more

successful [16]. The latest version of AES now uses a 512-bit key but it is a

matter of time before all the AES variants and enhancements are broken.

 Networks cannot rely on published reports of ciphers being broken to ascertain if

their current ciphers have been compromised.

1.3.1.5 Network vulnerability due to lax security implementation

A large degree of the vulnerability in networks is due to lax security implementation of existing protocols. A hacker using Wireshark™ or other sniffers can recover plaintext message authentication codes. Hot Standby Routing Protocol (HSRP) attacks due to weak authentication [17], or unsecured network devices via protocol attacks such as spoofing a

HSRP router into changing the active router or via SNMP reconnaissance [18]. Attackers probe networks and retrieve legitimate access control information. Mis-configured firewalls or XML-based firewall attack cause firewalls to be ineffective or allow them to be used against the network it is trying to protect [19]. The purpose of the biological, gene regulation based approach to security is to supplement existing legacy protocols.

Adding the biological protocols to networks that do not have good security practices is like locking the front door but leaving the ground floor windows open - it will not

9

increase security or decrease vulnerability. Adding the biological security protocols in networks with a good security posture will enhance security by significantly raising the level of attack complexity, reducing the vulnerability of conventionally encrypted traffic, and reducing the opportunities for security breaches through social engineering. It will also provide a secure backstop when network operations are required to continue in an environment of degraded security. Turn it on and turn it off as desired.

1.4 Proposed Solution via the research goals

The power of this technology rests upon functional genomics and the processes of gene regulation. The probabilities of transcription and translation, the quantities of such products and the functionality of pieces of cellular machinery that make gene expression possible, like transcription factors and DNA promoter sequences and protein-nucleic acid interactions. These protocols are unlike the existing legacy approaches and do not have the legacy vulnerabilities previously described. Only through detailed knowledge of the underlying concepts in molecular biology combined with new cryptanalysis paradigms can these protocols be successfully attacked.

What can be accomplished through this security approach? Referring to figure 1-2, a new set of security approaches become available. In the Alice and Bob diagram, various forms of security can be achieved while still using legacy IPSec, as an example. Alice and Bob can perform standard IPSec exchanges using gene expression data as keys. Alice can send Bob a protein coded message, which directs Bob’s ciphercolony to express a protein, make an image, and send the result back to Alice for authentication. Bob can

10

send Alice a message encrypted in a protein code and Alice can return the plaintext derived from the protein code based on a pre-shared genomic secret. Alice can send Bob a RNA message and Bob can return the unique protein message based on a pre-shared genomic secret. Bob can send Alice a message with the patterns of gene expression of 14 microbials in his ciphercolony and Alice can take that data, apply it to her ciphercolony, and reply with new patterns of expression of 25 genes in her ciphercolony. Then Bob can take that data, apply it to his ciphercolony, and return the new pattern of expression, and so forth. Eve can only impersonate Bob or Alice by knowing a very long list of state information. This is just a sample of the features this protocol provides.

This protocol is forward looking to toward the security of networks as their bandwidth and data transmission capacity increases through use of optical, gigabit

Ethernet, and other high-speed communications links.

Figure 1-2. Capabilities of the genomics and proteomics approach 11

The proposed solution will be achieved through accomplishment of the following research goals:

1. Implement the process of the regulation of gene expression into a

language that permits development of cryptographic protocols for the

purposes information and network security.

2. Introduce novel, but cryptographically hard information security

protocols.

3. Provide a mechanism to convert electronic messages to and from the

language and products of gene expression in a cryptographically hard

manner.

4. Provide a path to implementation that is consistent with the existing

legacy security protocols and architecture.

1.5 Organization of the Dissertation

Chapter one introduces the subject matter. Chapter two summarizes the previous work by other researchers. Chapter three provides a brief overview of the key concepts in molecular biology used in the dissertation and explains the nomenclature that is being used in subsequent chapters. Chapter four describes the progression of the research from its initial developments through the current state of the research, provides an explanation of how protocols function, and simulation test results of various aspects of the protocols.

Chapter five is a detailed example of coding a short message through the various levels of the protocols and the overhead (in terms additional bits to be transmitted and received)

12

associated with each process. Chapter six provides a context for implementing the protocols through description of various network concepts of operations. Chapter 7 summarizes the conclusions.

CHAPTER 2  PREVIOUS WORK BY OTHER RESEARCHERS

There are numerous papers on DNA encryption techniques. The origin of the modern concepts on DNA encryption comes from work by Cleland in 1999 on hiding messages in microdots, Gehani in 1999 on DNA based cryptography, and Bourbakis in 1997 on image data compression. They proposed specific methodologies for performing DNA based encryption functions in applications of wide interest. DNA encryption systems are one of the paths taken in the field of molecular computing. A large variety of methods has been published to utilize DNA transcription and translation in cryptographic systems. DNA encryption techniques can, in general be divided into two categories: One category contains the techniques that utilize the terminology and alphabet of genomics, but does not implement a path toward a biological instantiation. The other category includes techniques that utilize both the terminology and a path to a biological instantiation.

2.1 Genomic approaches that are not targeted for biological instantiation

2.1.1 DNA Cryptography and the central dogma

DNA cryptography using the central dogma of biology has been published that takes plaintext through a process of DNA→RNA→Amino Acid coding. Researchers in 2010 published a DNA based Cryptography for Secure Mobile Networks scheme [20] in which 13

binary plaintext is converted to a DNA text via a substitution code, introns are inserted into DNA text and a key is passed to the receiver over a secure channel to provide the details of the intron insertion. The new DNA text is transcribed into a mRNA code utilizing only the exons and the exons are translated into an amino acid protein code requiring a second, secure transfer of translation data so that the receiver can decode the protein code back to the mRNA, mRNA back to DNA sequence, and the DNA sequence stripped of the introns and converted back to the original binary plaintext. The protein code can be transmitted over an open channel. There are many variations on this theme in the literature.

2.1.2 DNA computing and Elliptic Curve Cryptography

A combination of DNA computing and Elliptic Curve Cryptography (ECC) has been described [21] for a powerful form of DNA encryption. It permits encrypted traffic over communication links, which may not be secure. Sender and receiver agree on an auxiliary base parameter as a pre-shared secret for the ECC process. A substitution code for the plaintext is performed to convert it to DNA text, which is converted to integers, followed by conversion to ECC curve points using Koblitz’s algorithm. The curve points are encrypted with the ECC algorithm.

2.1.3 Other DNA encryption systems in the literature

Systems using DNA as a one-time code pad in a steganographic approach have been described [22]. In work by Gehani, et. al. they proposed use of DNA codes assembled 14

from short oligonucleotide sequences, into one time pads. They further assume that the one-time-pads can be kept as a pre-shared secret. The approach relies on encoding the plaintext through a DNA substitution code or a bit-wise XOR function between the plaintext and the DNA sequence. They also propose that the language for creating the

DNA ciphertext be disjoint from the plaintext. Gehani also proposes an approach with biological instantiation. The approach is compatible with using DNA one-time pads and custom DNA chips with complementary sequences to an encrypted sequence such that an encrypted image could be decrypted and revealed fluorescently.

A symmetric key block cipher approach using DNA transcription and translation has been demonstrated by Sadeg [23]. This work uses nomenclature of transcription and translation. The encryption algorithm generates ciphertext blocks 128 bits in length from a plaintext block of 128 bits and a key of 128 or 256 bits. There exist sub-key generators that provide nr+2 keys, with nr equaling the number of iteration rounds.

An image compression – encryption system using a DNA-based alphabet [24] was demonstrated including a genetic algorithm based compression scheme. This work is based upon the principles of fractal-based languages such as SCAN. This approach encodes data into large fields of (n x n)! pixel-like data and selects one of the (n x n) permutations as the ciphertext.

15

2.2 Genomic approaches that are targeted for biological instantiation

2.2.1 Cryptography on the basis of separation by gel electrophoresis

In the category of techniques that use a biological instantiation, researchers in 2000 proposed an optical technique in which a message was coded into DNA and subjected to gel electrophoresis to separate the DNA sequence into bands. The DNA message would also be mixed with nonsense DNA to create a different pattern of bands and the two could be subtracted from each other at the receiver to resolve the message. [25]

2.2.2 DNA Watermarks via coding in synonymous codons

By using the natural redundancy of the amino acid codon system, messages can be coded into biologically functional genomic sequences without disrupting the ability of the code gene to be expressed. This permits messages to be inserted as watermarks in genomes of choice. This algorithm permits a user to insert encrypted data into a genome of choice. Researchers in 2008 [26], [27] created a system of DNA watermarks in which

Genetically Modified Organisms (GMOs) could be tagged with a DNA message without disrupting the process of gene expression. It does this primarily by encoding the message into the third base in a triplet with synonymous codons. The natural redundancy of the codon system is such that the third base can sometimes be altered without changing the codons meaning to another amino acid. This was elegantly demonstrated on the Vam7 gene in a mutant Saccharomyces cerevisiae strain CG783. They proved that the watermark mutation did not influence subsequent mRNA translation into protein. 16

2.3 Relationship between the currently published approaches and the current research.

No published approach utilizes the suite of protein-nucleic acid and nucleic acid- nucleic acid behaviors found in regulation of gene expression as cryptographic code or a security approach. No published approach utilizes a combination of live and algorithmic inhabitants of a colony designed to produce codeable behavior expressed as function of random variables associated with patterns of gene expression.

CHAPTER 3  KEY CONCEPTS FROM BIOLOGY UTILIZED IN THE DISSERTATION

3.1 Short summary of the organization of DNA

Genes are composed of Deoxyribonucleic acid (DNA). DNA is organized into a hierarchy structures that permit it to be packed within the confines of cells. DNA is organized differently between prokaryotes (simple organisms with a single chromosome not packed within a nuclear structure) and eukaryotes (complex organisms with multiple chromosomes packed within a nuclear structure).

3.1.1 Eukaryotic DNA organization

In general, eukaryotic chromosomes exist as one DNA molecule per chromosome.

Additionally, each chromosome has tightly packed regions called heterochromatin in which gene expression is low or non-existent (constitutive heterochromatin) or situational expression (facultative heterochromatin) as well as looser packed regions called euchromatin in which most active gene expression occurs. 17

The genome of the rice genome (Oryza sativa L. ssp. indica) is taken as an example.

A summary of the genome characteristics detailed by the International Rice Genome

Sequencing Project, build 4.0 is shown in table 3-11

Table 3-1. O. Indica Genome summary

Highest level of Chromosome assembly Size (total bases) 382,150,945 Number of genes 30,294 Number of proteins 28,392

3.1.2 Prokaryotic DNA organization

By way of comparison to eukaryotes, the genome of E. Coli is taken as an example.

E. coli is in the family of prokaryotes. Simple cells with a genetic material not enclosed in a nuclear structure. The genome consists of 4,639,221 bases organized into approximately 4300 genes depicted in figure 3-1. Protein-coding genes account for

87.8% of the genome, 0.8% encodes stable RNAs, and 0.7% consists of non-coding repeats, leaving 11% for regulatory and other functions [28]. Even within this simple cell, only about 80% of the gene functions are fully understood. It has complex patterns of gene expression and a number of features to regulate gene expression.

1 From NCBI genome resources http://www.ncbi.nlm.nih.gov/genome/10. Summary does not include the mitochondrial DNA with 81 genes coding for 53 proteins. 18

Figure 3-1. E. coli genome.

Replichore 1 and 2 refer to the two replication domains of the genome. Replication runs from the origin, oriC, to the diametrically opposed terminus region.

3.2 Gene Transcription and Translation

The set of all genes for an organism is referred to as the genome. The set of proteins produced by those genes is referred to as the proteome. Basic cellular functions rely on the process by which gene sequences coded by DNA are transcribed to another sequence represented by a derivative of DNA, Ribonucleic acid (RNA). Some of the RNA (protein 19

coding messenger RNA, hereafter referred to as mRNA) is transported to a site in the cell where it can be translated into sequences of amino acids that comprise proteins. This is an oversimplification of sets of thousands of interconnected processes eloquently (but inaccurately) summarized by Crick as the “Central Dogma”2 of biology. In summary,

 DNA to RNA (transcription)

 RNA to Proteins (translation)

This summarized in figure 3-2 below. When a gene has undergone this process, it is said to have been “expressed”.

Figure 3-2. DNA to RNA to Protein

3.3 Patterns of gene expression

A pattern of gene expression is created by an organism going through the processes of

DNA transcription and RNA translation across the many genes within the genome. Genes are always expressed within the context of overall cellular requirements. Thus, genes are expressed in response to stimuli indicating a need for expression. Not all genes are expressed all the time.

Eukaryotic organisms have more complex mechanisms and options for regulating

2 Nobel prize winner expressed this concept in his paper, “Central Dogma of Biology”, in 1970 20

gene expression than prokaryotes. However, both prokaryotic and eukaryotic processes can be used in these protocols.

Each pathway of gene expression contributes to an overall pattern of gene expression. These patterns of gene expression can be represented as sets of random variables that vary with time. The interaction between the patterns of gene expression by different organisms and colonies of organisms can also be represented as sets of random variables. Thus, they can be modeled using probabilistic and stochastic processes.

However, the fact that these processes are not fully understood or characterized allows us to use nature as a natural generator of cryptographic codes and protocols that are functions of patterns of gene expression. The more complex the organism, the greater the diversity of the pattern of gene expression. Figure 3-33 depicts the increasing diversity that proteomics adds to the basic genetic diversity of the . All of that diversity can be utilized to develop cryptographically hard security protocols because knowledge of a given sequence or set of sequences can be used to describe a wide range of translational outputs, depending upon the patterns of gene expression.

3Adapted from Thermo Scientific Pierce Protein Products http://www.piercenet.com/browse.cfm?fldID=7CE3FCF5-0DA0-4378-A513-2E35E5E3B49B accessed April 27, 2012. 21

Figure 3-3. Genomic and proteomic complexity.

Note the increase in proteomic complexity of the human proteome over the rice proteome given roughly similar numbers of genes. The complexity could be expressed as different forms of protein coding gain.

3.3.1 Selection of gene expression processes from prokaryotic or eukaryotic

groups

Prokaryotes and eukaryotes share many processes of transcription, translation, and regulation of gene expression. However, eukaryotes have a more complex set of

22

processes and most of the subsequent details will be driven by processes in eukaryotes. It should be noted that use of prokaryote based processes is not prohibited by this approach and it may be useful in subsequent research to establish transgenic prokaryotic ciphergenes (prokaryotic genes expressing eukaryotic proteins such as E. coli expressing

Insulin or Human Growth Hormone). This will increase the complexity of cipheranalysis attacks by increasing the combinatorial relationships between transcription, translation, and protein products.

3.4 Organization of genes in the eukaryotes

There are many biological processes that create and alter patterns of gene expression in living organisms. A few of those basic processes are being converted into algorithms for the authentication and confidentiality protocols. Figure 3-4 provides a template describing the various regions of interest in this discussion and the progression of a gene transcript through transcription and translation. Figure 3-5 provides the DNA transcript accompanying figure 3-4 [29] . The structures in figures 3-4 and 3-5 will be referred to repeatedly in this dissertation.

23

I

Figure 3-4. Nomenclature for gene transcription and translation using -globin as an example.

This figure summarizes the development process from the DNA transcript to a completed protein. In this dissertation, the nuclear RNA step is skipped over as a potential encryption coding source. All other sources are used as a potential encryption coding sources.

24

Figure 3-5. -globin transcript with control regions annotated

Referring to figure 3-5 [29], the following regions of the DNA genomic transcript are defined. These regions will be referred to repeatedly in the dissertation.

25

1. The core promoter region, which is responsible for the binding of RNA polymerase and transcription factors for the subsequent initiation of transcription.

2. The transcription initiation site, which for human β-globin is ACATTTG. This site is often called the cap sequence because it represents the 5´ end of the RNA, which will receive a m7G-cap of modified soon after it is transcribed. The specific cap sequence varies among genes. Also found -30 bases upstream from the cap site is the

TATA box sequence ATAAA4.

3. The translation initiation site, ATG. This codon (which becomes AUG in the mRNA) is located 50 base pairs after the transcription initiation site in the human β- globin. The intervening sequence of 50 base pairs between the initiation points of transcription and translation is the 5´ untranslated region (5´ UTR).

4. The first exon, which contains 90 base pairs coding for amino acids 1–30 of human

β-globin.

5. An intron containing 130 base pairs with no coding sequences for the -globin protein.

6. An exon containing 222 base pairs coding for amino acids 31–104.

7. An intron of 850 base pairs.

8. An exon containing 126 base pairs coding for amino acids 105–146.

9. A translation termination codon, TAA. This codon becomes UAA in the mRNA.

The ribosome dissociates at this codon, and the protein is released.

4There are differences in the literature on the TATA box consensus sequence and location in -globin. This selection was taken from Shi-Ping Cai, et. al., T “A New TATA Box Mutation Detected at Prenatal Diagnosis for 1-Thalassemia”, Am. J. Hum. Genet. 45:112-114, 1989 26

10. A 3´ untranslated region that, (3´ UTR) which is transcribed, but not translated protein. This region includes the sequence AATAAA that is needed for polyadenylation

(poly (A)) sequence. Transcription continues beyond the AATAAA site for about 1000 nucleotides before being terminated [29].

The upstream and downstream flanking sequences, 5’, and 3’ UTR contain motifs

(evolutionarily conserved short sequences of bases) that alter the patterns of gene expression through binding of transcription factors. The designations of the sequences are demonstrative of their functions: promoter, enhancer, silencer, insulator, upstream activation, downstream activation.

3.4.1 DNA Nomenclature in the dissertation

The DNA codes in this paper can represent the template (antisense) DNA strand, the sense strand or a double strand. Unless otherwise specified, if only one strand is shown or referenced, it is the sense strand. mRNA, however is coded from the template strand running 3’ to 5’ end as shown in figure 3-6 [30]. As it is complementary to the template strand it follows the sequence of the sense strand (if the sense strand is denoted as S and the template strand denoted as S’, the mRNA that is transcribed is (S’)’ = S, with the exception that thymine is replaced by uracil. When referring to the processes of transcription, the references are to the template strand. Otherwise, the references are to the sense strand.

27

Figure 3-6. DNA Nomenclature in the dissertation.

3.5 Transcription and the General Transcription Machinery of the Eukaryotic

Nucleus

Transcription occurs via nucleic acid-nucleic acid, nucleic acid-protein, and protein- protein interactions. These processes have been translated into a form into which protocols can be described. Reference will be made to the general transcription machinery of the cell. All cells possess the general transcription machinery. The general transcription machinery is a set of protein-protein and protein-nucleic acid interaction that must occur in order for a DNA to be transcribed into RNA.5 These interactions include binding by specific proteins called transcription factors to control regions on the gene that were previously described in figures 3-4 and 3-5. There is a temporal quality to these interactions. At different stages of transcription, different protein-nucleic acid complexes are required. This temporal quality to transcription is also reflected in the security protocol. Figure 3-7 provides an example of the general transcription machinery prior to the start of transcription. This is the pre-initiation complex. The coding of this complex is called the pre-transcriptional complex.

5In addition to the general transcription machinery, additional cell-specific and stimulus-specific transcription machinery also exist. 28

Figure 3-7. Pre-initiation complex for transcription.

The Pre-Initiation Complex is depicted in figure 3-7A. The DNA sequence with control elements BRE, TATA, INR, MTE, DPE are shown along the horizontal axis. Six protein transcription factors (TFIIA, TFIIB, TFIIE, TFIIF, TFIID, TFIIH) are shown binding to their respective locations for the initiation of transcription. The Basal

Transcriptional Complex is completed in figure 3-7B by addition of the RNA Polymerase

II, which performs the enzymatic function of transcription. TFIIA interacts with TATA

Binding Protein (TBP) and stabilizes interaction with the TATA box, TFIIB interacts with DNA sequences flanking the TATA box, BRE, TFIIF and TBP. TFIID recognizes and interacts with different promoter regions and contains TBP and 14 TBP associated factors called TAF. TFIIF recruits Polymerase II, TFIIE and TFIIH, TFIIE recruits and 29

stimulates TFIIH, which is important for transitioning RNA Polymerase II from initiation to elongation of the RNA transcript. Some new elements have been added to the control sequences shown in figure 3-5 [31].

 TFIIB Recognition Element (BRE) which functions as an important binding

site for transcription factor TFIIB

 Initiator (INR) which is the initiation sequence and contains within its

sequence the transcription start site.

 Motif Ten Element (MTE). A downstream core promoter element which

stimulates transcription when it appears in a precise position relative to the

INR [32]

 Downstream Promoter Elements (DPE). Regulatory sequences which enhance

transcription by recruiting general transcription factors to the promoter site.

3.5.1 Additional regulatory sequences and their functions of interest.

Figure 3-8, adapted from [33] is a block level diagram of the transcriptional control elements for human gene heat shock protein 70 (hsp70) -250 was set as an arbitrary limit on the upstream side. The gene has five transcriptional control motifs in 8 transcriptional control regions upstream from the transcriptional start site: SP1, CCAAT, AP2, HSE,

CCAAT, SP1, TATA, and AP2. The names SP1, AP2, HSE refer to protein transcription factors of the same name needed to bind to the corresponding sites on the DNA sequence.

If the site does not appear on the DNA sequence, the rate of transcription and the

30

probability of expression6 is reduced. Alternatively, if the site is a silencer, the opposite occurs.

Figure 3-8. hsp70 transcriptional control regions.

3.5.1 Processes of Transcription

The protocols reduce the processes of transcription to initiation, elongation, and termination. Initiation refers to the processes that recruit the transcription factors to the appropriate regulatory sequence. Elongation refers to the process by which the DNA sequence is read by RNA Polymerase II in the presence of the proper transcription factors and a nascent chain of RNA is produced. Termination refers to the process by which transcription is terminated and the appropriate caps are placed on the 5’ and 3’ ends of the mRNA.

6 The term probability of expression is used to indicate the probability of gene expressing the ultimate protein transcript through both transcription and translation.The term probability of transcription is used to indicate the probability of a gene being transcribed at a level consistent with the full production of mRNA transcripts. A transcription probability of 1.0 means that transcription is occurring at a level that produces the maximum amount of mRNA transcripts a gene in a cell would produce. 0.5 means that transcription is occurring at a level that produces ½ the maximum amount of mRNA transcripts, and so forth. The same logic applies to probability of translation. Both are treated as independent random variables such that if a gene has a probability of full transcription = 0.8 and probability of full translation = 0.8, the probability of full protein expression is 0.64, i.e. 64% of the protein that could be expected under full transcription and translation conditions. 31

3.6 Translation of Eukaryotic messenger RNA

The security protocol provides for an analogous process for translation of mRNA into protein. The translation process involves protein-nucleic acid, protein-protein, and nucleic acid-nucleic acid interaction. The processes of translation of mRNA to protein are very different from the processes of transcription of DNA to mRNA.

 The processes occur in the cytoplasm as opposed to the nucleus. Thus, mRNA

must be transported to the cytoplasm by carrier proteins (exportins). This

property will be exploited in future versions of the protocol where some

network nodes will be represented as cipher-nuclei and other nodes will be

represented as cipher-cytoplasm. The relationship will consist of cipher-

mRNA being pushed out to the specific cipher-cytoplasm nodes for one-way

or two way authentication.

 The process of translation is heavily dependent upon non-coding RNAs such

as transfer RNA (tRNA) and ribosomal RNA (rRNA). RNAs read the genetic

code of mRNA to facilitate the recruitment of the proper amino acid to the

correct site on the protein chain.

 The output of the translation process is in a new codebook that represents

proteins.

3.6.1 Processes of Translation

Translation also undergoes three processes: Initiation, Elongation, and Termination.

Initiation involves the complexing of the tRNA with proteins and the translation start 32

sequence amino acid. The protein-nucleic acid complex is responsible for reading the mRNA and staging the first amino acid of the messenger RNA chain. Elongation involves the successive recruitment of tRNAs complexed with the appropriate amino acid, such that the amino acid can be complexed to the growing protein chain.

Termination is a group of processes that cap the N-terminal and C-terminal ends of the proteins. In between transcription and translation, a number of processes perform functions such as the spliceosome to provide alternate decoding of mRNA.

3.7 Role of these processes in the dissertation

The processes by which transcription and translation are conducted within the eukaryotic cell will be used in creating the security protocols. The dissertation will concentrate on encryption coding schemes based upon:

 Transcriptional and Transcriptional regulation. Create mRNA transcription

products and alter the pattern of gene expression by determining which sequences

of bases are transcribed into mRNA and the quantity of a transcription product

produced.

 Post-transcriptional regulation. Alter the pattern of gene expression by

determining which mRNA sequences are translated into proteins and the quantity

of a protein produced.

 Translation and Post-translational regulation. Create translation products and alter

the protein products of gene expression to create protein variations and the

quantity of each protein variant produced.

33

CHAPTER 4  DESCRIPTION OF THE RESEARCH PRODUCTS

The research undertaken for this dissertation has undergone a steady increase in complexity and breadth of coverage of the subject through incremental development of a suite of protocols. Figure 4-1 summarizes the research developments.

Figure 4-1. Research Development and Products

34

4.1 Initial Research on a DNA-inspired authentication protocol for Mobile ad hoc Networks (MANET) from 2006-2008.

Development of the initial protocol as a DNA-inspired methodology for secure mobile ad hoc networks. The initial work used a DNA encryption scheme over a fixed plaintext alphabet. It established the basic process for encrypting plaintext into DNA text.

This process, with modifications continues to be used. It allowed for the fitness selection of the fit encrypted codewords for a message by way of a genetic algorithm. It had a trust algorithm for determining if routing paths met a standard of network trustworthiness.

4.2 Protocol Enhancements Resulting in a Generalized DNA-based HMAC for

Authentication (2008-2010).

Expansion of the original concept from a DNA inspired encryption methodology to a genomics and proteomics encryption and authentication methodology. A specific protocol for a DNA based keyed HMAC was developed. The protocol was generalized such that a fixed plaintext alphabet was no longer required. Synthetic genomes were substituted with biological genomes. The protocol concepts were expanded. The concepts of ciphergenes and cipherproteins incorporated into ciphercolonies were developed.

4.3 Genomics and Proteomics Protocols for Network Security (2010-2012)

The concepts were further generalized to include the use of regulation of gene expression as the methodology for securing computer networks. A floating point source coding protocol for a genomic alphabet (including mutagenic and epigenetic nucleotides) was developed. The Network BioID using ciphercolonies concept was established. A 35

three level hierarchal set of protocols utilizing processes for transcription and translation were added. A network concept of operations was developed.

4.4 Development of the initial protocol

The research began with investigations into novel DNA-inspired encryption schemes

[34]. This work demonstrated the ability of molecular biology functions such as DNA evolution to provide a basis for proprietary architectures that can achieve high degrees of diffusion and confusion and resistance to cryptanalysis. Proprietary encryption products can serve both large and small applications and can exist at both application and network level. The paper briefly outlined the basis of the proprietary encryption mechanism, which uses the principles of DNA replication and steganography (hidden word cryptography) to produce confidential data. The foundation of the approach includes: organization of coded words and messages using base pairs organized into genes, an expandable genome consisting of DNA-based chromosome keys, and a DNA-based message encoding, replication, and evolution process. The process is summarized below as follows:

 The method encrypts on the basis of words, not characters

 The words are stored in a dictionary in lexicographic order

 Define an alphabet, dictionary and word length limitations.

o Two dictionaries exist, a basic basepair dictionary, and a codon-based

dictionary

 Implementation steps required

36

1. Precoding into DNA alphabet

2. Encryption with Chromosome encryption keys to generate a

population of encrypted mutant messages

3. Evaluate the fitness of each mutant for diffusion and confusion against

some fitness criteria

4. Anneal the most fit mutants as candidates for transmission or further

generations of encrypted mutants

5. Develop trust metrics for the known nodes in the mobile ad-hoc

network

6. Select a mutant for transmission based upon the fitness criteria and the

network trust metric.

The methodology can be used for confidentiality (encryption and decryption) or authentication (one-way encryption).

This research involved a new encryption technique, which utilizes DNA-inspired coding, a dynamic fitness algorithm, and trust metrics for ad-hoc routing. Because of the dynamic, evolutionary nature of this approach, potential intruders must continually intercept decoding instructions between source and destination. Missing one generation of genome decryption information seriously corrupts the decryption process. Missing multiple generations eventually renders previous decryption analyses useless.

Figure 4-2 displays a MANET routed message from Jack to Jill routed at two different times, through secure and potentially malicious nodes. A truly ad hoc network permits routing in the presence of un-trusted peers. In this case, message traffic is 37

between Jack and Jill. Nodes A, B and C are trustworthy nodes at time t1 and nodes ,  are trustworthy at time t2.

The problem of successful routing of messages over potentially un-trusted nodes requires:

 Routed messages arrive at the destination intact

 Routed messages remain confidential in transmission

 Cryptanalysis of message traffic passing through nodes other Jack or Jill is

unlikely to be successful.

 Nodes enter and leave the network at will.

38

___ Figure 4-2: MANET routed over secure nodes at t1 ( ) and t2 (---).

4.4.1 Encryption Process

 Two or more users define a plaintext dictionary, and a DNA based dictionary. The

users define the method by which plaintext is represented by the four DNA bases.

The DNA dictionary is the is the source of messages and encryption keys

(chromosomes)

 Messages are pre-coded from plaintext into DNA using a system of linear

equations relating word position in the message and the ordinal position in the

dictionary

 Chromosomes encrypt multiple permutations of the message

 The permutations are tested for fitness and the most fit permutation is selected

for transmission by the source.

 The recipient authenticates the message with the same chromosomes

 The genome is expanded by mutating the chromosomes with each other or

with message sequences. 39

The system is based upon operations upon words and not individual characters. The only individual characters that are encrypted are one character words. Users of the DNA encryption tool are endowed with a starter genome, which provides the equivalent of a small dictionary for initiating messages, an intended recipient capable of possessing a secret, shared key, and a secret DNA sequence to initiate communication. Chromosomes are “long” compared to message sequences.

Let D represent a dictionary (lexicographically ordered set) of all words such that D0 represents the first word in the dictionary and that sender and receiver compose messages of Wi words (genes). A function U converts words to sequences of DNA bases Bq as shown below in equations 4-1 through 4-3:

Di -1 < Di < Di + 1  i < n (4.1)

Wi,  Dn (4.2)

Di = U(Wi,Bq) (4.3)

There exists a one-to-one mapping between the plaintext dictionary and DNA dictionary built from Bq = {A,T,C,G}. The binary coding for the bases is shown in table

4-1. Note that A and T, and C and G are inverses.

Table 4-1. DNA base coding.

Base Binary value Base Binary value Adenine 0011 Thymine 1100

Cytosine 1001 Guanine 0110

40

Given an alphabet of n characters, words of character length m, each plaintext word codes into a DNA word (gene) of x basepairs in length creating ci possible combinations of DNA words for each plaintext word and Y total combinations DNA words for the dictionary as shown below in equations 4.4 through 4.6

log2 (n) = x (4.4)

(x*m) ci = 2 (4.5)

Y =  ci , i=1,…,imax (4.6)

For n=8 with a character set consisting of {a,e,i,o,u,n,s,t}, and m = 3, there would be

584 total entries. Selected entries from such a dictionary are shown in table 4-2.

Sequences of nonsense words can be inserted between plaintext words. As the character set and character length increases, the number of possible words (mostly nonsense words) increases exponentially. Actual words can be padded with interspersed nonsense words to increase security. Figure 4-3 shows displays the maximum size of the DNA dictionary for

8, 32 and 256 character alphabets, word lengths ranging from 1 to 10 characters.

Maximum DNA Dictionary Size

1.E+25

1.E+20

1.E+15

words 1.E+10

1.E+05

1.E+00 1 5 10 word length Y(8) Y(32) Y(256)

41

Figure 4-3. DNA Dictionary Size versus word length

Table 4-2. Sample DNA dictionary entries

ordinal word DNA code 146 i TTT 147 ia TTACAT 148 ie TTATAT 452 san ACACACGCC 453 sas ACACAGAAG 454 sat ACACATAAT 455 sea ACCACACCC

The plaintext coding process yields message M consisting of a sense string Msense of bases and Manti-sense string of bases. Chromosome (C1,..j) sense and anti-sense strands generated from the DNA dictionary encrypt Msense and Manti-sense to produce encrypted mutants. Given j chromosomes in the genome, m message basepairs, k chromosome basepairs, 2*j*(k-m) rounds of encryption on the sense and anti-sense message strands are possible. The message slides down the chromosome between rounds. One encrypted mutant is produced per round.

Table 4-3. Encryption Process

Encrypt E(C,M) Cipher Anneal A(Cipher, B(q))  ACipher

Trust for p routes Tp(FREQp,RREQp)Tmax Fitness(Diffusion & Confusion) D(M,ACipher),C(M,ACipher) F

Select mutant S(g(F,Tmax))  Output

Encryption is a 5 step process as shown in table 4-3. The encryption step processes 42

the message against the chromosome key to create a generation of two new mutants. A

DNA fragment consisting of a sense strand from the message paired with a fragment of equivalent length from the sense strand of each chromosome moving from 5to 3 end, and a DNA fragment consisting of an anti-sense strand from the message paired with a fragment of equivalent length from the sense strand of each chromosome key moving from 3to 5 end. The process is summarized in figure 4-4. The chromosome is depicted as a series of segments. The functional output of the step is referred to as ‘Cipher’

43

Figure 4-4. Mating of chromosome to message and subsequent selection

The process of aligning two dissimilar DNA strands results in numerous mismatches.

Figure 4-5 demonstrates the annealing process via mutation by use of virtual bases Bq =

{a,t,c,g}. Use of the transformation: AgT, CaG, TcA, GtC simplifies the evolution of the code and anneals mismatches. Mutations are

44

Figure 4-5. Anneal/mutation process induced by the chromosome (encryption key) onto the message (plaintext). The rule is simple: if mismatch between the chromosome base and a message base appears, the message is mutated to match the chromosome. For example, if a chromosome base ‘A’ is mismatched with either ‘A’, ‘C’ or ‘G’ on the corresponding message base, the message base is changed to a ‘g’ which mutates to a ‘T’. The unused bit patterns in the 4-bit binary representations of the DNA hold special codes for this transformation. The advantage of this technique is that it allows for rapid merging of chromosome and message strands and provides a path for substituting new bases into the chromosome strand and mimics the activity of creating molecules, which rely on the DNA structure but have substituted new monomer units into the structure. This technique could also be combined with crossover between chromosome and message strand. The functional output of this step is referred to as ACipher for annealed ciphertext.

45

The sender of the message would like to know how much trust should be placed in each potential route to the destination. Determining the level of trust to be placed is a factor in determining the fitness of the encryption. The source of trust information in the methodology is querying the network and tabulating successful forward and return route request messages (FREQ, RREQ). The value of this information decays between successive queries. Given the assumption that any route is only as secure as the weakest link, a trust metric for p routes at a given point in time can be defined as shown in equation 4.7:

t/r Tp(FREQp,RREQp,t)= (e- )*ZFp*ZRp (4.7)

where ZFp and ZRp are the number of successful forward and return route requests over p routes and t is the delay from the baseline query. The rate of decay in trust can be adjusted by the factor r to depending upon sender preference with the effects as shown in figure 4-6. The maximum value over all Tp represents the most desired route and is referred to as Tmax.

46

Dynamic Route Trust 1

0.8

0.6

trust 0.4

0.2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 delta t

r=0.9 r = 0.5 r = 0.1

Figure 4-6. Temporal route trust.

47

A fitness algorithm defining the desired level of diffusion and confusion [35] produces a means for evaluating each potential encryption. Diffusion ensures that redundancy or patterns in the plaintext message are dissipated into the long range statistics of the ciphertext message. Confusion ensures a complex relationship exists between the plaintext and ciphertext. Each encrypted mutant is compared to the plaintext message on this basis. The output of these functions produces a fitness value, F, for each mutant.

The source can define a fitness goal, g(F,Tmax) such that only an encrypted mutant that exceeds the goal is selected by function S to become the transmitted message, referred to as the Output.

Conceivably, if no mutant exceeded the fitness goal, the sender could select one of the following options:

 Reduce the magnitude of the fitness parameters diffusion and confusion

 Query the network again, re-compute Tmax, and determine if there is an

encryption fit for transmission.

 Conduct a second round of encryption by mating the most fit encrypted

mutants, and re-compute their fitness parameters

 Delay transmission of the encrypted message until a suitable Tmax is achieved.

Figure 4-7 displays the transition from plaintext to a pair of DNA strands 54 base

49

pairs long (Msense and Manti-sense) to a pair of encrypted and annealed mutants with a sense strand from message and chromosome for one mutant, and an anti-sense strand from message and chromosome for a second mutant. The 8 letter dictionary {a,e,i,o,u,n,s,t} and a chromosome designated as C4 having 1793 base pairs are used in this example.

Appendix A provides an extensive analysis of fitness metrics for a series of 6 encrypted messages using 20 different synthesized DNA keys.

4.1.2 Mutation Effects and Fitness

Life is intolerant of a high mutation rate in its genetic code. Ribonucleic acid (RNA) viruses have the highest mutation rate of any living species, 10-3 to 10-5 errors/nucleotide and replication cycle [36]. The human DNA mutation rate has been approximated to be on the order of 10-8 errors/nucleotide and generation [37]. Injection of mutations into

DNA encrypted messages is central to the encryption process.

In evolutionary biology, fitness is a characteristic that relates to the number of offspring produced from a given genome. From a population genetics point of a view, the relative fitness of the mutant depends upon the number of descendants per wild-type descendant. In evolutionary computing, a fitness algorithm determines whether candidate solutions, in this case encrypted messages, are sufficiently encrypted to be transmitted.

This is summarized in figure 4-8.

By organizing the DNA dictionary into a codon-based system and applying tools of evolutionary computing the encryption methodology can be adapted as a tool of 50

computational biology for applications such as:

 Simulation of DNA mutations via crossover and translation

 Creation of DNA samples and mutagenic PCR (Polymerase Chain Reaction)

primers for simulation

 Optimization of alignment of two DNA sequences

 Simulation of mutagenic agents on DNA

 Rate-based synthesis and mutation studies

Utilization of software based tools provides a fast, cost-effective means of testing strategies prior to performing laboratory analyses or cell-based techniques. DNA coding for biological applications require certain characteristics that are the opposite of those required for encryption. Diffusion and confusion must be minimized. Fitness would be defined in application specific parameters such as rate kinetics and reactant stoichiometry. Messages could be replaced by oligonucleotides of interest.

51

Figure 4-7: Sample plaintext, encrypted mutation, annealed mutation

Figure 4-8. Comparison of the definition of fitness with respect to DNA encryption algorithm 52

4.2 Genomics based security protocol: DNA Keyed Hash Message

Authentication Code

Refinements and simplifications to the initial protocols led to a DNA keyed HMAC protocol and the foundation of the genomics and proteomics based security architecture.

The research described herein is not just about inserting encrypted sequences into genomes. It will insert messages that can control gene expression through a variety of mechanisms. It is also focused on a broader goal of extending biological mechanisms that control gene expression into a domain that includes network authentication. Such architecture elucidates the nascent concept of expression of cipherproteins as vectors for authentication as well as confidentiality.

4.2.1 Elements of the genomics HMAC architecture

Plaintext is mapped into a reduced representation consisting of an alphabet of q letters, where q = 4 for a genomic alphabet such as DNA or Ribonucleic acid (RNA), q =

20 for proteomic alphabet, or other values when representing other functions in molecular biology, e.g., histone code. The actual HMAC requires additional base representations beyond the four DNA bases, but the minimum requirement is shown in equations 4.8 and

4.9. B is the set of DNA bases A, T, C, and G, which represent the molecules adenine, thymine, guanine, and cytosine and represent the entire alphabet of the genomic hash code. DNA bases have the property that the only permitted pairs are Watson-Crick matches (A-T), (C-G), thus, the binary representations of B and B’ sets are complimentary such that a r-bit length sequence of Bq and B’q maintain the identity property shown in equation 4.10. Assignment of letter to DNA base sequences is performed. Letters with

53

greater frequency can be assigned shorter DNA sequences to reduce the code size.

4.2.1.1 Lexicographic and DNA representation of plaintext

Plaintext words, P are converted into a numerical form suitable for subsequent coding into the cryptographic alphabet of the required code. Plaintext words are coded such that a lexicographic order is maintained between words, and the numerical forms may take either integer or floating point representations. F is a function that converts the plaintext to lexicographic numerical form. D represents the numerical form of the dictionary

(lexicographically ordered set) such that D1,..n represents the set of all words. The subset of D1,..i represent the subset of words in the plaintext message. The function U assigns the

DNA base sequence corresponding to the Di as shown in equations 4.11, 4.12, and 4.13.

L is the plaintext message coded into the DNA alphabet found in sets B and B’.

4.2.1.2 Sentence-message order coding

A system of linear equations codes the lexicographic position of each word relative to the sentence position of each word. This complicates detection of words based upon frequency analysis. Multiple appearances of the same word are uniquely coded. As a minimum requirement, if there are i DNA representations in the message, and n represents a numerical sequence related to the number of DNA representations in the message (the simplest case being i = 1, 2, 3, …, n), then the system of linear equations shown in equation 4.14 provide the solutions for sentence-message order coding.

Bq  A,T,C,G (4.8)

' Bq  T, A,G,C (4.9)

54

r r' 1 Bq  Bq r 1,....,q (4.10)

Equations 4.8 and 4.9 define the sets containing the DNA bases that comprise the alphabet for the HMAC code. Equation 4.10 defines the complimentary relationship required for the binary representations of the members of that space. For example: the

XOR product of the rth bit of A and T is a one as is true for T and A, C and G, G and C.

Equation 4.11 defines each word in the message, Pi as a member of a set of all words in a lexicographically ordered dictionary. Equations 4.12 and 4.13 show the operation of the function that assigns a DNA sequence using the members of the set of DNA bases to a coding of concatenated sequences labeled L and L’. L and L’ maintain the same complimentary relationship that is a property of the individual DNA bases in the sets Bq and B’q.

The solution to the system of equations in 4.14 yields a series of coefficients x1, x2, …

, xi that are concatenated as shown in equation 4.15 The binary representation of each coefficient undergoes bit expansions such that only Bq or B’q codes are represented in the bit stream created by equation 4.15. X represents the relationship between lexicographic coding of the words and their position in the message.

D  F(P)  D  D  i  n i i i i1 (4.11)

L U(D1, Bq ) || U(D2 , Bq ) || ...|| U(Di , Bq ) (4.12)

' ' ' ' L U(D , B q ) || U(D , B q ) || ...|| U(D , B q ) 1 2 i (4.13)

55

x1  D1 D2 ... Di  r1        x2 D D ... D r     i 1 i-1   2  (4.14) ...  ......  ..        xi  D2 D3 ... D1  ri 

X  x || x || ...|| x 1 2 i (4.15)

M  L X (4.16)

4.2.1.3 Message coding

DNA coding on the message is completed by XOR and bit expansions to maintain the

DNA base coding in the binary sequence in the operation shown in equation 4.16. M is the plaintext message coded into the DNA alphabet and coded again with the sentence- message coefficients. This sequence will be subjected to encryption.

The set of linear equations in equation 4.14 provide the process of sentence-message order coding using the rth position in the message to code each word of the message. The resulting coefficients are concatenated and XOR’d with the coded plaintext message to produce the ciphertext message.

4.2.1.1 Availability of Genomic data for use in encryption.

There is a large volume of genetic sequence information available for use as encryption keys. Figure 4-9 from the International Nucleotide Sequence Database

Collaboration (INSDC) [38] shows the growth in whole genome sequencing between

2000 and 2010.

56

Figure 4-9. Whole genome sequencing data from 2000-2010

In addition, there are approximately 126,551,501,141 bases in 135,440,924 sequence records in traditional GenBank divisions and 191,401,393,188 bases in 62,715,288 sequence records in the WGS (Whole Genome Shotgun) division as of April 20117.

Additionally, there are an infinite number of ways to use genome sequences as cryptographic keys. However, genomes have high degrees of redundancy and sequence conservation across species. Without specific knowledge about the cross species conservation of a genome sequence, genomes sequences used as keys should be treated as one-time pads.

4.2.2 Encryption Process

The first step is to select a genome and a sequence from that genome and encode it

7 http://www.ncbi.nlm.nih.gov/genbank/ Accessed January 15, 2013 57

with the binary representations of Bq and B’q.

DNA consists of two complimentary sequences, referred to as the sense and antisense strands as shown in Figure 4-10 [39]. A DNA sequence has a start point called the five- prime end (5’) and an endpoint called the three-prime (3’). In biochemistry, the 5’ and 3’ designations refer to orientation of each strand necessary for proper replication and transcription. The complements are bonded to each other base by base to create base pairs. The antisense strand is oriented in the 3’ to the 5’ direction, relative to the sense strand. For a DNA encryption key, both sense and antisense strands can be encoded and utilized. Figure 4-11 and figure 4-12 demonstrate two ways of implementing the chromosome encryption key in the HMAC scheme. Figure 4-11 represents the simplest scheme, in which successive bases from the key and message are XOR’d and a single ciphertext message is produced. Encryption proceeds in the 5’ to 3’ direction using the sense strand. Figure 4-12 represents a more complex scheme, in which both sense and antisense bases from key and message are XOR’d. Encryption proceeds in the 5’ to 3’ direction in both strands.

4.2.2.1 Mismatches and Annealing

The encryption process generates base pair mismatches that do not conform to the A-

T, C-G pairing rule. These mismatches are central to creating a one-way hash code.

Subsequent to the encryption step, the mismatches are resolved through an annealing process that results in an irreversible transformation of the encryption sequence not directly traceable to the original ciphertext.

58

Figure 4-10. Strand Sequence Specification in DNA.

Figure 4-11. Single strand chromosome encryption

Figure 4-12. Dual strand chromosome encryptions 59

4.2.3 Prototype DNA-based, keyed HMAC system

Assume a network such as the one shown in Figure 4-13. Jack, Jill, JoAnn, and Lisa wish to form a secure MANET. In the same wireless transceiver space can be found X and Y whose intentions are unknown, but are capable of sending and receiving messages.

Jack, Jill, JoAnn, and Lisa possess all of the required authentication tools:

 A common genome, C, to use as an HMAC key.

 A pre-shared secret, pss, unique to each party.

 The DNA-based HMAC algorithm.

Figure 4-13. MANET with trusted and untrusted nodes and routes

Consider two authentication scenarios. In the first scenario Jack, Jill, JoAnn, and Lisa send and receive cleartext messages using the DNA-based HMAC authentication. If the receiver is not the intended destination, the receiver rebroadcasts the message with their hash and the process continues until the message reaches the intended receiver or until a message time-out period elapses. X and Y also receive the cleartext messages and hash 60

codes. X and Y may possess the algorithm. However, if X and Y wish to substitute a new message with a valid hash code, or forward the message and have it accepted by the network members, they have to create a valid hash code and checksum, which requires knowledge of the chromosome sequence and valid pre-shared secrets known to the other

MANET nodes. The MANET members change their pre-shared secrets on a pre- established basis to thwart a brute force attack to derive the pre-shared secret from the hash code.

In the second scenario, Jack, Jill, JoAnn, and Lisa wish to establish a trust relationship before exchanging sensitive information across a MANET. In this case, the participants utilize a confidentiality (encryption) protocol for the messages and establish a chain of custody using keyed HMAC authentication. A hash chain of hash codes is established such that each recipient can determine the origin and subsequent hops of the message. In this case, X and Y cannot read the plaintext and the hash code transcript may be encrypted and compressed with the ciphertext.

4.2.3.1 Genomic hash code properties

Table 4-4 summarizes the properties of the prototype hash code against the requirements for an ideal hash code [40].

61

Table 4-4. Genomic Hash Code Properties

Property Compliance Produces a fixed length 2560 bits output. Can be applied to a block of Yes data of any length Yes. 12 step H(x) is relatively easy to process for hash compute for any message x. code. One-way property. For any h, it is computationally Maybe infeasible to find H(x)=h Weak collision resistance. For a set of x messages, i Yes with y≠xi for all i, no H(y)=H(xi) for all i. No Strong collision resistance. Messages ≤ 512 For any x, with yx, no bits require H(y)=H(x) padding

4.2.3.2 Initialize and Perform Lexicographic and DNA assignments

The plain text message is read and parsed into 3-word blocks (3WB). Take each word in the string, assign it a lexicographic value of x.yyyy….y where x = 1,....,26 corresponding to the first letter of the word and subsequent letters are assigned to each successive decimal place until the entire word is coded in a rational number. Assign a

DNA letter code to each letter. Most common English alpha characters use 2-letter codes, the rest use a 3-letter code as shown in table 4-5. The column labeled ‘’ is the English alphabetic character adjacent to its DNA code equivalent. As an example, the short phrase ‘jump out windows’ is shown in its lexicographic and DNA assigned forms in table 4-6.

62

Table 4-5. Sample of Alpha To DNA Conversion Codes.

 DNA  DNA  DNA  DNA 0 CGG G TT N TG U CT A GC H AC O AG V CTG B TGT I AA P GA W CAC C TC J AAG Q CCT X GTA D GT K ACT R CC Y GTT E TA L AT S GG Z TAG

Table 4-6. Plaintext to Lexicographic Order and DNA Letter Codes.

Conversion Lexicographic # Plain text DNA Conversion Conversion 1 jump 10.211316 AAGCTCGGA 2 out 15.2120 AGCTCA 3 windows 23.9144152319 CACAATGGTAGCACGG

4.2.3.3 Binary representation of the DNA bases

The four DNA bases (A, T, C, G) are represented by binary sequences (0011, 1100,

1001, 0110). The remaining 12 four-bit sequences code for transitional base sequences that are used to anneal mismatches in the encryption process as shown in Table 4-7. The

‘Key’ column represents the base in the chromosome encryption key. The ‘M’ column represents the corresponding base in the DNA coded message.

The ‘Result’ column represents the results of encrypting the key onto the message.

The ‘Anneal’ column represents the final ciphertext base. In an operational system, all codes would be significantly lengthened to thwart brute force attacks.

63

Table 4-7 Encryption and Annealing Table

Key M Result Anneal Key M Result Anneal A T T G C G G A A A gA C C A aA C A C gC T C C aC G A G gG A C T aT T T A A T G C C C T G cC G G A tA G T C cG A G G tG A T T cT C G T tT T

4.2.3.4 Encryption, Mismatches and Annealing

Figure 4-14 also provides a short example of the encryption and annealing process.

Each base in the chromosome is XOR’d against the corresponding base in the message. If the base in the message is the complement of the base in the chromosome, the base in the message is copied to the encrypted output string and then altered to a new base in the annealed output string. If the base in the message is not the complement of the base in the chromosome, a transitional base, whose value depends upon the mismatch is written to the encrypted output string. The 5’ base always determines the change in the other strand.

This feature allows tracking of point mutations and provides a future expansion capability for mutations. The annealing process also alters the encrypted result by transforming the positions that are not mismatches.

64

Figure 4-14. Plaintext Coding, Encryption, and Annealing Process

4.2.3.5 Cryptographic Genome

Mycoplasma genitalium G37 (National Center for Biotechnology Information accession number NC000908.2) is the bacterial genome used as an encryption key in the prototype system. There are a number of characteristics of M. genitalium that make it a good candidate as an encryption key base. It is small (it may be the smallest, self- replicating genome). It has 580,070 base pairs with 470 predicted coding regions. M. genitalium has a low G+C content of 34% [41]. A random, uniform distribution of basepair content would provide for 50% G-C pairs and 50% A-T pairs. This feature provides some testability advantages. The genome contains 470 predicted protein coding regions, which is a manageable number of potential cipherproteins.

Knowledge of the genome coding characteristics is important in selecting and utilizing genomes as cryptographic keys. Approximately 62,000 base pairs are being utilized from the M. genitalium genome for the prototype HMAC.

4.2.3.6 Protocol for Message Authentication.

The process is as follows:

65

 Encode the plaintext message into DNA code (Pre-sense message) 3 words at

a time (3 word blocks – 3WB)

 Encrypt with pre-shared secret chromosome key and generate sense and

antisense strands.

 Different chromosome segments are used to encrypt each 3WB for increased

key confidentiality.

 Combine sense and antisense strands to create a checksum (S).

 Anneal the sense strand (Sender) or the antisense strand (Receiver) removing

the transitional bases in the 3WBs.

 Concatenate the first 64 DNA bases from the first nine 3WBs to create the

Promoter (P).

 Append the checksum to the Promoter. The Promoter || checksum is the Hash

Code, K (2560 bits long). The sender and receiver processes are summarized

in figure 4-15.

66

Figure 4-15. Sender and Receiver Protocol.

The receiver extracts the Promoter and checksum from the message. The hash code computed at the receiver must have the complement of the Promoter sequence and an exact match of the checksum. Sender and receiver must have the pre-shared secret of the genome, and the location of the first base of the sequence. A sample of the output for the test message ‘jump out windows’ is shown in figure 4-16. The hash code has been truncated for test and presentation purposes.

67

Figure 4-16. Sample output of sender and receiver hash code of ‘jump out windows’

4.2.3.6.1 User Software for evaluation keyed HMAC performance

Figures 4-17 and 4-18 display the user software interfaces for the sender and receiver.

68

69

70

The default genome is M. Genitalium. Note in figures 4-17 and 4-18 that there is a text box with an integer, in this case 125. That specifies the 125th base in the sequence of

M. Genitalium is being used for this hash code operation. Thus, 125 is the pre-shared secret between sender and receiver. If the wrong base is selected at the receiver, authentication does not occur. This is shown in figures 4-19 and 4-20. There is two phase authentication via a checksum and DNA sequence. Both have to be correct and both require specific knowledge of the pre-shared secret starting location within the genome.

A frameshift error of one nucleotide in either direction at the receiver causes authentication failure. These features were extensively tested during coding evaluation.

71

72

73

Appendix B shows the step-by-step progression of the plaintext through to the ciphertext.

4.2.3.7 Short Message Performance

A critical factor in determining the goodness of a hash code is the ability to satisfy criteria four and five from table 4-4. A hash code algorithm should not produce identical hash code outputs for two or more different messages. Performance of short messages was evaluated for soft and hard collision resistance. The number of MAC verifications,

R, required to perform a forgery attack on a m-bit MAC by brute-force verifications [42] is shown in equation 4.17:

R  2m1  (2m1 1)/ 2m  2m1 (4.17)

The variable R is an approximate upper bound to the brute-force verification limit.

Short messages were repeatedly hashed over different cryptographic sequences to look for collisions. The process is shown in figure 4-22. Table 4-8 summarizes the results of those tests.

The single letter message exhibited 403 checksum collisions and 466 hash code collisions. Chromosomes have a high degree of redundancy and repetition; therefore, short messages will require padding to eliminate hash code collisions. These statistics utilize different transcripts on the same message to identify potential collisions. These statistics should be indicative of the potential for multiple messages to produce the same hash code from a single transcript. For secure authentication purposes, this code must be implemented with higher level protocols that would block a brute force attack and not reuse genome sequences for authentication. It must also move the starting point in the

74

genome to widely separated start positions to prevent an attacker from guessing the encryption sequence.

Table 4-8. Sample of hash code collision metrics.

R

Code Code

Length

Total C/S Total C/S

Collisions Collisions

Plain Text Plain

Hash Code Hash Code

Total Hash Hash Total Msg Length Msg

z 1 22 466 403 2097152.5

ly 2 30 255 214 536870912.5

cat 3 36 136 109 34359738369

vent 5 64 0 0 9.22337E+18

aeiou 6 64 0 0 9.22337E+18

jump out windows 16 64 0 0 9.22337E+18 jump out windows jump out windows jump out windows jump out 59 256 0 0 5.7896E+76 the 123 of my fields are very large please require all personnel to take their equipment with them for the work to be performed in 365777 small increments it will be good to get practice on these tasks 201 576 0 0 1.2367E+173

75

Short plaintext messages

Encrypt with M. genitalium starting at position n

Increment by n by 1 base 1000 times

Sort hash codes and check for collisions

Figure 4-22. Collision resistance tests for short messages

A hash code must be secure against the possibility that the cryptographic key, in this case the original genome sequence cannot be recovered from the hash code. Figure 4-23 represents a small MANET example for developing trust metrics.

Figure 4-23. MANET route establishment at a slice in time.

Assume Jack is broadcasting forward requests to establish a link with Lisa and Lisa is broadcasting return route requests to Jack to establish a return link. Jill is relaying route

76

requests in both directions. Felix wishes to join the MANET. Each node is capable of dynamically appearing and disappearing from the network at will via application of a dynamic source routing protocol. Each node can also take the role untrusted/unknown trust or trusted depending upon the situation. Source and Destination must determine the trustability of a potential route through some quantitative means. In this case successful forward and return route requests (FREQ, RREQ) and route delays are used to create the trust metrics. The sources and destinations can set the minimum level of trust for routes via a dynamic fitness algorithm.

To establish Felix as a trusted member, he relays forward REQs from Jack destined for Lisa and return REQs from Lisa destined for Jack with his DNA HMAC authentication attached. JoAnn, does not respond to route requests and those requests time-out.

Y is a malfeasor attempting to breach the network by sending route requests with counterfeit DNA HMAC authentication and analyzing received DNA HMACs for vulnerabilities. Assume that when Y sends a counterfeit route request, genuine nodes respond with negative acknowledgement attached to a genuine authentication code. The questions to be answered are:

 Can Y establish a counterfeit authentication code (hash + checksum) for the

current session (however a session is defined)?

 Can Y utilize the stolen information to recover information that might be

useful for a future network breach?

If Y can recover the original cryptographic sequence, or determine the genome and

77

genome location a cryptographic key was taken from, Y may be able to forge a valid hash code. This could be problematic for a cryptographic sequence due to the high degree of redundancy in the all genomes. For this application, the hash code must be evaluated against the cryptographic key to ensure it has the proper characteristics of diffusion and confusion.

4.2.3.8 Effects of coding long strings of zeros

One strategy for attacking the authentication message is to generate long strings of zeros and identify the correct code for the non-zero positions. If a message generates long strings of zeros it is particularly vulnerable to a key recovery attack because the attacker can reduce the number of bit matches required by the length of zero bit blocks. Table 4-9 displays the hash code for a string of 217 zeros. Table 4-10 summarizes test results of

1000 trials on messages consisting of zeroes and spaces against the genome. No collisions were identified.

Table 4-9. Sample hash code strings of 217 consecutive zeros

Checksum DNA Hash Code AATTCTAAGTTCCCGCCCGTCGGTCCGCCGCCCGT CCGGTCCGCCGCCCGTCCCGGTCCGCCGCAATCT CAATTCTCGCCCGTCGGTCCGCCGCCCGTCCGGTC CGCCGCCCGTCCCGGTCCGCCGCCAACTCCAATC TTGCCCGTCGGTCCGCCGCCCGTCCGGTCCGCCGC CCGTCCCGGTCCGCCGCCCAATCCGAACTTCCCC GTCGGTCCGCCGCCCGTCCGGTCCGCCGCCCGTC 10437404 CCGGTCCGCCGCCCGAACCGTAATTCTCCGTCGG TCCGCCGCCCGTCCGGTCCGCCGCCCGTCCCGGTC CGCCGCCCGTAACGTTAATCTTCGTCGGTCCGCCG CCCGTCCGGTCCGCCGCCCGTCCCGGTCCGCCGC CCGTCAAGTTCAACTTTAATCCGAACTTCAATCGT AACGTTAATCTTTCGTTTAAGTTCAACTTTAATTA ATTCTAATTTCAACCGTAATTCTAACGTTAAGTTC AACTTTCGTTTCAATTCTAATTTCAATC 78

Table 4-10. Tests of collisions on strings of consecutive zeros

Length of ‘0’s in Number of Collisions Plain Text after 1000 trials 73 0 109 0 217 0 73 0

Next, the hash codes were compared to the original cryptographic keys to evaluate diffusion and confusion. Mutation samples were created by hashing message against different segments of the encryption key. Appendix C displays 50 mutation samples from hash codes on the message ‘jump out windows’ with encryption keys from the genome.

The process was run on 1000 message combinations at a time. Referring to table 4-11, mutants 4 and 21, for example would be particularly poor fits due to the number of consecutive matches between the hash code and encryption key. Mutant 8 has no match of two consecutive bases. The diffusion metric counts the number of matching base positions between the mutant hash code and the key. The confusion metric counts the number of 2-base, 3-base, 4-base, and 5-base consecutive matches between the mutant hash code and the key. Each combination actually represents a mutant message, which can be further evaluated via a genetic algorithm for fitness. One of the major advantages of this system over a conventional encryption system is the ability to provide a set of encrypted outputs from which the most-fit (best) member can be selected.

79

Table 4-11. Sample Diffusion and Confusion Scores for Hash Code for Message ‘Jump Out Windows’

Diffusion Confusion Consecutive match positions

Mutant Matching 2 3 4 5 base positions positions positions positions pair positions 21 21 8 4 3 2 4 25 9 5 3 2 10 11 1 0 0 0 23 11 1 0 0 0 27 11 1 0 0 0 6 12 1 0 0 0 2 13 1 0 0 0 40 13 2 0 0 0 14 14 0 0 0 0 30 14 2 0 0 0 50 14 3 0 0 0 3 15 3 0 0 0 7 15 1 0 0 0 11 15 2 0 0 0 16 15 6 3 1 0 18 15 2 0 0 0 31 15 2 0 0 0 32 15 5 1 0 0 34 15 3 0 0 0 42 15 3 1 0 0 43 15 2 0 0 0 49 15 3 1 0 0 5 16 4 0 0 0 13 16 4 0 0 0 22 16 2 0 0 0 24 16 3 0 0 0 35 16 4 0 0 0 39 16 4 2 1 0 46 16 3 0 0 0

80

Table 4-11 (continued)

Diffusion Confusion Consecutive match positions

Mutant Matching 2 3 4 5 base positions positions positions positions pair positions 47 16 3 1 0 0 9 17 6 3 1 0 15 17 4 0 0 0 26 17 4 2 1 0 44 17 3 2 1 0 48 17 3 0 0 0 19 18 5 0 0 0 20 18 6 2 0 0 28 18 5 1 0 0 33 18 4 2 1 0 41 18 3 0 0 0 45 18 6 1 0 0 1 19 5 1 0 0 12 19 6 2 0 0 17 19 2 0 0 0 29 19 5 1 0 0 36 19 5 0 0 0 37 19 7 3 1 0 38 19 3 0 0 0 8 21 6 0 0 0 25 21 5 1 0 0

4.2.3.9 Intronic sequence padding and potential frameshift mutations can

increase cryptographic hardness

Padding short messages and short words has been previously discussed as a means to decrease collisions and reduce the likelihood of successfully forging messages. Adding padding to the front of messages as well as the end and padding short words makes it

81

more difficult for an attacker to find the start of the coded message sequence. The analogy in molecular biology is the frameshift mutation, in which changing the starting position by a single nucleotide can result in a completely different protein sequence as shown in figure 4-24. The mechanics of DNA transcription in cells relies on a number of properties to identify the nucleotide triplet sequence that actually transcribes to mRNA, which translates to a protein. Some of the mechanics are thermodynamic and biochemical in nature such as DNA folding, binding to transcription factors, and chromatin relaxation in eukaryotes. Some of the mechanics are sequence related. Four types of sequences and mechanisms from molecular biology are directly relevant to this discussion:

 Start codon: (usually ATG) to specify the translation start site (three letter

sequence that ultimately specifies the first amino acid in the protein to be

translated.)

 Stop codon: (TAA, TGA, TAG).

 Promoters. The function of promoters is different in prokaryotes and

eukaryotes, but as a general statement, the promoter is sequence of

nucleotides necessary to locate the transcription starting point. In eukaryotic

genes that contain a promoter, the sequence often contains the letters ‘TATA’

hence the term ‘TATA box’.

 Enhancers. In eukaryotes, a variety of sequences upstream and downstream

from the transcription site provide binding sites for transcription factors

(proteins) necessary to enhance protein expression.

82

DNA Code G G T C A A C G T G A A C C T

GLY GLN ARG GLU PRO

Correct Amino Acid Translation Frameshift mutation of DNA Code G G T C A A C G T G A A C C T

VAL ASN VAL ASN

Resulting Error Amino Acid Translation

Figure 4-24. Frameshift Mutations

The transcription (decryption) of DNA uses these sequences as markers for process control. However, the sequences can have multiple interpretations. ATG within a gene codes for the amino acid methionine; at the start of a gene, it is a start codon. Not all instances of TATA signify a promoter. These ambiguities provide DNA with its own version of adding diffusion and confusion, and the analyst must fully understand the rules and mechanisms of transcription. In fact, research in gene expression starts with unambiguously identifying the actual gene sequence that codes for proteins (in eukaryotes this is called the exon region) from intervening sequences that are untranslated regions that do not code for proteins (intron regions) as shown in figure 4-25 for the human gene hspB9, which codes for heat shock protein B9 (Ensembl

ENSG00000197723). Referring back to Figure 4-18, transcription from a different start

83

site would yield a different outcome, one that is possibly fatal to the organism. Padding creates introns spread throughout the message (exon).

Figure 4-25. Confusion factors in actual DNA genome

The same confusion and diffusion factors would apply when constructing DNA coded messages for the electronic domain that will be later instantiated into actual genomes.

The ciphertext must be capable of meeting the requirements of the cryptographic hardness in the electronic domain while producing a ciphertext that can be reliably integrated into a cellular genome via standard techniques, transcripted into RNA, and translated into the appropriate cipherprotein. Decryption (expression) of the cipherprotein gene occurs in response to specific decryption instructions hidden within the electronic domain ciphertext .

4.2.4 Relationship Between Cryptography and Gene Expression

The following relationships can be observed between the cryptographic treatment of 84

messages and control of gene expression. In the case of gene expression, the message is genomic (DNA or RNA sequence).

 Cryptography transforms messages between two states: plain and encrypted.

 Cryptography uses operations such as circular shifts, bit expansions, bit

padding, arithmetic operations to create ciphertext. These operations have

analogs in molecular biology, e.g., transposable elements

 Cells transform DNA sequences in genes between two states: Expressed

(decrypted) and Silent (encrypted)

 In prokaryotes, a simple system involving operators and repressors can be

described in terms of encryption and decryption, but prokaryotes have fewer

mechanisms available for a rich set of cryptographic protocols. Figure 4-26

provides an example from Escherichia coli using lacZ gene expression.

In this prokaryotic example from E. coli, the lacZ gene expresses the -galactosidase enzyme when lactose is present and the simple sugar glucose is absent. -galactosidase metabolizes lactose into glucose and galactose. It would be inefficient to express the enzyme above a trace level if glucose is present. Figure 4-26 provides a cryptographic analogy to the states of the lacZ gene under the various conditions of glucose and lactose present, lactose present, and lactose absent. The lacZ gene is encrypted when lactose is absent or both lactose and glucose are present. A repressor protein (rep) authenticates

(binds) to the encryption site (lacZ operator) on the lacZ gene with lactose is absent. A catabolite activator protein (CAP) authenticates (binds) to the decryption site (CAP site) allowing RNA polymerase to decrypt (express) the lacZ gene when glucose is absent. All

85

of these operations are shown as analogies to elements of cryptographic message traffic in operations shown in Figure 4-26. It is possible to write the description of the gene expression sequence in Figure 4-26 in terms of a series of messages between a sender and receiver.

Figure 4-26. Conceptual example of Confidentiality and Authentication in E. coli using lacZ expression

Figure 4-27 shows the architecture of the DNA HMAC (without all the required control regions) described in detail in this paper and its comparison between gene transcriptional control structures for a typical mammalian gene, and a simple, yet important eukaryote, yeast (S. Cerevisiae). The DNA HMAC structure preserves the intent of the design to mimic a genomic transcriptional control structure.

A successful, in vivo instantiation of a DNA HMAC system will require specific stop codons, start codons, promoter, and enhancer sequences. An in vivo DNA encryption

86

system should be multi-dimensional, utilize primary, secondary, and tertiary structural information and include up/downstream regulators such that a single sequence can be seamlessly implemented at the genomic level and have multiple levels of encryption at the message or data level, depending upon the context (only known between sender and receiver). This approach also permits generation of mutant hash codes, which can be evaluated for fitness such that only the best hash code is selected for authentication purposes.

87

Figure 4-27. Simplified comparison between gene transcription control regions and MAC protocol

88

4.2.4.1 Epigenetic relationships between cryptography and gene expression.

Epigenetics involves heritable control of gene expression that does not involve modifications of the underlying DNA sequence [43]. Examples of epigenetic effects include: DNA methylation of cytosine residues, and control of gene expression via the higher order structures of DNA. In eukaryotes, DNA is packed into a hierarchy of structures: nucleosomes → chromatin → chromosomes. Chromatin states can also be utilized as a form of encryption and decryption by exposing or not exposing genes for transcription. Examples include:

 Heterochromatin form (encrypted) and Euchromatin form (decrypted) .

 Transcriptional memory via modification of chromatin states [44].

 Histone Code. A complex series of regulatory activities, which include histone

lysine acetylation by histone acetyl transferase – transcriptionally active

chromatin (decrypted); Histone lysine deacetylation by histone deacetylase –

transcriptionally inactive (encrypted) [45], [46].

Expansion of the cryptographic protocols to include epigenetic operations will increase the richness of the protocols and the options for producing combinations of cipherproteins.

A cryptographic hash code based upon a DNA alphabet and a secure MANET authentication protocol has been presented. These codes can be utilized at the network level or application level and can be implemented directly into genomes of choice to provide a new level of ciphertext communication at the genomic and proteomic level.

The DNA inspired cryptographic coding approach is an option in developing true

89

MANET architectures and developing novel forms of biological authentication to augment those architectures.

4.3 Genomics and Proteomics Protocol Development

4.3.1 Adaptive Self-Correcting Floating Point Source Coding Methodology for a

Genomic Encryption Protocol

One of the foundations of the protocol is the floating point encryption methodology.

For a one-way encryption scheme, sender and receiver compute the same hash code and there is no need to perform a reverse computation. A complete protocol that performs two-way encryption requires the ability to perform the inverse computation for decryption purposes. The precision of floating point computations becomes an issue depending upon how the variables are cast and the available precision. An effort was undertaken to create a new source coding methodology that would reliably allow two- way encryption and decryption using the same basic elements of the existing schemes.

This effort yielded the adaptive, self-correcting, floating point source coding methodology.

4.3.1.1 Overview of the source coding methodology

The problem of creating an adaptive source coding algorithm for a genomic encryption protocol using a small alphabet such as the nucleotide bases represented in the genetic code is being addressed. For codewords derived from an alphabet of N plaintext with probability of occurrence, p, a mapping into a floating point representation of the codewords, which are translated into genomic codewords, derived from a novel

90

modification of the Shannon-Fano-Elias coding process. Errors in the reverse decoding process are processed through an adaptive, self-correcting codebook to determine the best fit codeword decoding solution. A genetic algorithmic approach to error correction within the source coding is also summarized.

4.3.1.2 High-level description of the transmitter source coding process

Consider a memoryless source generating letters from an alphabet A1 = {a1, a2,…, an} with a source taken from a probability mass function P = {p1, p2,…, pn}. Let the source generate a message X such that: X=x1x2…xi  A  i where i represents the word order of the message. The message X is serialized and subdivided into character blocks of size r, and r-sized blocks are arranged into k sized word blocks in a set L as shown in Figure 4-

28. The words are lexicographically coded in the format of k where  is the

Huffman decimal code for the first letter and k are the subsequent Huffman decimal codes for remaining letters. Clearly, if the character blocks were long enough, precision and accuracy of subsequent floating point computations would be a concern. Therefore, the character block size is made adaptable to the floating point capabilities of the transmitting and receiving system.

91

words: x1 x2 x3 x4 ,...,xi  (r x k x n) 1 (a ,a ,...a ), (a ,a ,...a ),...,(a ,a ,...a ) 1 2 r 1 2 r 1 2 r blocks 2 ( a1,a2 ,...ar ), (a1,a2 ,...ar ),...,(a1,a2 ,...ar )

n ( a1,a2 ,...ar ), (a1,a2 ,...ar ),...,(a1,a2 ,...ar ) 1 2 ... k

Figure 4-28. Organization of words for the source coding protocol.

Words are divided into equal blocks r characters long. The new blocks are coded in groups of k blocks at a time resulting in r x k x n organization to begin the source coding process.

A pilot channel link between transmitter and receiver can be used to establish the optimal character block size based upon current channel state information. The source coding can be implemented in conjunction with a subsequent channel coding algorithm.

Let R = {R1, R2,…Rn}={(a1, a2,…,ar)1, (a1, a2,…,ar)2… (a1, a2,…,ar)n then equation

4.18:

R11 R12 R13 ... R1k  q1   y1  R R R ... R  q   y   1k 11 12 1 k  1   2   2  (4.18)  ......  ...  ...        ......  ... ... R R ... R R  q  y   12 13 1k 11 n  k n  k n

The series of linear equations is analogous to the previously described system in the keyed HMAC system in equation 4.14, where the operations are performed upon blocks of plain text.

And Qn is defined as equation 4.19: 92

q1 q2 ... qk    q q ... q (4.19)  k 1 k 1  Qn  ......    q q ... q  2 3 1 n

This is transformed into a matrix of cofactors by equation 4.20

C1 C2 ... Ck    1 C C ... C (4.20) Q1   k 1 k 1  n Det QT  ......  n   C C ... C  2 3 1 n

Let C = {C11, C12, …., Cnk} which code the entire set of the original words in X.

Treating C as a set of symbols from an alphabet of base 10 characters, sign characters and decimal point, A2 = {.,+,-,0,1,2,3,4,5,6,7,8,9} the entropy in bits of code C can be derived from the standard definition in equation 4.21

13 H(C)   ( )  pi log2 pi (4.21) i  1

Every unique plaintext message will have a unique distribution of symbols for each set C. The entropy, H, represents the lower bound on symbol length. However, the goal of coding set C is not minimum symbol length but a prefix-free code with symbol error correction capability at the decoder codebook. Therefore, a modification to the Shannon-

Fano-Elias source coding algorithm has been developed for this purpose. Following

Shannon-Fano-Elias, assume p(x) >0 for all x, the cumulative distribution function, F(x) is shown in equation 4.22

93

F(x)  p(a)  (4.22) a  x

Shannon-Fano-Elias modifies the CDF as [47] in equation 4.23:

1 F (x)  F(x)  p(x) (4.23) 2

We define instead equation 4.24:

1 F(x)  F(x)  p(x) 2 (4.24)

and the binary code length,remains as equation 4.25:

  1    (x)  log  1 (4.25)   p(x) 

with brackets, indicate rounding to the next higher integer. Equation 4.26:

 1   (x)    V (x) (4.26) 1 F(x) 

where V(x) is an offset value that shifts the decimal value of  into a desired range between adjacent values of F(x). The codeword is as shown in equation 4.27:

J(x)  binary(v(x)) | (x) (4.27)

J(x) is the binary codeword truncated to x) bits. Table 4-12 illustrates an example.

The codes of length six could have been coded to be length five, but there were programming and error correction advantages to lengthening the code by one bit.

94

Table 4-12. Modified Shannon-Fano-Elias Coding

The expected length of this code versus the entropy is < H(x) + 2, as in Shannon-

Elias-Fano. This construction produces a prefix-free code, which, as expected, satisfies the Kraft inequality in equation 4.28:

i  D  1 (4.28) i

The next step is to concatenate the binary code words for each k-sized block of codewords. Each k-sized block may be preceded with a prefix-free preamble code that is not a member of the codebook. The resulting series is labeled as XT, where the subscript T

95

refers to the transmitter in equation 4.29:

X T  C11 C12 ...Cnk (4.29)

Two additional sequences are brought into the scheme. KT and PT. KT is a binary sequence representing a unique symmetric encryption key. Ostensibly for this application it is the binary translation of gene sequence from a genomic alphabet as described in the introduction, or it could be any user specified binary sequence satisfying the requirements of a symmetric encryption key. PT is a binary sequence representing a message authentication code that is a pre-shared secret between transmitter and receiver. For this application, it is the binary translation of gene sequence from a genomic alphabet but it could also be any user specified binary sequence satisfying the requirements of a keyed message authentication code. The final four steps are as follows in sequence 4.30:

F T  X T  K T

GT  F T  PT

M T  GT ||PT (4.30)

M T  M (DNA)T

Where the DNA alphabet can consist of symbols from a genomic alphabet such as shown in the set in equation 4.31

AD={A,T,C,G,MeC,H,X} (4.31)

One possible coding scheme for this alphabet using the previously described procedure is shown in table 4-13.

96

Table 4-13. DNA Base Source Coding

A, G, C, and T represent the four main nucleotide bases adenine, guanine, cytosine, and thymine. MeC represents 5-Methylcytosine, an important epigenetic marker, H represents hypoxanthine, and X represents xanthine. H and X are mutagenic deaminations of DNA bases that occasionally occur in gene sequences. MeC operates as an epigenetic marker by altering the pattern of gene expression without changing the basic sequence. Subsequent encryption steps can utilize all of the bases represented in this alphabet for creating different types of encrypted codes. The entropy of the DNA bases in a genomic sequence is also a source of potential encryption coding by skewing the code lengths of DNA based source code sequences. Certain genomes have A-T or G-

C base pair contents the deviate significantly from a uniform distribution as was previously discussed for the genome of M. genitalium G37. Utilization of a genomic sequence with a high concentration of CpG (cytosine-phosphate-guanine) islands can be used to alter the source code sequence lengths for each base from what would be expected in a uniform distribution of the four main bases (A-T, C-G).

The resulting message is designated MT. MT contains the coded message contents and the required hash code necessary for the receiver to authenticate the transmitted message.

97

MT represents the basic, unencrypted message unit that is to be subjected to higher level encryption at the transmitter.

4.3.1.3 High-level description of the receiver source decoding process

The receiver receives the message, and creates a bit stream labeled MR, which represents the best estimate at the receiver of the transmitted message. MR is decrypted and the receiver computes the PR pre-shared secret message authentication code and determines that it matches the PT. Then MR is sent to the receiver source decoder. The description of the process resumes at this point.

The final four steps of the transmission source coding are reversed in decoding (the subscript R refers to processes at the receiver in sequence 4.32:

M (DNA)R  M R

M R  GR ||PR

F R  GR  PR (4.32)

X R  F R  K R

Extending from the previous definition at the transmitter to equation 4.33:

X R  C11,R C12,R ... Cnk,R (4.33)

Using linear algebra, the cofactor matrix is assembled and the inverse yields the original lexicographic codes. Summarizing these steps yields equations 4.34 and 4.35:

98

1 C1 C2 ... Ck  C C ... C  (4.34) Q   k 1 k1  n,R  ......    C C ... C  2 3 1  n

R11 R12 ... R1k  R R R R  R   1k 11 12 1k1  n  ......    (4-35) R R ... R  12 13 11  n

The R coefficients then map back to the original words in set X. R ={R1,

R2,…Rn}→{(a1, a2,…,ar)1, (a1, a2,…,ar)2… (a1, a2,…,ar)n→ X=x1x2…xi  A  i

4.3.1.4 Example of floating point source coding

An example is taken from a snippet from the script of the first line of Shakespeare’s

Hamlet soliloquy: ‘HAMLET To be or not to be that is the question Whether’. We compare the effects of dividing the plaintext phrase into 3 word blocks, 6 characters per block, versus 3 word blocks and 4 characters per block. The lexicographic codes and the uncorrected plaintext recovery are shown in table 4-14. Computations were performed on a 32-bit HP Pavilion dv4 PC under Windows 7 using Microsoft Visual Basic 2010 and

Microsoft Excel 2007. The remaining errors in the recovered 6 character block codes are easily corrected at the source codebook level.

99

Table 4-14. Comparison of 6-character block coding and 4-character block coding recovery

The forward and reverse computations introduce floating point rounding errors.

However, a genetic algorithm approach will be used to determine the fitness of candidate decoding words. This increases the probability of recovering error-free codes during source decoding. The tradeoff is the increase in the intrinsic error rate as the character block is lengthened. In the case of 6 characters coded per block at the source, 12 coefficients are transmitted and the pre-corrected recovered text was: “hamlet to be or n77t to be that is the question 76hether” In the case of the 4 characters coded per block at the source, 15 coefficients are transmitted and the pre-corrected recovered text was:

“hamlet to be or not to be that is the question whether”. The longer the block at the source, the fewer number of coefficients for transmission, at the cost of greater error correction at the receiver. For 8 characters per block at the source, only 9 coefficients are required to code the block, but the number of error positions in the message requiring correction at the receiver increased to seven. The output of this step is a sequence of

DNA letter codes. They are not yet a ciphergene, because they lack the regulatory 100

structure that will be imposed during subsequent encryption steps. The source coding process exists as a standalone modular block. It can be utilized with or without the DNA text, with or without the floating point coding. If the subsequent process is the genomic and proteomic encryption process, the DNA text is passed to the next level of processing.

4.3.1.4.1 Analysis of source coding data

Data on the performance of transmitted and received data is shown below. One problem in analyzing the data was the performance various software products with respect to rounding and truncation of floating point data. No cross platform problems with rounding and truncation were observed within the Windows operating system family

(tests were conducted using HP and Dell Laptops running Windows XP and Windows

Vista and Windows 7) using Visual Basic 2003, 2010 and 2012. However, moving data between Notepad, Excel, and Matlab proved problematic due to truncation and rounding effects. Data often had to be moved as character strings to prevent unexpected truncation and rounding effects. Figure 4-29 provides an opening screenshot of Windows Form application running under Visual Basic 2012 currently used for implementing the source coding function. Figure 4-30 provides the same view for the decoding function.

101

102

103

The source coding and decoding process is broken down into 3 programmable steps:

 Step 1 reads in the plaintext and makes the lexicographic assignments to the

blocks of plaintext.

 Step 2 performs the linear algebra and derives the cofactor matrix solution

 Step 3 performs the encryption and hash XOR functions and outputs the data

in DNA text and binary.

The decoding process performs the reverse functions.

The plaintext in the software screenshots shown in figures 4-29 and 4-30 is:

PLEASE JOIN YOUR FELLOW ALUMNI AND FRIENDS FOR A COCKTAILS AND HORS D OEUVRES RECEPTION IN HONOR OF THE NEW DEAN OF SEAS DAVID S. DOLLING PHD DOLLING COMES TO GW FROM THE

Tables 4-15 through 4-17 display the floating point output of the M. Genitalium encrypted text for character block lengths at 4, 6, and 8 characters. The protocol was tested out to 24 characters per block but above 8 characters per block the programming approach runs out of precision; alternate approaches would allow longer blocks at the cost of a computational penalty. Table 4-18 displays the recovered plaintext before any post-processing, stripping of padding characters, etc.

104

Table 4-15. Floating Point coding of sample text at 8 characters/block

8 CHAR BLKS 8 CHAR BLKS SOURCED RECOVERED 1 765.829296762929 765.82929676292900002765406765 2 766.857678875477 766.85767887547700002769495677 3 88.9192828276676 88.91928282766760000323287642 4 96.8275881767859 96.82758817678590000242335975 5 767.938891763859 767.93889176385900001911641755 6 93.7628891766764 93.76288917667640000230273200 7 88.947669483762 88.94766948376200000204428725 8 82.7628896767939 82.76288967679390000189648169 9 766.763762889389 766.76376288938900001762226386 10 758.757763927629 758.75776392762900001996963390 11 94.9276576185767 94.92765761857670000250044495 12 85.7678886766768 85.76788867667680000222206842 13 88.7669188761869 88.76691887618690000103785809 14 767.92756889393 767.92756889393000000916435561 15 88.766918876293 88.76691887629300000107288136 16 88.9396757859389 88.93967578593890000000012668 17 97.8893766828286 97.88937668282860000000013987 18 87.8876586938894 87.88765869388940000000012670 19 82.8285767878895 82.82857678788949999917592397 20 81.9276288761767 81.92762887617669999917972627 21 756.889176376682 756.88917637668199999222417505 22 86.928888889697 86.92888888969699999999998626 23 96.969696969697 96.96969696969699999999998306 24 96.969696969697 96.96969696969699999999998306

105

Table 4-16. Floating Point coding of sample text at 6 characters/block

6 CHAR BLKS 6 CHAR BLKS SOURCED RECOVERED 1 765.82929676292 765.82929676292000002827688665 2 88.847668576788 88.84766857678800000323143931 3 754.766758763889 754.76675876388900002787570450 4 92.828276675688 92.82827667568799999999999978 5 96.827588176785 96.82758817678499999999999965 6 88.96767938891 88.96767938891000000000000041 7 763.859276793762 763.85927679376199999620131666 8 88.917667638896 88.91766763889599999958899152 9 88.947669483761 88.94766948376099999957332154 10 96.85827628896 96.85827628895999999975750610 11 767.938886766763 767.93888676676299999833615221 12 762.88938876692 762.88938876691999999836024356 13 758.757763927629 758.75776392762899999433936220 14 763.929492765761 763.92949276576099999427176577 15 85.7667678885767 85.76676788857669999935046916 16 88.8676676776676 88.86766767766760000000000553 17 88.766918876186 88.76691887618600000000000561 18 92.887679275688 92.88767927568800000000000570 19 93.929676788766 93.92967678876599999999997418 20 91.887629296762 91.88762929676199999999997434 21 88.93967578593 88.93967578592999999999997572 22 88.762978893766 88.76297889376599999900219402 23 82.82857678788 82.82857678787999999905599502 24 765.86938893766 765.86938893765999999127230541 25 82.82857678788 82.82857678787999999785667664 26 94.766819276288 94.76681927628799999759101022 27 761.766888775688 761.76688877568799998060483517 28 91.7637668188761 91.76376681887610000000002198 29 86.9288888896 86.92888888960000000000002217 30 96.9696969696 96.96969696960000000000002453

106

Table 4-17. Floating Point coding of sample text at 4 characters/block

4 CHAR BLKS 4 CHAR BLKS SOURCED RECOVERED 1 765.829296 765.82929600000000000000045906 2 762.928884 762.92888400000000000000045634 3 766.8576788 766.85767880000000000000045645 4 754.766758763 754.76675876299999999484047383 5 88.919282 88.91928199999999999936265478 6 82.76675688 82.76675687999999999940036744 7 96.8275881 96.82758810000000000356228725 8 767.858896 767.85889600000000002865045478 9 767.938891 767.93889100000000002861349728 10 763.8592767 763.85927670000000000888460097 11 93.7628891 93.76288910000000000106422870 12 766.7638896 766.76388960000000000888134748 13 88.9476694 88.94766940000000000000004455 14 83.7619685 83.76196850000000000000004096 15 82.7628896 82.76288960000000000000004066 16 767.938886 767.93888599999999999186894252 17 766.76376288 766.76376287999999999188416913 18 93.8876692 93.88766919999999999907059469 19 758.75776392 758.75776391999999996673654280 20 762.8876392 762.88763919999999996655244337 21 94.92765761 94.92765760999999999585853596 22 85.76676788 85.76676788000000000384083798 23 85.7678886 85.76788860000000000384552586 24 766.767766763 766.76776676300000003443726483 25 88.7669188 88.76691879999999999977206205 26 761.869288 761.86928799999999999786675121 27 767.9275688 767.92756879999999999789371882 28 93.9296767 93.92967670000000000029824950 29 88.7669188 88.76691880000000000022074458 30 762.9296762 762.92967620000000000218216535 31 88.9396757 88.93967569999999999999984349 32 85.9388762 85.93887619999999999999985161 33 97.8893766 97.88937659999999999999982739 34 82.8285767 82.82857670000000000000002247 35 87.8876586 87.88765860000000000000002636 36 93.8893766 93.88937660000000000000002929 37 82.8285767 82.82857669999999999999995884 38 87.8894766 87.88947659999999999999995694 39 81.9276288 81.92762879999999999999995930 40 761.7668887 761.76688869999999999999058515 41 756.8891763 756.88917629999999999999065205 42 766.8188761 766.81887609999999999999051250 43 86.928888 86.92888799999999999999987229 44 88.969696 88.96969599999999999999986754 45 96.969696 96.96969599999999999999985601

107

Table 4-18. Recovered plaintext for 4, 6, 8, 12, and 24 character blocks

4 char blocks 6 char blocks 8 char blocks 12 char blocks 24 char blocks recovered recovered recovered recovered recovered p l e a p l e a s e p l e a s e 9 p l e a s e 9 p l e a s e 9 s e j j o i n o i n y 77 y o u r 9 a l u m n i 9 o i n y o u r 9 f e l l o 76 a l u m n i 9 c o c k s y o u r e l l o w a l u m n i 9 r i e n d r u v r e s 9 f e l a l u m n i n d f r i 9 c o c k s o f t h 9 l o w a n d f d s f o q n d h o q d a v i d 89 a l u m r i e n d s c o c k s u v r e s 9 l l i n g b n i a f o r a l s a n d 9 i o n i 768 h e a . n d f c o c k t o r s d 89 o f t h 9 a a a a a a . r i e n a i l s a u v r e s 9 d e a n o 9 0 d s f n d h o r c e p t i n d a v i d 89 o r a s d o e i n h o 768 l l i n g 77 c o c u v r e s 9 o f t h 9 l l i n g b k t a i r e c e p t n e w d d t o g w 89 l s a i o n i n o f s d h e a . n d h h o n o 76 d a v i d 89 0 o r s o f t h . d o l l h d o e e n e w g p h d c u v r e d e a n o l l i n g b s r e f s e a s m e s t n c e p t d a v i d w f r o l i o n s . d o h e a . i n h l l i n g a a a a a a . o n o r p h d d o a a a a a a . o f l l i n g 0 t h e c o m e s n e w t o g w d e a n f r o m t o f h e a s e a s a a a a a a d a v 0 i d s . d o l l i n g p h d d o l l i n g c o m e s t o g w f r o m t h e a a a a a a a 0

108

4.3.1.5 Genetic Algorithm (GA) for Source code error correction

Errors may occur at any position within a coefficient. Assume that an error is received in the DNA code and is propagated into the binary code received at the receiver is decoded and subsequently into the set of cofactors {Cn1-R, Cn2-R,…,Cnk-R}. The remaining source of error is in inversion of the cofactor matrix. The receiver does not know the precision of the original cofactors; therefore, arbitrary truncation will produce uneven results. Let b = codeword Cnk-R and let a represent a candidate codeword for b. A genetic algorithm approach can be used to determine the fitness of a series of candidate codewords derived from the recovered codeword. The codeword with the highest fitness score is the best estimate of the recovered text. The highest fitness is derived by ordering a series of candidate codewords that minimize the distance, d, between the candidate codeword and the received codeword. Thus d = |b – a| → 0 is the criteria for optimal candidate codeword selection and the most fit codeword possessing a zero distance between candidate and received codeword. There exists a fitness threshold such that codewords with values beneath the threshold value are excluded from consideration. The code is prefix-free; therefore, candidate codewords can be generated from recovered codewords before the entire code word is received.

4.3.1.5.1 Simulation over BPSK channel

To explore the performance of the code, a BPSK channel simulation was performed using MATLAB in figure 4-31. Codewords from the source coding protocol are transmitted over an AWGN channel and then analyzed for error correction capability at 109

the receiver decoder.

Figure 4-31. Baseband BPSK model for analyzing source coding error correction characteristics.

4.3.1.5.1.1 A short description of the MATLAB model

The model inputs codewords from the codeword set, converts them from integers to binary bits from the set of (0,1), and transmits them through the MATLAB BPSK baseband modulator function. The BPSK modulator outputs a complex valued bit stream from the set of (1+0i,-1+0i) to be processed through the MATLAB Additive White

Gaussian Noise (AWGN) channel. The AWGN channel adds complex-valued Gaussian noise to the signal. The signal plus channel noise is demodulated through the BPSK demodulator function and converted from bits to integer. The model was run over Eb/N0 in the AWGN channel at 10dB, 8dB, 6dB, 4dB, and 2dB. The codewords are selected from codeword set used for all of the source coding being done. It is shown in table 4-19.

110

Table 4-19. Source Codeword Set

Char Val Codon Name . 97 GCA ALANINE A 96 GCT ALANINE 3 747 GCC ALANINE R 763 CGT ARGENINE Z 753 CGG ARGENINE 1 751 AGG ARGENINE N 767 AAT ASPARAGINE D 93 GAT ASPARTICACID 9 741 GAC ASPARTICACID C 94 TGT CYSTEINE 6 744 TGC CYSTEINE E 92 GAA GLUTAMICACID 5 745 GAG GLUTAMICACID Q 764 CAA GLUTAMINE 8 742 CAG GLUTAMINE G 87 GGT GLYCINE H 86 CAT HISTIDINE 88 ATA ISOLEUCINE

I 85 ATT ISOLEUCINE O 766 ATC ISOLEUCINE L 82 CTT LEUCINE B 95 AAG LYSINE K 83 AAA LYSINE M 81 ATG METHIONINESTART F 91 TTT PHENYLALANINE P 765 CCT PROLINE U 758 CCC PROLINE S 762 TCT SERINE 2 748 TCC SERINE 4 746 TCA SERINE @ 6326 TCG SERINE 88 AGT SERINE (Char without matching code)

J 84 TAA STOP

Table 4-19 (continued)

111

Char Val Codon Name X 755 TGA STOP 0 752 TAG STOP T 761 ACT THREONINE W 756 TGG TRYPTOPHAN Y 754 TAT TYROSINE 7 743 TAC TYROSINE V 757 GTT VALINE

Characters without a codeword will code into the codeword for a space, but the DNA codeword will be identifiable from the DNA codeword for a space.

The set of codewords does not cover the complete set of all synonymous codons. It does cover the set of all amino acids such that there is at least one codeword for each amino acid. Figure 4-33 shows the theoretical BER characteristics for a BPSK channel through an AWGN channel.

112

113

The table was derived by running the MATLAB model for BER at Eb/N0 in 2dB increments from 2dB to 10dB.

4.3.1.5.1.2 Error correction capability through genetic algorithm

selection of most fit codewords

A dataset consisting of 12002 random selections from the table 4-19 codebook was constructed. These codewords were transmitted through the AWGN channel at Eb/N0 values from 10dB to 2dB in 2 dB increments. To reduce the chances of correlation between runs, the random seed for the AWGN function was changed for each run as shown in table 4-20. The symbol size is 32 bits/symbol.

Table 4-20. Random seeds for AWGN channel

Eb/N0 Seed 10 97 8 11 6 31 4 73 2 59

The number of codeword errors at the receiver is shown in table 4-21.

Table 4-21. Codeword Error count

EbN0 Number of Codeword Errors in 12002 Transmitted codewords 10 2 8 82 6 860 4 3975 2 8518 114

The codeword error rate for this channel is shown in figure 4-33.

For Eb/N0 = 10dB and 8dB, tables 4-22 and 4-23, respectively show the transmitted words and received error words and the position of the error word in the 12002 codewords.

Table 4-22. Eb/N0 = 10dB performance

Eb/N0=10dB Transmitted Received Codeword index out of 12002 Codeword error Codeword Codeword codewords transmitted count 767 763 8634 1 94 536871006 9423 2

Table 4-23. Eb/N0 = 8dB performance

Eb/N0=8dB Transmitted Received Codeword Codeword index out of Codeword error Codeword 12002 codewords count transmitted 765 66301 10 1 88 72 50 2 91 134217819 118 3 81 131153 258 4 86 536870998 303 5 93 524381 424 6 741 134218469 766 7 758 1049334 1037 8 88 216 1173 9 83 2131 1265 10 748 4195052 1586 11 6326 268470 2080 12 83 134217811 2092 13 761 1785 3191 14

115

Table 4-23 (continued)

Eb/N0=8dB Transmitted Received Codeword Codeword index out of Codeword error Codeword 12002 codewords count transmitted 743 4195047 3211 15 758 17142 3223 16 82 86 3428 17 753 16777969 3542 18 743 4195047 3610 19 81 4177 3653 20 767 1049343 3691 21 91 2097243 3784 22 746 1002 3810 23 88 216 3946 24 748 2796 4288 25 767 16777983 4366 26 762 698 4430 27 83 1107 4441 28 764 636 4546 29 764 33532 4609 30 92 134217820 4613 31 762 17146 4698 32 741 66277 4776 33 87 85 4980 34 758 766 5035 35 95 8287 5334 36 95 536871007 5506 37 752 17136 5601 38 761 1017 5653 39 753 536871665 5896 40 83 131155 6191 41 765 637 6192 42 756 2804 6196 43 767 703 6275 44 742 8389350 6282 45 93 29 6345 46 752 1049328 6396 47 87 134217815 6475 48

116

Table 4-23 (continued)

Eb/N0=8dB Transmitted Received Codeword Codeword index out of Codeword error Codeword 12002 codewords count transmitted 745 525033 6508 49 748 268436204 6614 50 86 1048662 6676 51 82 33554514 6918 52 97 67108961 6976 53 87 262231 7006 54 744 232 7112 55 82 4194386 7203 56 747 2097899 7220 57 757 66293 7343 58 767 703 7595 59 755 536871667 7874 60 85 16777301 7985 61 88 8280 8280 62 96 97 8349 63 758 131830 8442 64 764 33532 8583 65 762 250 8878 66 762 250 9051 67 741 743 9122 68 762 67109626 9323 69 748 16777964 9920 70 765 761 10500 71 747 739 10571 72 756 740 10694 73 766 254 10722 74 744 8389352 10860 75 763 1019 11115 76 753 525041 11601 77 85 4194389 11602 78 97 4194401 11626 79 764 2812 11720 80 92 4188 11885 81 86 8278 11980 82

117

118

An examination of the two codewords in error from the 10dB case reveals two different scenarios.

Scenario 1: The codeword received in error is also a codeword.

Scenario 2: The codeword received in error is not a codeword.

In the case of scenario 1, an alternative codeword that is more fit than the received codeword is problematic to find. This is not the case in scenario 2. An examination of the binary conversions of the codewords in error in the 10dB case yields table 4-24.

Table 4-24. Examination of binary codewords received in error

Scenario decimal binary 767 (transmitted) 00000000000000000000001011111111 1 763 (received) 00000000000000000000001011111011 94 (transmitted) 00000000000000000000000001011110 2 536871006 (received) 00100000000000000000000001011110

In both scenarios, a single bit error occurred. In scenario 1, the single bit error generated another codeword, in scenario 2, it was in the higher order MSBs. The fact that the two decimal values are highly dissimilar in not important. By examining the binary representation of each received codeword and comparing it to the elements of the codebook, some errors can be detected and corrected. From a genetic fitness standpoint, the most fit correction candidates have the fewest errors (shortest distance) between themselves and a member of the codebook and this forms the basis of the fitness function of this scheme. Tables 4-25 and 4-26 display the codeword errors and a selection of the three most fit candidate codewords to correct the error solely based on the number error bits in the candidate codeword compared to the 40 members of the codebook. In this case, received codeword 763 cannot be corrected based on bit error count, but received 119

codeword 94 can be corrected. These uncorrected codewords could be corrected through a variety of forward error correction codes with some level of additional overhead required. There is no coding overhead for the genetic algorithm approach.

120

Table 4-25. Candidate Codeword Error Corrections for Eb/N0 = 10dB

Eb/N0=10dB Transmitted Codeword Transmitted Codeword Received transmitted 12002codewords of out index Codeword count error Codeword codebook 1 error and candidate bits between Agreement codebook 2 error and candidate bits between Agreement codebook 2 error and candidate bits between Agreement st nd nd

candidate from candidate candidate from from candidate from candidate

767 763 8634 1 763 32 747 31 755 31 94 536871006 9423 2 94 31 86 30 92 30

Table 4-26. Candidate Codeword Error Corrections for Eb/N0 = 8dB

Eb/N0 = 8dB Transmitted Codeword Transmitted Received transmitted 12002codewords of out index Codeword count error Codeword codebook 1 error and candidate bits between Agreement codebook 2 error and candidate bits between Agreement codebook 2 error and candidate bits between Agreement st nd nd

candidate from candidate candidate from from candidate from candidate Codeword

765 66301 10 1 765 31 757 30 761 30 88 72 50 2 88 31 88 31 92 30 91 134217819 118 3 91 31 83 30 95 30 81 131153 258 4 81 31 83 30 85 30 86 536870998 303 5 86 31 82 30 84 30 93 524381 424 6 93 31 85 30 92 30 741 134218469 766 7 741 31 743 30 757 30 758 1049334 1037 8 758 31 742 30 754 30 88 216 1173 9 88 31 88 31 92 30 83 2131 1265 10 83 31 81 30 82 30 748 4195052 1586 11 748 31 744 30 764 30 6326 268470 2080 12 6326 31 758 27 86 26 83 134217811 2092 13 83 31 81 30 82 30 761 1785 3191 14 761 31 745 30 753 30 743 4195047 3211 15 743 31 741 30 742 30 758 17142 3223 16 758 31 742 30 754 30 121

Table 4-26 (continued)

Eb/N0 = 8dB Transmitted Codeword Transmitted Codeword Received transmitted 12002codewords of out index Codeword count error Codeword codebook 1 error and candidate bits between Agreement codebook 2 error and candidate bits between Agreement codebook 2 error and candidate bits between Agreement st nd nd

candidate from candidate candidate from from candidate from candidate

82 86 3428 17 86 32 82 31 84 31 753 16777969 3542 18 753 31 752 30 755 30 743 4195047 3610 19 743 31 741 30 742 30 81 4177 3653 20 81 31 83 30 85 30 767 1049343 3691 21 767 31 751 30 763 30 91 2097243 3784 22 91 31 83 30 95 30 746 1002 3810 23 746 31 744 30 747 30 88 216 3946 24 88 31 88 31 92 30 748 2796 4288 25 748 31 744 30 764 30 767 16777983 4366 26 767 31 751 30 763 30 762 698 4430 27 762 31 746 30 754 30 83 1107 4441 28 83 31 81 30 82 30 764 636 4546 29 764 31 92 30 748 30 764 33532 4609 30 764 31 748 30 756 30 92 134217820 4613 31 92 31 84 30 88 30 762 17146 4698 32 762 31 746 30 754 30 741 66277 4776 33 741 31 743 30 757 30 87 85 4980 34 85 32 81 31 84 31 758 766 5035 35 766 32 758 31 762 31 95 8287 5334 36 95 31 87 30 91 30 95 536871007 5506 37 95 31 87 30 91 30 752 17136 5601 38 752 31 753 30 754 30 761 1017 5653 39 761 31 745 30 753 30 753 536871665 5896 40 753 31 752 30 755 30 83 131155 6191 41 83 31 81 30 82 30 765 637 6192 42 765 31 93 30 757 30 756 2804 6196 43 756 31 752 30 757 30 767 703 6275 44 767 31 751 30 763 30 742 8389350 6282 45 742 31 743 30 758 30 93 29 6345 46 93 31 85 30 92 30 752 1049328 6396 47 752 31 753 30 754 30 87 134217815 6475 48 87 31 83 30 85 30 745 525033 6508 49 745 31 744 30 747 30 748 268436204 6614 50 748 31 744 30 764 30 86 1048662 6676 51 86 31 82 30 84 30 82 33554514 6918 52 82 31 83 30 86 30 122

Table 4-26 (continued)

Eb/N0 = 8dB Transmitted Codeword Transmitted Codeword Received transmitted 12002codewords of out index Codeword Codeword codebook 1 error and candidate bits between Agreement codebook 2 error and candidate bits between Agreement codebook 2 error and candidate bits between Agreement st nd nd

candidate from candidate candidate from from candidate from candidate

error count error

97 67108961 6976 53 97 31 96 30 81 29 87 262231 7006 54 87 31 83 30 85 30 744 232 7112 55 744 31 96 30 745 30 82 4194386 7203 56 82 31 83 30 86 30 747 2097899 7220 57 747 31 745 30 746 30 757 66293 7343 58 757 31 741 30 753 30 767 703 7595 59 767 31 751 30 763 30 755 536871667 7874 60 755 31 753 30 754 30 85 16777301 7985 61 85 31 81 30 84 30 88 8280 8280 62 88 31 88 31 92 30 96 97 8349 63 97 32 96 31 81 30 758 131830 8442 64 758 31 742 30 754 30 764 33532 8583 65 764 31 748 30 756 30 762 250 8878 66 762 31 746 30 754 30 762 250 9051 67 762 31 746 30 754 30 741 743 9122 68 743 32 741 31 742 31 762 67109626 9323 69 762 31 746 30 754 30 748 16777964 9920 70 748 31 744 30 764 30 765 761 10500 71 761 32 745 31 753 31 747 739 10571 72 743 31 747 31 755 31 756 740 10694 73 741 31 742 31 748 31 766 254 10722 74 766 31 94 30 758 30 744 8389352 10860 75 744 31 745 30 746 30 763 1019 11115 76 763 31 747 30 755 30 753 525041 11601 77 753 31 752 30 755 30 85 4194389 11602 78 85 31 81 30 84 30 97 4194401 11626 79 97 31 96 30 81 29 764 2812 11720 80 764 31 748 30 756 30 92 4188 11885 81 92 31 84 30 88 30 86 8278 11980 82 86 31 82 30 84 30

The logic checks each received error codeword against the binary representation of each member of the codebook for the closest codeword. If the received error code is also

123

in the codebook, go to the next closest candidate codeword.

Table 4-27 summarizes the corrected codeword improvement of this scheme.

Table 4-27. Codeword Correction Improvement

Eb/N0 Number of Codeword Errors Remaining codewords in error in 12002 after applying genetic algorithm Transmitted codewords correction scheme 10 2 1 8 82 14 6 860 115 4 3975 525 2 8518 3176

Figure 4-34 shows the potential coding gain of the scheme. The cost of the implementation is the search time for each received error codeword against a table of binaries for each codeword in the codebook.

124

125

4.3.1.6 Error correction capability through redundancy (Majority-weighted

coding)

Re-running the same model as the previous case, only in this case, each input code word is triply redundant. There are 4000 unique codewords, 12,000 total transmitted codewords. The uncorrected error statistics are shown in table 4-28

Table 4-28. Codeword Error count, Triple redundancy case

EbN0 Number of Codeword Errors in 12000 Transmitted codewords 10 2 8 82 6 856 4 3975 2 8515

The statistics are virtually identical, as expected. The uncorrected codeword performance is the same as the previous case. Using majority voting on the codewords, the rate 1/3 code performance is as shown in table 4-29 as compared to the genetic algorithm scheme.

126

Table 4-29. Comparison of uncoded, genetic algorithm and Rate 1/3 source coding

Eb/N0 Number of Codeword Remaining Remaining codewords Errors in 12002 codewords in error in error after applying Transmitted after applying genetic majority voting on codewords algorithm correction Rate=1/3 coding (4000 scheme (12002 codewords) codewords) 10 2 1 0 8 82 14 0 6 860 115 43 4 3975 525 758 2 8518 3176 2590

Figure 4-35 compares the uncorrected, the genetic algorithm corrected schemes, and majority voting rate-1/3 corrected schemes.

127

128

At Eb/N0 around 3.5 dB, the genetic algorithm outperforms the majority voting scheme by 0.5 dB, but at Eb/N0 > 4.5dB, the majority voting scheme is superior, at the cost of data rate reduction of 2/3. At Eb/N0 = 4.5dB the two schemes are roughly equal.

4.4 A Cryptographic System of Authentication and Confidentiality Based Upon the

Principles of the Regulation of Gene Expression

The previous work was used as a foundation to support the expansion of the keyed

HMAC genomic protocol into a generalized cryptographic system using gene expression.

Processes involving genomics and proteomics based authentication and confidentiality protocols can be demonstrated that use the principles of regulation of gene expression to augment traditional network security approaches. It is a hierarchal cryptographic system that permits users to select up to three levels of confidentiality and authentication. The system is tied to products of in vivo and in vitro gene expression. The purpose of this system is to enhance and co-exist with existing cryptographic systems and permit secure network operation in the presence of degraded or compromised security.

4.4.1 Introduction to genomic and proteomic cryptography protocol

A hierarchal approach to genomic and proteomic encryption using the process of regulation of gene expression is now described. The input to these protocols is a DNA sequence of nucleotide bases. This sequence could be provided from a genomic database, a FASTA sequence, or it could be the output of any program that converts plain text into

DNA text (such as the previously described source coding protocol). The protocols convert the DNA text into a genomic structure with regulatory regions and emulate the

129

processes of protein-DNA interactions to produce products of gene expression. These quantity and timing of these products of gene expression can be modified post- transcriptionally and post-translationally. The protocol can be coded to accommodate a wide range of processes involved in the regulation of gene expression. The protocols in this approach are shown tabular form in table 4-30.

Table 4-30. Cryptographic Protocol

Encryption Level Input Output Decryption Level Input Output  - Plaintext DNA text  3C Cipherprotein c-mRNA  1 DNA text Ciphergene  3B c-mRNA BTC  2 Ciphergene Pre-  3A BTC PTC transcriptional complex (PTC)  3A PTC Basal  2 PTC Ciphergene transcriptional complex (BTC)  3B BTC Cipher  1 Ciphergene DNA text messenger RNA (c- mRNA) End 3C c-mRNA Cipherprotein End - DNA text Plaintext

Initially, a precoding step converts the plaintext input into an alphabetic string from the set of DNA bases AD = {A,T,C,G,MeC,H,X}. A, G, C, and T represent the main DNA bases adenine, guanine, cytosine, and thymine. MeC represents 5-Methylcytosine, an epigenetic marker, H represents hypoxanthine, and X represents xanthine. H and X are mutagenic deaminations of DNA bases that occasionally occur in gene sequences. This precoding step is described in detail in 4.3. The emerging product is an unstructured sequence of DNA alphabets. The subsequent steps are:

130

 The DNA text is mapped into the structure of a gene complete with introns,

exons, regulatory regions, etc. This output is called a ciphergene. This

represents the level 1 encryption and the inverse operation is the level 1

decryption. The purpose of this coding from a security perspective is that a

single sequence of letters from a small alphabet can be used to represent a

large set of permutations of message combinations. The decoding of such

messages represents an np-hard problem for attackers.

 The ciphergene code is then operated on by a series of protein transcription

factor codes that combine with their counterpart regulatory codes on the

ciphergene to produce a new coded sequence that represents a coded

transcriptional complex. The output of level 2 is the Pre-Transcriptional

Complex and represents the level 2 encryption and the inverse operation is the

level 2 decryption.

 The third step is a series of operations that takes the Pre-Transcriptional

Complex code, which is operated on by protein and RNA polymerase codes

resulting in a basal transcriptional complex code. The basal transcriptional

complex code is processed by algorithms and maps the code into a messenger

RNA code, called the cipher-mRNA code. The cipher-mRNA now consists

only of codons of the original DNA text message and is translated into a

protein code, called the cipherprotein. The output of level 3 is the

cipherprotein code that is transmitted from the sender to the receiver. The

receiver applies the symmetric decryption keys to recover the cipher-mRNA

131

and then perform all subsequent steps to reach level 2, level 1, and decoding

to produce the plaintext.

 The resulting ciphergenes, cipher-mRNA, and cipherproteins are subject to

the processes of regulation of expression. This can be done as pre- or post-

transcriptional operations as well as, pre- or post-translational operations these

processes are utilized as part of the network security concept of operations.

The scope of the protocols can be described in biological terms as the

regulated transcription of genes to form messenger RNA followed by

translation of the messenger RNA into proteins.

4.4.2 Algorithm Description

4.4.2.1 Coding of sequences as objects.

An object is defined as a genomic or proteomic sequence. It could be sequence defined at:

 the nucleotide base level (e.g. AGGCT…)

 the codon level, (AAG, TTA, CGC, …)

 transcription factor/ binding site (SP1, CCAT, AP2,…)8

 protein transcription factor (TFIIA, TFIIB,…)

and so forth.

Each object is drawn from the elements in a dictionary set associated with that object, for example:

8 Sometimes these codes are used to refer to binding site on the gene as well as the protein that binds to the site. 132

 Nucleotides: N = {A, T, C, G, U, I, MeC, X, H}

 DNA Codons: DC = {ATT, ATC, ATA,CTT, CTC, CTA, CTG, TTA, TTG, GTT,

GTC, GTA, GTG, TTT, TTC, ATG, TGT, TGC, GCT, GCC, GCA, GCG, GGT, GGC,

GGA, GGG, CCT, CCC, CCA, CCG, ACT, ACC, ACA, ACG, TCT, TCC, TCA, TCG,

AGT, AGC, TAT, TAC, TGG, CAA, CAG, AAT, AAC, CAT, CAC, GAA, GAG, GAT, GAC,

AAA, AAG, CGT, CGC, CGA, CGG, AGA, AGG,TAA, TAG, TGA}

 Transcription factors: TF ={TFII, TBP, …}

Associated with each dictionary is a set of class elements that describe the function of an element in a given sequence. For the set of nucleotides, N, the classes might be:

CN ={Promoter, Upstream Activator, TATA, Exon, Intron, …}

The class defines the function of the element in a sequence at a given position. Each element of an object is mapped into class. For example, all the nucleotides in the sequence from figure 4-36 in the range of -275 to -200 would be mapped into a code in the UAS (Upstream activator sequence) set.

Let f1 be a column vector of numerical representations of each element in a sequence drawn from the appropriate dictionary as shown in equation 4.29:

f 1  nt1 nt2 ... ntn , f1  (4.29)

4.4.2.2 Sequence Column Vector Operations.

A substitution vector, s1 replaces the code for one element for another element. Then, as shown in equation 4.30:

133

' f 1  f1  s  nt1 nt2 ... ntn  s1 s2 ... s n , where 0 if no substitution at location n,  (4.30) sn     an if substitution at location n 

Figure 4-36. Biological gene structure with substitution of DNA text message into the exons.

The brackets in figure 4-36 denote the position of a motif within the gene. The zero point is the transcription start point.

The code of nti +ai,  i  n maps into a substitution set of codes. (When the decoder does not find nti in the code space of nucleotides, it searches the code space of substitution codes for the correct code substitution.) The new f’ then takes the place of

134

the old f as shown in figure 4-36. Figure 4-36 utilizes the basic classes needed for transcriptional regulation of genes.

 Promoter. The promoter region is responsible for the binding of RNA

polymerase and for the subsequent initiation of transcription.

 Upstream Activating Sequence. This is a region upstream of the transcriptional

start site that binds transcription factor proteins required for transcription.

 Downstream Activating Sequence. This is a region downstream from the

transcriptional start site that binds transcription factor proteins required for

transcription.

 Exon. These regions contain the codons that are ultimately translated into

proteins from messenger RNA.

 Introns. These are non-coding intervening regions between exons. Introns may

also contain regulatory elements.

 TATA. This is a recognition sequence of bases (ATA(A/T)A(A/T)(A/G) in

Saccharomyces [48] that appears in some genes upstream of the transcription

start site and binds TATA box binding proteins required for transcription.

 Non-coding. These are regions without a specific function assigned.

 Insulator. The insulator is a regulator region that acts as a repressor of

transcription of adjacent genes.

A mapping of f1 occurs onto class CN ={c1,c2,…cn} is performed, compressing f1 into a diagonal matrix consisting only of class codes as shown in equation 4.38:

135

c1 0 ... 0  0 c ... 0  F   2  0 ...... 0  (4.38)   0 ...... cn 

There exists a set of gene expression codes tied to F. These codes describe the various states of transcription based upon the state of activation of the gene that comes from a database on gene expression present at sender and receiver, and exist on the interval (0,1] as shown in equation 4.39.

 1 g11 ... g1m  g 1 ... g  G   21 2m   ...... 1 ...    (4.39) g m1 ...... 1  m  n 1

Then C contains the gene sequence codes and codes for the expression limits of the gene as shown in equation 4.40:

C  F *G (4.40)

4.4.2.3 Encryption Matrix

-1 There exists a series of diagonal matrices, En, and their inverses En that performs

-1 computationally hard encryption and decryption of C. The elements of En and En can be constants or polynomials. Therefore, as shown in equation set 4.41:

Lout  C * E1 * E2 *...* En 1 1 1 (4.41) Lin  Lout * En * En1 *...* E1

The encryption can be combined with rotational matrices shown in equations 4.42:

136

Lout  C * R * E1 *...* En 1 1 (4.42) Lin  Lout * En *...E1 * R

4.4.2.4 Use of codes with other types of sequences

The algorithms can be applied to mRNA, protein, or other sequences, but the interpretation is changed relative to the function of the sequence. For example, if the sequence is mRNA, C contains the RNA sequence codes and codes for translation expression limits. The classes in CN might consist of the Open Reading Frame (ORF), 5’

Untranslated Region (5’ UTR) and 3’UTR.

4.4.2.5 Coding of DNA-Protein Complexes: The General Transcriptional

Regulatory Complex as an example.

Referring back to figure 3-7, two views of the interactions of elements required for transcriptional regulation are shown translated in the view shown in figure 4-37 into a system network perspective.

137

Figure 4-37. Requirement for the Pre-Transcriptional Complex.

138

The Pre-Transcriptional Complex is functional if the transcriptional factors TFIIA,

TFIIB, TFIID, TFIIE, TFIIF, and TFIIH are bound to the correct regulatory sequence in the DNA sequence. Red dashed lines are protein-protein interactions. Black solid lines are protein-nucleic acid interactions.

Let T be the set of all transcription factor codes, {tf1, tf2, tf3,…, tfn} and let a subset of these codes be assigned to the set of codes for TFII, {tfII1, tfII2, …,tfIIm}. Let R be the set of all regulatory sequence codes, {r1,r2,…rj} and let a subset of these codes be assigned to the codes for BRE {rBRE1,rBRE2,…,rBREk} and TATA {rTATA1, rTATA2,…,rTATAk}.

There exists a condition of binding such that codes from BRE and TFIIA and TATA and

TFIIA satisfy a condition at a threshold. Graphically this can be depicted as a Venn diagram involving BRE, TATA, and TFIIA as shown in figure 4-38.

Figure 4-38. Binding of BRE to TFIIA and TATA to TFIIA

Similarly, below the binding threshold in figure 4-39 would not be demonstrated in the codes for TFIIH and BRE.

139

Figure 4-39. Non-binding of BRE to TFIIH.

Using probability theory, we can express, for example, equation 4.43:

J= P(BRETFIIA)  P(TATATFIIA) (4.43)

Table 4-31 lists samples of the joint probabilities for protein-DNA binding in figure

4-39.

Table 4-31. Sample of event joint probabilities for figure 4-39

Event (Protein-DNA) TFIIA BRE TFIIA  TATA TFIIBBRE TFIIBTATA TFIIEINR TFIIE∩MTE TFIIF∩TATA TFIIF∩INR TFIID∩BRE TFIID∩TATA TFIID∩INR TFIID∩MTE TFIID∩DPE TFIIH∩MTE TFIIH∩DPE TFIID∩TFIIA∩TATA TFIIE∩TFIIF∩TFIIH

140

4.4.2.5 Control of Transcription Factor Binding

If we define the relationship between protein transcription factor and regulatory sequence in terms of jointly typical sets of the two sequences, then different levels of homology can be required in different authentication or confidentiality scenarios.

An example: Let  = {1, 2, 3, 4, 5}, a 5-tuple alphabet for gene regulatory sequences with type g consisting of equation set 4.44:

Pg1  (2/10)  0.2

Pg 2  0.4

Pg 3  0.1 (4.44) Pg 4  0.1

Pg 5  0.2

The type class of g consists of all sequences within  with the same statistical distribution, as shown in equation set 4.45:

T(g)  {1122223455,112225534,...,5543222211}  10!  (4.45) T g     37,800  2!4!2!

We can then define the code for a member of regulatory sequence BRE as g =

2455222113 as a member of the type  which can contain all the codes for those regulatory sequences.

We can define metrics of sequences jointly typical to  such that a condition of binding occurs. Let  = {0,1,2,4,5,8,9} 7-tuple alphabet of transcription factor codes for members of TFII (TIIA, TFIIB, etc.) Let tf consist of sets that conform to equation set

4.46:

141

P0  1/10

P1  1/10

P2  2 /10

P4  2 /10

P5  1/10 (4.46) P8  1/10

P9  2 /10  10!  T tf      453,600  2!2!2!

Such a code as TFIID as tf= 5089292414 fits the condition. We can define codes for different transcription factors of the family TFII. It is clear that we can define binding criteria based as the mutual information between  and . Let tf and g have the following joint distribution as shown in table 4-32:

Table 4-32. Joint distribution of gene regulatory and transcription factor codes.

Define a new type, , such that it conforms to the joint distribution of  and  as shown in table 4-33. This distribution emphasizes redundancy of the most common codeword elements.

142

Table 4-33. Symbols of type 

Using the examples of BRE as g = 2455222113 and TFIIA as tf= 5089292414 and the output sij is a codeword complying with the statistical distribution shown in table 4-33.

There are ~4.5 e+88 combinations in this space. However, the entire space need not be used. The length of a codeword can be compressed at the cost of higher probability of codeword matches through random selection of codewords by an attacker. Table 4-34 displays the probability of random selection of a compliant codeword sorted against codewords of length 11, 12, 13, 14, and 15-tuples. The tuples would be selected from the sets of equations 4.47 through 4.51: tuple-11={2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2 ,62,62,62,62,62,62,62,62,62,62,62,62,62,62,81,81,81,81,81,81,81,81,7,7,7,7,7,7,7,7,7, 7,7,7,7,7,1,1,1,1,1,1,1,1,68,68,68,68,9,9,9,9,9,9,9, 5,5,5,5,4,4,4,4,89,89,69,69} (4.47) 143

tuple-12 = tuple-11  { 3,3} (4.48)

tuple-13 = tuple-12  { 64} (4.49)

tuple-14 = tuple-13  {826} (4.50)

tuple-15 = tuple-14  {837} (4.51)

The uncompressed codeword is created from the tuples that represent the joint distribution of BRE and TFIIA. In this example, the codes for (4, 0), (2, 9), and (1,4) have a joint probability of zero and are passed over. In this example, the set of tuple-15 is required because the codeword of the joint distribution specifies i=5 and j=9,

(highlighted in yellow) which is the lowest probability entry in table 4-33. The sender can send the code for regulatory sequence BRE bonded to TFIIA, sij = (BRE  TFIIA) in compressed form as shown below in equation 4.52 through equation 4.54:

g = 2 4 5 5 2 2 2 1 1 3 (4.52)

tf = 5 0 8 9 2 9 2 4 1 4 (4.53)

sij = 604 400 C01 F04 428 290 428 140 214 802 (4.54)

This corresponds to the code for the tuples set of {(2,5), (4,0), (5,8), (5,9), (2,2), (2,9),

(2,2), (1,4), (1,1), (3,4)}. This does not correspond to the order of the sequence; it only corresponds to the statistical distribution. Tuples that are not represented in the joint probability matrix are represented in this code as (gn,tfn,0). This can reduce the data required for transfer by up to 75% depending upon which tuples are called for in the joint distribution, . The first column in table 4-33 corresponds to the fourth column in table

4-33 sorted from highest to lowest joint distribution (Pij).

144

Table 4-34. Codeword Entropy and Random hit probability

Given type , table 4-34 summarizes the probability of randomly selection of a codeword is shown in equation 4.55 :

p()  2nH(Pij ) (4.55)

For each protein-protein or protein-DNA interaction, the users can set up specific intersection requirements by stating which tuples are required, what order they should be in and how many correct tuples must be successively transmitted without an invalid tuple for complete authentication. The probability of selecting n successive sequences with the required distribution becomes p()n as shown in table 4-35.

Table 4-35. Probability of successive codeword hits by random selection

145

The new type,  captures the joint information from the transcription code protein sequences and regulatory code DNA sequences. Transcription factors and regulatory elements that do not bind will have a joint probability distribution of zero and sij ={}.

There are a number of alternative ways to populate sij. If the requirement for prefix- free codes is relaxed then tuples of type S in  in table 4-33 could be replaced, for example with fractional parts of irrational numbers. In which case table 4-33 might look like table 4-36. The values were taken from the fractional parts of hyperbolic sine and hyperbolic cosine functions.

146

Table 4-36. Alternative tuples for type 

4.4.2.6 Encrypting the Ciphergene-transcription factor as a protein-DNA

complex

The goal of this step is to encrypt the sequence information represented at the basic level of encryption into a higher level of ciphertext that contains more information. This additional information is the interaction between gene and proteins necessary for transcription.

The ciphergene, Lout, has the property that its diagonal contains the nucleotide*class.

Let Z be defined as in equation 4.58:

l1  Z   ...   z z ... z (4.58)    1 2 n   ln 

147

Where l1,…,ln are the encrypted diagonal elements of Lout.

In a manner analogous to the ciphergene, each protein (in this case, a transcription factor protein) is coded into a matrix Mout. Instead of introns, exons, etc. the constituents of the transcription factor codes are the subunits of protein. For example, TFIID contains a TATA Binding Protein (TBP) and 14 Transcription Activation Factors (TAFs). These would be coded for the TFIID transcription factor. Instead of a set of gene expression codes there would be an analogous set of subunit activity codes. These codes describe the level of activation provided by each subunit and exist on the interval (0,1], Thus, as shown in equations 4.59 and 4.60:

Mout  C * R * E1 *...* En 1 1 (4.59) Min  Mout * En *...E1 * R

m1  Y   ...   y y ... y (4.60)    1 2 n   mn 

Z and Y describe the genomic and proteomic sequences but it does not identify the interaction between those sequences.

 The transcriptional regulatory network for Z identifies the protein-DNA interactions as shown in the example in figure 4-38: Requirement for the Pre-

Transcriptional Complex. However, the proteins are now represented by the collection of subunits, just as the gene is now represented by a sequence of regulatory sequences, exons and introns.

148

 Complete tables of the event joint probabilities representing every interaction in the gene regulatory network are created, all of the joint probability distributions are computed.

4.4.3 Transcription and Translation

One goal of this cryptographic model is to provide a generalized, flexible framework for the coding of the processes of gene expression. The processes of transcription and translation can be coded in this framework at any level of granularity desired. In most cases, the coding capability of the model exceeds what is required to provide a secure protocol. However, any protein can be coded as a collection of subunits or a single coding entity. Any nucleotide sequence can be coded as a single coding entity or a collection of subunits. The user of the protocol determines the level of granularity required. Figures 4-42 through 4-45 represent the minimum level of coding granularity.

Most proteins are represented as a single coding entity, but can be represented in groups of subunits that are described in the subsequent paragraphs or as individual amino acids in a polypeptide sequence. Most nucleic acids are represented by their function in the gene sequence, but can be represented in groups of codons or individual nucleotide bases.

Numerous proteins are involved in transcription of DNA to mRNA and translation from mRNA into proteins. From a coding perspective, a completed Basal Transcriptional

Complex code is required prior to the transcription step of the ciphergene to cipher- mRNA. Both transcription and translation occur in assembly fashion through three steps:

 Transcription

o Initiation 149

o Elongation

o Termination

 Translation

o Initiation

o Elongation

o Termination

Although a fixed set of protein sequences are utilized, the encryption hardness comes from the fact that only the sender and receiver know which proteins codes in a particular message may be used and only the sender and receiver know the degree of binding between the required elements that must occur for a successful transcription process.

4.4.3.1 Coding of RNA Polymerase II

RNA Polymerase II is a twelve subunit protein complex that exists in some form across all eukaryotes. The twelve subunits are: Rbp1, Rbp2, Rbp3, Rbp5, Rbp6, Rpb8,

Rbp9, Rbp10, Rbp11 and Rbp12 that make up a 10-subunit polypeptide core with Rbp4 and Rbp7 forming a heterodimer complex required for initiation [31]. These subunits may be coded into the protocol as required. In vivo, RNA Polymerase II performs the transcription function by forming a single stranded RNA biopolymer from the template strand of the double stranded DNA biopolymer. (In the coding structure of the protocol, only one DNA strand is used, therefore, in this context it is the template strand). Joint probability matrices with the other factors and the ciphergene will be used to setup the coding as was done previously for transcription factor-ciphergene complexes.

150

4.4.3.2 Co-Activators - Mediator

Mediator is a 26 subunit protein complex required for assembly and regulation of the in vivo pre-initiation complex. Mediator also includes proteins CCNC, CDK8 and

CDC2L6/CDK11. Mediator bridges between gene-specific regulatory proteins to the basal RNA polymerase II transcription machinery [49]. Mediator has a co-activator function in that its presence is necessary for stabilizing the pre-initiation complex. The interaction of RNA Polymerase II and Mediator is a subject of intense study.

4.4.3.3 General Transcription Factors and Transcriptional Activator Factors

The general transcription factors are proteins without a high degree of sequence specificity required for all DNA transcription activities. For eukaryotic transcription, these factors are primarily members of the TFII family. Transcriptional Activator Factors

(TAFs) are proteins and protein subunits (e.g. subunit of TFIID) serve as co-activators in stabilizing the pre-initiation complex. Other proteins including transcription binding factor (TBP) proteins, TBP related factors (TBR), TAF-containing complexes (TFTC,

SAGA, SLIK/SALSA, STAGA, and PRC1) and TAF variants are all involved in regulation of transcription via transmission of cues between proteins and DNA [50]. The same algorithms as previously described apply.

4.4.3.4 Upstream Stimulatory Activity (USA)

These proteins induce an upregulation of transcriptional activity (positive acting,

PC1/PARP-1, PC2, PC3/DNA topoisomerase I, and PC4) or downregulation of transcriptional activity (negative acting, NC1/HMGB1) when bound as a co-activator in the pre-initiation complex. They seem to function by increasing the effectiveness of 151

transcription factor binding to the promoter and transcription repression occurs when absent and work independently or in conjunction with other proteins in the transcriptional machinery.

4.4.3.5 Upstream and Inducible Transcription Factors

In addition to the general transcription machinery previously described, additional transcription factors participate in transcription that either increase transcriptional efficiency or induce transcription in response to specific cellular conditions or demands.

Codes for these factors will customize the coding performed at the general transcription factor level. These codes can be used in response to specific network conditions, thus mimicking their utilization at the biological level.

o Upstream Factors. The upstream factors are DNA binding proteins associated

with short consensus sequences upstream from the initiation of transcription.

These include SP1, AP1, AP2 and others. These factors increase the efficiency of

initiation when bound to the consensus sequence [77].

o Inducible Transcription Factors. A variety of proteins assist in transcription by

binding to DNA response elements in response to specific cellular signals such as

hypoxia-induced transcription factor (HIF). These proteins may reside latently in

the cytoplasm and are transported to the nucleus in response to cellular signals.

These proteins include TFII-I, NF-κB, STAT and NFAT [51]. A small group of

these proteins has both cytoplasmic and nuclear functions.

152

4.4.3.6 The transcription process – Initiation, Elongation, Termination

One key element in interpreting any language or decrypting any ciphertext is knowing where the message starts and where the message stops. Thus, the transcription initiation and termination steps are very important in maintaining confidentiality of the underlying information.

4.4.3.6.1 Initiation

All of the transcription factors, regulatory sequences and RNA Polymerase II codes appear in a gene transcription regulatory network that represents a multidimensional joint probability matrix. The initiation phase is complete when the joint probabilities of all protein-protein and protein-DNA combinations have been computed and output type for each interaction has been computed and authenticated. This occurs into two steps. Any reference to an operation of binding or complexing is meant to state that the codes based upon the joint probabilities of two or more entities are computed and a code from the resulting type is computed using the methodology previously described.

All of the joint probabilities for the protein-protein interactions from the gene transcription regulatory network are computed. The resulting output strings can be checked against a database. This ensures that basal transcription complex has been set up correctly to recognize the key sequences in the ciphergene. These steps occur in a specific order as shown in figure 4-41. This order is derived from the sequential assembly pathway [52] for the basal transcriptional complex. The numeric designations 1-6 refer to the connections between the proteins and the gene sequence starting with the number 1, which are the RNA Polymerase II – DNA bindings through number 6 which are

153

Transcription Factor IIE bindings to gene sequence regulatory elements INR and MTE.

The processes then proceed as shown in figures. 4-42 through 4-45 [31],[50], [52].

Starting with figure 4-41, step A, the general transcription machinery is assembled.

First, the transcription factor proteins TFIID and TFIIA are complexed together. In step

B, transcription factor TFIIB is complexed with TFIID-TFIIA complex. The step B complex is complexed to RNA Polymerase II and TFIIF in step C. The step C complex is complexed with TFIIH and TFIIE in step D. This completes the validation and authentication of the protein-protein coding. The parenthetical quantities of (factor1  factor2 ) describe how the particular step is coded to compute the numerical representation above it. The parenthetical intersection quantities shown in figure 4-41 is representative of how each of the network diagrams in figures 4-42 through 4-45 are coded. In step E, all of the joint probabilities for the protein-DNA interactions are computed. The resulting output strings are the joint probability codes for the basal transcriptional complex.

154

155

4.4.3.6.2 Elongation

Elongation represents the transcription stage where the DNA codes are processed into

RNA codes. Chain elongation is also a key step in transcriptional control and provides many opportunities for enhanced encryption options. In vivo, transcriptional elongation is a process that contains temporal discontinuities, delays, and pauses. Both positive and negative acting elongation factors exist and the underlying mechanisms are highly varied and not well understood [52]. The transcription elongation process is a subject of intense research with many unknowns. Nevertheless, it is a process of protein-protein and protein-DNA, and DNA-RNA interactions, which can be coded into this protocol. The

DNA template and nascent mRNA transcript, and RNA Polymerase II complex form a temporary hybrid structure from which emerges the nuclear pre-mRNA transcript. The resulting output strings contain the representation of the codons of the cipher mRNA.

From a coding perspective, a baseline set of simplifying assumptions and a small set of protein-protein, protein-DNA and DNA-RNA interactions have been defined to produce the transcriptional elongation function as shown in figure 4-42. In figure 4-42, a transcriptional elongation complex consisting of RNA Polymerase II, TFIIS and TFIIF at transcription start site in the imitator (INR). In this model, DNA template is assumed to be in motion under the transcriptional elongation complex. In figure 4-43, the complex recruits the appropriate complimentary nucleotide. In figure 4-44, RNA Polymerase II

Carboxy Terminal Domain (CTD) and associated factors DSIF and PTEF-b are recruited.

PTEF-b is a positive elongation factor and its counterpart NTEF could be coded as a negative elongation factor. mRNA nucleotide addition cycle proceeds from this point.

156

The nascent, uncapped mRNA transcript is growing codon by codon until reaching figure

4-44. This sets off poly-A signaling and all of the codes needed for both 5’ m7-G cap and

3’ polyadenylation code (AAA…) are recruited in one step.

4.4.3.6.3 Termination

In the termination step, the transcription processes is stopped and the cipher mRNA is terminated with at the 3’UTR with a code from type of poly-A codes. This is more secure than terminating the sequence with a poly-A string. Transcription continues until transcription stop codon is reached. The transcription elongation codes are no longer needed. Figure 4-45 shows a new complex that assist in truncating the code at the 3’ cleavage site, properly terminating the 5’UTR end with code from the 5’ m7-G cap codes and 3’ polyadenylation code (AAA…). After completion of transcription, the cipher mRNA has a structure in the form of figure 4-46. Note that translation start and stop codons are within the exons and the ciphergene can be coded such that there will exist a

5’ UTR and a 3’ UTR. The section between the UTRs is referred to as the Open Reading

Frame (ORF). If a 5’ UTR and 3’UTR are desired, the plain text message should be coded such that both ends of the message padded with nonsense text. The untranslated regions will be useful for coding that utilizes RNA interference as a method of post- transcriptional regulation.

157

158

159

160

161

Figure 4-46. Format of mRNA code after completion of transcription

4.4.3.6.5 The Spliceosome and Alternative Splicing

In vivo, the output would consist of a pre-mRNA, which would be subjected to processing by the spliceosome . The spliceosome is an extensive complex of non-coding

RNAs and RNA binding proteins that accurately ligate the exon ends of the mRNA together and excise the introns from the mRNA transcript. Since this coding scheme is based upon eukaryotic processes, the introns are Group III introns. The group III introns are excised from the code and the exons to be translated flagged. The function of alternative splicing also exists such that a ciphergene can have alternative exons spliced together allowing different proteins to be coded from the same gene. If the sender includes nonsense or confusing plaintext in the original message, alternative splicing can be used to either process or exclude such text. The sender can also include plaintext for exon enhancer sequences, which could be used in processing alternative message constructs. After spliceosome processing, the mRNA is ready for translation.

By specifically including the spliceosome in the translation process for encryption, the path is open for increasingly higher levels of confidentiality through increasing the fidelity of the encryption process relative to the in vivo counterpart. This fidelity would

162

come through the use of codes for spliceosomal small nuclear RNAs.

4.4.3.6.6 Transcriptional Regulation via forms of transcriptional

activation

Transcriptional regulation can be achieved through recruitment of proteins as homodimers and heterodimers. Figure 4-43 two different transcriptional binding factor scenarios are shown that perform transcriptional regulation.

Figure 4-47. Transcriptional activation at PORE and MORE DNA regulatory sites

In the top scenario, the code for Oct-1 dimer (a domain of the POU activation protein) binds to the MORE DNA regulatory sequence code. However, this operation only results in weak transcription (low probability of transcription). In the bottom scenario, the code for the Oct-1 dimer binds to the PORE regulatory sequence. This creates a cryptographically difficult problem for attackers with limited knowledge of the encryption scheme.

Coding for dimers, or n-mers in general, is done by creating n joint probability 163

matrices for the monomer species. The transcriptional network for figure 4-43 would contain coding for:

 Oct-1  Oct-1

 OBF-1 OBF-1

 (Oct-1  Oct-1)  (OBF-1 OBF-1)

4.4.3.6.7 microRNA mediated mRNA post-transcriptional regulation

The transcriptional regulation coding processes previously described are mostly available at the algorithmic level. These processes occur in the nucleus of eukaryotic cells and are difficult to access, control and monitor. It would be extremely difficult to incorporate the necessary processes into a network biological security apparatus. microRNA mediated regulation of mRNA processing, although extremely complex, has processes that occur in the cytoplasm and could be incorporated into the operations of a ciphercolony in a Network BioID.

4.4.3.6.8 The role of non-coding DNA and RNA interference in developing

a genomic cryptographic protocol

The power of this protocol to produce cryptographically hard genomic and proteomic codes is enhanced by the ability to perform functions beyond the central dogma of DNA to RNA to protein. These protocols can utilize RNA products and also introduce post- transcriptional and post-translational modifications to the gene products. In eukaryotes most DNA does not code for proteins at all. Most of the DNA codes for non-coding RNA or is left untranscribed in the heterochromatin. In figure 4-48, the x-axis is by organism

164

and it applies to both chart A and chart B. In chart A, the ratio of the total bases of non- protein-coding DNA to the total bases of genomic DNA is shown. In chart B, the amount

(in megabases) of protein coding DNA per genome for species is shown. Whereas chart

A shows a trend towards greater complexity with higher ratios of non-coding DNA with humans near the top, chart B puts humans in the middle of the pack (behind rice) in terms of protein coding DNA. One of the goals of the research is to start to tap into this complexity in the non-coding DNA region (especially non-coding RNAs) for cryptographic purposes. One important class of non-coding RNA is microRNA. It post- transcriptionally regulates mRNA expression. A system for modeling microRNA mediated regulation of mRNA using information theory and public key cryptography has been developed and accepted for publication [53] as a part of the dissertation research.

While the primary purpose of the model is to study microRNA:mRNA mediated regulation, the scoring function of the model will provide inputs to alter the levels of expression of mRNA post-transcriptionally.

165

166

4.4.3.7 Translation

Translation in the code occurs by the same algorithmic processes as transcription. In the case of translation, a translation regulatory network codes the joint probability matrix for all of the protein-RNA and protein-protein interactions that are required. As was true for transcription, a fixed set of protein sequences are utilized, and the encryption hardness comes from the fact that only the sender and receiver know which proteins codes in a particular message may be used and only the sender and receiver know the degree of binding between the required elements that must occur for a successful translation process. Synonymous codons are encrypted such that they can be individually identified.

The process of translation is heavily dependent upon non-coding RNAs such as transfer

RNA (tRNA) and ribosomal RNA (rRNA). In this protocol, the functions of tRNA are utilized.

4.4.3.7.2 tRNA function

Transfer RNA (tRNA) codes read the mRNA codes and mimic function the in vivo functions. The tRNA codes are provided in the protocol as fully mature tRNA. The major functions of the tRNA are:

 Decode the mRNA codon.

 Covalently bind the proper amino acid that corresponds to the associated RNA

codon and transport that amino acid to the ribosome for incorporation into the

protein chain.

Figure 4-49 is a diagram of the functional parts of tRNA. Although numerous

167

exceptions exist in nature, the rule of one tRNA per amino acid code will be used.

Different tRNA codes are also utilized to provide a range of translational process options.

These options permit recognitions of alternative codons, e.g. AUA for AUG.

Figure 4-49. tRNA Structure

Figure 4-49 displays the tRNA structure with acceptor arm activity binding amino acids. The amino acid is initially activated by an aminoacyl tRNA synthase, which binds a molecule of AMP to the carboxy terminal of the amino acid. The resulting adenylyated- amino acid complex binds to the 3’terminal adenosine on the tRNA releasing the AMP moiety leaving a aminoacyl-tRNA complex. The anticodon arm binds the complementary codon from the mRNA [78]

168

4.4.3.7.1 Translational regulatory network at Initiation.

Figure 4-50 displays the translational regulatory network for a successful initiation phase. The steps occur in the order shown from A through G. Although every message is coded using the same database of proteins, each message is uniquely coded from the pool of types for each protein and individual regulatory networks. For every operation, there is an algorithmic operation of calculation of joint probabilities of types that provides a representation of that operation.

o In step A, the initiator tRNA for Methionine is complexed with translation factor

protein elF2 and GTP (guanosine triphosphate).

o In step B, the poly-A tail and 5’m7G cap are bound to a complex of proteins

consisting of PABP, elF4G, elF4A, elF4E. This leads to circularization of the

mRNA. In vivo steps A and B occur simultaneously.

o In step C, the product of step A is complexed with the ribosomal 40S small subunit

and translation factor proteins elF1A, elF3, and elF1.

Step D combines the complexes from steps B and C to form the 43S Pre-

Transcriptional complex. At this point, scanning for the translation start codon can begin.

When the start translation codon is detected, the 43S Pre-Transcriptional complex recruits elF5 to form the 48S Pre-Transcriptional complex as shown in Step E. elF5 stimulates release of the initiation tRNA from elF2, which is followed by disassociation of the proteins, not needed for elongation. The remaining complexes are shown in steps F and

G. elF5B will be released prior to the start of elongation.

169

4.4.3.7.2 Translational regulatory network at Elongation

Three new coding factors are added to provide the coding process with factors similar to those found in ribosomal elongation. These factors are: A, P, E. These factors are coded as subunits of the 60S ribosomal complex. The in vivo process has been modified and simplified for coding purposes. The factors A, P and E operate like a FIFO buffer as shown in figure 4-51. It starts with initiator in bound to the P-code and the next codon in sequence bound to the A code. The purpose of the A, P, and E codes are to error check and authenticate the ‘to be coded’ codon before it is used to elongate the cipherprotein.

At the A-site, coding is jointly authenticated using the protein codes eEF1A. At the A– site, the coding is jointly authenticated with protein codes from eEF2. After exiting the E- code, the amino acid code is appended to the N-terminal code of the cipherprotein. This process continues until the stop codon is read at the A-code.

170

171

172

4.4.3.7.3 Translational Regulation Network at Termination

When a stop codon is read at the A-site, transcription release factor eRF-1 is recruited to the E-site as shown in figure 4-52. After the E-site successfully checks the stop codon a signal is sent to the elongating cipherprotein to terminate the chain with a C-carboxyl termination code. The cipherprotein translation of the message is complete.

4.4.3.7.4 Post-translational modification coding, protein-protein interaction

Post-translational modifications involve conformational and chemical changes to proteins after translation. Post-translational modifications greatly increase diversity of patterns of gene expression [54]. These modifications are also required for proper protein function. In the context of encryption, it greatly increases the cryptographic hardness of this approach. Post-translational modifications occur in the cytoplasm. It is possible to perform these modifications experimentally in a variety of ways in vivo and in vitro.

From a coding perspective, protein codes from the protein codebook will map into post- translational codes in a variety post-translational codebooks. Coding in the specific area of post-translational modification will require the use of amino acid codes for specificity of the protein-protein interaction (e.g. proper pairing of cysteine residues [79].

The mechanisms of post-translational modifications required in vivo to transform the translated protein sequence into the proper functioning protein can be utilized. These include:

 Phosphorylation

 Glycolsylation

 Ubiquitination

173

 S-Nitrosylation

 Methylation

 N-Acetylation

 Lipidation

 Proteolysis

174

175

4.4.3.8 Network Concepts for A Cryptographic System Using The Principles

Of Gene Regulation

4.4.3.8.1 Using Gene Regulatory Networks as a basis of authentication in

a Mobile Ad hoc Network (MANET).

Figure 4-53 summarizes an application in which users possessing transcription factor codes and the necessary pre-shared secrets to form a MANET. In this case, remote

MANET members A, B, D, E, F, and H authenticate candidate member Z by transmitting the transcription factors necessary to activate the ciphergene Z. If Z responds with the proper protein expression code, Z is authenticated. There is a temporal aspect to this form of authentication, as all codes must be received responded within a specified time window. No users need be aware of the specific functions of the codes being sent.

176

Figure 4-53. MANET authentication via the general transcription machinery specified in a gene transcriptional regulatory network

4.4.3.8.2 Proteomic Authentication Messages.

In figure 4-54, the IT security official receives a remote request for access to network assets from a remote user. The security official sends the user a message coded as a protein sequence, by a regulatory network using a message-specific set of protein-DNA joint distribution codes and a source coding scheme based upon a keyed hash function tied to a specific genome. The user successfully decrypts the message and returns the plaintext (which could be encrypted if desired) to the IT security official. The IT security official then sends a set of access credentials encrypted with a different protein and a different genome for the keyed hash code. The user successfully decrypts the message to

177

gain access to the network. In this scheme, an attacker needs multiple levels of information at the genomic and proteomic levels to be able to decode the message by cryptanalysis means alone.

Figure 4-54. Protein Coded Authentication Challenge

4.4.3.8.3 Integration of genomic and proteomic protocols into legacy

security networks.

Paragraphs 4.4.3.8.1 and 4.4.3.8.2 present a view of the isolated use of the protocols within an existing legacy network. This view is important because potential users will be risk averse with regards to adoption of new concepts. Given the large investment in infrastructure and manpower associated with the legacy security protocols, this is 178

reasonable. Figure 4-55 re-iterates the view of how to incorporate this technology into the legacy networks on an incremental basis.

Figure 4-55. Alice and Bob communicate using genomic network security

Two new concepts are introduced:

 The Network BioID

 The Ciphercolony

(a) Network BioID. The Network BioID interfaces to a computer network and performs the full suite of authentication and confidentiality functions required by the protocol. It exchanges data with other Network BioIDs. It is a genomic and proteomic firewall. The heart of the Network BioID is the ciphercolony.

(b) A ciphercolony contains a combination of live and virtual inhabitants which maintain

a collective pattern of gene expression.

179

A notional architecture for the Network BioID is shown in figure 4-56.

Figure 4-56. Network BioID architecture.

As previously stated, Alice and Bob can continue to use their legacy IPSec unabated.

Alice and Bob can also perform standard IPSec exchanges using gene expression data as keys. Alice can send Bob a protein coded message, which directs Bob’s ciphercolony to express a protein, take a picture and send the result back to Alice for authentication. Bob can send Alice a message encrypted in a protein code and Alice can return the plaintext associated the unique plaintext and pre-shared genomic secret. Alice can send Bob a

RNA message and Bob can return the unique protein message with associated with a pre- shared genomic secret. Bob can send Alice a message with the patterns of gene

180

expression of 14 microbials in his ciphercolony and Alice can take that data, apply it to her ciphercolony, and reply with new patterns of expression of 25 genes in her ciphercolony. Then Bob can take that data, apply it to his ciphercolony, and return the new pattern of expression, and so forth. Over time, Alice and Bob’s Network BioID’s learn to recognize each other through the patterns of gene expression in their ciphercolonies. Eve can only impersonate Bob or Alice by knowing a very long list of state information. This becomes true for every Network BioID on the network.

The ciphercolony patterns of gene expression can be stored in the Network BioID in the form of probability mass functions of the genes in a regulatory network. The distribution of masses changes with each state of gene expression. These changes of states can be expressed as transition matrices in a non-stationary Markov chain, or in a non-Markovian, or other suitable model for progression of states of gene expression. In fact, the use of external stimuli to alter patterns of gene expression is required to produce confusion and diffusion in the ciphercolony state of gene expression.

4.4.3.8.4 Network Firewalls via Patterns of Gene Expression

A process for developing cryptographic codes for transcriptional and translational expression has been described in the previous paragraphs. A network security hierarchy can now be established by developing patterns of ciphergene expression. Messages processed through colonies of ciphergenes create gene expression products. Groups of related ciphergenes respond to the gene expression products with altered patterns of ciphergene expression. Colonies of ciphergenes interact via communications channels to expand the network of patterns of gene expression. Authenticated network nodes can

181

recognize each other by patterns of gene expression in each authorized node. Ultimately, the ciphergene patterns of gene expression are augmented by colonies of live organisms, which exchange in vivo patterns of gene expression with the ciphergenes in their own network node and other network nodes. A hierarchal series of transition probability matrices express the states of the patterns of gene expression. Security is partly based upon the large number of combinatorial products of gene expression available. The products of gene expression may be sampled and coded into encryption algorithms.

Given a sample size of 100,000 gene expression products from a colony of ciphergenes sampled at 10 products at a time yields as shown in equation 4.65

( ) (4.65) ciphertext input combinations. Some of the gene expression products may be naturally derived based upon actual assay samples from the colonies of ciphergenes and some may be derived algorithmically. By using a combination of natural and algorithmic gene expression products, there will generally be no coding latency due to the time it takes to perform biochip-based assays.

By way of example, assume that a colony of ciphergenes has two organisms, where each organism is capable of expressing four gene expression products. In this case, the four gene expression products are four proteins. The first organism has patterns of

expression A represented by the random or pseudorandom variables { }. The second organism has patterns of expression B represented by the random or

pseudorandom variables { }. These random variables represent the quantity of a gene expression product when fully expressed (i.e., the maximum amount that the organism would produce in a homogeneous culture). The union of A and B represent the 182

complete pattern of gene expression in the colony of ciphergenes. A and B produce the gene expression products (proteins) with probabilities as predicted in the random variables p(a) and p(b). These probabilities may change based upon the state of the expression of A and B. C represents the total protein from A and B expression. The patterns of gene expression may be modified by post transcriptional forms of regulation as represented by F and G. The presence of proteins from the first organism affects the levels of expression of proteins in the second organism, and vice versa, as represented by state matrix H. For this example (in equations 4.66 through 4.74) :

{ } (4.66) { } (4.67) { } (4.68)

( ) {( ) ( ) ∑ } (4.69)

( ) {( ) ( ) ∑ } (4.70)

( ) {( ) ( ) ∑ } (4.71)

[ ] (4.72)

[ ] (4.73)

[ ] (4.74)

A MATLAB simulation of thousands of rounds of gene expression was performed for this example. Four successive states are shown in figure 4-57 through figure 4-60 at different time intervals. At each iteration, random probabilities (forced by changes in 183

stimuli to gene expression) P(A) and P(B) are applied to the gene expressions A and B, followed by application of post-transcriptional expression state matrices F and G. The sum of the expressed products by A and B directly affect the expression of both, so random probabilities P(C) are applied to the sum of proteins followed by post- transcriptional expression state matrix H. The result is dynamic changes in the observed levels of expression of the eight proteins (four for the first organism and four for the second organism). In this example A and B can exchange the expression data for later use in message exchanges for authentication. The output of each round of simulation is a state vector that describes the state of protein expression in the colony of ciphergenes.

The state vector is time stamped and translated into a message encrypted by any desired protocol. The state vector is retained by the originator and transmitted to the other colonies of ciphergenes in the network. The set of all time stamped state vectors defines the state of protein expression for the history of a colony of ciphergenes.

No two users will have an identical colony of ciphergenes. This is because even if two users started with identical live flora, they will have divergent patterns of gene expression. Additionally, different colonies of ciphergenes will undergo different forms of expression modifications to force the patterns of gene expression to diverge from a steady state. Furthermore, the algorithmic patterns of gene expression of each colony of ciphergenes may be initialized with random seeds derived from a pattern of gene expression from a live colony (either within the colony of ciphergenes or externally- derived).

Users can authenticate each other by requesting one or more time stamped state vectors, decrypting the state vector(s) and comparing the decrypted state vector(s) to a 184

previously stored value. Gaps in the time record can also be used for authentication. If a user knows of a time window in which no state vectors were distributed, the gap in the record can be used as an authentication tool. In another example, figure 4-57 shows the exchange of time stamped vectors between User 1 and User 2. In this example, User 1 and User 2 have colonies of biological and ciphergenes that express proteins A and B.

For User 1, at time intervals 6 and 7, no expression data is given, resulting in a gap in the record. For User 2, there is a gap in time intervals 4 through 8.

Figure 4-61 depicts the uncoded data. In a real application, the data would undergo source and channel coding and encryption. Source coding to achieve a target entropy code length could be applied for data compression or unique identification of code words longer than the minimum entropy code. Such coding will enhance error detection and correction, as well as authentication.

Figure 4-62 depicts an authentication handshaking message flow between User 1 and

User 2. User 1 and User 2 have previously exchanged colony of ciphergenes state information for proteins A and B. After exchanging identities via a legacy (e.g., secure sockets layer (“SSL”)) authentication process, User 1 requests that User 2 send the previously exchanged User 1 state vector for time stamp t = 1. User 2 responds and requests that User 1 send the previously exchanged User 2 state vector for time stamp t =

9. User 1 and User 2 complete the handshaking process. In the case of a compromised network, the colony of ciphergenes-based system may provide a temporary means to continue network operations until the threats to the legacy systems have been neutralized.

The biosecurity system can operate in parallel with legacy security protocols. Samples of the gene expression products may be used to generate inputs for network-to-network 185

security associations. These inputs may include inputs to public/private key pairs, key encryption keys, nonces, etc.

Figure 4-57. Pattern of Expression at t=12

Figure 4-58. Pattern of Expression at t=22

186

Figure 4-55. Pattern of Expression at t=71

Figure 4-60. Pattern of Expression at t=113

187

Figure 4-61. Exchange of ciphercolony state information about Proteins A and B between User 1 and User 2.

Figure 4-62. Handshaking protocol between User 1 and User 2. 188

4.4.3.8.5 Secure versus non-secure ciphercolony implementations

Referring to figure 4-63, let each bar represent the level of expression a group of related genes. Users A and B are passing state of expression information back and forth, but the patterns become periodic as the ciphercolony inhabitants pass through the lag, log, stationary and death phases. The height of the bars corresponds to quantities of gene product expression. Note that they form a repeating pattern, in which the gene expression in the 4th time interval is the same as in the 1st time interval. In addition, assume that this pattern continues to repeat every 4th time interval. An attacker could easily discern this pattern and use it to impersonate either A, B or both. Much like a random number generator that produces the same random number sequence when provided with the same seed, this form of the concept provides no additional security.

Figure 4-63. Non secure ciphercolony implementation

189

In a secure implementation, such as figure 4-64, the ciphercolonies have a heterogeneous colony of eukaryotes and prokaryotes and the patterns of gene expression are constantly being modulated by external stimuli. This creates opportunities for new groups of genes to be expressed. The patterns do not repeat as evidenced by the new patterns and the height of the bars. The stimuli can be applied independently of each other or one user can inform another user that they are becoming repetitious and either alter their behavior or be removed from the network.

Figure 4-64. Secure Ciphercolony implementation

190

CHAPTER 5  GENOMIC AND PROTEOMIC ENCRYPTION/DECRYPTION PROCESSES AND

SIMULATION

5.1 Process Overview.

The following figures in paragraph 5.1 outline the general processes that are used.

The example simulation and results shown in paragraph 5.2 provide the details of how each process is implemented. Figure 5-1 summarizes the basic processes of the genomic and proteomic algorithms. Messages can start as plain text or a DNA sequence, and then undergo encryption processes up to a maximum level of ciphertext in the form of a protein sequence with all of the attendant regulatory sequence information.9 The regulatory sequence information for transcription and translation is stored in the form of codes that represent the mutual information between interacting sequences, such as a

DNA regulatory sequence and a protein transcription factor code. Figure 5-2 summarizes the data types for the genomic and proteomic protocols used at all levels of the process.

Figure 5-3 summarizes the progression of the structure of the ciphertext through the encryption process. There are no fixed input block sizes with this protocol. Figure 5-4 summarizes the processes that transform coding sequences into types with appropriate codes. These functions occur at levels 2, 3A, 3B, and 3C. This also summarizes the supporting structural information for each gene in the ciphercolony. Figure 5-5 summarizes the operations that are performed on coded gene sequences at level 1. Figure

5-6 summarizes the transcription operations performed to code mRNA sequences from gene sequences at level 3B. Figure 5-7 summarizes the translation operations performed

9 Although the examples and data are based upon the manipulation of text characters, any bit patterns can be encrypted and decrypted so long as the patterns can be reduced to a consistent set of representations which would be referred to as characters. 191

to code protein sequences from mRNA sequences in level 3C.

192

Figure 5-1. Genomic and Proteomic Flowchart for Encryption and Decryption through all levels of the protocol

193

Figure 5-2. Data Types for the Genomic and Proteomic Protocols

194

Figure 5-3. Progression of the structure of the ciphertext.

195

Figure 5-4. The Transformation of coding sequences into types.

196

Figure 5-5. Operations performed on coded gene sequences

197

Figure 5-6. Coding and Decoding of mRNA sequences

Figure 5-7. Coding and Decoding of Protein sequences

198

5.1.1 General Information about the protocols.

 Every gene sequence resides in a ciphercolony database.

 Every gene sequence is indexed by a ciphergene ID.

 The ciphergene ID points to all of the features unique to the expression of the

gene. It is the single link to all of the information necessary to process and

regulate transcription and translation for a given gene and message.

 Each output level of the protocol carries all the levels beneath it in its payload.

 Every gene sequence possesses the following attributes:

 Matrix F, that contains the starting location of each Type in the gene along

the diagonal

 Matrix, G, which contains a probability of expression for the gene in a

given state. The number states is given by the number diagonal entries in

G. F and G are square and the same size.

 A matrix C, that is the product of F and G.

 Encryption matrices E1, E2, …,En, that operate on C. Decryption matrices

1 1 1 E- 1, E- 2,…, E- n that return C.

 A series of regulatory networks that describes the interactions with

proteins and other nucleic acids necessary for all the processes within this

protocol.

 One or more Types (the Types have been previously described in

paragraph 4.4)

 Each Type possess the following attributes: 199

 A probability mass function to derive a code to represent each

Type as utilized by the ciphergene.

 A position in a regulatory network to describe its relationship to

the other Types required for transcription or translation of the

ciphergene. Each type-to-type relationship is a joint event.

 A joint probability matrix with its mutual information to other

Types required for transcription and translation using the joint

event

 For every joint event, a code is derived from the joint probability matrix

and the coding of the Types. This code is typically much longer than the

either of the codes for an individual Type in a joint event.

 For sequences that are converted from a DNA message to a DNA

sequence or a DNA message to a mRNA sequence (and vice versa), there

exists a coding process of ring subtraction over a subset of integers and an

inverse process of a ring addition over a subset of integers.

 For sequences that are converted from mRNA to protein (and vice versa)

there exists a substitution process for selecting the amino acid code from a

triplet of mRNA codes (codon) and a reverse substitution for recovering

the codon from the amino acid code. The synonymous codons are coded

uniquely.

200

5.1.2 Source Data

The genomic and proteomic protocols described herein can be incorporated into the structure of any security or transmission system. They can be packed into TCP/IP segments, they can be transmitted via UDP, they can be incorporated into TLS/SSL,

IPSec or any protocol. The example and following data provide information on the amount of overhead the protocols require.

5.2 Encryption Process Example.

5.2.1 Example Input Message

A 140 character message was encoded using the source coding process previously described using the previously shown Visual Basic software. The chromosome key is

Homo sapiens -globin transcript HBB-001 (Ensmbl designation ENST00000335295).

Table 5-1 shows the plaintext message.

Table 5-1. Plaintext Twitter Length messages

Plain text DNA message length DNA message length (in (in nucleotides) codons) Twitter 1 Opening message 3861 1287 from user 1 to user 2 Sending an opening request in first of four messages each message 140 characters long format

The floating point source encoded message is shown in table 5-2.

201

Table 5-2. Floating point Source Encoded Message

761.7568576 763.8875189 92.7678577 81.9276276 92.8891764 88.7587629 751.8876177 762.9276389 762.9276794 87.8896768 765.9276786 88.7639276 762.7618886 91.8576376 766.9188918 763.8881928 96.8792763 96.9486888 762.7629688 751.7467529 96.7639695 763.7628883 87.8891767 96.7618889

The message length given in equation 5-1:

Length (Msg1FPsource) = 252 characters * 4.86 bits/char = 1225 bits (5-1)

5.2.2 -globin gene sequence information

The -globin gene sequence is shown below in table 5-2 taken from Ensembl.org

(adapted from HBB-001 ENST00000335295) [80]. Exons are in upper case letters. The

202

untranslated regions (UTR) are in purple, and the coding sequence is in black. Lower case green letters are untranslated regions where upstream and downstream regulatory elements can be found. The flanking sequences are in lower case letters. Introns are blue lower case, upstream and downstream regions are in green. Key regulatory sequences are in the flanking sequences.

203

Table 5-2. -globin transcript information

Exon / Intron Start End Length Sequence

5' upstream ggtttgaagtccaactcctaagccagtgccagaagagc sequence 50 caaggacaggta CGGCTGTCATCACTTAGACCTCAC CCTGTGGAGCCACACCCTAGGGTT GGCCAATCTACTCCCAGGAGCAG GGAG GGCAGGAGCCAGGGCTGGGCATA AAAGTCAGGGCAGAGCCATCTAT TGCTTACATTTGCTTCTGACACAA CTGTG TTCACTAGCAACCTCAAACAGACA CCATGGTGCATCTGACTCCTGAGG AGAAGTCTGCCGTTACTGCCCTGT ENSE00001829 GGGGCAAGGTGAACGTGGATGAA 867 5,248,427 5,248,160 268 GTTGGTGGTGAGGCCCTGGGCAG

gttggtatcaaggttacaagacaggtttaaggagaccaa Intron 1-2 5,248,159 5,248,030 130 tagaaactgggcatgtggagacagagaagactcttg GCTGCTGGTGGTCTACCCTTGGAC CCAGAGGTTCTTTGAGTCCTTTGG GGATCTGTCCACTCCTGATGCTGT TAT GGGCAACCCTAAGGTGAAGGCTC ATGGCAAGAAAGTGCTCGGTGCC TTTAGTGATGGCCTGGCTCACCTG GACAA CCTCAAGGGCACCTTTGCCACACT GAGTGAGCTGCACTGTGACAAGC ENSE00001057 TGCACGTGGATCCTGAGAACTTCA 381 5,248,029 5,247,807 223 GG gtgagtctatgggacgcttgatgttttctttccccttcttttc tatggttaagttcatgtcataggaaggggata agtaacagggtacagtttagaatggttctgcttttattttat ggttgggataaggctggattattctgagtccaagctagg cccttttgctaatcatgttcatacctcttatcttcctcccaca Intron 2-3 5,247,806 5,246,957 200 g

204

Table 5-2 (continued)

Exon / Intron Start End Length Sequence CTCCTGGGCAACGTGCTGGTCTGT GTGCTGGCCCATCACTTTGGCAAA GAATTCACCCCACCAGTGCAGGCT GCC TATCAGAAAGTGGTGGCTGGTGTG GCTAATGCCCTGGCCCACAAGTAT CACTAAGCTCGCTTTCTTGCTGTC CAA TTTCTATTAAAGGTTCCTTTGTTCC CTAAGTCCAACTACTAAACTGGGG GATATTATGAAGGGCCTTGAGCAT ENSE00001600 CTGGATTCTGCCTAATAAAAAACA 613 5,246,956 5,246,694 263 TTTATTTTCATTGCAA

3' downstream tgatgtatttaaattatttctgaatattttactaaaaagggaa sequence 50 tgtggga

1184

Table 5-3 summarizes the length and location of the non-coding and coding positions in the gene.

Table 5-3 Coding and Non-coding base position summary

Non-coding (NC) NC position Coding (C) position C Count Base Count 226 1-226 227-318 92 130 319-448 449-671 223 200 672-871 872-1000 129 184 1001-1184

Total bases 740 444

205

5.2.3 Ciphergene coding process using the -globin gene as the message carrier

Table 5-4 shows the -globin sequence, as it will be coded into its various elements.

In this case, the example from figure 3-7 will be used for guidance. The mapping of the base sequences into the regulatory regions was influenced by [55]. The non-canonical

TATA box sequence and location selection in -globin was influenced by [56].

206

Table 5-4. -globin coding elements

Coding Element Sequence Absolute Type Position Non Coding 1 GGTTTGAAGT 1 - 10 NC BRE CCAACTC 11 - 17 CP Non Coding 2 CTAAGCCAGTGCCAGAAGAG 18 - 37 NC TATA box CCAAGGA 38 - 44 CP Non Coding 3 CAGGTA 45 - 50 NC INR CGGCTGT 51 - 57 CP Non Coding 4 CATCACTTAGA 58 - 68 NC Motif Ten Element (MTE) CCTCACCCTG 69 - 78 CP Non Coding 5 TGGA 79 - 82 NC Downstream Core GCCACA 83 - 88 CP Promoter Element (DPE) Non Coding 6 CCCTAGGGTTGGCCAATCTACT 89 - 226 NC CCCAGGAGCAGGGAG GGCAGGAGCCAGGGCTGGGCAT AAAAGTCAGGGCAGAGCCATCT ATTGCTTACATTTGCTTCTGAC ACAACTGTG TTCACTAGCAACCTCAAACAGA CACC Protein Coding 1 ATGGTGCATCTGACTCCTGAGG 227 - 318 PC AGAAGTCTGCCGTTACTGCCCT GTGGG GCAAGGTGAACGTGGATGAAGT TGGTGGTGAGGCCCTGGGCAG Intron 1 GTTGGTATCAAGGTTACAA 319 - 448 IN GACAGGTTTAAGGAGACCAATA GAAACTGGGCATGTGGAGACAG AGAAGACTCTTG GGTTTCTGATAGGCACTGACTC TCTCTGCCTATTGGTCTATTTT CCCACCCTTAG

207

Table 5-4 (continued)

Coding Element Sequence Absolute Type Position Protein Coding 2 GCTGCTGGTGGTCTACCCT 449 - 671 PC TGGACCCAGAGGTTCTTTGAGT CCTTTGGGGATCTGTCCACTCC TGATGCTGTTAT GGGCAACCCTAAGGTGAAGGCT CATGGCAAGAAAGTGCTCGGTG CCTTTAGTGATGGCCTGGCTCA CCTGGACAA CCTCAAGGGCACCTTTGCCACA CTGAGTGAGCTGCACTGTGACA AGCTGCACGTGGATCCTGAGAA CTTCAGG Intron 2 GTGAGTCTATGGGACGCTT 672 - 871 IN GATGTTTTCTTTCCCCTTCTTT TCTATGGTTAAGTTCATGTCAT AGGAAGGGGATA AGTAACAGGGTACAGTTTAGAA TGGTTCTGCTTTTATTTTATGG TTGGGATAAGGCTGGATTATTC TGAGTCCAAGCTAGGCCCTTTT GCTAATCATGTTCATACCTCTT ATCTTCCTCCCACAG Protein Coding 3 CTCCTGGGCAACGTGCTGG 872-1000 PC TCTGTGTGCTGGCCCATCACTT TGGCAAAGAATTCACCCCACCA GTGCAGGCTGCC TATCAGAAAGTGGTGGCTGGTG TGGCTAATGCCCTGGCCCACAA GTATCACTAA Non Coding 7 GCTCGCTTTCTTGCTGTCCAA 1001 -1107 NC TTTCTATTAAAGGTTCCTTTGT TCCCTAAGTCCAACTACTAAAC TGGGGGATATTATGAAGGGCCT TGAGCATCTGGATTCTGCCT Poly Adenylation Site AATAAAAAACATTT 1108-1121 PolyA Non Coding 8 ATTTTCATTGCAA 1122-1184 NC TGATGTATTTAAATTATTTC TGAATATTTTACTAAAAAGG GAATGTGGGA

208

The process will be to code the message into the coding bases of the exons. This consists of 444 bases or 148 amino acid codons. The strategy will be break the message into 444 bases increments and encode into the -globin coding bases and transmit the resulting ciphergene to a receiver who will decrypt the message. Per the process described in 4.4.2, the first 444 bases of the message replaced the -globin protein coding bases. For this example, the sequence will be divided into 5 types: NC (Non-Coding),

CP (Core Promoter), PC (Protein Coding), IN (Intron), YA (poly adenylation). The position of each type designated by its starting nucleotide position from table 5-4 becomes F and is shown in table 5-5.

Table 5-5. Type Identifier for -globin

Absolute Type Position NC 1 CP 11 NC 18 CP 38 NC 45 CP 51 NC 58 CP 69 NC 79 CP 83 NC 89 PC 227 IN 319 PC 449 IN 672 PC 872 NC 1001 PolyA 1108 NC 1122

209

There are nineteen entries in table 5-5. Coding the types of column 1 of table 5-5 reduces the 1184 nucleotide sequence to the 19 x 19 matrix referred to as F in equation 4-

38 as shown in table 5-6. Table 5-7 contains the expression codes referred in equation 4-

39 as G. It is a probability of expression profile over 19 different states. Equation 4-40 is satisfied by the product C = F * G. C contains the sequence information and a series of transcription probabilities in compressed form over the 1184 bases and possible transcription probabilities. C is shown in table 5-8.

210

Table 5-6. Compressed Ciphergene Sequence matrix, F

211

Table 5-7. Ciphergene expression profile matrix, G

Table 5-8. Ciphergene coding matrix, C

212

Encryption is performed per equation 5.1:

Lout_ciphergene  Cciphergene  *E1 *E2 (5.1)

For demonstration purposes, E1 and E2 are implemented using utilizing 4 and 5 digit primes for E1 on the range of (-10957, +10957) followed by a random sequence in E2 on the range of (-7.0479 to +10.2522). If desired E1 and E2 could be generated by a legacy method such as ECC Diffie-Hellman. The encryption keys can be integer, real, or complex or any combination thereof. Decryption is performed by equation 5.2

1 1 Lin  Lout * E2 * E1 (5.2)

E1 and E2 are shown in tables 5-9 and 5-10 respectively. The encrypted output is shown in table 5-11. For decryption, the receiver maps the identifiers for each type back into the sets that contain their detail sequence information and then the sequence is decoded using the source decoding process previously described in paragraph 4.3.

213

214

215

216

The length of the Level1lout_ciphergene is given by equation 5.3

Length(Level1out_Ciphergene) = (3340 characters

*4.86 avg bits/character) = 16,232 bits. (5.3)

5.2.4 Encryption of Nucleotide base sequences

This process can also be used to encrypt the nucleotide base sequence. In that case, the matrix size is determined by the length of the message and the processing constraints of sender and receiver. The types would be the individual nucleotides (A, C, G, T, etc.). the integer identifiers would different, an example would be a simple assignment of A =

1, G = 2, C = 3, T = 4. All encryption and decryption processes remain the same as in paragraph 5.2.3. Taking an example of the first 9 bases in the message and first 9 bases in the -globin protein coding sequence, an addend which codes the conversion of the- globin base to the message bases is defined for the set {1, 2, 3, 4}. The receiver decrypts the addend code and converts the -globin sequence to the message sequence. For this example, the ring subtraction and addition codes shown in tables 5-13 and 5-14 are used for explanatory purposes. Given gene sequence codes for {A,G,C,T} → {1, 2, 3, 4} such that Mj  {1, 2, 3, 4}, Aj  {1, 2, 3, 4}, and, ntj  {1, 2, 3, 4}:

Aj=Mj-ntj (5-4)

Mj=Aj+ntj (5-5)

th th Where Aj is the j addend code to be transmitted substituted for j nucleotide base

th th code in -globin, ntj is the j nucleotide base code in -globin, and Mj is the j message

217

nucleotide base code. Then the ring subtraction table in 5-13 encodes the message at the source and the ring addition table in 5-14 recovers the message the receiver.

Table 5-13. Coding the Addend from the Message and the Gene sequence

Message {Mj} 1 2 3 4 1 4 1 2 3 Gene {ntj} 2 3 4 1 2 3 2 3 4 1 4 1 2 3 4

Table 5-14. Decoding the Message from the Addend and Gene sequence

Gene {ntj} 1 2 3 4 1 2 3 4 1 Addend 2 3 4 1 2 {Aj} 3 4 1 2 3 4 1 2 3 4

The DNA sequence representing the message need not be transmitted; only the addend code, as shown in table 5-15. The encrypted is assigned to the level 1 output variable in equation 5.6.

Table 5-15. Coding and Decoding DNA with Addend Codes

 -globin A T G G T G C A T  Message C C T A C T A G T ntj  1 4 2 2 4 2 3 1 4 Mj  3 3 4 1 3 4 1 2 4 Aj=Mj-ntj AddendCode 2 3 2 3 3 2 2 1 4

Mj=Aj+ntj Message 3 3 4 1 3 4 1 2 4

Level1out_message = A (5.6)

218

The length of Level1out_message is given by equation 5.7:

Length(Level1out_message ) = 3861 addend characters

* 4.86 bits/character = 18,765 bits (5.7)

The receiver computes the ring addition of the transmitted addend code to the integer representation of the -globin protein coding sequence yielding the integer representation of the message, which is translated to DNA text and then hashes it as in equation 5.8. An attacker would have to know the gene and sequence within the gene to apply the addend code to retrieve the DNA text of the message. The matrices are reshaped into linear vectors and the final level 1 output is shown in equation 5.8.

Level1code =H( Level1out_ciphergne)||Level1out_message || CID, KT1 ) (5.8)

which is equivalent to equation 5.9:

= (Level1out_ciphergne)|| Level1out_message || CID) KT1 ) (5.9)

KT is defined in paragraph 4.3.1.2. There will be a key schedule for keys KT1,…,KTq.

Each level of encryption utilizing KTq, will pull a key from a key schedule pointed to by

CID. Assume the length of a Ciphergene ID is 1K bits, the length of the level 1 output is given by equation 5.10:

Length(Level1code) = length(Level1out_ciphergene) + length(Level1out_message) + length (CID)

= 16,233 + 18,765 + 1,024 = 36,022 bits (5.10)

This completes the description of the level 1 encryption and decryption process.

219

5.2.5 Nomenclature

A ciphergene ID, CID, exists which points to the correct biogene sequence, in this case HBB-01, -globin, and the structure of the biogene to be used for the session or message to be operated on. The encryption matrices E1 and E2 form the Gene Sequence

Key Encryption Key (GSK). The GSK permits decryption of the locus control region key

(Bio-LCR) when indexed with the correct CID. The -globin sequence can have many variations of structure without changing the underlying sequence and CID points to a specific regulatory structure. For example, there are 51 bases upstream of the transcription initiation site. From a security perspective, the coding of the regulatory structures is not strictly limited to their appearance in nature. Table 5-16 provides the total combinations types of elements that could be achieved using the -globin sequence.

Table 5-16. Combinations of codes upstream of transcription initiation site

Type Length Combinations Non Coding 1 10 1.28E+10 BRE 7 1.16E+08 Non Coding 2 20 7.75E+13 TATA box 7 1.16E+08 Non Coding 3 6 1.80E+07 INR 7 1.16E+08

Sum 7.75E+13

5.2.6 Description of Message Traffic between Sender and Receiver.

Assume that Alice and Bob have the necessary components of this system and the

220

chromosome keys, -globin sequence, and pre-shared secret hash codes. One possible scenario for sending a secure message incorporating legacy protocols is as follows and is shown in figure 5-8:

(a) First, Alice and Bob establish a secure session with their legacy protocols.

Then, Alice sends Bob a ciphergene ID (CID), for a given gene, X, encrypted

with Bob’s public key

(b) Bob decrypts the CID with his private key and returns a sequence, Sn, which is

a sequence of n bases from X. The location of the sequence is a pre-shared

secret between Bob and Alice.

(c) Having established two forms of identity verification between Alice and Bob,

Alice transmits the encrypted CID for -globin with Bob’s public key. Alice

transmits the Level1code. The message is 3861 bases long, the number of

protein coding positions in -globin is 444 bases long, so the sequence count

wraps around every 444 bases, i.e. 445th addend code in the sequence is added

to the first protein coding base in the -globin sequence.

(d) Bob decrypts the CID with his private key and uses CID to retrieve the -globin

sequence details and decryption keys, and then decrypts Level1out. Bob

assembles the ciphergene and applies the addend code to retrieve the DNA

text from the protein coding regions of the -globin sequence.

(e) Bob can recover the plaintext using the source decoding process or pass the

ciphergene on to level 2 encryption.

(f) Unless Eve can impersonate Bob or Alice in a man-in-middle attack, Eve

221

must have access to keys E1 and E2 as well knowledge of the biogene regulatory structure to retrieve the plaintext or insert replacement ciphertext.

Eve may be able to mount a mathematical attack on the keys, but knowledge of the regulatory structure of the message is required to completely retrieve the DNA text and knowledge of the pre-shared secret hash codes is required to retrieve the plain text from the DNA text (see equation 4.31)

222

Figure 5-8. Alice and Bob communicating with established -globin keys

223

5.2 Level 2 - Coding the General Transcriptional Regulatory Complex

Figure 5-9 contains the gene regulatory network for the general transcriptional complex to be used as the pre-transcriptional complex. It mimics the construction of the in vivo Pre-Initiation Complex without the RNA Polymerase II. RNA Polymerase II will be added at the Basal Transcriptional Complex level.

Figure 5-10 displays the specific interactions to be used for the coding of this example message.

224

225

226

For the messages in this example, figure 5-10 will apply. Using the previously given nomenclature from 4.4.2.5,  ={1,2,3,8}, a 4-tuple alphabet for the gene regulatory sequences G = {BRE, TATA, INR, MTE, DPE} with type g consisting of equation set

5.10:

Pg1  (1/6)

Pg 2  (2/6)

Pg 3  (1/6) (5.10)

Pg 4  (2/6)

The type of class g, is populated as shown in equations 5.11 and 5.12:

T(g) = {122388,212388,…} (5.11)

 6! T(g )     360 (5.12)  2!

a 4-tuple alphabet of transcription factor codes for the transcription factor proteins TF = {TFIIA, TFIIB, TFIID, TFIIE, TFIIF, TFIIH} with type tf

consisting of equation set 5.13:

Ptf 1  (1/ 6)

Ptf 2  (1/ 6)

Ptf 3  (2 / 6) (5.13)

Ptf 4  (2 / 6)

The type of class tf, is populated as shown in equations 5.14 and 5.15:

T(tf) = {034477,304477 ,…} (5.14)

 6! T(tf )     360 (5.15)  2! 227

The joint probability of each interaction is based upon the intersection of unique codes. Let the codes for the gene regulatory members and proteins be defined as shown in table 5-17.

Table 5-17. Gene Regulatory Codes from  and Transcription Factor Codes from 

Regulatory Codes Transcription Factor Codes Sequence BRE 812328 TFIIA 703447 TATA 881232 TFIIB 477034 INR 328812 TFIID 770344 MTE 182823 TFIIE 730744 DPE 318282 TFIIF 473074 TFIIH 344770

Table 5-18 lists the joint events for these messages.

Table 5-18. Joint DNA-Protein and Protein-Protein Events required for Pre- Transcriptional Complex of the -globin messages

Index DNA-Protein Index Protein-Protein 1 BRE  TFIIB 8 TFIID TFIIA 2 TATA  TFIIB 9 TFIID  TFIIB 3 TATA  TFIID 10 TFIIE  TFIIH 4 INR  TFIIE 11 TFIIB  TFIIF 5 INR  TFIID 12 TFIIF  TFIIH 6 MTE  TFIIH 7 DPE  TFIID

There are 12 joint probability matrices which define the binding criteria as the mutual information between  and . The new type,  is defined such that it conforms to the joint distribution of  and  using the specific codes from table 5-17. These are found in tables 5-19 through 5-30. 228

Table 5-19. BRE  TFIIB

Table 5-20. TATA  TFIIB

Table 5-21. TATA  TFIID

229

Table 5-22. INR  TFIIE.

Table 5-23. INR  TFIID.

Table 5.24. MTE  TFIIH.

230

Table 5-25. DPE  TFIID.

Table 5-26. TFIID  TFIIA.

Table 5-27. TFIID  TFIIB

231

Table 5-28. TFIIE  TFIIH.

Table 5-29. TFIIB  TFIIF.

Table 5-30. TFIIF  TFIIH.

232

Table 5-31 depicts the selection of the 79 prefix-free codes available to sender and receiver.

Table 5-31. Prefix free S in type 

1 84 665 897 66313 893256 2 85 667 6225 66451 893856 3 86 812 6227 66789 898666 4 611 819 6229 66856 898667 5 615 822 6234 66888 7 619 826 6247 66899 9 623 837 6289 871191 61 631 851 6299 871352 63 632 859 6412 871467 65 633 863 6429 872598 67 634 864 6451 872792 69 635 868 6467 883256 81 636 874 6473 883856 82 637 881 66112 888666 83 661 889 66114 888667

Table 5-32 depicts the number of tuples in  needed to represent the joint probability of  and  for the bonding of BRE and TFIIB (BRE  TFIIB)

Table 5-32. Representation of BRE  TFIIB

BRE  TFIIB  prefix code database g tf →s pg||tf code pointer (1,2,…i)

8 4 1 0.028 822 1 3 8 7 2 0.056 81 1 2 8 7 3 0.000 0 0 0 233

Table 5-32 (continued)

BRE  TFIIB  prefix code database g tf →s pg||tf code pointer (1,2,…i) 8 0 4 0.000 0 0 0 8 3 5 0.083 1 1 1 8 4 6 0.000 0 0 0 1 4 7 0.000 0 0 0 1 7 8 0.000 0 0 0 1 7 9 0.139 2 2 1 1 0 10 0.000 0 0 0 1 3 11 0.000 0 0 0 1 4 12 0.000 0 0 0 2 4 13 0.083 3 3 1 2 7 14 0.000 0 0 0 2 7 15 0.028 826 2 3 2 0 16 0.056 82 2 2 2 3 17 0.000 0 0 0 2 4 18 0.056 83 3 2 3 4 19 0.000 0 0 0 3 7 20 0.056 84 4 2 3 7 21 0.000 0 0 0 3 0 22 0.000 0 0 0

3 3 23 0.000 0 0 0 3 4 24 0.000 0 0 0 2 4 25 0.000 0 0 0 2 7 26 0.000 0 0 0 2 7 27 0.000 0 0 0 2 0 28 0.028 837 3 3 2 3 29 0.000 0 0 0 2 4 30 0.000 0 0 0 8 4 31 0.000 0 0 0 8 7 32 0.000 0 0 0 8 7 33 0.056 85 5 2 8 0 34 0.000 0 0 0 234

Table 5-32 (continued)

BRE  TFIIB  prefix code database g tf →s pg||tf code pointer (1,2,…i) 8 3 35 0.000 0 0 0 8 4 36 0.111 4 4 1

In this table,  represents the joint coding of the BRE regulatory sequence and TFIIB transcription factor codes. Each of the 36 tuples {(8,7),…, (8,0),…(1,4)} is mapped into a sequence number s, which is transmitted in the place of the tuple identification. Pg||tf is the joint probability of the tuple as shown in table 5-32. The code pointer is an integer index into the prefix codes set. It points to the specific code within the relevant set of prefix codes. There is a set of prefix codes of length1, length2, …, lengthi, where lengthi is the set of the longest codes, which is indexed by the last digit. The compressed representation is created using the nomenclature in equations 4.52 through 4.54. For this case, it corresponds to:

113212300400511600700800921100011001200133114001523162217001832190020

422100220023002400250026002700283329003000310032003352340035003641

This 135 character string contains all the relevant information about BRE to TFIIB interaction. For example the first three digits in the code above, ‘113’ identify the first tuple = (8,4), the first code in the prefix database of length three codes (= 822) and a code of length 3, → codeword ‘113’. The codes for the set of transcriptional regulation at level two are shown in table 5-33.

235

Table 5-33. Compressed codes for all DNA-protein and protein-protein intersections at Level 2 (Transcriptional Regulation)

BRE  TFIIB 1132123004005116007008009211000110012001331140015231622170 0183219002042210022002300240025002600270028332900300031003 2003352340035003641 TATA  TFIIB 1122223004115216317138329001000114112001300140015511600170 0180019612000212322422300245225002662270028002900300031003 2003300343335003600 TATA  TFIID 100200300413500600712822900101111211231132314321500160017 411800190020002151220023002400256126002733284229003052310 032623300340035003600 INR  TFIIE 100212300400500600700800900101311001200132214321500161117 211831192320422100220023412400250026002751280029003000316 132003333345235003662 INR  TFIID 100213312400500611700800900102111001200132214311500162317 321800190020002142220023002400250026002700280029333000314 132523362340035513661 MTE  TFIIH 111200313412500600700800900100011211200130014221531160017 231832190020002100224223002400250026002700280029003033314 132513352346235003661 DPE  TFIID 100211300413512600700800900100011001221132214001532163117 001823190020002100220023422400253326002700280029003000314 132513361345235623600 TFIID TFIIA 111200300412513621700800900100011001200132314001522160017 311800190020002100220023002400253326002732284229433000310 032413352346235003651 TFIID  TFIIB 100212313411521600700800900100011001200132214001531160017 231800190020002100220023002400253226422733280029433000315 232623300344135003651 TFIIE  TFIIH 111212322400521600700800932101311311241130014001500160017 001800190020422100225123002423250026002700280029003000310 032523362343335003643 TFIIB  TFIIF 100212322413500623711832942100011211200130014001552163317 311841190020002100220023002400250026622700285129003043310 032003300340035003600 TFIIF  TFIIH 100200300400500600700812922101311001223131114321542160017 211800190020002152223323312441250026002700280029003000310 032623300345135003643

All subsequent protein-protein, nucleic acid-nucleic acid and protein-nucleic acid

236

interactions are coded using the processes described in this section. Binary coding of the numerical values is performed using the codes in table 4-13. Each of the 12 strings encoding the protein-DNA and protein-protein interactions are encrypted and decrypted in the same manner previously described in paragraph 5.2. The output string of level 2 (or a substring specified by a pre-shared secret) becomes an encryption key for level 3A. As

Equation 5.16 produces the Level2out_ciphergene for the joint probabilities.

Level2out_ciphergene = (BRE  TFIIB) || (TATA  TFIIB) || (TATA  TFIID) || (INR  TFIIE) || (INR  TFIID) || (MTE  TFIIH) || (DPE  TFIID) || (TFIID TFIIA) || TFIID  TFIIB) || (TFIIE  TFIIH) || (TFIIB  TFIIF) || (TFIIF  TFIIH) (5.16)

Equation 5-17 produces the level 2 output designated as Level2code

Level2code = (Level2out_ciphergene|| Level1code) KT2 (5.17)

The additional length of level 2 output code is shown in equation 5-18.

Length(Level2code) = (135 characters/code*12 codes*4.86bits/character) = 7,874 bits (5.18)

The CID can be hashed over a pre-shared secret or encrypted over a receiver’s public key or some other scheme as required. The Basal Transcriptional Complex consists of the Pre-Transcriptional Complex plus coding for RNA Polymerase II. That coding will be described in the subsequent paragraphs on transcription and translation.

5.3 Level 3A, 3B and 3C – Transcription and Translation

Each message is encoded from a unique regulatory network based upon the generic regulatory networks shown for transcription and translation in chapter 4. Proteins such as

RNA Polymerase II and TFIID consist of multiple subunits, which create opportunities for unique coding of the protein-protein and protein-nucleic acid interactions by breaking down the generic blocks shown in the figures to the level of protein sub-units or even 237

individual amino acid sequences when required to achieve a specific coding result. Also, additional proteins such as the Upstream Stimulatory Activity (USA), Upstream

Transcription Factors, and Inducible Transcription Factors can be added to the regulatory networks to increase the security of a given message transaction at the tradeoff for higher overhead. Unique combinations of factors would be associated with a unique gene expression matrix, G.

5.3.1 Level 3A, Level 3B and Level 3C

5.3.1.1 Basal Transcriptional Complex of Level 3A

The additional joint probabilities added by inclusion of RNA Polymerase II are shown in figure 5-11 and table 5-34 adds two additional combinations: RNA Polymerase

II  TFIIF and RNA Polymerase II  INR.

Table 5-34. Level 3A codes

1 BRE  TFIIB 2 TATA  TFIIB 3 TATA  TFIID 4 INR  TFIIE 5 INR  TFIID 6 MTE  TFIIH 7 DPE  TFIID 8 TFIID TFIIA 9 TFIID  TFIIB 10 TFIIE  TFIIH 11 TFIIB  TFIIF 12 TFIIF  TFIIH 13 RNA Polymerase II  TFIIF 14 RNA Polymerase II  INR

238

The Level 3A output is as shown in equations 5.19 and 5.20.

Level3Aout_ciphergene = (RNA Polymerase II  TFIIF) || (RNA Polymerase II  INR) (BRE  TFIIB) || (TATA  TFIIB) (TATA  TFIID) || (INR  TFIIE) || (INR  TFIID) || (MTE  TFIIH) || (DPE  TFIID) || (TFIID TFIIA) || TFIID  TFIIB) || (TFIIE  TFIIH) || (TFIIB  TFIIF) || (TFIIF  TFIIH) (5.19)

Level3Acode = (Level3Aout_ciphergene|| Level2code)  KT3 (5.20)

The additional length of the level 3A output code is given by equation 5.21

Length (Level3Acode) = (135 characters/code * 14 codes *4.86 bits/character) = 9,186 bits (5-21)

5.3.1.2 Transcription and Cipher mRNA of Level 3B

The mRNA is compressed to form of types in the same manner as the DNA template strand. First, convert the protein coding regions of the DNA to mRNA The resulting mRNA is the complement of the template strand. Table 5-31 details the mRNA sequence conversion for the codes in table 5-9. Using a similar ring subtraction and substitution process as in coding the message into the DNA, the DNA message will be coded into the mRNA. For explanatory purposes, it is given gene sequence codes for {A,G,C,T} → {1, 2,

3, 4} such that Mj  {1, 2, 3, 4}, mRNA codes for {A,G,C,U} → {5, 7, 9, 11},such that rj

 {5, 7, 9, 11} and addend codes Bj  {1, 2, 3, 4, 5, 6, 7}. Defining the operations in equations 5-21 and 5-22:

Bj=Mj-rj (5-21)

Mj=Bj+rj (5-22)

th th Where Bj is the j addend code to be transmitted substituted for j mRNA nucleotide

th th base code in , rj is the j mRNA nucleotide base code in , and Mj is the j message 239

nucleotide base code. Then the ring subtraction table in 5-35 encodes the message at the source and the ring addition table in 5-36 recovers the message at the receiver.

Table 5-35. Coding the mRNA with the Message and RNA Addend Codes.

Message {Mj} 1 2 3 4 5 4 3 2 1 mRNA {rj} 7 6 5 4 3 9 1 7 6 5 11 3 2 1 7

Table 5-36. Decoding the Message with mRNA and RNA Addend Codes.

mRNA {rj} 5 7 9 11 1 4 - 1 3 2 3 - - 2 3 2 4 - 1 Addend 4 1 3 - - {Bj} 5 - 2 4 - 6 - 1 3 - 7 - - 2 4

Blanks in the table indicate undefined addition products. In this case, the space of the

Addend Codes are larger than the space of mRNA codes. Table 5-37 gives an example on a segment of the DNA coded message.

240

Table 5-37. Coding and Decoding mRNA with Addend Codes

 -globin RNA A U G G U G C A U  Message C C T A C T A G T rj  5 11 7 7 11 7 9 5 11 Mj  3 3 4 1 3 4 1 2 4 Bj=Mj-rj AddendCode 2 1 3 6 1 3 1 3 7

Mj=Bj+rj Message 3 3 4 1 3 4 1 2 4

The message is in the protein coding bases, so those are repeated as many times as necessary to code the entire message and this is summarized in table 5-38.

Table 5-38. mRNA base position summary

DNA Position mRNA Type mRNA Position 1-226 UTR5p 1-226 227-318,449-671,872-1000 ORF 227-671 227-318,449-671,872-1000 ORF 672-1115 227-318,449-671,872-1000 ORF 1116-1559 227-318,449-671,872-1000 ORF 1560-2003 227-318,449-671,872-1000 ORF 2004-2447 227-318,449-671,872-1000 ORF 2448-2891 227-318,449-671,872-1000 ORF 2892-3335 227-318,449-671,872-1000 ORF 3336-3779 227-318,449-664 ORF 3780-4088 1001-1107 UTR3p 4089-4195 1108-1121 RPolyA 4187-4209 1122-1184 UTR3p 4202-4272

The ‘R’ in RPolyA distinguishes it from the gene polyadenylation signal. For this example, the mRNA transcriptional network will bypass the coding of the entire process of transcription in figures 4-42 through 4-45. Building messages on coding those steps is optional and should be used for very specific authentication and confidentiality sessions.

241

Figure 5-12 shows the level 3B coding and the transcriptional network. Table 5-39 details the coding of the tuples required to satisfy figure 5-12. Equations 5-22 and 5-23 detail the level 3B output.

Table 5-39. Joint DNA-Protein and Protein-Protein Events required for cipher- mRNA

RNA-Protein Protein-Protein Other 1 CPB20UTR5p 9 CPB20CPB80 17 m7G-cap  UTR5p 2 CPB80UTR5p 10 CstFCPSF 3 CPB20ORF 11 CPSFCFlm 4 CPB80ORF 12 CFlmPAP 5 CstFUTR3p 13 6 CPSFUTR3p 14 7 CFlmRPolyA 15 8 PAPUTR3p 16

The 5’ UTR is capped by a N7-methylguanosine called the m7G-cap. Regardless of any alternate splicing present in the coding process, the m7G-cap signifies the 5’ end of the mRNA. For coding purposes, the 3’ end will terminate just after the polyadenylation sequence in a downstream element that will be coded as a 3’ UTR sequence. The encrypted mRNA sequence, and the encrypted joint probability codes for the 13 combinations in table 5-38 are passed to level 3C for translation.

Level3Bout_message= (3861 RNA bases* 1.5 avg addend characters/base) * 4.86 bits/character = 28,147 bits (5-22)

Level3Bout_ciphergene=(CPB20UTR5p)||(CPB80UTR5p)||(CPB20ORF)|| (CPB80ORF)||(CstFUTR3p)||(CPSFUTR3p)||(CFlmRPolyA) ||(PAPUTR3p)||(CPB20CPB80) ||(CstFCPSF)||(CPSFCFlm)||( CFlmPAP)|| (m7G-cap  UTR5p) (5.23)

The level 3B out is given by equation 5.24

242

Level3Bcode=(Level3Acode||Level3Bout_message||Level3Bout_ciphergene) KT3 (5.24)

The additional length of the level 3B output is given by equation 5.25.

Length (Level3Bcode) = (17 codes*135 characters/code*4.86 bits/character = 11,154 bits) + 28,147 = 39,301bits (5.25)

5.3.1.3 Translation and Level 3C processing.

The protein code is compressed to a form of types in the same manner as the mRNA and DNA template strand. First, convert the mRNA bases between the start and stop codons into amino acid codes. These are the codes in the open reading frame (ORF). The amino acid codes consist of 20 amino acids from 64 (43) combinations of nucleic acids.

Table 4-19 lists the conversion codes for the codons. Only 40 of the 64 combinations appear in the source codebook. Any three letter code at the receiver not represented in the table represents an error at the receiver subject to source code correction. The translational network is shown in figure 5-13. For RNA coding, all thymine (T) are converted to uracil (U) . Table 5-40 shows the substitution code at the codon level of the first 3 codons in the ciphergene. The single letter code convention is being used.10 Valid triplet addend codes map to the amino acid substitution code.

Table 5-40. Codon substitution table

Message CCU ACU AGU Addend Code 213 613 137 Substitution Code 765 761 88

Amino Acid P T S

10 http://www.bioinformatics.org/sms/iupac.html 243

Table 5-41 shows the progression from the mRNA codes to the protein code. The

ORF codes are replaced by a series of amino acid codes. Table 5-42 details the joint probabilities of the translational network. In this example, a protein subunit code for the

Initiator portion of the tRNA sequence is used as a separate element. Codes for ribosomal

RNA subunits, 40S, A, P, and E are also used. The ribosome is a ribonucleoprotein consisting of rRNA (ribosomal RNA) and protein subunits.

244

Table 5-41. mRNA to Protein

mRNA Type mRNA Position Protein Type Protein Position UTR5p 1-227 - ORF 228-671 AA 1-1287 ORF 672-1115 ORF 1116-1559 ORF 1560-2003 ORF 2004-2447 ORF 2448-2891 ORF 2892-3335 ORF 3336-3779 ORF 3780-4088 UTR3p 4089-4195 RPolyA 4187-4209 UTR3p 4202-4272

The Level3CAA_code derived from the substitution code of table 4-19. The length of the output of level 3C is given by equation 5.26

Length(Level3CAA_code) = 1287 amino acids * 2.625 characters/amino acids *4.86 bits/integer = 16,419 bits (5.26)

Table 5-42. Translation Joint Probabilities

RNA-Protein, Protein-Protein, RNA-RNA 1 elF1a40S 2 40SelF3 3 elF3elF1 4 elF1elF4G 5 elf4GPABP 6 elf4Gelf4E 7 elf4Aelf4E 8 eEF1AtRNA 9 eEF2tRNA 10 eRF-1tRNA 11 ProteintRNA 12 N-terminaltRNA 13 C-terminaltRNA 245

Table 5-42 (continued)

14 EProtein 15 PProtein 16 AProtein 17 EP 18 PA 19 PABPRPolyA 20 5m7G-capelF4E 21 ORFE 22 ORFP 23 ORFA 24 Initiator-tRNAtRNA 25 eEF2Initiator-tRNA 26 eEF1tRNA 27 ORFInitiator-tRNA 28 AtRNA

The level 3Cout_ciphergene is assembled as shown in equations 5.27. The length is shown in equation 5-28.

Level3Cout_ciphergene= (elF1a40S) || (40SelF3) || …. || (ORF Initiator- tRNA)||(AtRNA) (5.27)

Length(Level3Cout_ciphergene) = (28 codes * 135 characters/code *4.86 bits/character) = 18,371bits (5.28)

The entire level 3C output code is assembled as shown in equation 5.29

Level3Ccode=(Level3Bcode ||Level 3CAA_code|| Level3Cout_ciphergene||CID)  PT (5.29)

The length of the level 3 output code is shown in equation 5.30

Length(Level3Ccode)=16,419 bits+18,371 +1024 = 35,814 bits (5.30)

PT is defined in paragraph 4.3.1.2. The sender and receiver have a pre-shared secret

DNA key in the Network BioID. The receiver uses that key to recover Level3out and proceeds to decrypt the remaining layers of the message.

246

247

Figure 5-12. cipher-mRNA Complex Network and Coding

248

249

A summary the overhead for the protocol is shown table 5-43 for the sample message.

Table 5-43. Overhead for sample message of 140 characters

Variables Bits

Level 1 - # of addend chars 3,861 18,765 Level 1 - # of chars in ciphergene matrix, C of size 19 x19 16,232

CID length 1,024

total Level 1 36,022

Level 2

Level 2 - # of codes 12 7,874 -

total length 2 7,874

Level 3A

Level 3A - # of codes 14 9,186 -

total length 3A 9,186

Level 3B

Level 3B- # of RNA bases 3,861 28,147

Level 3B- # of codes 17 11,154

250

Table 5-43 (continued)

Variables Bits

total length 3B 39,301

Level 3C

Level 3C - # of amino acids 1,287 16,419

Level 3C- # of codes 28 18,371

CID length 1,024

total length 3C 35,814

total length (bits) 128,197

This particular message demonstrated some unusual statistics in terms of the ratio of

DNA text message length to plaintext. This is shown in table 5-44.

Table 5-44. Message overhead at level 1 of the protocol for various length messages

Plaintext msg length DNA text message Overhead (characters) length (characters) ratio 55 2439 44.35 140 3861 27.58 170 7263 42.72 998 40803 40.88 1999 81486 40.76

3362 135852 40.41

251

This is graphically shown in figures 5-14 and 5-15.

90000 81486 80000 70000 60000 50000 40803 40000 DNA text message length 30000 (characters) 20000 7263 10000 2439 3861 0 55 140 170 998 1999 Plaintext message length (characters)

Figure 5-14. Volume of DNA text to Plaintext at various plaintext message lengths.

50.00 44.35 42.72 45.00 40.88 40.76 40.00 35.00 30.00 27.58 25.00 20.00 Overhead ratio 15.00 10.00 5.00 0.00 55 140 170 998 1999 Plaintext message length (characters)

Figure 5-15. Ratio of DNA text to Plaintext at various plaintext message lengths

Table 5-45 adjusts the characteristics of the sample message to account for the 252

abnormally low overhead ratio.

Table 5-45. Adjusted sample message overhead characteristics

Variables Bits

Level 1 - # of addend chars 3,861 18,765 Level 1 - # of chars in ciphergene matrix, C of size 19 x 19 24,023

CID length 1,024

total Level 1 43,812

Level 2

Level 2 - # of codes 12 7,874

total length 2 7,874

Level 3A

Level 3A - # of codes 14 9,186

total length 3A 9,186

Level 3B

Level 3B- # of RNA bases 3,861 28,147

Level 3B- # of codes 17 11,154

253

Table 5-45 (continued)

Variables Bits

total length 3B 39,301

Level 3C

Level 3C - # of amino acids 1,287 16,419

Level 3C- # of codes 28 18,371

total length 3C 34,790

CID 1,024

total length (bits) 134,963

Figure 5-16 provides a summary of the protocol overhead for plaintext lengths from

500 to 5000 characters using -globin in the same manner as the sample message and the same number and type of joint probability codes used in this example. In practice, not every message would be encrypted at all levels for every message exchange. The use of the protocol would be tailored to the bandwidth of the narrowest links. A secure, lightweight authentication could occur as shown in figure 5-17.

Following the example of paragraph 5.2.6, Alice and Bob establish a secure session with their legacy protocols. Then, Alice sends Bob a ciphergene ID (CID), for a given gene, X, encrypted with Bob’s public key

(a) Bob decrypts the CID with his private key and returns a sequence, Sn, which is

a sequence of n bases from X. The location of the sequence is a pre-shared 254

secret between Bob and Alice.

(b) Having established two forms of identity verification between Alice and Bob,

Alice transmits the encrypted CID for -globin with Bob’s public key. Alice

transmits the Level3out_ciphergene (or a subset of it), encrypted with Bob’s public

key. The message contains the codes for the joint probability sequences

required for translation of the-globin mRNA specified by the Ciphergene ID

(CID). The maximum length of this plaintext is 18,371 bits.

(c) Bob decrypts the CID with his private key and uses CID to retrieve the -globin

amino acid sequence. Bob encrypts the amino acid sequence (or a subset of it)

with Alice’s public key and transmits it to Alice. The maximum length of this

plaintext is 16, 419 bits.

(d) Alice and Bob are now comfortable enough to increase the level of sensitivity

of the information they exchange.

(e) Alice and Bob can continue with periodic authentication exchanges in this

manner as often as required.

255

3,500,000 3,000,000 2,500,000 2,000,000 1,500,000 1,000,000 Level 3C Overhead (bits) 500,000 Level 3B - 500 1000 2000 3000 5000 Level 3A chars chars chars chars chars Level 2 Level 3C 105,317 191,239 363,083 534,926 878,613 Level 3B 159,473 306,767 601,356 895,945 1,485,123 Level 1 Level 3A 10,210 10,210 10,210 10,210 10,210 Level 2 8,898 8,898 8,898 8,898 8,898 Level 1 123,244 221,440 417,833 614,225 1,007,010 Plaintext character length

Figure 5-16. Protocol overhead summary

256

Figure 5-17. Lightweight authentication challenge and response scenario

257

5.4 Comparison of the genomic and proteomic encryption algorithms with the

Advanced Encryption System (AES) algorithms.

5.4.1 Short summary of AES

Figure 5-18 from [40] summarizes the steps in the AES encryption and decryption processes. The parameters of AES for AES-128, AES-192, and AES-256 are shown in table 5-46 [40]. AES utilizes key lengths of 128, 192, and 256 bits with the newest implementations utilizing 512 bit key lengths.

Table 5-46. AES Parameters

Key Size (words/bytes/bits) 4/16/128 6/24/192 8/32/256 Plain text block size (words/bytes/bits) 4/16/128 4/16/128 4/16/128 Number of rounds 10 12 14 Round key size (words/bytes/bits) 4/16/128 4/16/128 4/16/128 Expanded key size (words/bytes) 44/176 52/208 60/240

258

Figure 5-18. AES Flowchart for Encryption and Decryption

259

The data structures of AES are shown in figure 5-19. The input is arranged by column in a n byte x n byte array designated as In. The input matrix passes through state matrices,

State, which are modified at each step of the encryption/decryption process. Each four bytes of the expanded key forms a column in matrix W. AES uses four types of stages:

1. Substitution: S-box bytewise substitution on State

2. ShiftRows: A permutation on State in which bytes in each row in the input

data are rotated left. The number of rotation is different per row

3. MixColumns: This is a transformation on State, column-by-column, which

treats each column as a four-term polynomial [57]. The columns are

considered as polynomials over the galois field GF(28) and multiplied modulo

x4 + 1 with a fixed polynomial shown in equation 5-31. This corresponds to

the matrix multiplication in equation 5-32. The inverse transformation is given

by equation 5-33. The identity relationship between a(x) and b(x) is shown in

equation 5-34. a(x) = (03) x3 + (01) x2 + (01) x2 + (02) (5-31)

[ ][ ] , for c < number of columns, Nb (5-32)

[ ] b(x) = (0B) x3 + (0D) x2 + (09) x +(0E) (5-33)

( ) ( ) ( ) (5-34)

260

Figure 5-19. AES-128 data and key structures.

261

4. AddRoundKey. An bitwise XOR operation is performed between the 128 bits

of State and 128 bits of Round key. The inverse operation is equivalent

because of the XOR function.

In addition to the four types of stages used, AES has a key expansion algorithm. This expands the key from the original four, six or eight words. Given Nb = number of columns in State and Nr = number of rounds, the key expansion generates a total of

Nb (Nr + 1) words. The algorithm requires an initial set of Nb words, and each of the

Nr rounds requires Nb words of key data. This results in a key schedule which consists of a linear array of 4-byte words, wi , for i 0  i < Nb(Nr + 1). It proceeds via the following steps:

 A four-byte input word [a0, a1, a2, a3] is subjected to a cyclic permutation

returning [a1, a2, a3, a0].

 That word has a bytewise substitution using the S-box applied to each of the

four bytes.

 A round constant word array, Rcon(i), contains the values given by [xi-1,

{00},{00},{00}], with xi-1 being powers of x (x is denoted as {02} in the field

GF(28), with x  1).

 The word resulting from the rotation-substitution is XOR’d with the Round

Constant, Rcon(i).

 The result is the expansion of the original key to the expanded keys

summarized in table 5-46.

262

5.4.1.1 Decryption.

The AES decryption protocol is not the straight inverse of the AES encryption protocol. An equivalent version can be created by making modifications to the key schedule [40], [57].

5.4.2 Short summary of the genomic and proteomic algorithm overhead

Table 5-47 provides a table that summarizing three different implementations of the protocols that vary by certain parameters that control the length of the encrypted output.

These implementations are referred to as ‘Small’, ‘Medium’ and ‘Heavy’ encryption overhead. The detailed example in this chapter would classify as a ‘Small’ implementation although it is possible to create smaller ones.

Table 5-47. Small, Medium and Heavy Implementations

Variable Small Medium Heavy

Number of characters per joint probability code 135 185 235 Average number of addend characters/base 1.50 3.0 4.5

Average number of characters per amino acid code 2.625 4. 0 6.0 Entropy coded bits/character 4.86 4.86 4.86 Ciphergene matrix size 19 x 19 19 x 19 19 x 19

Table 5-48 provides data that allows an approximate comparison with AES for a 128 bit block of data. In this scheme, 128 bits will code approximately 26 characters due to the source coding compression of 4.86 bits/character.

263

Table 5-48. Overhead for encryption of 128 bit plaintext block

Small Medium Heavy Total (bits) 90,915 118,168 146,484

Figure 5-20 breaks the overhead down by encryption level. Figure 5-21 compares the small, medium and heavy encryption overhead for plaintext character lengths from 500 to

2500 characters.

160,000 140,000 120,000 100,000 80,000 60,000 40,000 Level 3C 20,000 Level 3B - Small Medium Heavy Level 3A Level 3C 23,863 33,008 43,216 Level 2 Level 3B 18,814 30,604 42,394 Level 1 Level 3A 10,210 13,612 17,014 Level 2 7,874 10,790 13,706 Level 1 30,154 30,154 30,154 Small, Medium and Heavy Overhead Encryption

Figure 5-20. Level by level detail of encryption overhead for 128 bit block.

264

4,000,000

3,500,000 3,000,000 2,500,000 2,000,000 1,500,000 Small 1,000,000 Medium Total Overhead (bits) 500,000 Heavy - 500 1000 1500 2000 2500 Small 403,046 734,458 1,065,871 1,397,284 1,708,696 Medium 612,600 1,136,313 1,660,027 2,183,741 2,707,454 Heavy 842,612 1,579,084 2,315,556 3,052,029 3,788,500 Plaintext message length (characters)

Figure 5-21. Small, Medium and Heavy encryption overhead for plaintext character lengths from 500 to 2500 characters.

5.4.2.1 Vulnerabilities

The ciphergene ID (CID) is a single point of potential vulnerability. The mapping

of any CID to its corresponding details in a ciphercolony database by an attacker must

be prevented. Mitigations to this vulnerability include:

 Generation of cryptographically hard CID’s as is done with product keys for

commercial software

 Replacement of CID’s on a regular basis

 Infrequent use of the same CID

 Never use any CID in a plaintext transmission

 Utilization of public and private key pairs to generate codes that map into

CID’s.

Ring operations in levels 1 and 3B are potential vulnerabilities. Mitigation of 265

these vulnerabilities can be achieved by ensuring that the ring operations are done

over large and cryptographically hard to analyze rings. These operations could also be

combined with a set of substitution codes.

The substitution code in level 3C is a potential vulnerability. This vulnerability

can be mitigated by a variety of means including multiple rounds of substitution.

5.4.8 Quality of Protection

Quality of Protection (QoP) is an attempt to define network metrics analogous to

Quality of Service (QoS) network metrics. There is no standard definition of QoP that is widely accepted. However, it is logical to assume that just as QoS metrics have become a standard feature of evaluating network performance, QoP metrics will eventually become a standard feature for evaluating network security performance. Some researchers have developed QoP classes [58], which divide QoP into services. Their service definitions are

(gold): assured restorability, (silver): best efforts, (bronze): non-protected and (economy): pre-emptible service. In [59], researchers use the term Quality of Security Services

(QoSS). The QoSS is related to choices in the fidelity of coverage versus the security it provides. Users might accept more security and higher costs if it results in an increase in their satisfaction with their security posture. The metrics are related to features such as:

1. Strength of cryptographic algorithm, e.g., RSA, DES measured in terms of the

work factor associated with a brute force attack

2. Length of cryptographic key, characterized by bit-length

266

3. Security functions present in destination job-execution environment

characterized by operating system or boundary control security policy

enforcement mechanisms.

4. Confidence of policy enforcement in remote login environment characterized

by third-party evaluation.

5. Robustness of authentication mechanism (weak password, strong password,

biometric, smart cards, etc.)

Some of these QoP metrics have questionable value. The length of a cryptographic key does not necessarily correlate to the level of confidentiality. Boundary control security policies may not recognize security vulnerabilities that occur from inside the boundary, and so forth. In fact, valid boundary intrusion detection policies may prohibit encrypted traffic across its boundary because encryption can inhibit intrusion detection and detection of restricted data movement across a network boundary. The benefit of intrusion detection can outweigh the risks associated with robust confidentiality.

Robustness of authentication can be thwarted by social engineering attacks and poor physical security (passwords written down and left in plain sight).

5.4.8.2 Vulnerabilities of networks, computers, and mobile

applications continue to grow.

Kasperky Laboratory statistics from 2012 show operating systems, applications, and programming language attacks continue to be a major problem. Figure 5-22 details the breakdown of vulnerability exploits reported by system [60]. Over 1.5 billion browser attacks were launched in 2012. Over half the attacks involved Java vulnerabilities.

267

According to Kaspersky Laboratory, the number of attempted web-based infections in

2012 is 1.7 times greater than in 2011; 2011 was 1.6 times greater than in 2010. Users do not have the option to build their own operating systems and application software, therefore the trend towards increasing attacks and vulnerabilities will continue.

Figure 5-22. Applications containing vulnerabilities targeted by web exploits in 2012

5.4.8.3 Two QoP metric concepts: Security LD50 and Security Rate

Decay Constant

Assume that a set of users (the customer) have adopted a security concept of operations, in this case the genomic-proteomic approach, but this concept could apply to any set of protocols. The users are making user-website transactions, user-user data transmission, email, user-to-user file transfers, etc. At the start of operations, the security is at its highest quantitative value of protection. Also assume a committed set of attackers are monitoring the operations at numerous points over a period of time. Assume that the customer user behavior is fixed. Keys, passphrases, etc. remain fixed or bounded over a

268

small set of values such that the attackers can gain valuable data through long term monitoring. Under such a scenario, it is clear that the security of the network would decay over time to a point beyond which the security (the quality of protection) becomes irrelevant.

Further assume that attackers incrementally apply the knowledge gained through monitoring the customer users. Their attacks become increasingly skillful over time. The attacks are analogous to administration of successively increasing concentrations of a lethal drug to a set of subjects. Continuing the analogy, there could exist two useful QoP metrics, each derived differently.

 A rate decay curve for the QoP of the customers

 A median lethal dose (LD50) to the customers – lethality defined as the breach

of a particular node (user, website, etc.) by an attacker. Once a node is

successfully attacked, that node is dead.

There are two methods to build a model of QoP based on these concepts. The method with the highest fidelity would be to build a network, secure the nodes (users, websites, servers, firewalls, etc.) with the genomic-proteomic concepts and conduct a series of increasingly more accurate attacks and measure the rates of lethality as the drug concentration (dosage of attacks applied to network nodes) increases over time. Then the median lethal dose (LD50) for the network can be computed [61]. Given that the genomic- proteomic protocol has not been built into any networks, no facility exists to conduct such tests.

A second method would be to use an alternate model, such as the exponential decay

269

of radioactive nuclei and half-life estimation. Although no data exists on the applicable rate constant, a notional rate constant could be estimated as a starting point for developing this type of model. Use of a security half-life concept has been applied in other contexts [62]. Two assumptions are required for this approach.

 Assumption 1. Under a scenario in which the security of a network is fixed, all

nodes will eventually be successfully attacked.

Define the QoP of a network as shown in equation 5-35.

Q = N (5-35)

Where N = number of nodes = network instantiations of genomic-proteomic protocols

(within users, websites, firewalls, etc.) and Q equals the QoP. The security decay constant is in units of inverse time (t-1)

Let the initial number of nodes equal N0. This leads to the second assumption:

 Assumption 2: The rate of decay of security is proportional to the number of

nodes and the time interval of attacks. Thus –dN is proportional to N dt and

the proportionality constant is the security decay constant. This leads to

equation 5-36.

(5-36)

Divide by N and integrate both sides. For any time interval, t, the number of remaining live nodes decreasing from N0 to Nt is computed from equation 5-37.

∫ ∫ (5-37)

Integrate both sides of 5-37 to yield equation 5-38.

270

( ) (5-38)

Solving for Nt yields equation 5-39.

(5-39)

The half-life for the network would be defined as the point in time where Nt has decreased to one-half the value of N0. The relationship between half-life and the decay constant,  is as shown in equation 5-39.

⁄ ( ) (5-39)

The decay in QoP can expressed in terms of the remaining serviceable (living) nodes.

Normalizing N0 as the proportion of remaining serviceable nodes is summarized in figure

5-23. Figure 5-23 utilizes three security decay values.  = 0.75 (corresponding to a notional QoP for Level 1 implementation).  = 0.5 (corresponding to a notional QoP for

Levels 1 and 2 implementation).  = 0.1 (corresponding to a notional QoP for Levels 1,

2, 3A, 3B and 3C implementation).

Figure 5-23. QoP as expressed as a decay constant. 271

Table 5-49 provides the tabular values for figure 5-23 with the half-life point highlighted in yellow italics. For example, for = 0.75, the half-life lies on the interval between t = 0.5 and t = 1. Thus for a network using only level 1 protocols with no changes in security and for some arbitrary time constant between t = 0.5 and t =1, half of the network nodes will have been successfully attacked. The unit of time could be seconds, years, decades, or longer. The time units must be established through network simulation and testing.

Table 5-49. Tabular values of figure 5-23

t    0 1.0000 1.0000 1.0000 0.5 0.6873 0.7788 0.9512 1 0.4724 0.6065 0.9048 1.5 0.3247 0.4724 0.8607 2 0.2231 0.3679 0.8187 2.5 0.1534 0.2865 0.7788 3 0.1054 0.2231 0.7408 3.5 0.0724 0.1738 0.7047 4 0.0498 0.1353 0.6703 4.5 0.0342 0.1054 0.6376 5 0.0235 0.0821 0.6065 5.5 0.0162 0.0639 0.5769 6 0.0111 0.0498 0.5488 6.5 0.0076 0.0388 0.5220 7 0.0052 0.0302 0.4966 7.5 0.0036 0.0235 0.4724 8 0.0025 0.0183 0.4493 8.5 0.0017 0.0143 0.4274 9 0.0012 0.0111 0.4066 9.5 0.0008 0.0087 0.3867 10 0.0006 0.0067 0.3679

272

In conclusion, two potential methods of deriving QoP metrics have been described.

One method involves an analogy with the computation of median lethal dose (LD50). This method requires a network to obtain nodal lethality data. The second method involves an analogy with exponential decay of radioactive nuclei. This method was illustrated at a notional level with arbitrary values of decay constants for each of the levels of the genomic-proteomic protocol.

273

CHAPTER 6. CONCEPTS OF NETWORK OPERATIONS USING

GENOMIC AND PROTEOMIC SECURITY

The utilization of the protocols described has to be placed into context with existing network architectures and operations. Chapter 6 describes examples of how the protocols can be incorporated into different network security applications

6.1 Network of Networks with genomic and proteomic security

Figure 6-1 depicts a network of networks with secure communications using the genomic and proteomic security protocols. Clusters of computers and networks interconnected by security firewall devices that exchange gene expression data, genomic and proteomic encrypted messages. Patterns of gene expression are formed by live and algorithmic entities organized together in colonies. Patterns of gene expression are continually modified such that the state of gene expression is unpredictable to unauthorized users.

274

Figure 6-1. Network Concept of Operations using regulation of gene expression

In figure 6-1, enterprises maintain Network BioIDs that communicate with each other via communication channels. The ciphercolonies are regularly sampled to measure their patterns of gene expression and receive external stimuli to modify those patterns of expression. Patterns of gene expression occur with the expression of multiple genes which interact with each other. Gene A, produces protein A, which effects the expression of gene B, which produces protein B, etc. This creates a series of gene regulatory networks. In vivo, these regulatory networks appear in close proximity to each other, within the same cell, or colony of cells through a process of cellular signaling. In a computer network, that proximity need not be a physical proximity, but a communications channel. In vivo, the genes all exist in a physical sense within the nucleus or cellular compartment housing the DNA. In a computer network, some genes can exist in a virtual sense and these genes can communicate with the physical ones via a 275

signaling process which alters the patterns of expression in both algorithmic and live participants.

Figure 6-2 displays a series of network firewalls communicating with a third party

Certificate Authority (CA) for credentialing purposes. The CA becomes a Bio-Certificate

Authority which is exchanging Certificates with encrypted gene expression data and genomic and proteomic messages processed through the Network BioIDs. Not every network possesses the same capabilities. In figure 6-3 the network user behind the firewalls are equipped with Network BioIDs possessing ciphercolonies algorithmic inhabitants. These networks would have BioId implementations that could be implemented on a flash drive.

276

Figure 6-2. Network deployment strategy

277

Figure 6-3. Network BioID deployment at the individual user level.

278

6.2 National Smart Grid Application

A real-world application of great national interest is securing the nations power grid.

The infrastructure of the smart grid is being developed at the present time. Figure 6-4 provides a view of the smart grid topology [63]

Figure 6-4. Smart Grid Information Network

A view of a prototype for evaluating the smart grid information network, secured with

Network BioIDs and genomic authentication could be approached as in the architecture in figure 6-5.

279

Figure 6-5. Prototype Network BioID infused security architecture for the smart grid.

280

In this network prototype, Bio-Certificate Authority and an IT Security Authority maintain credentials for the network of networks. The IT Security Authority performs its normal duties as well as maintaining a set of bio security credentials. The network is equipped with a distribution of Network BioIDs integrated into the incoming and outgoing firewalls. Simulators are developed to provide realistic loading and security network traffic

The future smart grid will have a variety of network access capabilities. The legacy information security protocols remain in place. Added to the infrastructure is a network

BioID capability. Certain high value equipment could be equipped with BioIDs possessing live inhabitants. These live inhabitants could have their patterns of gene expression modified by the local environment such that local geographical location and conditions are uniquely identified. The Network BioID communicate with their peer IDs a baseline pattern of gene expression and maintain a database of gene expression responses as a function of the integrated genomic and proteomic protocol. Some nodes only possess the genomic authentication software capability as opposed to the complete

BioID. There is a complete class of message traffic utilized solely for security purposes.

Failure to authenticate with other BioIDs in the network would be treated as an equipment malfunction.

6.3 Hierarchal Protocol Architecture Utilizing a Certificate Authority

This section covers an implementation of the three layer architecture utilizing a certificate authority, designated the Bio-CA [81]. It uses a traditional public key infrastructure approach to maintain compatibility with legacy security protocols. Figure

281

6-6 depicts the Level 1 encryption protocol.

Figure 6-6. Level 1 Sender Encryption.

282

These are symmetric key processes. The purple boxes on each slide refer to blocks in the BioID ciphercolony which may or may not be local to the user computer performing encryption. Repeated references are made to the ciphergene ID. The ciphergene ID is essentially an index that points to the name of gene whose sequence, transcription, and translation features are used in the encryption process. The DNA text message is embedded into the gene sequence by the source coding protocol previously described. A given message can be encoded differently by inserting it into different genes. The yellow boxes refer to processes at the Certificate Authority, which is assumed remote from the user. Also not shown are BioID ciphercolony functions that are performed at the

Certificate Authority.

6.3.1 Level 1 Encryption

The level 1 process is as follows: The Sender encrypts the CID with a Bio-CA public key and transmits the encrypted CID to a remote Bio-CA. The Bio-CA decrypts the CID with its private key and retrieves a Gene Sequence Key Encryption Key (GSK) for the message associated with the CID. The Bio-CA encrypts the GSK with the Sender’s public key and transmits the GSK to the Sender. The Sender decrypts the GSK with its private key and retrieves the locus control region key (Bio-LCR) from the BioID ciphercolony database. The Bio-LCR is decrypted with the GSK. The DNA text is encrypted with the

Bio-LCR, converting the DNA text to a ciphergene. The CID is encrypted with the public key of the sender and concatenated with the ciphergene for Level 2 encryption. This completes Level 1 encryption.

283

6.3.2 Level 2 Encryption

Figure 6-7 describes the level 2 encryption process for coding a ciphergene into a Pre- transcriptional complex (PTC) code. The Sender decrypts the CID with its private key.

The sender encrypts the CID with the Bio-CA public key and transmits it to a remote Bio-

CA. The Bio-CA decrypts the CID with its private key and retrieves a Gene Transcription

Factor Key Encryption Key (GTFK) for the message associated with the CID. The Bio-

CA encrypts the GTFK with the Sender’s public key and transmits the encrypted GTFK to the Sender. The Sender decrypts the GTFK with its private key and retrieves the transcription factor key (Bio-TF) from the BioID ciphercolony database. The Bio-TF is decrypted with the GTFK. The ciphergene is encrypted with the Bio-TF, converting the ciphergene to a PTC. The CID is encrypted with the public key of the sender and concatenated with the PTC for Level 3A encryption . This completes Level 2 encryption.

284

Figure 6-7. Level 2 Sender Encryption

285

6.3.3 Level 3A Encryption

Figure 6-8 describes the level 3A process for coding a PTC into a Basal Transcription

Complex (BTC) code. The Sender decrypts the CID with its private key. The sender encrypts the CID with the Bio-CA public key and transmits the encrypted CID to a remote

Bio-CA. The Bio-CA decrypts the CID with its private key and retrieves an RNA a

Polymerase Key Encryption Key (RPK) for the message associated with the CID. The

Bio-CA encrypts the RPK with the Sender’s public key and transmits the encrypted RPK to the Sender. The Sender decrypts the RPK with its private key and retrieves the RNA

Polymerase key for the appropriate RNA Polymerase (Bio RPS-1, Bio RPS-II, or Bio

RPS-III) from the BioID ciphercolony database. Bio RPS-1, Bio RPS-II, or Bio RPS-III is decrypted with the RPK. The ciphergene is encrypted with Bio RPS-1, Bio RPS-II, or

Bio RPS-III, converting the ciphergene to a Basal Transcriptional Complex code (BTC).

The CID is encrypted with the public key of the sender and concatenated with the BTC for

Level 3B encryption. This completes Level 3A encryption.

286

Figure 6-8. Level 3A Sender Encryption

287

6.3.4 Level 3B Encryption

Figure 6-9 describes the level 3B process for coding a Basal Transcriptional Complex

(BTC) into a cipher-mRNA(c-mRNA) code. The Sender decrypts the CID with its private key. The sender encrypts the CID with the Bio-CA public key and transmits the encrypted

CID to a remote Bio-CA. The Bio-CA decrypts the CID with its private key and retrieves a Gene Transcription Key (GTK) for the message associated with the CID. The Bio-CA encrypts the GTK with the Sender’s public key and transmits the encrypted GTK to the

Sender. The Sender decrypts the GTK with its private key and retrieves the Transcription instruction encryption key (Bio-TR) for the transcribing and editing the BTC from the

BioID ciphercolony database. Bio-TR is decrypted with the GTK. The BTC is encrypted with Bio-TR, converting the BTC to a cipher-mRNA code. The CID is encrypted with the public key of the sender and concatenated with cipher-mRNA for Level 3C encryption.

This completes Level 3B encryption.

288

Figure 6-9. Level 3B Sender Encryption.

6.3.5 Level 3C Encryption

Figure 6-10 describes the level 3C process for coding c-mRNA into a cipherprotein code. The Sender decrypts the CID with its private key. The sender encrypts the CID with the Bio-CA public key and transmits the encrypted CID to a remote Bio-CA. Remote

Bio-CA decrypts the CID with its private key and retrieves a Gene Translation Key

Encryption Key (GLK) for the message associated with the CID. The Bio-CA encrypts the GLK with the Sender’s public key and transmits the encrypted GLK to the Sender.

The Sender decrypts the GLK with its private key and retrieves the Translation Key (Bio-

TL) and Amino Acid key (Bio-AA) for the translating and editing the c-mRNA from the

BioID ciphercolony database. Bio-TL and Bio-AA are decrypted with the GLK. The c- 289

mRNA is encrypted with Bio-AA and Bio-TL, converting it to a cipherprotein code. The

CID is encrypted with the public key of the sender and concatenated with cipherprotein for transmission to the receiver, who will start the decryption process at Level 3C. This completes Level 3C encryption.

Figure 6-10. Level 3C Encryption

290

6.3.5.1 Post-transcriptional and post translational modifications

A number of options are available to the sender to implement the post-transcriptional and post-translational modifications after levels 3B and 3C. These mimic the in vivo processes and implement the coding options previously described.

6.3.5.2 Post-transcriptional modifications

Figure 6-11 depicts a process for the Level 3B application of post-transcriptional edits via encryption of the BTC with Bio-PRN post-transcription instruction encryption key. BTC is encrypted with Bio-TR followed by Bio-PRN to produce c-mRNA code . Figure 6-12 illustrates the reverse process. In the reverse process, c-mRNA is decrypted by Bio-TR followed by Bio-PRN to produce BTC.

291

Figure 6-11. Sender Encryption Post Level 3A Post-transcriptional modifications

292

Figure 6-12. Receiver Encryption Post Level 3C Post-transcriptional modifications

293

6.3.5.3 Post-Translational Modifications

Figure 6-13 depicts a process for the Level 3C application of post-translational edits via encryption of the cipher-mRNA with a Bio-PTL post-translation instruction key. c- mRNA is encrypted with Bio-AA, followed by Bio-TL, followed by Bio-PTL to produce a cipherprotein code. Figure 6-14 depicts the reverse process. In the reverse process, cipherprotein is decrypted by Bio-PTL followed by Bio-TL followed by Bio-AA, resulting in c-mRNA.

Figure 6-13. Sender Post Level 3B Post-translational modifications

294

Figure 6-14. Receiver Post Level 3B Post-translational modifications

295

6.3.6 Receiver Processing

The implementation of the sender-to-receiver security can be performed with or without forcing expression of a protein (either cipher or biological) at the receiver. In vivo, proteins are expressed in response to a cellular demand. In vivo, cells expend energy to perform the processes of transcription and translation. At the network level, the same constraints hold true. It requires energy and computing resources to perform the processes of transcription and translation within the Network BioID. This overhead would not be expended on every security transaction between sender and receiver.

6.3.6.1 Receiver processing without ciphergene protein expression

Figure 6-15 describes the process of decrypting cipherprotein to c-mRNA. The CID is decrypted with the Receiver private key, encrypted with the Bio-CA public key, sent to remote Bio-CA, decrypted with the Bio-CA private key, and the GLK is retrieved. The

GLK is encrypted with the Receiver public key and transmitted to the Receiver. The

Receiver decrypts the GLK with its private key and retrieves Bio-TL and Bio-AA from the BioID ciphercolony database. Bio-TL and Bio-AA are decrypted with the GLK. The cipherprotein is decrypted with the Bio-AA and Bio-TL and converted to the c-mRNA.

The c-mRNA is concatenated with the CID and encrypted with the Receiver public key for Level 3B decryption. This completes Level 3C decryption.

296

Figure 6-15. Receiver Level 3C Decryption.

297

6.3.6.2 Level 3B Decryption

Figure 6-16 describes the process of decrypting cipher-mRNA to BTC. The CID is decrypted with the Receiver private key and encrypted with the Bio-CA public key and sent to remote Bio-CA, decrypted with the Bio-CA private key, and the GTK is retrieved.

The GTK is encrypted with the Receiver public key and transmitted to the Receiver. The

Receiver decrypts the GTK with its private key and retrieves Bio-TR from the BioID ciphercolony database. Bio-TR is decrypted with the GTK. The cipher-mRNA is decrypted with the Bio-TR and converted to the BTC. The BTC is concatenated with the

CID and encrypted with the Receiver public key for Level 3A decryption. This completes

Level 3B decryption.

Figure 6-16. Receiver Level 3B Decryption. 298

6.3.6.3 Level 3A Decryption

Figure 6-17 describes the process of decrypting BTC to PTC. The CID is decrypted with the Receiver private key and encrypted with the Bio-CA public key and sent to remote Bio-CA, decrypted with the Bio-CA private key, and the RPK is retrieved. The

RPK is encrypted with the Receiver public key and transmitted to the Receiver. The

Receiver decrypts the RPK with its private key and retrieves Bio RPS-1, Bio RPS-II, or

Bio RPS-III from the BioID ciphercolony database. Bio RPS-1, Bio RPS-II, or Bio RPS-

III is decrypted with the RPK. The BTC is decrypted with the Bio RPS-1, Bio RPS-II, or

Bio RPS-III and converted to the PTC. The PTC is concatenated with the CID and encrypted with the Receiver public key for Level 2 decryption at. This completes Level

3A decryption.

Figure 6-17. Receiver Level 3A Decryption Process

299

6.3.6.4 Level 2 Decryption.

Figure 6-18 describes the process of decrypting PTC to a ciphergene. The CID is decrypted with the Receiver private key and encrypted with the Bio-CA public key and sent to remote Bio-CA, decrypted with the Bio-CA private key, and the GTFK is retrieved. The GTFK is encrypted with the Receiver public key and transmitted to the

Receiver. The Receiver decrypts the GTFK with its private key and retrieves the Bio-TF from the BioID ciphercolony database. The Bio-TF is decrypted with the GTFK. The

PTC is decrypted with the Bio-TF and converted to the ciphergene. The ciphergene is concatenated with the CID and encrypted with the Receiver public key for Level 1 decryption. This completes Level 2 decryption.

Figure 6-18. Receiver Level 2 Decryption.

300

6.3.6.5 Level 1 Decryption

Figure 6-19 describes the process of decrypting ciphergene to DNA text. The CID is decrypted with the Receiver private key and encrypted with the Bio-CA public key and then sent to remote Bio-CA, decrypted with the Bio-CA private key, and the GSK is retrieved. The GSK is encrypted with the Receiver public key and transmitted to the

Receiver. The Receiver decrypts the GSK with its private key and retrieves the Bio-LCR from the BioID ciphercolony database. The Bio-LCR is decrypted with the GSK. The ciphergene is decrypted with the Bio-LCR and converted to DNA text for Level 1 decryption. This completes Level 1 decryption. The end result is the plaintext.

Figure 6-19. Receiver Level 1 Decryption

301

6.3.6.6 Receiver Processing with gene expression

The implementation of the option shown in the Alice and Bob diagram is shown in figure 6-20. The Sender provides a temporary or provisional CID to a receiver and a cipherprotein sequence. The Receiver retrieves a GLK by the processes previously specified and retrieves a Bio-GE key from the BioID. The Bio-GE key decrypts a gene expression protocol hidden in the cipherprotein sequence code. The BioID executes the gene expression protocol for the cipherprotein and detects the results. In this example, a fluorescence detection operation is performed. The detected result is encrypted with the

Sender public key and transmitted to the Sender for validation. If the pattern of expression returned by the Receiver matches the stored pattern at the Sender, the

Receiver receives a CID for a subsequent message or transaction.

302

Figure 6-20. Receiver processing with fluorescent protein detection

303

6.3.7 A gene expression authentication session

A four step process of user authentication is shown in figures 6-21 through 6-24.

Figure 6-21 is a message flow diagram of step 1 of the four step authentication process in which a user requests access (to Web Server, E-mail Server, VPN firewall, Cloud, etc.) from IT Authority (“ITA”). Figure 6-22 is step 2, which is a challenge from the ITA.

Figure 6-23 is a message flow diagram of step 3, which is the user response. Figure 6-24 is a message flow diagram of step 4, in which the ITA authentication acknowledgement occurs (access granted). In figure 6-21, the first step is depicted. A user generates an

Access Request and supplies a Certificate from a recognized Certificate Authority (CA).

The User Certificate contains the usual legacy CA information, plus a BioUser ID

(BUID), which identifies the user as possessing additional credentials for access to this protected resource. A hash of the BUID is created and, a message encrypted with the public key of ITA authority containing the credentials of the user is sent to the ITA in message S over the Internet. Assuming all other authentication data is acceptable, the

ITA verifies the BUID and issues a valid session key or ignores the network request, thus depriving the user of any valid network responses that could be used in a subsequent attack.

Figure 6-22 illustrates the next step in the authentication process by verifying the credentials in message S. The ITA decrypts S using their private key and executes the

DNA authentication protocol. If the authentication is successful, the ITA retrieves the cipherprotein message and CID specified by the user BUID from the protein message database. The cipherprotein || CID sequence is decrypted with the private key of the ITA.

304

The cipherprotein sequence is decrypted to a plaintext message for use in a plaintext challenge message. Message M, encrypted with the public key of the user, is transmitted over the Internet to the user.

Figure 6-23 illustrates step 3 of the process, beginning with the user responding by first decrypting the message with his or her private key. The plaintext has the keyed hash code appended by the ITA, which is extracted, and the message contents are subjected to the authentication protocol. If the authentication process is successful, the plaintext is encrypted with the keyed hash of the pss and the resulting DNA text is encrypted to a cipherprotein sequence. The challenge response from the user is created, hashing the concatenated cipherprotein and CID with the pss hash key. The hash is computed and encrypted with the public key of the ITA and transmitted as message Q.

Figure 6-24 is the final step in the process. The ITA decrypts Q with its private key.

Both hash codes are authenticated. If the hash codes pass authentication, the ITA generates a session key for the user. The session key is hashed with the pss. The session key and the hash of the session key is encrypted with the public key of the user and transmitted to the user over the Internet. The user now has a valid session key to establish a secure session or tunnel as required.

305

Figure 6-21. Step 1 - User Access Request

Figure 6-22. Step 2 - Proteomic message challenge/response authentication

306

Figure 6-23. Step 3 - User Response.

307

Figure 6-24. Step 4 - User Access Granted

308

CHAPTER 7 - CONCLUSIONS

Certain axioms of network security can be asserted based upon the experiences of the information age.

1. At some point in time, all networks will be successfully attacked from a

variety of threats using known and unknown vulnerabilities.

2. The trend toward greater use of open source software will perpetuate the

ability of attackers to invade networks and computers to retrieve data for the

purposes of committing crimes and obtaining revenue for criminals.

3. Every increase in the capability of users to share data via new capabilities

provided by operating systems, commercial off the shelf application products,

and websites, etc. make individuals and networks more likely to be subjected

to loss of personally identifiable information, commercially valuable data and

potential financial and personal harm.

The genomic and proteomic security protocols offer the possibility of decreased vulnerability for networks and individuals that cooperate in utilizing these protocols. The protocols can have a high overhead in terms of data volume between users, but this cost is offset by the increased security. The protocols offer a different set of challenges than those presented by the legacy cryptographic systems. The protocols offer a potential to increase the fidelity of authentication and confidentiality using both the information and biological domains. Implementation of these protocols does not require replacement of the legacy security formulations. 309

Given the fact the vulnerabilities of the operating systems, network platforms, programming languages and personal behavior are unlikely to disappear, the use of genomics and proteomics offers a practical augmentation to existing security systems.

310

REFERENCES

[1] A., Shabtai, Y., Fledel, U. Kanonov, Y. Elovici, S. Dolev, and C. Glezer, "Google Android: A Comprehensive Security Assessment," Security & Privacy, IEEE , 8,(2), 35- 44, March-April 2010, doi: 10.1109/MSP.2010.2 [2] D. J. Rogers, “ Broadband Quantum Cryptography”, Synthesis Lectures on Quantum Computing, 2010, Morgan and Claypool, San Rafael, CA, pp. 63-71, doi:10.2200/S00265ED1V01Y201004QMC003 [3] H.M. Elkamchouchi, A.-A.M. Emarah, and E.A.A. Hagras, "A New Public Key Signcrypted Challenge Response Identification (PKS-CR-ID) Protocol Using Smart Cards," Computer Engineering and Systems, The 2006 International Conference on, 244-249, November. 2006, doi: 10.1109/ICCES.2006.320455 [4] R. Novak, D. Naccache, P. Paillier, “SPA-Based Adaptive Chosen-Ciphertext Attack on RSA Implementation”, Lecture Notes in Computer Science, 2002, Springer, Berlin / Heidelberg, doi: 10.1007/3-540-45664-3_18 [5] W. Schindler, “A Timing Attack against RSA with the Chinese Remainder Theorem”, Cryptographic Hardware and Embedded Systems — CHES 2000, Lecture Notes in Computer Science, 2000, Springer, Berlin / Heidelberg, 1965, pp. 109-124, doi: 10.1007/3-540-44499-8_8 [6] M.J. Wiener, “Cryptanalysis of short RSA secret exponents”, Information Theory, IEEE Transactions, 36,(3),553-558 [7] J.F. Misarsky, “A multiplicative attack using LLL algorithm on RSA signatures with redundancy”, Advances in Cryptology — CRYPTO '97, Lecture Notes in Computer Science, 1997, 221-234, Springer, Berlin / Heidelberg, doi: 10.1007/BFb0052238 [8] P. C. Kocher, “Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other Systems”, Advances in Cryptology — CRYPTO ’96, Lecture Notes in Computer Science, 1996, 104-113, Springer, Berlin/Heidelberg, doi: 10.1007/3-540- 68697-5_9 [9] S. Wang, F. Bao and R.H. Deng, “Cryptanalysis of a Forward Secure Blind Signature Scheme with Provable Security”, Information and Communications Security, Lecture Notes in Computer Science, 2005, 3783/2005, pp. 53-60, Springer, Berlin/Heidelberg DOI: 10.1007/11602897_5 [10] B.N Levine, and C Shields, “A Survey of Solutions to the Sybil Attack”, University of Massachusetts Amherst, 2006, Technical Report [11] J. R. Douceur, “The Sybil Attack”, Peer-to-Peer Systems, Lecture Notes in Computer Science, 2002, 2429/2002, pp. 251-260, Springer, Berlin/Heidelberg, doi: 10.1007/3-540-45748-8_24 [12] S. Yi, and R. Kravets, “MOCA : MObile Certificate Authority for Wireless Ad 311

Hoc Networks”, Report No. UIUCDCS-R-2004-2502, UILU-ENG-2004-1805, December, 2004. [13] X. Wang and H. Yu, “How to Break MD5 and Other Hash Functions”, Eurocrypt 2005, Lecture Notes in Computer Science, 3494, pp. 19-35 [14] S. N. Krishna, and R. Rama, “Breaking DES using P systems”, Theoretical Computer Science, Elsevier, 299, pp. 495-508, 18 April 2003, doi: 10.1016/S0304- 3975(02)00531-5. [15] A. Moradi, M. T. M. Shalmani and M. Salmasizadeh, “A Generalized Method of Differential Fault Attack Against AES Cryptosystem”, Cryptographic Hardware and Embedded Systems - CHES 2006 Lecture Notes in Computer Science, 2006, 4249/2006, pp. 91-100, Springer, Berlin/Heidelberg, doi: 10.1007/11894063_8 [16] A. Biryukov, D. Khovratovich and I. Nikoli, “Distinguisher and Related-Key Attack on the Full AES-256 (Extended Version)”, Cryptology ePrint Archive: Report 2009/241 [17] US-Cert, Vulnerability Note VU#228186, “Hot Standby Router Protocol (HSRP) uses weak authentication”, Dec. 2001 [18] http://www.booches.nl/2008/07/secure-hsrp-configuration/, Accessed, April 27,2012 [19] H. Xu, A Reddyreddy, and D. Fitch, “Defending Against XML-Based Attacks Using State-Based XML Firewall”, Journal of Computers, 6,(11), 2395-2407 ,Nov. 2011, Academy, Oulu, Finland [20] H. Singh, K. Chugh, H. Dhaka and A. K. Verma, “DNA based Cryptography: an Approach to Secure Mobile Networks”, International Journal of Computer Applications 1(1), 77–80, February 2010. [21] P. Vijayakumar, V. Vijayalakshmi and G. Zayaraz. “DNA Computing based Elliptic Curve Cryptography”, International Journal of Computer Applications, 36,(4), 18-21, December 2011 [22] A. Gehani, T. LaBean, and J. Reif, “DNA-based Cryptography, Aspects of Molecular Computing”, Springer-Verlag Lecture Notes in Computer Science, 2950, pp. 167—188, 2004 Springer, Berlin [23] S. Sadeg, M. Gougache, N. Mansouri, and H. Drias, “An encryption algorithm inspired from DNA”, Machine and Web Intelligence (ICMWI), 2010 International Conference on, 3-5 Oct. 2010, pp. 344 – 349, IEEE, Piscataway, NJ [24] N.G. Bourbakis, “Image Data Compression-Encryption Using G-Scan Patterns”, Systems, Man, and Cybernetics, IEEE International Conference on Computational Cybernetics and Simulation, 2, pp. 1117—1120, October 1997 [25] A. Leier, C. Richter, W. Banzhaf, and H. Rauhe, “Cryptography with DNA binary strands”, BioSystems, 57, (1), pp. 13-22, June 2000 [26] D. Heider and A. Barnekow, “DNA-based watermarks using the DNA-Crypt 312

algorithm”, BMC Bioinformatics, 8, pp. 176, May 2007 [27] D. Heider and A. Barnekow, “DNA watermarks: A proof of concept”, BMC Molecular Biology, 2008, 9, p40 [28] F. R. Blattner , G. Plunkett, C. Bloch, N. Perna, V. Burland, M. Riley, et al., “The Complete Genome Sequence of Escherichia coli K-12”, Science, 277,(5331), 1453-1469, 1997, AAAS, Washington, DC [29] S.F. Gilbert, Developmental Biology. 6th edition, Differential Gene Transcription, Sunderland (MA), Sinauer Associates, 2000 [30] T, Read Strachan, Human Molecular Genetics, 2nd edition, Chapter 1, DNA structure and gene expression, AP. New York: Wiley-Liss; 1999 [31] G. Meister, RNA Biology, An Introduction”, pp. 20-70, 2011, WileyVCH, Weinheim, Germany [32] C. Lim, B. Santoso, T. Boulay, E. Dong, U. Ohler, J. T. Kadonaga, “The MTE, a new core promoter element for transcription by RNA polymerase II”, Genes & Dev. 18, 1606-1617, 2004, CSH Press, Woodbury, NY, doi: 10.1101/gad.1193404 [33] D. Latchman, Eukaryotic Transcription Factors, 5th Ed., pp. 10-105, 2008, Elsevier Press, Oxford, UK [34] H. Shaw and S. Hussein, “A DNA-Inspired Encryption Methodology for Secure, Mobile Ad-Hoc Networks (MANET)”, Proceedings of the First International Conference on Biomedical Electronics and Devices, BIOSIGNALS 2008, Funchal, Madeira, Portugal, (2), pp. 472-477, 2008 [35] Shannon, Claude, “Communication Theory of Secrecy Systems”, Bell System Technical Journal, 28, (4), 656-715, 1949 [36] S. F. Elena, P. Carrasco, J. A. Daròs, and R. Sanjuán, “Mechanisms of genetic robustness in RNA viruses”, EMBO Report (2006), 7, 168-173 [37] M.W. Nachman, S.L. Crowell, “ Estimate of the mutation rate per nucleotide in humans”, Genetics, 156, (1), 297-304, 2000, Genetics Society of America [38] G. Cochrane, I. K.-Mizrachi and Y. Nakamura, “The International Nucleotide Sequence Database Collaboration”, Nucleic Acids Research, 2011, 39, D15–D18, doi:10.1093/nar/gkq1150 [39] B. Alberts, A. Johnson, J. Lewis, L. Raff, K. Roberts, and P. Walter, Molecular Biology of the Cell, 4th edition, DNA Replication Mechanisms, New York, NY: Garland Science, 2002 [40] W. Stallings, Cryptography and Network Security, 4th edition, Upper Saddle River, NJ: Pearson Prentice-Hall, 2006, pp. 134- 355 [41] C. M. Fraser, J. D. Gocayne, O. White, M.D Adams, R. A. Clayton, R. D. Fleischmann, et al., “The Minimal Gene Complement of Mycoplasma genitalium”, Science, 270, (5235), pp. 397-403, 1995, AAAS, Washington, DC, doi: 10.1126/science.270.5235.397 313

[42] C. J. Mitchell, “Truncation attacks on MACs”, Electronics Letters - IET, 39, (20), pp. 1439-1440, 2003, doi: 10.1049/el:20030921 [43] M. Ptashne and A. Gann , Genes and Signals, pp. 100-192, CSH Press, Cold Spring Harbor, NY, 2001 [44] S. Kundu and C. L. Peterson, “Role of chromatin states in transcriptional memory“, Biochim Biophys Acta., 1790, (6), pp. 445–455, June 2009, doi:10.1016/ j.bbagen.2009.02.009 [45] S. Thiagalingam, K.-H. Cheng, H. J. Lee, N. Mineva, A. Thiagalingam, and J. F. Ponte, “Histone Deacetylases: Unique Players in Shaping the Epigenetic Histone Code”, Annals of the New York Academy of Sciences, 983, pp. 84–100, March 2003, New York, NY, doi: 10.1111/j.1749-6632.2003.tb05964.x [46] T. Jenuwein and C. D. Allis, “Translating the Histone Code”, Science, 293, (5532), pp. 1074-1080, August 2001, AAAS, Washington, DC, doi: 10.1126/science.1063127 [47] T. M. Cover and J. A. Thomas, , Elements of Information Theory 2nd Ed., pp,103-347, 2006, Wiley Interscience, Hoboken, NJ [48] A. D. Basehoar, S. J. Zanton and B. F. Pugh, “Identification and Distinct Regulation of Yeast TATA Box-Containing Genes”, Cell, 116, pp. 699–709, March 5, 2004 [49] C. Bernecky, P. Grob, C. C. Ebmeier, E. Nogales, D. J. Taatjes, “Molecular Architecture of the Human Mediator–RNA Polymerase II–TFIIF Assembly Polymerase II–TFIIF Assembly”, PLoS Biology, 9, (3), pp. 1-18, PLoS, San Francisco, CA, March 2011 [50] M. C. Thomas and C. Chiang, “The General Transcription Machinery and General Cofactors”, Critical Reviews in Biochemistry and Molecular Biology, 41,(3), pp. 105–178, Informa Healthcare, 2006 [51] A. L. Roy,” Biochemistry and biology of the inducible multifunctional transcription factor TFII-I: 10 years later”, Gene, 492, (1), pp. 32-41, Elsevier Press, Oxford, UK [52] S. Hahn, “Structure and Mechanism of the RNA Polymerase II Transcription Machinery”, Nature Structural and Molecular Biology, 11, (5), May 2004, pp. 394-403. [53] H. Shaw, “Modeling Patterns of microRNA:mRNA Regulation Through Utilization of Cryptographic Algorithms”, The Fifth International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies, March 2013, Accepted for publication, December 2012. [54] C.T. Walsh, S. Garneau-Tsodikova, and G.J. Gatto, Jr., ”Protein Posttranslational Modifications: The Chemistry of Proteome Diversifications”, Angew. Chem. Int. Ed., 44, pp. 7342 – 7372, 2005, Wiley VCH, Weinheim, Germany [55] T. Juven-Gershon, J-Y. Hsu, J. W. M. Theisen, and J. T. Kadonaga,” The RNA

314

Polymerase II Core Promoter – the Gateway to Transcription”, Curr Opin Cell Biol., 2008 ; 20, (3): pp. 253–259. doi:10.1016/j.ceb.2008.03.003. [56] T. Sengupta, N. Cohet, F. Morlé, and J. J. Biekera, “Distinct modes of gene regulation by a cell-specific transcriptional activator”, Proc Natl Acad Sci 2009, 106,(11), pp. 4213–4218. doi: 10.1073/pnas.08083471 [57] Federal Information Processing Standards Publications (FIPS 197), “Advanced Encryption Standard (AES) “, 26 Nov. 2001. [58] W. D. Grover, M. Clouqueur, “Span-Restorable Mesh Networks with Multiple Quality of Protection (QoP) Service Classes”, Photonic Network Communications, January 2005, 9, (1), pp 19-34 [59] T. Levin, C. Irvine, “Quality of Security Service”, Proceedings of the New Security Paradigms Workshop, Ballycotton, Ireland, pp. 1-10, September 2000. [60] Kaspersky Security Bulletin 2012. The overall statistics for 2012, http://www.securelist.com/en/analysis/204792255/Kaspersky_Security_Bulletin_2012_T he_overall_statistics_for_2012#9, Accessed, February, 3, 2013 [61] H. J. Horn, “Simplified LD50 (or ED50) Calculations”, Biometrics, 12, (3), September 1956, International Biometric Society, pp. 311-322 [62] C. Frühwirth, T. Männistö, “Improving CVSS-based vulnerability prioritization and response with context information”, ESEM '09 Proceedings of the 2009 3rd International Symposium on Empirical Software Engineering and Measurement, IEEE Computer Society, Washington, DC, pp 535-544, doi:10.1109/ESEM.2009.5314230 [63] “NIST Special Publication 1108: NIST framework and roadmap for smart grid interoperability standards, release 1.0”, National Institute of Standards and Technology, www.nist.gov/public_affairs/releases/upload/smartgrid_interoperability_final.pdf; 2010 accessed August 2, 2010. [64] K-J. Armache, H. Kettenberger, and P. Cramer, “Architecture of initiation- competent 12-subunit RNA polymerase II”, Proc Natl Acad Sci, June 2003, 100, (12), 6964-6968 [65] R. Qin, I. G. Macara, B. R. Cullen, “Exportin-5 mediates the nuclear export of pre-microRNAs and short hairpin RNAs”, Genes & Dev, 17, pp. 3011–3016 [66] T. P. Chendrimada, R. I. Gregory, E. Kumaraswamy, J. Norman, N. Cooch, K. Nishikura and R. Shiekhattar, “TRBP recruits the Dicer complex to Ago2 for microRNA processing and gene silencing”, Nature, 436, 740-744, August 2005, doi:10.1038/nature03868 [67] W. Ye, F. Qin, J. Zhang2, R. Luo3, H. Chen, “Atomistic Mechanism of MicroRNA Translation Upregulation via Molecular Dynamics Simulations”, PLoS One, August 2012, 7, (8), 1-11 [68] R. J. Taft, M. Pheasant, and J. S. Mattick,” The relationship between non-protein- coding DNA and eukaryotic complexity”, BioEssays, 29, 288–299, 2007 Wiley

315

Periodicals, Inc. [69] A. Moh'd, Y. Jararweh, L. Tawalbeh, “AES-512: 512-bit Advanced Encryption Standard algorithm design and evaluation,”, Information Assurance and Security (IAS), 2011 7th International Conference on, pp.292-297, Dec. 2011, doi: 10.1109/ISIAS.2011.6122835 [70] J. Lendino, “Hack Yourself”, PC Magazine, 11/20/2007, 26, Issue 23, pp87-88 [71] S. Shivkumar, and G. Umamaheswari, “Performance Comparison of Advanced Encryption Standard (AES) and AES Key Dependent S-Box - Simulation Using MATLAB”, Process Automation, Control and Computing (PACC), 2011 International Conference on ,1-6, July 2011, doi: 10.1109/PACC.2011.5979007 [72] C. S. Ong; K. Nahrstedt, W. Yuan , "Quality of protection for mobile multimedia applications", Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on, 2, pp. II-137 – II-40, July 2003, doi: 10.1109/ICME.2003.1221572 [73] Y., Seung, P. Naldurg, and R. Kravets. “Security-aware ad hoc routing for wireless networks.” Technical Report UIUCDCS, 2001. [74] M. Ergen, “Mobile Broadband: Including WiMAX and LTE, Springer Science+Buisness Media LLC, 2009, doi:10.1007/978-0-387-68192-4-1 [75] A. Travers, DNA-Protein Interactions, 1993, Chapman & Hall, London, UK [76] C. Walsh, Posttranslational modification of proteins: Expanding nature's inventory. Roberts and Co., Englewood, Colo. 2006 [77] V. Guasconi, H. Yahi, S. Ait-Si-Ali, Atlas of Genetics and Cytogenetics in Oncology and Haematology, Accessed February 11, 2013 http://atlasgeneticsoncology.org/Deep/TranscripFactorsID20043.html [78] Interactive Concepts in Biochemistry, Accessed February 11, 2013, http://www.wiley.com/college/boyer/0470003790/structure/tRNA/trna_intro.htm. [79] H. Lodish, A. Berk, S.L. Zipursky, et al., Molecular Cell Biology. 4th edition, New York, W. H. Freeman; 2000, Sec. 17.6, Post-Translational Modifications and Quality Control in the Rough ER [80] http://useast.ensembl.org/index.html, Ensembl Genome Browser, Accessed February 11, 2013 [81] H. Shaw, “Integrated Genomic and Proteomic Security Protocol”, U.S. Patent Application Serial Number 13/21,432, August 11, 2011 [82] H. Shaw, S. Hussein, and H. Helgert, "Prototype Genomics-Based Keyed-Hash Message Authentication Code Protocol," 2nd International Conference on Evolving Internet, pp. 131-136, September 2010, doi: 10.1109/INTERNET.2010.31

316

[83] H. Shaw, S. Hussein, and H. Helgert, “Genomics-based Security Protocols: From Plaintext to Cipherprotein”, International Journal on Advances in Security, 4 (1 & 2), 2011, http://www.iariajournals.org/security/

[84] H. Shaw, S. Hussein and H. Helgert, “Adaptive Self-Correcting Floating Point Source Coding Methodology for a Genomic Encryption Protocol”, International Journal of Computer Applications 56, (3):1-5, October 2012. Foundation of Computer Science, New York, USA

317

APPENDIX A – DIFFUSION AND CONFUSION METRICS FOR ENCRYPTED MESSAGES

This DNA Encryption method uses evolutionary computing principles of fitness algorithms to determine which encrypted mutants should be selected as the final encrypted ciphertext. Two parameters, Confusion and Diffusion are being used as the basis of the fitness criteria. Diffusion and Confusion are fundamental characteristics of ciphers. Shannon describes them as:

. Diffusion: any redundancy or patterns in the plaintext message are dissipated into the long range statistics of the ciphertext message . Confusion: make complex the relationship between the plaintext and ciphertext. A simple substitution cipher would provide very little Confusion to a code breaker.

For the purposes of this initial analysis, consider a segment of one of the encrypted mutant outputs compared to the corresponding plaintext precoded message as shown in figure A-1 below.

If two bases match, increment Confusion counter

(increasing counter  decreasing Confusion)

DNA-coded message GGGGGGTTAAGGCCGG ……… fragment Compare Encrypted output ggGtgcttgaGcggcc

Count the number of transitional bases (a, c, t, g) to determine Diffusion

(increasing number of transitional bases  increasing Diffusion

Figure A-3. Confusion and Diffusion metrics

Table A-1 provides the diffusion metrics for 3 generations of encrypted sense mutants. The second and third generations are created by encrypting the previous generation. Table A-2 provides the confusion metrics for 3 generation of encrypted sense mutants. There are 6 encrypted messages and 20 chromosome keys, yielding a total 120 encrypted mutant sense sequences and 120 encrypted mutant anti-sense sequences.

318

Table A-2. Diffusion metrics for 3 generations of encrypted sense mutants

319

Table A-1 (continued)

320

Table A-1 (continued)

321

Table A-2. Confusion metrics for 3 generations of encrypted sense mutants

322

Table A-2 (continued)

323

Table A-2 (continued)

324

A.1 Generating Fitness algorithm from Diffusion and Confusion scores

One of the powerful features of this encryption approach is the ability to select the best encrypted output from a population of potential encrypted outputs. The fitness of the pre-annealed, encrypted message strings are analyzed. This permits optimization of the fitness algorithm. The approach for fitness scoring is shown in equations B-1 through B- 3.

FitConf = Cscore / MaxCscore (B-1) FitDiff = ( DiffScore_a + DiffScore_t + DiffScore_c +DiffScore_g ) / MaxCscore) (B-2) F = - FitConf + FitDiff (B-3)

Confusion scores (Cscore and MaxCscore) and Diffusion scores (DiffScore_a for transitional base ‘a’, DiffScore_t for transitional base ‘t’, DiffScore_g for transitional base ‘g’, and DiffScore_c for transitional base ‘c’) for each of the 120 encrypted message mutants from the sense strand and an additional 120 encrypted message mutants from the anti-sense strands. The scoring was previously described. For this exercise, only the encrypted message mutants from the sense strand are presented. FitConf and FitDiff are calculated for each mutant. The goal is to balance the confusion scores in which increasing scores denote decreasing confusion and the diffusion scoring in which increasing scores denote increasing diffusion. The scale factor provides a mechanism to spread the scores and increase the ability to differentiate scores. For the first generation of encryption, the goal is a score slightly above zero. The proof of the value of the fitness algorithm will be demonstrated by selecting the best encrypted outputs, annealing them and then re-encrypting them. The fitness scores should improve with the second generation of mutants. Table A-3 shows the raw output of fitness algorithm applied to 120 encrypted mutants for 3 generations of encryption without selection. The column labeled ‘Chrom’ is a reference designation used to keep track of chromosome key being used. The average fitness for 5 out 6 encrypted messages demonstrated a steady increase in fitness, indicating that the fitness algorithm is performing as expected.

325

Table A-3. Fitness scoring from three generations of encrypted output fitness scores.

326

Table A-3 (continued)

327

Table A-3 (continued)

328

APPENDIX B - PLAINTEXT TO CIPHERTEXT WALKTHROUGH

This is a sample of the complete program output of DNA Hash Code system.

Copyright Harry C. Shaw Sender Application v4 for DNA Hash Code System started at: 12/11/2012 3:03:14 PM

------Section 1 Lexicographic assignments------the 20.85 CAACTA

123 27.2829 TTTTTATTC

of 15.6 AGACC

my 13.25 CGGTT

fields 6.9512419 ACCAATAATGTGG

are 1.185 GCCCTA

very 22.51825 CTGTACCGTT

large 12.11875 ATGCCCTTTA

please 16.1251195 GAATTAGCGGTA

require 18.517219185 CCTACCTCTAACCTA

all 1.1212 GCATAT

personnel 16.51819151414512 GATACCGGAGTGTGTAAT

to 20.15 CAAG

take 20.1115 CAGCACTTA

their 20.85918 CAACTAAACC

equipment 5.17219161351420 TACCTCTAAGACGTATGCA

with 23.9208 CACAACAAC

them 20.8513 CAACTACG

329

for 6.1518 ACCAGCC

the 20.85 CAACTA

work 23.151811 CACAGCCACT

to 20.15 CAAG

be 2.5 TGTTA

performed 16.518615181354 GATACCACCAGCCCGTAGT

in 9.14 AATG

365777 29.3231333333 TTCGACGAGGAAGAAGAA

small 19.1311212 GGCGGCATAT

increments 9.143185135142019 AATGTCCCTACGTATGCAGG

it 9.20 AACA

will 23.91212 CACAAATAT

be 2.5 TGTTA

good 7.15154 TTAGAGGT

to 20.15 CAAG

get 7.520 TTTACA

practice 16.181320935 GACCGCTCCAAATCTA

on 15.14 AGTG

these 20.85195 CAACTAGGTA

tasks 20.1191119 CAGCGGACTGG

@ 0.99 CGGCGG

330

------Section 2 Message Precoded 3WB ------Block 0 phrase CAACTA phraselength 6 phrase binary 100100110011100111000011000000000000 phrase TTTTTATTC phraselength 9 phrase binary 110011001100110011000011110011001001 phrase AGACC phraselength 5 phrase binary 001101100011100110010000000000000000 maxlength 9 1001001100111001110000110000000000001100110011001100110000111100110010010011011000111001100100000000000000 00 108

------Section 2 Message Precoded 3WB ------Block 3 phrase CGGTT phraselength 5 phrase binary 1001011001101100110000000000000000000000000000000000 phrase ACCAATAATGTGG phraselength 13 phrase binary 0011100110010011001111000011001111000110110001100110 phrase GCCCTA phraselength 6 phrase binary 0110100110011001110000110000000000000000000000000000 maxlength 13 1001011001101100110000000000000000000000000000000000001110011001001100111100001100111100011011000110011001 10100110011001110000110000000000000000000000000000 156

------Section 2 Message Precoded 3WB ------Block 6 phrase CTGTACCGTT phraselength 10 phrase binary 100111000110110000111001100101101100110000000000 phrase ATGCCCTTTA phraselength 10 phrase binary 001111000110100110011001110011001100001100000000 phrase GAATTAGCGGTA phraselength 12 phrase binary 011000110011110011000011011010010110011011000011 maxlength 12 1001110001101100001110011001011011001100000000000011110001101001100110011100110011000011000000000110001100 11110011000011011010010110011011000011 144

------Section 2 Message Precoded 3WB ------Block 9 phrase CCTACCTCTAACCTA phraselength 15 phrase binary 100110011100001110011001110010011100001100111001100111000011000000000000 phrase GCATAT phraselength 6 phrase binary 011010010011110000111100000000000000000000000000000000000000000000000000 phrase GATACCGGAGTGTGTAAT phraselength 18 phrase binary 011000111100001110011001011001100011011011000110110001101100001100111100 maxlength 18 1001100111000011100110011100100111000011001110011001110000110000000000000110100100111100001111000000000000 0000000000000000000000000000000000000001100011110000111001100101100110001101101100011011000110110000110011 1100 216

------Section 2 Message Precoded 3WB ------Block 12 phrase CAAG phraselength 4 phrase binary 1001001100110110000000000000000000000000 phrase CAGCACTTA phraselength 9 phrase binary 1001001101101001001110011100110000110000 phrase CAACTAAACC phraselength 10 phrase binary 1001001100111001110000110011001110011001 maxlength 10 1001001100110110000000000000000000000000100100110110100100111001110011000011000010010011001110011100001100 11001110011001 120

------Section 2 Message Precoded 3WB ------Block 15 phrase TACCTCTAAGACGTATGCA phraselength 19 phrase binary 1100001110011001110010011100001100110110001110010110110000111100011010010011 phrase CACAACAAC phraselength 9 phrase binary 1001001110010011001110010011001110010000000000000000000000000000000000000000 phrase CAACTACG phraselength 8 phrase binary 1001001100111001110000111001011000000000000000000000000000000000000000000000 maxlength 19 1100001110011001110010011100001100110110001110010110110000111100011010010011100100111001001100111001001100 331

1110010000000000000000000000000000000000000000100100110011100111000011100101100000000000000000000000000000 0000000000000000 228

------Section 2 Message Precoded 3WB ------Block 18 phrase ACCAGCC phraselength 7 phrase binary 0011100110010011011010011001000000000000 phrase CAACTA phraselength 6 phrase binary 1001001100111001110000110000000000000000 phrase CACAGCCACT phraselength 10 phrase binary 1001001110010011011010011001001110011100 maxlength 10 0011100110010011011010011001000000000000100100110011100111000011000000000000000010010011100100110110100110 01001110011100 120

------Section 2 Message Precoded 3WB ------Block 21 phrase CAAG phraselength 4 phrase binary 1001001100110110000000000000000000000000000000000000000000000000000000000000 phrase TGTTA phraselength 5 phrase binary 1100011011001100001100000000000000000000000000000000000000000000000000000000 phrase GATACCACCAGCCCGTAGT phraselength 19 phrase binary 0110001111000011100110010011100110010011011010011001100101101100001101101100 maxlength 19 1001001100110110000000000000000000000000000000000000000000000000000000000000110001101100110000110000000000 0000000000000000000000000000000000000000000000011000111100001110011001001110011001001101101001100110010110 1100001101101100 228

------Section 2 Message Precoded 3WB ------Block 24 phrase AATG phraselength 4 phrase binary 001100111100011000000000000000000000000000000000000000000000000000000000 phrase TTCGACGAGGAAGAAGAA phraselength 18 phrase binary 110011001001011000111001011000110110011000110011011000110011011000110011 phrase GGCGGCATAT phraselength 10 phrase binary 011001101001011001101001001111000011110000000000000000000000000000000000 maxlength 18 0011001111000110000000000000000000000000000000000000000000000000000000001100110010010110001110010110001101 1001100011001101100011001101100011001101100110100101100110100100111100001111000000000000000000000000000000 0000 216

------Section 2 Message Precoded 3WB ------Block 27 phrase AATGTCCCTACGTATGCAGG phraselength 20 phrase binary 00110011110001101100100110011001110000111001011011000011110001101001001101100110 phrase AACA phraselength 4 phrase binary 00110011100100110000000000000000000000000000000000000000000000000000000000000000 phrase CACAAATAT phraselength 9 phrase binary 10010011100100110011001111000011110000000000000000000000000000000000000000000000 maxlength 20 0011001111000110110010011001100111000011100101101100001111000110100100110110011000110011100100110000000000 0000000000000000000000000000000000000000000000000000001001001110010011001100111100001111000000000000000000 0000000000000000000000000000 240

------Section 2 Message Precoded 3WB ------Block 30 phrase TGTTA phraselength 5 phrase binary 11000110110011000011000000000000 phrase TTAGAGGT phraselength 8 phrase binary 11001100001101100011011001101100 phrase CAAG phraselength 4 phrase binary 10010011001101100000000000000000 maxlength 8 110001101100110000110000000000001100110000110110001101100110110010010011001101100000000000000000 96

332

------Section 2 Message Precoded 3WB ------Block 33 phrase TTTACA phraselength 6 phrase binary 1100110011000011100100110000000000000000000000000000000000000000 phrase GACCGCTCCAAATCTA phraselength 16 phrase binary 0110001110011001011010011100100110010011001100111100100111000011 phrase AGTG phraselength 4 phrase binary 0011011011000110000000000000000000000000000000000000000000000000 maxlength 16 1100110011000011100100110000000000000000000000000000000000000000011000111001100101101001110010011001001100 11001111001001110000110011011011000110000000000000000000000000000000000000000000000000 192

------Section 2 Message Precoded 3WB ------Block 36 phrase CAACTAGGTA phraselength 10 phrase binary 10010011001110011100001101100110110000110000 phrase CAGCGGACTGG phraselength 11 phrase binary 10010011011010010110011000111001110001100110 phrase CGGCGG phraselength 6 phrase binary 10010110011010010110011000000000000000000000 maxlength 11 1001001100111001110000110110011011000011000010010011011010010110011000111001110001100110100101100110100101 10011000000000000000000000 132

------Section 3 Develop Xprime------Block 0 Determinant = 6546.479529378789 0.0430882896552364865939170324 = x1 0.0377481683184752124620295067 = x2 0.0378099444462419310781087788 = x3 43088289655236500 = x1 37748168318475200 = x2 37809944446241900 = x3

01000011000010001000001010001001011001010101001000110110010100000000 = x1 00110111011101001000000101101000001100011000010001110101001000000000 = x2 00110111100000001001100101000100010001000110001001000001100100000000 = x2 XPrePrime 0100001100001000100000101000100101100101010100100011011001010000000000110111011101001000000101101000001100 01100001000111010100100000000000110111100000001001100101000100010001000110001001000001100100000000 204

Block 3 Determinant = 2336.319938761320870256059 0.2073175268608280735725832931 = x1 0.2130549804109235343323833727 = x2 0.4051729331450674512685310714 = x3 207317526860828000 = x1 213054980410924000 = x2 405172933145067000 = x3

001000000111001100010111010100100110100001100000100000101000000000000000 = x1 001000010011000001010100100110000000010000010000100100100100000000000000 = x2 010000000101000101110010100100110011000101000101000001100111000000000000 = x2 XPrePrime 0010000001110011000101110101001001101000011000001000001010000000000000000010000100110000010101001001100000 0001000001000010010010010000000000000001000000010100010111001010010011001100010100010100000110011100000000 0000 216

Block 6 Determinant = 4189.720427549122081489875 0.0206633385058710991887444999 = x1 0.1347996645779737258322875908 = x2 0.1537097436781750526088747 = x3 20663338505871100 = x1 134799664577974000 = x2 153709743678175000 = x3

00100000011001100011001100111000010100000101100001110001000100000000 = x1 000100110100011110011001011001100100010101110111100101110100000000000000 = x2 000101010011011100001001011101000011011001111000000101110101000000000000 = x2 XPrePrime 0010000001100110001100110011100001010000010110000111000100010000000000010011010001111001100101100110010001 0101110111100101110100000000000000000101010011011100001001011101000011011001111000000101110101000000000000 212

333

Block 9 Determinant = 9828.902409910312643107642767 0.2374710717121952139952492761 = x1 0.3567135811214999793886447775 = x2 0.3079956746045875373004151255 = x3 237471071712195000 = x1 356713581121500000 = x2 307995674604588000 = x3

001000110111010001110001000001110001011100010010000110010101000000000000 = x1 001101010110011100010011010110000001000100100001010100000000000000000000 = x2 001100000111100110010101011001110100011000000100010110001000000000000000 = x2 XPrePrime 0010001101110100011100010000011100010111000100100001100101010000000000000011010101100111000100110101100000 0100010010000101010000000000000000000000110000011110011001010101100111010001100000010001011000100000000000 0000 216

Block 12 Determinant = 32.499207301615632 0.0018103387325502543942743917 = x1 0.0188975122209250521133774954 = x2 0.0127665081920512482313941187 = x3 1810338732550250 = x1 18897512220925100 = x2 12766508192051200 = x3

0001100000010000001100111000011100110010010101010000001001010000 = x1 00011000100010010111010100010010001000100000100100100101000100000000 = x2 00010010011101100110010100001000000110010010000001010001001000000000 = x2 XPrePrime 0001100000010000001100111000011100110010010101010000001001010000000110001000100101110101000100100010001000 0010010010010100010000000000010010011101100110010100001000000110010010000001010001001000000000 200

Block 15 Determinant = 15152.263668121366742482564533 0.3899159475985634120302310292 = x1 0.3201925585680932704185500877 = x2 0.2798964606901765095172244421 = x3 389915947598563000 = x1 320192558568093000 = x2 279896460690176000 = x3

001110001001100100010101100101000111010110011000010101100011000000000000 = x1 001100100000000110010010010101011000010101101000000010010011000000000000 = x2 001001111001100010010110010001100000011010010000000101110110000000000000 = x2 XPrePrime 0011100010011001000101011001010001110101100110000101011000110000000000000011001000000001100100100101010110 0001010110100000001001001100000000000000100111100110001001011001000110000001101001000000010111011000000000 0000 216

Block 18 Determinant = 12797.591373600905024731 0.4424017810608360340781669133 = x1 0.3953081334941082516907014529 = x2 0.3219808764466154069519962261 = x3 442401781060836000 = x1 395308133494108000 = x2 321980876446615000 = x3

010001000010010000000001011110000001000001100000100000110110000000000000 = x1 001110010101001100001000000100110011010010010100000100001000000000000000 = x2 001100100001100110000000100001110110010001000110011000010101000000000000 = x2 XPrePrime 0100010000100100000000010111100000010000011000001000001101100000000000000011100101010011000010000001001100 1101001001010000010000100000000000000000110010000110011000000010000111011001000100011001100001010100000000 0000 216

Block 21 Determinant = 10207.948764525735231919266525 0.5366450561799783147460405566 = x1 0.662174803975748341575911294 = x2 0.6210013456397797904967134036 = x3 536645056179978000 = x1 662174803975748000 = x2 621001345639780000 = x3

010100110110011001000101000001010110000101111001100101111000000000000000 = x1 011001100010000101110100100000000011100101110101011101001000000000000000 = x2 011000100001000000000001001101000101011000111001011110000000000000000000 = x2 XPrePrime 334

0101001101100110010001010000010101100001011110011001011110000000000000000110011000100001011101001000000000 1110010111010101110100100000000000000001100010000100000000000100110100010101100011100101111000000000000000 0000 216

Block 24 Determinant = 17596.728968786310143912285003 0.577958976286624204331593782 = x1 0.4669245383313946806964409074 = x2 0.4676588825304634258607291364 = x3 577958976286624000 = x1 466924538331395000 = x2 467658882530463000 = x3

010101110111100101011000100101110110001010000110011000100100000000000000 = x1 010001100110100100100100010100111000001100110001001110010101000000000000 = x2 010001100111011001011000100010000010010100110000010001100011000000000000 = x2 XPrePrime 0101011101111001010110001001011101100010100001100110001001000000000000000100011001101001001001000101001110 0000110011000100111001010100000000000001000110011101100101100010001000001001010011000001000110001100000000 0000 216

Block 27 Determinant = 9181.468058975530824257043104 0.7201336554227998489920423888 = x1 0.7908785170556348431513525391 = x2 0.6485660146565380020660218266 = x3 720133655422800000 = x1 790878517055635000 = x2 648566014656538000 = x3

011100100000000100110011011001010101010000100010100000000000000000000000 = x1 011110010000100001111000010100010111000001010101011000110101000000000000 = x2 011001001000010101100110000000010100011001010110010100111000000000000000 = x2 XPrePrime 0111001000000001001100110110010101010100001000101000000000000000000000000111100100001000011110000101000101 1100000101010101100011010100000000000001100100100001010110011000000001010001100101011001010011100000000000 0000 216

Block 30 Determinant = 7481.964004324472264 1.1241292495481052111865415397 = x1 1.1583448838384394860473824484 = x2 1.0327109375538293053218587632 = x3 1124129249548110000 = x1 1158344883838440000 = x2 1032710937553830000 = x3

0001000100100100000100101001001001001001010101001000000100010000000000000000 = x1 0001000101011000001101000100100010000011100000111000010001000000000000000000 = x2 0001000000110010011100010000100100110111010101010011100000110000000000000000 = x2 XPrePrime 0001000100100100000100101001001001001001010101001000000100010000000000000000000100010101100000110100010010 0010000011100000111000010001000000000000000000000100000011001001110001000010010011011101010101001110000011 0000000000000000 228

Block 33 Determinant = 2605.616212484837429776248400 14.204286549190967124366696802 = x1 12.210102207405804542771712994 = x2 10.857986056276251020133009821 = x3 14204286549191000000 = x1 12210102207405800000 = x2 10857986056276200000 = x3

00010100001000000100001010000110010101001001000110010001000000000000000000000000 = x1 00010010001000010000000100000010001000000111010000000101100000000000000000000000 = x2 00010000100001010111100110000110000001010110001001110110001000000000000000000000 = x2 XPrePrime 0001010000100000010000101000011001010100100100011001000100000000000000000000000000010010001000010000000100 0000100010000001110100000001011000000000000000000000000001000010000101011110011000011000000101011000100111 0110001000000000000000000000 240

Block 36 Determinant = 15965.282552391286940938159 32.883273659155333197409599008 = x1 31.123882855571835666260476477 = x2 34.852926925834308822945751227 = x3 335

32883273659155300000 = x1 31123882855571800000 = x2 34852926925834300000 = x3

00110010100010000011001001110011011001011001000101010101001100000000000000000000 = x1 00110001000100100011100010000010100001010101010101110001100000000000000000000000 = x2 00110100100001010010100100100110100100100101100000110100001100000000000000000000 = x2 XPrePrime 0011001010001000001100100111001101100101100100010101010100110000000000000000000000110001000100100011100010 0000101000010101010101011100011000000000000000000000000011010010000101001010010010011010010010010110000011 0100001100000000000000000000 240

------Section 4 expand every 2 bits with 2 additional bits to maintain bp coding in XPrePrime----- NewXprime(0) 0110001100111100001100111010001110100011001110101010001110100110011010100110011001100110001110100011110001 1010100110011000110011001100110011110001101100011011000110001110100011001101100110101010100011001111000011 0110101000110110001101101100011001100011101000110011001100110011110001101100101000110011001110100110101001 100110001101100011011000110110001101101010001110100110001100110110101001100011001100110011 408 NewXprime(3) 0011101000110011011011000011110000110110011011000110011000111010011010101010001101101010001100111010001100 1110101010001100110011001100110011001100111010001101100011110000110011011001100110001110100110101000110011 0011011000110011011000110011101001100011101001100011001100110011001100110011011000110011001101100110001101 1001101100001110101010011000111100001111000011011001100011011001100011001101101010011011000011001100110011 00110011 432 NewXprime(6) 0011101000110011011010100110101000111100001111000011110010100011011001100011001101100110101000110110110000 1101100011011000110011001100110011011000111100011000110110110010100110101001100110101001101010011000110110 0110011011000110110010100110011011000110001100110011001100110011001100110110011001100011110001101100001100 1110100110011011000110001100111100011010100110110010100011001101100110110001100110001100110011001100110011 424 NewXprime(9) 0011101000111100011011000110001101101100001101100011001101101100001101100110110000110110001110100011011010 1001100110011000110011001100110011001100111100011001100110101001101100001101100011110001100110101000110011 0110001101100011101000110110011001100011001100110011001100110011001100110011001111000011001101101100101001 1010100110011001100110101001101100011000110110101000110011011000110110011010100011101000110011001100110011 00110011 432 NewXprime(12) 0011011010100011001101100011001100111100001111001010001101101100001111000011101001100110011001100011001100 1110100110011000110011001101101010001110100011101001100110110001100110001101100011101000111010001110100011 0011101001100011101001100110001101100011001100110011001101100011101001101100011010100110101001100110001100 1110100011001101101010011000111010001100110110011000110110001110100011001100110011 400 NewXprime(15) 0011110010100011101001101010011000110110011001101010011001100011011011000110011010100110101000110110011001 1010100011110000110011001100110011001100111100001110100011001100110110101001100011101001100110011001101010 0011011001100110101010100011001100111010011000111100001100110011001100110011001110100110110010100110101000 1110100110011010100110001101101010001100110110101010100110001100110011011001101100011010100011001100110011 00110011 432 NewXprime(18) 0110001101100011001110100110001100110011001101100110110010100011001101100011001101101010001100111010001100 1111000110101000110011001100110011001100111100101001100110011000111100001100111010001100110110001111000011 1100011000111010011001100011001101100011001110100011001100110011001100110011001111000011101000110110101001 1010100011001100111010001101101100011010100110001101100011011010100110101000110110011001100011001100110011 00110011 432 NewXprime(21) 0110011000111100011010100110101001100011011001100011001101100110011010100011011001101100101001101010011001 1011001010001100110011001100110011001101101010011010100011101000110110011011000110001110100011001100110011 1100101001100110110001100110011011000110001110100011001100110011001100110011011010100011101000110110001100 1100110011001101100011110001100011011001100110101000111100101001100110110010100011001100110011001100110011 00110011 432 NewXprime(24) 0110011001101100011011001010011001100110101000111010011001101100011010100011101010100011011010100110101000 1110100110001100110011001100110011001101100011011010100110101010100110001110100110001101100110001111001010 0011001111000011110000110110001111001010011001100110001100110011001100110011011000110110101001101100011010 1001100110101000111010001110100011001110100110011000111100001100110110001101101010001111000011001100110011 00110011 432 336

NewXprime(27) 0110110000111010001100110011011000111100001111000110101001100110011001100110001100111010001110101010001100 1100110011001100110011001100110011001101101100101001100011001110100011011011001010001101100110001101100110 1100001100110110011001100110011010100011110001100110001100110011001100110011011010100110001110100011011001 1001101010011010100011001100110110011000110110101001100110011010100110011000111100101000110011001100110011 00110011 432 NewXprime(30) 0011011000110110001110100110001100110110001110101010011000111010011000111010011001100110011000111010001100 1101100011011000110011001100110011001100110011001101100011011001100110101000110011110001100011011000111010 0011101000110011110010100011001111001010001101100011011000110011001100110011001100110011001100110110001100 1100111100001110100110110000110110001100111010011000111100011011000110011001100110001111001010001100111100 00110011001100110011001100110011 456 NewXprime(33) 0011011001100011001110100011001101100011001110101010001101101010011001100110001110100110001101101010011000 1101100011001100110011001100110011001100110011001100110011011000111010001110100011011000110011001101100011 0011001110100011101000110011011011000110001100110011011001101010001100110011001100110011001100110011001100 1100110110001100111010001101100110011011001010011010100011011010100011001101100110011010100011101001101100 01101010001110100011001100110011001100110011001100110011 480 NewXprime(36) 0011110000111010101000111010001100111100001110100110110000111100011010100110011010100110001101100110011001 1001100011110000110011001100110011001100110011001100110011110000110110001101100011101000111100101000111010 0011001110101010001101100110011001100110011001101100001101101010001100110011001100110011001100110011001100 1100111100011000111010001101100110001110101010011000111010011010101010011000111010011001101010001100111100 01100011001111000011001100110011001100110011001100110011 480

------Section 5 pad out Lsense to same length as Xprime------Lsense(0)XP 1001001100111001110000110000000000001100110011001100110000111100110010010011011000111001100100000000000000 0011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100 1100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011 001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100 408

Lsense(3)XP 1001011001101100110000000000000000000000000000000000001110011001001100111100001100111100011011000110011001 1010011001100111000011000000000000000000000000000011001100110011001100110011001100110011001100110011001100 1100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011 0011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100 11001100 432

Lsense(6)XP 1001110001101100001110011001011011001100000000000011110001101001100110011100110011000011000000000110001100 1111001100001101101001011001101100001111001100110011001100110011001100110011001100110011001100110011001100 1100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011 0011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100 424

Lsense(9)XP 1001100111000011100110011100100111000011001110011001110000110000000000000110100100111100001111000000000000 0000000000000000000000000000000000000001100011110000111001100101100110001101101100011011000110110000110011 1100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011 0011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100 11001100 432

Lsense(12)XP 1001001100110110000000000000000000000000100100110110100100111001110011000011000010010011001110011100001100 1100111001100111001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100 1100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011 0011001100110011001100110011001100110011001100110011001100110011001100110011001100 400

337

Lsense(15)XP 1100001110011001110010011100001100110110001110010110110000111100011010010011100100111001001100111001001100 1110010000000000000000000000000000000000000000100100110011100111000011100101100000000000000000000000000000 0000000000000000110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011 0011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100 11001100 432

Lsense(18)XP 0011100110010011011010011001000000000000100100110011100111000011000000000000000010010011100100110110100110 0100111001110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100 1100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011 0011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100 11001100 432

Lsense(21)XP 1001001100110110000000000000000000000000000000000000000000000000000000000000110001101100110000110000000000 0000000000000000000000000000000000000000000000011000111100001110011001001110011001001101101001100110010110 1100001101101100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011 0011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100 11001100 432

Lsense(24)XP 0011001111000110000000000000000000000000000000000000000000000000000000001100110010010110001110010110001101 1001100011001101100011001101100011001101100110100101100110100100111100001111000000000000000000000000000000 0000110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011 0011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100 11001100 432

Lsense(27)XP 0011001111000110110010011001100111000011100101101100001111000110100100110110011000110011100100110000000000 0000000000000000000000000000000000000000000000000000001001001110010011001100111100001111000000000000000000 0000000000000000000000000000110011001100110011001100110011001100110011001100110011001100110011001100110011 0011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100 11001100 432

Lsense(30)XP 110001101100110000110000000000001100110000110110001101100110110010010011001101100000000000000000 96

Lsense(33)XP 1100110011000011100100110000000000000000000000000000000000000000011000111001100101101001110010011001001100 1100111100100111000011001101101100011000000000000000000000000000000000000000000000000011001100110011001100 1100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011 0011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100 11001100110011001100110011001100110011001100110011001100 480

Lsense(36)XP 1001001100111001110000110110011011000011000010010011011010010110011000111001110001100110100101100110100101 1001100000000000000000000011001100110011001100110011001100110011001100110011001100110011001100110011001100 1100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011 0011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100 11001100110011001100110011001100110011001100110011001100 480

------Section 6 Lprime xor with Xprime to produce Presense message---- Mpresense(0) 1111000000000101111100001010001110101111111101100110111110011010101000110101000001011111101010100011110001 338

1001101010101011111111111111111111000010100000101000001010111101101111111110101010011001101111111100001111 1010011011111010111110100000101010101111011011111111111111111111000010100000011011111111111101101010011010 1010101111101011111010111110101111101001101111011010101111111110100110101011111111111111110000000000000000 00000000000000000000000000000000000000000000000000000000 480 Mpresense(3) 1010110001011111101011000011110000110110011011000110010110100011010110010110000001010110010111111100010101 0100110011101011110000001100110011001100111010001110101111000011111111101010101010111101101010011011111111 1111101011111111101011111111011010101111011010101111111111111111111111111111101011111111111110101010111110 1010100000111101100110101011110000111100001111101010101111101010101111111110100110101000001111111111111111 11111111000000000000000000000000000000000000000000000000 480 Mpresense(6) 1010011001011111010100111111110011110000001111000000000011001010111111111111111110100101101000110000111100 0010101111010101011010010101011111010111110000101011111010000001101010011010101010011010100110101011111010 1010101000001010000001101010101000001010111111111111111111111111111111111010101010101111000010100000111111 1101101010101000001010111111110000101001101010000001101111111110101010000010101010111111111111111111111111 00000000000000000000000000000000000000000000000000000000 480 Mpresense(9) 1010001111111111111101011010101010101111000011111010111101011100001101100000010100001010000001100011011010 1001100110011000110011001100110011001101011111101001011111001100001010000000001111101010100000011000000000 1010111110101111011011111010101010101111111111111111111111111111111111111111111100001111111110100000011010 1001101010101010101010011010100000101011111010011011111111101011111010101001101111011011111111111111111111 11111111000000000000000000000000000000000000000000000000 480 Mpresense(12) 1010010110010101001101100011001100111100101011111100101001010101111100000000101011110101010111111111000000 0010011111111111111111111110100110111101101111011010101010000010101010111110101111011011110110111101101111 1111011010101111011010101010111110101111111111111111111110101111011010100000101001101010011010101010111111 1101101111111110100110101011110110111111111010101011111010111101101111111111111111000000000000000000000000 00000000000000000000000000000000000000000000000000000000 480 Mpresense(15) 1111111100111010011011110110010100000000010111111100101001011111000001010101111110011111100100001111010101 0100110011110000110011001100110011001100111100101010010000101011110101001100000011101001100110011001101010 0011011001100110011001101111111111110110101011110000111111111111111111111111111101101010000001101010011011 1101101010101001101010111110100110111111111010011001101010111111111111101010100000101001101111111111111111 11111111000000000000000000000000000000000000000000000000 480 Mpresense(18) 0101101011110000010100111111001100110011101001010101010101100000001101100011001111111001101000001100101010 1011111111011011111111111111111111111111110000011010101010101011110000111111110110111111111010111100001111 0000101011110110101010101111111110101111111101101111111111111111111111111111111100001111011011111010011010 1001101111111111110110111110100000101001101010111110101111101001101010011011111010101010101111111111111111 11111111000000000000000000000000000000000000000000000000 480 Mpresense(21) 1111010100001010011010100110101001100011011001100011001101100110011010100011101000000000011001011010011001 1011001010001100110011001100110011001101101010000010011111100110101111010101011111000011001010101010100101 0000100100001010000010101010101000001010111101101111111111111111111111111111101001101111011011111010111111 1111111111111110101111000010101111101010101010011011110000011010101010000001101111111111111111111111111111 11111111000000000000000000000000000000000000000000000000 480 Mpresense(24) 0101010110101010011011001010011001100110101000111010011001101100011010101111011000110101010100110000100101 0111000101000001010000000001010000000000000101111111000000001110011010000001100110001101100110001111001010 0011111100001111000011111010111100000110101010101010111111111111111111111111101011111010011010100000101001 1010101010011011110110111101101111111101101010101011110000111111111010111110100110111100001111111111111111 11111111000000000000000000000000000000000000000000000000 480 Mpresense(27) 0101111111111100111110101010111111111111101010101010100110100000111101010000010100001001101010011010001100 1100110011001100110011001100110011001101101100101001101010000000110000010111110110000010100110001101100110 1100001100110110011001100110101001101111000010101010111111111111111111111111101001101010111101101111101010 1010100110101001101111111111111010101011111010011010101010101001101010101011110000011011111111111111111111 11111111000000000000000000000000000000000000000000000000 480 Mpresense(30) 1111000011111010000010100110001111111010000011001001000001010110111100001001000001100110011000111111111111 1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111 1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111 1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111 11111111111111111111111111111111000000000000000000000000 480 339

Mpresense(33) 1111101010100000101010010011001101100011001110101010001101101010000001011111101011001111111111110011010100 0001011111101011110000000001011111010100110011001100110011011000111010001110100011011011111111111110101111 1111111101101111011011111111101000001010111111111111101010100110111111111111111111111111111111111111111111 1111111010111111110110111110101010101000000110101001101111101001101111111110101010101001101111011010100000 10100110111101101111111111111111111111111111111111111111 480 Mpresense(36) 1010111100000011011000001100010111111111001100110101101010101010000010011111101011000000101000000000111100 0000000011110000110011001111111111111111111111111111111111000011111010111110101111011011110000011011110110 1111111101100110111110101010101010101010101010100000111110100110111111111111111111111111111111111111111111 1111110000101011110110111110101010111101100110101011110110101001100110101011110110101010100110111111110000 10101111111100001111111111111111111111111111111111111111 480

------expand Mpresense message 2 bits for every 2 bits and create Msense and Mantisense--- Msense(0) 1100110000110011001100110110011011001100001100111001100100111100100110011100110011001100011010010110100111 0011001001011010011001100110010011110001100110001100110110011011001100100110011001100100111100110000110110 1001011010011001100110011001110011001100110011001100110011001100110000110011100110010011001110011001001100 1110011001110011000110100111001100110011001001100110011001011010010110100111001100110011000011001111001100 1001100101101001110011001001100111001100100110010011001110011001100110011100110001101001110011001100110011 0011001100110011001100001100111001100100110011011010011100110011001100110011000110100110011001011010011001 1001100110011100110010011001110011001001100111001100100110011100110010011001011010011100110001101001100110 0111001100110011001001100101101001100110011100110011001100110011001100110000110011001100110011001100110011 0011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100 110011 960 Msense(3) 1001100111000011011001101100110010011001110000110011110011000011001111000110100101101001110000110110100101 1001101001100100111100011001101001011001101001001100110110011001101001011001101100110011000011011001100110 0110001111000011110010011001110011000011001100111100001111000011110000111100001111001001100100111100100110 0111001100001100111100110011001100100110011001100110011001110011000110100110011001011010011100110011001100 1100110010011001110011001100110010011001110011001100110001101001100110011100110001101001100110011100110011 0011001100110011001100110011001100110011001100100110011100110011001100110011001001100110011001110011001001 1001100110010011001111001100011010010110100110011001110011000011001111001100001100111100110010011001100110 0111001100100110011001100111001100110011001001100101101001100110010011001111001100110011001100110011001100 1100110011001100001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100 110011 960 Msense(6) 1001100101101001011001101100110001100110001111001100110011000011110011000011001100111100110000110011001100 1100111100001110011001110011001100110011001100110011001001100101100110100110010011110000110011110011000011 0011100110011100110001100110011001101001100101100110011001101100110001100110110011000011001110011001110011 0010011001001100110110100110011001011010011001100110011001011010011001100101101001100110011100110010011001 1001100110011001001100111001100100110011011010011001100110011001001100111001100111001100110011001100110011 0011001100110011001100110011001100110010011001100110011001100111001100001100111001100100110011110011001100 1100011010011001100110011001001100111001100111001100110011000011001110011001011010011001100100110011011010 0111001100110011001001100110011001001100111001100110011001110011001100110011001100110011001100110011001100 0011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100 110011 960 Msense(9) 1001100100111100110011001100110011001100011001101001100110011001100110011100110000110011110011001001100111 0011000110011011000011001111000110100100110011011001100011001110011001001100110110100100111100011010011001 1001011010010110100101101001001111000011110000111100001111000011110000111100011001101100110010011001011001 1011001100001111000011001110011001001100110011001111001100100110011001100100110011011010010011001100110011 1001100111001100100110011100110001101001110011001001100110011001100110011100110011001100110011001100110011 0011001100110011001100110011001100110011001100110011000011001111001100110011001001100100110011011010011001 1001011010011001100110011001100110011001100101101001100110010011001110011001110011001001100101101001110011 0011001100100110011100110010011001100110010110100111001100011010011100110011001100110011001100110011001100 1100110011001100001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100 110011 960 Msense(12) 1001100101100110100101100110011000111100011010010011110000111100001111001100001110011001110011001100001110 0110010110011001100110110011000011001100110011100110011100110001100110011001101100110011001100001100110011 0011100101101100110011001100110011001100110011001100100110010110100111001100011010011100110001101001100110 0110011001001100111001100110011001110011001001100111001100011010011100110001101001110011000110100111001100 340

1100110001101001100110011100110001101001100110011001100111001100100110011100110011001100110011001100110011 0011001001100111001100011010011001100100110011100110010110100110011001011010011001100110011001110011001100 1100011010011100110011001100100110010110100110011001110011000110100111001100110011001001100110011001110011 0010011001110011000110100111001100110011001100110011001100001100110011001100110011001100110011001100110011 0011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100 110011 960 Msense(15) 1100110011001100001111001001100101101001110011000110100101100110001100110011001101100110110011001100001110 0110010110011011001100001100110110011001100110110011001001011011001100100101100011001111001100011001100110 0110001111000011110011000011001111000011110000111100001111000011110000111100001111001100001110011001100101 1000110011100110011100110001100110001111000011001100111100100110010110100101101001011010010110100110011001 0011110001101001011010010110100101101001011010011100110011001100110011000110100110011001110011000011001111 0011001100110011001100110011001100110011001100110011000110100110011001001100110110100110011001011010011100 1100011010011001100110011001011010011001100111001100100110010110100111001100110011001001100101101001011010 0110011001110011001100110011001100100110011001100100110011100110010110100111001100110011001100110011001100 1100110011001100001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100 110011 960 Msense(18) 0110011010011001110011000011001101100110001111001100110000111100001111000011110010011001011001100110011001 1001100110100100110011001111000110100100111100001111001100110010010110100110010011001111000011100110011001 1001110011001100110001101001110011001100110011001100110011001100110011001100110011000011001101101001100110 0110011001100110011100110000110011110011001100110001101001110011001100110010011001110011000011001111001100 0011001110011001110011000110100110011001100110011100110011001100100110011100110011001100011010011100110011 0011001100110011001100110011001100110011001100110011000011001111001100011010011100110010011001011010011001 1001011010011100110011001100110011000110100111001100100110010011001110011001011010011001100111001100100110 0111001100100110010110100110011001011010011100110010011001100110011001100111001100110011001100110011001100 1100110011001100001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100 110011 960 Msense(21) 1100110001100110001100111001100101101001100110010110100110011001011010010011110001101001011010010011110000 1111000110100101101001011010011001100100111100100110010011001100110011011010010110011010011001011010010110 1001110000111001100100111100001111000011110000111100001111000011110000111100011010011001100100110011100101 1011001100100101101001100111001100011001100110011011001100001100111100001110011001100110011001100101100110 0011001110010110001100111001100100110011100110011001100110011001001100111001100111001100011010011100110011 0011001100110011001100110011001100110011001100100110010110100111001100011010011100110010011001110011001100 1100110011001100110011001100100110011100110000110011100110011100110010011001100110011001100101101001110011 0000110011011010011001100110011001001100110110100111001100110011001100110011001100110011001100110011001100 1100110011001100001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100 110011 960 Msense(24) 0110011001100110100110011001100101101001110000111001100101101001011010010110100110011001001111001001100101 1010010110100111000011011010011001100111001100011010010011110001100110011001100011110000110011100101100110 0110110000110110011000110011011001100011001100110011011001100011001100110011001100110110011011001100110000 1100110011001111001001011010011001001100110110100101101001001111000110100101101001001111001100001110011001 0011110011001100001100111100110000110011110011001001100111001100001100110110100110011001100110011001100111 0011001100110011001100110011001100110011001100100110011100110010011001011010011001100100110011100110010110 1001100110011001100101101001110011000110100111001100011010011100110011001100011010011001100110011001110011 0000110011110011001100110010011001110011001001100101101001110011000011001111001100110011001100110011001100 1100110011001100001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100 110011 960 Msense(27) 0110011011001100110011001100001111001100100110011001100111001100110011001100110010011001100110011001100110 0101101001100100110011110011000110011000110011011001100011001110010110100110011001011010011001001111000011 1100001111000011110000111100001111000011110000111100001111000011110000111100011010011100001110011001011010 0110011001001100110011110000110011011001101100110001101001001100111001100101101001001111000110100101101001 1100001100111100001111000110100101101001011010010110100110011001011010011100110000110011100110011001100111 0011001100110011001100110011001100110011001100100110010110100110011001110011000110100111001100100110011001 1001100110010110100110011001011010011100110011001100110011001001100110011001110011001001100101101001100110 0110011001100110010110100110011001100110011100110000110011011010011100110011001100110011001100110011001100 1100110011001100001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100 110011 960 Msense(30) 1100110000110011110011001001100100110011100110010110100100111100110011001001100100110011110000111001011000 1100110110011001101001110011000011001110010110001100110110100101101001011010010011110011001100110011001100 341

1100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011 0011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100 1100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011 0011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100 1100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011 0011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100 1100110011001100110011001100110011001100110011001100110011001100001100110011001100110011001100110011001100 110011 960 Msense(33) 1100110010011001100110010011001110011001100101100011110000111100011010010011110000111100100110011001100100 1111000110100110011001001100110110011011001100100110011100001111001100110011001100110000111100011001100011 0011011001101100110010011001110011000011001100110011011001101100110001100110001111000011110000111100001111 0000111100011010010011110010011001001111001001100100111100011010011100110011001100110011001001100111001100 1100110011001100011010011100110001101001110011001100110010011001001100111001100111001100110011001100110010 0110011001100101101001110011001100110011001100110011001100110011001100110011001100110011001100110011001100 1100110011001001100111001100110011000110100111001100100110011001100110011001001100110110100110011001011010 0111001100100110010110100111001100110011001001100110011001100110010110100111001100011010011001100100110011 1001100101101001110011000110100111001100110011001100110011001100110011001100110011001100110011001100110011 001100 960 Msense(36) 1001100111001100001100110011110001101001001100111100001101100110110011001100110000111100001111000110011010 0110011001100110011001001100111001011011001100100110011100001100110011100110010011001100110011110011000011 0011001100110011110011000011001111000011110000111100110011001100110011001100110011001100110011001100110011 0011001100001100111100110010011001110011001001100111001100011010011100110000110011011010011100110001101001 1100110011001100011010010110100111001100100110011001100110011001100110011001100110011001100110010011001111 0011001001100101101001110011001100110011001100110011001100110011001100110011001100110011001100110011001100 1100110011000011001110011001110011000110100111001100100110011001100111001100011010010110100110011001110011 0001101001100110010110100101101001100110011100110001101001100110011001100101101001110011001100110000110011 1001100111001100110011000011001111001100110011001100110011001100110011001100110011001100110011001100110011 001100 960

Mantisense(0) 0011001111001100110011001001100100110011110011000110011011000011011001100011001100110011100101101001011000 1100110110100101100110011001101100001110011001110011001001100100110011011001100110011011000011001111001001 0110100101100110011001100110001100110011001100110011001100110011001111001100011001101100110001100110110011 0001100110001100111001011000110011001100110110011001100110100101101001011000110011001100111100110000110011 0110011010010110001100110110011000110011011001101100110001100110011001100011001110010110001100110011001100 1100110011001100110011110011000110011011001100100101100011001100110011001100111001011001100110100101100110 0110011001100011001101100110001100110110011000110011011001100011001101100110100101100011001110010110011001 1000110011001100110110011010010110011001100011001100110011001100110011001111001100110011001100110011001100 1100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011 001100 960

Mantisense(3) 0110011000111100100110010011001101100110001111001100001100111100110000111001011010010110001111001001011010 0110010110011011000011100110010110100110010110110011001001100110010110100110010011001100111100100110011001 1001110000111100001101100110001100111100110011000011110000111100001111000011110000110110011011000011011001 1000110011110011000011001100110011011001100110011001100110001100111001011001100110100101100011001100110011 0011001101100110001100110011001101100110001100110011001110010110011001100011001110010110011001100011001100 1100110011001100110011001100110011001100110011011001100011001100110011001100110110011001100110001100110110 0110011001101100110000110011100101101001011001100110001100111100110000110011110011000011001101100110011001 1000110011011001100110011000110011001100110110011010010110011001101100110000110011001100110011001100110011 0011001100110011110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011 001100 960

Mantisense(6) 0110011010010110100110010011001110011001110000110011001100111100001100111100110011000011001111001100110011 0011000011110001100110001100110011001100110011001100110110011010011001011001101100001111001100001100111100 1100011001100011001110011001100110010110011010011001100110010011001110011001001100111100110001100110001100 1101100110110011001001011001100110100101100110011001100110100101100110011010010110011001100011001101100110 0110011001100110110011000110011011001100100101100110011001100110110011000110011000110011001100110011001100 342

1100110011001100110011001100110011001101100110011001100110011000110011110011000110011011001100001100110011 0011100101100110011001100110110011000110011000110011001100111100110001100110100101100110011011001100100101 1000110011001100110110011001100110110011000110011001100110001100110011001100110011001100110011001100110011 1100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011 001100 960

Mantisense(9) 0110011011000011001100110011001100110011100110010110011001100110011001100011001111001100001100110110011000 1100111001100100111100110000111001011011001100100110011100110001100110110011001001011011000011100101100110 0110100101101001011010010110110000111100001111000011110000111100001111000011100110010011001101100110100110 0100110011110000111100110001100110110011001100110000110011011001100110011011001100100101101100110011001100 0110011000110011011001100011001110010110001100110110011001100110011001100011001100110011001100110011001100 1100110011001100110011001100110011001100110011001100111100110000110011001100110110011011001100100101100110 0110100101100110011001100110011001100110011010010110011001101100110001100110001100110110011010010110001100 1100110011011001100011001101100110011001101001011000110011100101100011001100110011001100110011001100110011 0011001100110011110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011 001100 960

Mantisense(12) 0110011010011001011010011001100111000011100101101100001111000011110000110011110001100110001100110011110001 1001101001100110011001001100111100110011001100011001100011001110011001100110010011001100110011110011001100 1100011010010011001100110011001100110011001100110011011001101001011000110011100101100011001110010110011001 1001100110110011000110011001100110001100110110011000110011100101100011001110010110001100111001011000110011 0011001110010110011001100011001110010110011001100110011000110011011001100011001100110011001100110011001100 1100110110011000110011100101100110011011001100011001101001011001100110100101100110011001100110001100110011 0011100101100011001100110011011001101001011001100110001100111001011000110011001100110110011001100110001100 1101100110001100111001011000110011001100110011001100110011110011001100110011001100110011001100110011001100 1100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011 001100 960

Mantisense(15) 0011001100110011110000110110011010010110001100111001011010011001110011001100110010011001001100110011110001 1001101001100100110011110011001001100110011001001100110110100100110011011010011100110000110011100110011001 1001110000111100001100111100110000111100001111000011110000111100001111000011110000110011110001100110011010 0111001100011001100011001110011001110000111100110011000011011001101001011010010110100101101001011001100110 1100001110010110100101101001011010010110100101100011001100110011001100111001011001100110001100111100110000 1100110011001100110011001100110011001100110011001100111001011001100110110011001001011001100110100101100011 0011100101100110011001100110100101100110011000110011011001101001011000110011001100110110011010010110100101 1001100110001100110011001100110011011001100110011011001100011001101001011000110011001100110011001100110011 0011001100110011110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011 001100 960

Mantisense(18) 1001100101100110001100111100110010011001110000110011001111000011110000111100001101100110100110011001100110 0110011001011011001100110000111001011011000011110000110011001101101001011001101100110000111100011001100110 0110001100110011001110010110001100110011001100110011001100110011001100110011001100111100110010010110011001 1001100110011001100011001111001100001100110011001110010110001100110011001101100110001100111100110000110011 1100110001100110001100111001011001100110011001100011001100110011011001100011001100110011100101100011001100 1100110011001100110011001100110011001100110011001100111100110000110011100101100011001101100110100101100110 0110100101100011001100110011001100111001011000110011011001101100110001100110100101100110011000110011011001 1000110011011001101001011001100110100101100011001101100110011001100110011000110011001100110011001100110011 0011001100110011110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011 001100 960

Mantisense(21) 0011001110011001110011000110011010010110011001101001011001100110100101101100001110010110100101101100001111 0000111001011010010110100101100110011011000011011001101100110011001100100101101001100101100110100101101001 0110001111000110011011000011110000111100001111000011110000111100001111000011100101100110011011001100011010 0100110011011010010110011000110011100110011001100100110011110011000011110001100110011001100110011010011001 343

1100110001101001110011000110011011001100011001100110011001100110110011000110011000110011100101100011001100 1100110011001100110011001100110011001100110011011001101001011000110011100101100011001101100110001100110011 0011001100110011001100110011011001100011001111001100011001100011001101100110011001100110011010010110001100 1111001100100101100110011001100110110011001001011000110011001100110011001100110011001100110011001100110011 0011001100110011110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011 001100 960

Mantisense(24) 1001100110011001011001100110011010010110001111000110011010010110100101101001011001100110110000110110011010 0101101001011000111100100101100110011000110011100101101100001110011001100110011100001111001100011010011001 1001001111001001100111001100100110011100110011001100100110011100110011001100110011001001100100110011001111 0011001100110000110110100101100110110011001001011010010110110000111001011010010110110000110011110001100110 1100001100110011110011000011001111001100001100110110011000110011110011001001011001100110011001100110011000 1100110011001100110011001100110011001100110011011001100011001101100110100101100110011011001100011001101001 0110011001100110011010010110001100111001011000110011100101100011001100110011100101100110011001100110001100 1111001100001100110011001101100110001100110110011010010110001100111100110000110011001100110011001100110011 0011001100110011110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011 001100 960

Mantisense(27) 1001100100110011001100110011110000110011011001100110011000110011001100110011001101100110011001100110011001 1010010110011011001100001100111001100111001100100110011100110001101001011001100110100101100110110000111100 0011110000111100001111000011110000111100001111000011110000111100001111000011100101100011110001100110100101 1001100110110011001100001111001100100110010011001110010110110011000110011010010110110000111001011010010110 0011110011000011110000111001011010010110100101101001011001100110100101100011001111001100011001100110011000 1100110011001100110011001100110011001100110011011001101001011001100110001100111001011000110011011001100110 0110011001101001011001100110100101100011001100110011001100110110011001100110001100110110011010010110011001 1001100110011001101001011001100110011001100011001111001100100101100011001100110011001100110011001100110011 0011001100110011110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011 001100 960

Mantisense(30) 0011001111001100001100110110011011001100011001101001011011000011001100110110011011001100001111000110100111 0011001001100110010110001100111100110001101001110011001001011010010110100101101100001100110011001100110011 0011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100 1100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011 0011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100 1100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011 0011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100 1100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011 0011001100110011001100110011001100110011001100110011001100110011110011001100110011001100110011001100110011 001100 960

Mantisense(33) 0011001101100110011001101100110001100110011010011100001111000011100101101100001111000011011001100110011011 0000111001011001100110110011001001100100110011011001100011110000110011001100110011001111000011100110011100 1100100110010011001101100110001100111100110011001100100110010011001110011001110000111100001111000011110000 1111000011100101101100001101100110110000110110011011000011100101100011001100110011001100110110011000110011 0011001100110011100101100011001110010110001100110011001101100110110011000110011000110011001100110011001101 1001100110011010010110001100110011001100110011001100110011001100110011001100110011001100110011001100110011 0011001100110110011000110011001100111001011000110011011001100110011001100110110011001001011001100110100101 1000110011011001101001011000110011001100110110011001100110011001101001011000110011100101100110011011001100 0110011010010110001100111001011000110011001100110011001100110011001100110011001100110011001100110011001100 110011 960

Mantisense(36) 0110011000110011110011001100001110010110110011000011110010011001001100110011001111000011110000111001100101 1001100110011001100110110011000110100100110011011001100011110011001100011001101100110011001100001100111100 1100110011001100001100111100110000111100001111000011001100110011001100110011001100110011001100110011001100 344

1100110011110011000011001101100110001100110110011000110011100101100011001111001100100101100011001110010110 0011001100110011100101101001011000110011011001100110011001100110011001100110011001100110011001101100110000 1100110110011010010110001100110011001100110011001100110011001100110011001100110011001100110011001100110011 0011001100111100110001100110001100111001011000110011011001100110011000110011100101101001011001100110001100 1110010110011001101001011010010110011001100011001110010110011001100110011010010110001100110011001111001100 0110011000110011001100111100110000110011001100110011001100110011001100110011001100110011001100110011001100 110011 960

------Section 7 Convert binary sense and antisense strands to BP code--- MsenseBP(0) TTAAAAGGTTAACCATCCTTTTGCGCTTCGCCCCATGGAAGGTTCCCCATTAGCGCCCCCTTTTTTTTTTAACCAACCAACCTTGCTTTTCCCCGCGCTT TTAATTCCGCTTCCTTCCAACCCCTTGCTTTTTTTTTTAACCAAGCTTTTTTGCCCGCCCCCTTCCTTCCTTCCTTCCGCTTGCCCTTTTCCGCCCTTTTTTT TAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 240 MantisenseBP(0) AATTTTCCAATTGGTAGGAAAACGCGAAGCGGGGTACCTTCCAAGGGGTAATCGCGGGGGAAAAAAAAAATTGGTTGGTTGGAACGAAAAGGGG CGCGAAAATTAAGGCGAAGGAAGGTTGGGGAACGAAAAAAAAAATTGGTTCGAAAAAACGGGCGGGGGAAGGAAGGAAGGAAGGCGAACGG GAAAAGGCGGGAAAAAAAATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT 240

MsenseBP(1) CCTAGGTTCCTAATTAATGCGCTAGCGGCCATGGCGGCAAGGGCGGTTTAGGGGATATCCTTAAATATATATATCCATCCTTAATTTTCCCCCCTTGC CCGCTTTTTTCCTTTTCCTTTTGCCCTTGCCCTTTTTTTTTTTTTTCCTTTTTTCCCCTTCCCCAATTGCGCCCTTAATTAATTCCCCTTCCCCTTTTCCGCCC AATTTTTTTTTTTTAAAAAAAAAAAAAAAAAAAAAAAA 240 MantisenseBP(1) GGATCCAAGGATTAATTACGCGATCGCCGGTACCGCCGTTCCCGCCAAATCCCCTATAGGAATTTATATATATAGGTAGGAATTAAAAGGGGGGAA CGGGCGAAAAAAGGAAAAGGAAAACGGGAACGGGAAAAAAAAAAAAAAGGAAAAAAGGGGAAGGGGTTAACGCGGGAATTAATTAAGGGGA AGGGGAAAAGGCGGGTTAAAAAAAAAAAATTTTTTTTTTTTTTTTTTTTTTTT 240

MsenseBP(2) CCGCGGTTGGATTTTATTAAATTAAAAATACCTTTTTTTTCCGGCCATAATTAACCTTGGGGCCGGGGTTGGTTAACCTTCCAAGCCCGCCCCCGCCC GCCCTTCCCCCCAACCAAGCCCCCAACCTTTTTTTTTTTTTTTTCCCCCCTTAACCAATTTTGCCCCCAACCTTTTAACCGCCCAAGCTTTTCCCCAACCC CTTTTTTTTTTTTAAAAAAAAAAAAAAAAAAAAAAAAAAAA 240 MantisenseBP(2) GGCGCCAACCTAAAATAATTTAATTTTTATGGAAAAAAAAGGCCGGTATTAATTGGAACCCCGGCCCCAACCAATTGGAAGGTTCGGGCGGGGGC GGGCGGGAAGGGGGGTTGGTTCGGGGGTTGGAAAAAAAAAAAAAAAAGGGGGGAATTGGTTAAAACGGGGGTTGGAAAATTGGCGGGTTCGA AAAGGGGTTGGGGAAAAAAAAAAAATTTTTTTTTTTTTTTTTTTTTTTTTTTT 240

MsenseBP(3) CCATTTTTTTGGCCCCCCTTAATTCCTTGGTAATGCAAGGAACCAAGCATGCCCGCGCGCATATATATATATGGTTCCGGTTATAACCAAAATTCCCC AAGCAAAACCTTCCTTGCTTCCCCCCTTTTTTTTTTTTTTTTTTTTTTAATTTTCCAAGCCCGCCCCCCCCCGCCCAACCTTCCGCTTTTCCTTCCCCGCTT GCTTTTTTTTTTTTTTAAAAAAAAAAAAAAAAAAAAAAAA 240 MantisenseBP(3) GGTAAAAAAACCGGGGGGAATTAAGGAACCATTACGTTCCTTGGTTCGTACGGGCGCGCGTATATATATATACCAAGGCCAATATTGGTTTTAAGG GGTTCGTTTTGGAAGGAACGAAGGGGGGAAAAAAAAAAAAAAAAAAAAAATTAAAAGGTTCGGGCGGGGGGGGGCGGGTTGGAAGGCGAAAA GGAAGGGGCGAACGAAAAAAAAAAAAAATTTTTTTTTTTTTTTTTTTTTTTT 240

MsenseBP(4) CCGGCGGGATGCATATATTACCTTTACCGGGGTTAAAACCTTGGGGTTTTAAAACGTTTTTTTTTTCCGCTTGCTTGCCCCCAACCCCTTCCTTGCTTG CTTGCTTTTGCCCTTGCCCCCTTCCTTTTTTTTTTCCTTGCCCAACCGCCCGCCCCCTTTTGCTTTTCCGCCCTTGCTTTTCCCCTTCCTTGCTTTTTTTTAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 240 MantisenseBP(4) GGCCGCCCTACGTATATAATGGAAATGGCCCCAATTTTGGAACCCCAAAATTTTGCAAAAAAAAAAGGCGAACGAACGGGGGTTGGGGAAGGAA CGAACGAACGAAAACGGGAACGGGGGAAGGAAAAAAAAAAGGAACGGGTTGGCGGGCGGGGGAAAACGAAAAGGCGGGAACGAAAAGGGG AAGGAACGAAAAAAAATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT 240

MsenseBP(5) TTTTATCCGCTTGCGGAAAAGGTTTACCGGTTAAGGGGTTCGTTCGAATTGGGGATATTAATATATATATATATTACCCGAACCTTGGATAAATCCG CGCGCGCCCATGCGCGCGCGCTTTTTTGCCCTTAATTTTTTTTTTTTTTGCCCAAGCCCGCTTGCCCCCGCCCTTCCGCTTTTCCGCGCCCTTTTTTCCC CAACCGCTTTTTTTTTTTTAAAAAAAAAAAAAAAAAAAAAAAA 240 345

MantisenseBP(5) AAAATAGGCGAACGCCTTTTCCAAATGGCCAATTCCCCAAGCAAGCTTAACCCCTATAATTATATATATATATAATGGGCTTGGAACCTATTTAGGC GCGCGCGGGTACGCGCGCGCGAAAAAACGGGAATTAAAAAAAAAAAAAACGGGTTCGGGCGAACGGGGGCGGGAAGGCGAAAAGGCGCGGGA AAAAAGGGGTTGGCGAAAAAAAAAAAATTTTTTTTTTTTTTTTTTTTTTTT 240

MsenseBP(6) GGCCTTAAGGATTTATATATCCGGGGGGGCAAATGCATATTTCGCCAATACCCCTTTTGCTTTTTTTTTTTTTTAAGCCCCCCCTTAATTTTGCTTTTCC TTAATTAACCTTGCCCCCTTTTCCTTTTGCTTTTTTTTTTTTTTTTAATTGCTTCCGCCCGCTTTTTTGCTTCCAACCGCCCTTCCTTCCGCCCGCTTCCCCC CTTTTTTTTTTTTAAAAAAAAAAAAAAAAAAAAAAAA 240 MantisenseBP(6) CCGGAATTCCTAAATATATAGGCCCCCCCGTTTACGTATAAAGCGGTTATGGGGAAAACGAAAAAAAAAAAAAATTCGGGGGGGAATTAAAACGA AAAGGAATTAATTGGAACGGGGGAAAAGGAAAACGAAAAAAAAAAAAAAAATTAACGAAGGCGGGCGAAAAAACGAAGGTTGGCGGGAAGGA AGGCGGGCGAAGGGGGGAAAAAAAAAAAATTTTTTTTTTTTTTTTTTTTTTTT 240

MsenseBP(7) TTGGAACCGCCCGCCCGCATGCGCATATGCGCGCCCATCCAAAAGCGGCCGCGCTACCATATATATATATATGCCCAACGTTCGCCTTGGGGTTAAT ACCCCCCGGAACGAACCAACCCCCCAACCTTGCTTTTTTTTTTTTTTCCGCTTGCTTCCTTTTTTTTTTCCTTAACCTTCCCCCCGCTTAAGCCCCCAAGC TTTTTTTTTTTTTTTTTTAAAAAAAAAAAAAAAAAAAAAAAA 240 MantisenseBP(7) AACCTTGGCGGGCGGGCGTACGCGTATACGCGCGGGTAGGTTTTCGCCGGCGCGATGGTATATATATATATACGGGTTGCAAGCGGAACCCCAAT TATGGGGGGCCTTGCTTGGTTGGGGGGTTGGAACGAAAAAAAAAAAAAAGGCGAACGAAGGAAAAAAAAAAGGAATTGGAAGGGGGGCGAAT TCGGGGGTTCGAAAAAAAAAAAAAAAAAATTTTTTTTTTTTTTTTTTTTTTTT 240

MsenseBP(8) GGGGCCCCGCTACCGCGCGCCCATCCGCGCTAGCCCTTGCATGGGGATAACGGGTAGGAAGGAAAAGGAAAAAAGGTTTAAAATCGCCAAGCGC ATGCGCATTACCATTTAATTAATTCCTTAAGCCCCCCCTTTTTTTTTTTTCCTTCCGCCCAACCGCCCCCGCTTGCTTGCTTTTGCCCCCTTAATTTTCCTT CCGCTTAATTTTTTTTTTTTAAAAAAAAAAAAAAAAAAAAAAAA 240 MantisenseBP(8) CCCCGGGGCGATGGCGCGCGGGTAGGCGCGATCGGGAACGTACCCCTATTGCCCATCCTTCCTTTTCCTTTTTTCCAAATTTTAGCGGTTCGCGTAC GCGTAATGGTAAATTAATTAAGGAATTCGGGGGGGAAAAAAAAAAAAGGAAGGCGGGTTGGCGGGGGCGAACGAACGAAAACGGGGGAATTA AAAGGAAGGCGAATTAAAAAAAAAAAATTTTTTTTTTTTTTTTTTTTTTTT 240

MsenseBP(9) GGTTTTTATTCCCCTTTTTTCCCCCCCGCCAATTGGAAGGAACGCCCGCCATATATATATATATATATATATGCTACCGCCCAAATAAGGTTGCAACC GCATGCGCTAATATGCGCGCGCCCGCTTAACCCCTTTTTTTTTTTTCCGCCCTTGCTTCCCCCCGCCCGCTTTTTTCCCCTTCCGCCCCCCCGCCCCCTT AAGCTTTTTTTTTTTTTTAAAAAAAAAAAAAAAAAAAAAAAA 240 MantisenseBP(9) CCAAAAATAAGGGGAAAAAAGGGGGGGCGGTTAACCTTCCTTGCGGGCGGTATATATATATATATATATATACGATGGCGGGTTTATTCCAACGTT GGCGTACGCGATTATACGCGCGCGGGCGAATTGGGGAAAAAAAAAAAAGGCGGGAACGAAGGGGGGCGGGCGAAAAAAGGGGAAGGCGGGG GGGCGGGGGAATTCGAAAAAAAAAAAAAATTTTTTTTTTTTTTTTTTTTTTTT 240

MsenseBP(10) TTAATTCCAACCGCATTTCCAATACGAAGGGCTTAACGAAGCGCGCATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT TTTTTTTTTTTTTTAAAAAAAAAAAA 240 MantisenseBP(10) AATTAAGGTTGGCGTAAAGGTTATGCTTCCCGAATTGCTTCGCGCGTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATTTTTTTTTTTT 240

MsenseBP(11) TTCCCCAACCCGATATGCATATCCCCATGCCCAAGGTTCCTATTTTTTATGGAAGGTTCCTTAAAAGGTTGGATATATATATGCATCCATCCATGCTTT TTTCCTTTTTTGCTTGCTTTTCCAACCTTTTTTCCCCGCTTTTTTTTTTTTTTTTTTTTTTTTCCTTTTGCTTCCCCCCAAGCCCGCTTCCGCTTTTCCCCCCG CTTGCCCAACCGCTTGCTTTTTTTTTTTTTTTTTTTT 240 MantisenseBP(11) AAGGGGTTGGGCTATACGTATAGGGGTACGGGTTCCAAGGATAAAAAATACCTTCCAAGGAATTTTCCAACCTATATATATACGTAGGTAGGTACG AAAAAAGGAAAAAACGAACGAAAAGGTTGGAAAAAAGGGGCGAAAAAAAAAAAAAAAAAAAAAAAAGGAAAACGAAGGGGGGTTCGGGCGAA GGCGAAAAGGGGGGCGAACGGGTTGGCGAACGAAAAAAAAAAAAAAAAAAAA 240 346

MsenseBP(12) CCTTAAATGCAATAGGTTTTATATGGCCCCCCAACGTTCCTAAACCAAAATTAAAAATTAATATATTTTTTTTTTTTTTTTTAATTCCTTCCTTGCTTAAG CTTGCTTTTGCGCTTCCCCCCCCCCCCCCAATTCCGCTTTTTTTTTTTTTTTTTTTTTTTTAACCTTGCTTCCCCTTGCGCCCTTGCCCGCGCCCTTGCCCC CGCTTTTAACCTTTTAATTTTTTTTTTTTTTTTTTTT 240 MantisenseBP(12) GGAATTTACGTTATCCAAAATATACCGGGGGGTTGCAAGGATTTGGTTTTAATTTTTAATTATATAAAAAAAAAAAAAAAAATTAAGGAAGGAACG AATTCGAACGAAAACGCGAAGGGGGGGGGGGGGGTTAAGGCGAAAAAAAAAAAAAAAAAAAAAAAATTGGAACGAAGGGGAACGCGGGAACG GGCGCGGGAACGGGGGCGAAAATTGGAAAATTAAAAAAAAAAAAAAAAAAAA 240

-----Section 8 Read in Chromsome genome key at specified position and encrypt message and anneal sense strand------M genitalium NC_000908_2 start position 125 Csense AATAAGTTATTATTTAGTTAATTAAGTTATTATTTAGTTAATATAAGTTATTATTTAGTTAATACTAAGTTATTATTTAGTTAATACTTAAGTTATTATTT AGTTAATACTTTAAGTTATTATTTAGTTAATACTTTTAAGTTATTATTTAGTTAATACTTTTTAAGTTATTATTTAGTTAATACTTTTATAAGTTATTATT TAGTTAATACTTTTAATAAGTTATTATTTAGTTAATAC 240 Cantisense TTATTCAATAATAAATCAATTAATTCAATAATAAATCAATTATATTCAATAATAAATCAATTATGATTCAATAATAAATCAATTATGAATTCAATAATA AATCAATTATGAAATTCAATAATAAATCAATTATGAAAATTCAATAATAAATCAATTATGAAAAATTCAATAATAAATCAATTATGAAAATATTCAAT AATAAATCAATTATGAAAATTATTCAATAATAAATCAATTATG 240

EncryptsenseBP(0) T T A gA gA tA cG cG T cT A gA cC cC A T C cC cT T T cT cG gC gG C cT cT gC cG cC gC cC cC A T tG cG A gA gG cG T cT gC gC C cC A T cT A gG cC cG cC gC C cC cC T T cT T aT cT T T tT cT A gA cC cC gA A cC cC gA tA cC cC T T cG gC aT cT cT T gC C cC cC gG cC cG gC cT cT cT T tA A cT T gC cC gG aC cT cT cC gC T tT cC cC gA A cC gC cC cC cT T tG cC cT T T cT T aT cT cT cT cT gA gA C cC A gA cG cC T cT cT cT T tT cG cC gC gC cG gC aC cC cC cC cT cT gC gC tT cT cC gC cT cT gC cC cT cT gC C cG cC T T cG gC aC cC cT cT cT T cC gC gG C cC cC T cT cT T cT cT cT T tA A A gA gA A gA aA A A A A gA gA A gA gA tA A A gA A A gA A A A gA tA A A gA gA A gA aA 650 EncryptannealBP(0) GGTCCGAAGCTCGGTGCGCGGCATACCCTAGTGGTGAATCAAGCTTCGTGCTAGAGTCGGGGCGTCGGTCTCGGCTGGCGGGGGATTCCGTCGG AGATCCCGGTCGTGAGCCGTGTGGCTGTGGCGAGCGGCGTCCCCCCCGTCAGGCCCGTAGTTATGGGGCCTTTCGTCCTGCCTCAGGGATGGCCC GGTACGGGCCGCCCGGTTCCTCCTTTTCCTCCGTTCTTCTTTCGTTCCTCC 240 EncryptsenseBinary(0) 1100110000110000000001001101110111001111001100001110111000111100100111101111110011001111110100010010100111 1111110001110111100001111011100011110001011101001100000010110111001111000100011001111000111100111100110010 1110110111100001100111101110110011001111110010111111110011000111111100110000111011100000001111101110000001 0011101110110011001101000110111111111111000001100111101110001011101101000111111111111111000100001111111100 0001111000101010111111111110000111000111111011100000001111100001111011101111110001011110111111001100111111 0010111111111111111111000000001001111000110000110111101100111111111111110001111101111000010001110100011010 1110111011101111111100010001011111111110000111111111000111101111111100011001110111101100110011010001101011 1011111111111111001110000100101001111011101100111111111100111111111111110001000011001100000000001100001000 0011001100110011000000000011000000000100001100110000001100110000001100110011000001000011001100000000001100 001000 960 Encryptsensehex(0) CC3004DDCF30EE3C9EFCCFD129FF1DE1EE3C5D302DCF119E3CF32EDE19EECCFCBFCC7F30EE03EE04EECCD1BFFC19EE2ED1FFFC43F C1E2AFFE1C7EE03E1EEFC5EFCCFCBFFFF009E30DECFFFC7DE11D1AEEEFF117FE1FF1EFF19DECCD1AEFFFCE129EECFFCFFFC43300308 3333003004330330333043300308 240

Csense ATAAGTTATTATTTAGTTAATTAAGTTATTATTTAGTTAATATAAGTTATTATTTAGTTAATACTAAGTTATTATTTAGTTAATACTTAAGTTATTATTTA GTTAATACTTTAAGTTATTATTTAGTTAATACTTTTAAGTTATTATTTAGTTAATACTTTTTAAGTTATTATTTAGTTAATACTTTTATAAGTTATTATTT AGTTAATACTTTTAATAAGTTATTATTTAGTTAATACT 240 Cantisense TATTCAATAATAAATCAATTAATTCAATAATAAATCAATTATATTCAATAATAAATCAATTATGATTCAATAATAAATCAATTATGAATTCAATAATAA ATCAATTATGAAATTCAATAATAAATCAATTATGAAAATTCAATAATAAATCAATTATGAAAAATTCAATAATAAATCAATTATGAAAATATTCAATA ATAAATCAATTATGAAAATTATTCAATAATAAATCAATTATGA 240

EncryptsenseBP(1) gC cC T gA tG cG cT T cC cC T A A cT T tA A cT gG gC cG cC T gA tG cC cG gG cC cC gA cT cG cG gC tG cG cC gA gA cG 347

gG cG gC gG tG cT cT T A cG gG cG cG A T tA cT cC gC T cT gA aA A T gA tT A cT gA cT A T cC cC A T C cC cT T gA A T aT cT cT gC gC C cC cC gC cT cT gG cC cC cC gG C cT cT T T cT T aC cC cT cT T T C cC cT T cT cT gG cC cC cC T tT cG cC gC gC cT T aT cT cT cT cT T T tT cT cT T cT cC gC cT cT cT T tT cT cC gC gC cC T aT cC cC cC cC A gA T tT cG cC gG cC cC gC cT cT A gA tT cT A gA T cT gC aC cC cC cT cT gC cC gC gC tT cT cT T cC cC gG cC cC cC gA tA cT cT T T cT T aT cT cT cT cT T gA A gA gA tA A A gA A A gA A A A gA tA A A gA gA A gA aA A 658 EncryptannealBP(1) TGGCAACGGGGTTCGGTCATAGGCAGAAGGCCAATAAGCCAAATAACCGTAAAATGGCGTGCCCTGCTTCCCTGGGTGCGCGCTGTCCTTCGGTC CAGGGACCCGGCGGGCCGGCGCGCCAGGGGTAGTTCGTCCCCGGTCCGCGTCCCGTCGTTGGTGGGGTCGTAGAGGTCCTCTCTCGCTGGGCCTG TTTCCGGGAGGGCGCCGGCGTCCCCGCTCCGTTCTTCTTTCGTTCCTCCT 240 EncryptsenseBinary(1) 0001111011000000010111011111110011101110110000110011111111000100001111110010000111011110110000000101111011 0100101110111000001111110111010001010111011110000000001101001011010001001001011111111111000011110100101101 1101001111000100111111100001110011110000100000111100000001110011111100001111001111001110111000111100100111 1011111100000000111100101111111111000100011001111011100001111111110010111011101110001010011111111111001100 1111110010101110111111111100110010011110111111001111111100101110111011101100011111011110000100011111110010 1111111111111111111100110001111111111111001111111000011111111111111100011111111110000100011110110010111110 1110111011100011000011000111110111100010111011100001111111110011000001111111001100001100111100011010111011 1011111111000111100001000101111111111111001110111000101110111011100000010011111111110011001111110010111111 1111111111111100000000110000000001000011001100000011001100000011001100110000010000110011000000000011000010 000011 960 Encryptsensehex(1) 1EC05DFCEEC33FC43F21DEC05ED2EE0FDD15DE00D2D125FFC3D2DD3C4FE1CF083C073F0F3CEE3C9EFC03CBFF119EE1FF2EEE29FFC CFCAEFFCC9EFCFF2EEEC7DE11FCBFFFFCC7FFCFE1FFFC7FE11ECBEEEE30C7DE2EE1FF307F30CF1AEEFF1E117FFCEE2EEE04FFCCFCBFF FFC030043303303330433003083 240

Csense TAAGTTATTATTTAGTTAATTAAGTTATTATTTAGTTAATATAAGTTATTATTTAGTTAATACTAAGTTATTATTTAGTTAATACTTAAGTTATTATTTA GTTAATACTTTAAGTTATTATTTAGTTAATACTTTTAAGTTATTATTTAGTTAATACTTTTTAAGTTATTATTTAGTTAATACTTTTATAAGTTATTATTT AGTTAATACTTTTAATAAGTTATTATTTAGTTAATACTT 240 Cantisense ATTCAATAATAAATCAATTAATTCAATAATAAATCAATTATATTCAATAATAAATCAATTATGATTCAATAATAAATCAATTATGAATTCAATAATAAA TCAATTATGAAATTCAATAATAAATCAATTATGAAAATTCAATAATAAATCAATTATGAAAAATTCAATAATAAATCAATTATGAAAATATTCAATAA TAAATCAATTATGAAAATTATTCAATAATAAATCAATTATGAA 240

EncryptsenseBP(2) cC gC gG C cG cG T cT cG gG A cT cT T tT A cT T gA A A T T tA A A gA A cT gA cC cC cT T tT cT cT T T cT gC cC gG gG C cC A T A A T cT A A gC C cT cT gG gG cG gG aC cC gG gG tG cG cT T cG cG T cT A A gC C cT cT gC gC A gA G cC cC gC gG C cC cC gC cC cG gC cC cC cG gC C cC cT T gC cC gC aC cC cC A gA gC C A A gG cC cC gC cC cC A gA C cC cT T T cT T aT cT cT cT cT T T tT cT cT T cC cC gC cC cC cC T tT A A gC gC A gA aT cT cT cT cG cC gC gC C cC A gA cC cC T cT cT cT gA tA cC cC gG gC cC gC aA A cG cC cT T cT T gC C cC cC gA A cC gC cC cC cT T tT cT cT T T cT T aT cT cT A A gA gA A gA gA tA A A gA A A gA A A A gA tA A A gA gA A gA aA A A 642 EncryptannealBP(2) GTACAAGCAATCCGTTCGCTTGGGTTCTCCGGCGTCCGGCTGAACGTGTTGCTTTCCCAAAAGGAAAACGAAGCTTTCCCTTTCAGGTACGGTGATG GATCGCGTGTGGGTCTCTTAGGTGGTCCGCGGCGTCCCCGGTCCGGGTGGGGTTTTTTCTCCCAGTTCGTCGGGCCCCGGGATGTCTAGCGCGTCG GCTGTGGCGTCCGGCGTCCTTCCTCCGTTCTTCTTTCGTTCCTCCTT 240 EncryptsenseBinary(2) 1110000100101001110111011100111111010010001111111111110001110011111111000000001100111100110001000011001100 0000111111000011101110111111000111111111111100110011110001111000100010100111100011110000110011110011110011 0011000110011111111100100010110100101010111000100010010111011111110011011101110011110011001100011001111111 1100010001001100000110111011100001001010011110111000011110110100011110111011010001100111101111110000011110 0001101011101110001100000001100100110011001011101110000111101110001100001001111011111100110011111100101111 1111111111111111001100011111111111110011101110000111101110111011000111001100110001000100110000101111111111 1111110111100001000110011110001100001110111011001111111111110000010011101110001000011110000110000011110111 1011111100111111000001100111101110000000111110000111101110111111000111111111111100110011111100101111111111 0011001100000000001100000000010000110011000000110011000000110011001100000100001100110000000000110000100000 110011 960 Encryptsensehex(2) E129DDCFD23FFC73FC033CC43303F0EEFC7FFCCF1E229E3C33CF3319FF22D2AE225DFCDDCF3319FF11306EE129EE1ED1EED19EFC1 E1AEE3019332EE1EE309EFCCFCBFFFFCC7FFCEE1EEEC7331130BFFFDE119E30EECFFF04EE21E183DEFCFC19EE03E1EEFC7FFCCFCBFF3 348

300300433033033304330030833 240

Csense AAGTTATTATTTAGTTAATTAAGTTATTATTTAGTTAATATAAGTTATTATTTAGTTAATACTAAGTTATTATTTAGTTAATACTTAAGTTATTATTTAG TTAATACTTTAAGTTATTATTTAGTTAATACTTTTAAGTTATTATTTAGTTAATACTTTTTAAGTTATTATTTAGTTAATACTTTTATAAGTTATTATTTA GTTAATACTTTTAATAAGTTATTATTTAGTTAATACTTT 240 Cantisense TTCAATAATAAATCAATTAATTCAATAATAAATCAATTATATTCAATAATAAATCAATTATGATTCAATAATAAATCAATTATGAATTCAATAATAAAT CAATTATGAAATTCAATAATAAATCAATTATGAAAATTCAATAATAAATCAATTATGAAAAATTCAATAATAAATCAATTATGAAAATATTCAATAAT AAATCAATTATGAAAATTATTCAATAATAAATCAATTATGAAA 240

EncryptsenseBP(3) gC gC tA cT cT T cT cT T cT cG cG gC C cC cC gC gC cT cT gA gA tT cT cC gC cT cT gG cG cT A gA tT cG cC gA gA cG gG A gA gC C A A gG cC A T cG cC cC gC tG cC cG gC gG cC gA aT A T gA tT A cT gA cT A T cG cG cT T C cC cG gG T cT gA aT A A gC gC tA A A gA cT cT gC cC cC cC gA tA cG cC gA gA A gA aC cC cT cT gC gC tT cT cG gC cT cT gC cC cC cC gC C cT cT T T cT T aT cT cT cT cT T T tT cT cT T cT cT T cT cT A gA tT cT cT T gC cC gA aA cG cC cC cC cG gC gC C cC cC gC cC cC gC cG cC cC gC tA A cC gC T cT gC aC cG cC cT cT T cT gC gC tT cT cC gC cC cC gG cC cT cT gG C cT cT T T cT T aT cT cT cT cT T T cT gA gA tA A A gA A A gA A A A gA tA A A gA gA A gA aA A A A 664 EncryptannealBP(3) TTGCCGCCGCAATCGGTTCCCCTCGTCCAACTCTAGCCAATCTCTTAGTGAGGTAGATAGCTTGCTTCCCTGAACGCGAAGCCTTTTTGTTCCCTGGG CGAGCCTCGGCCTTTCATCCTGGGTCCCGGCGTCCCCGGTCCGCCGCCTCTCCGTGCCAGGGATTCGGTGGTAGGTGTGTGCTGAGCCGCTTTCGT GGAGCCACCCGGCGTCCCCGGCCCGTTCTTCTTTCGTTCCTCCTTT 240 EncryptsenseBinary(3) 0001000101001111111111001111111111001111110111010001100111101110000100011111111100000000011111111110000111 1111110010110111110011000001111101111000000000110100100011000000011001001100110010111000111100110111101110 0001010111101101000100101110000010110011110000000111001111110000111100111100110111011111110010011110110100 1011001111000010110011001100010001010000110011000011111111000111101110111000000100110111100000000000110000 1010111011111111000100010111111111010001111111110001111011101110000110011111111111001100111111001011111111 1111111111110011000111111111111100111111111100111111110011000001111111111111000001111000001000110111101110 1110110100010001100111101110000111101110000111011110111000010100001111100001110011110001101011011110111111 1111001111000100010111111111100001111011100010111011111111001010011111111111001100111111001011111111111111 1111110011001111000000000100001100110000001100110000001100110011000001000011001100000000001100001000001100 110011 960 Encryptsensehex(3) 114FFCFFCFDD19EE11FF007FE1FF2DF307DE00D23019332E3CDEE15ED12E0B3C073F0F3CDDFC9ED2CF0B33114330FF1EEE04DE003 0AEFF117FD1FF1EEE19FFCCFCBFFFFCC7FFCFFCFF307FFC1E08DEEED119EE1EE1DEE143E1CF1ADEFFCF117FE1EE2EFF29FFCCFCBFFFF CCF004330330333043300308333 240

Csense AGTTATTATTTAGTTAATTAAGTTATTATTTAGTTAATATAAGTTATTATTTAGTTAATACTAAGTTATTATTTAGTTAATACTTAAGTTATTATTTAGTT AATACTTTAAGTTATTATTTAGTTAATACTTTTAAGTTATTATTTAGTTAATACTTTTTAAGTTATTATTTAGTTAATACTTTTATAAGTTATTATTTAGT TAATACTTTTAATAAGTTATTATTTAGTTAATACTTTT 240 Cantisense TCAATAATAAATCAATTAATTCAATAATAAATCAATTATATTCAATAATAAATCAATTATGATTCAATAATAAATCAATTATGAATTCAATAATAAATC AATTATGAAATTCAATAATAAATCAATTATGAAAATTCAATAATAAATCAATTATGAAAAATTCAATAATAAATCAATTATGAAAATATTCAATAATA AATCAATTATGAAAATTATTCAATAATAAATCAATTATGAAAA 240

EncryptsenseBP(4) gC C cG cG gC cG cG gG A cT cG gC tA cT A T gA cT cT gA gC C cT cT T A cC gC cG cG cG gG tT cT A gA gA A gC cC T T tG cG cG gG cT cT T cT A A gA tA cC cG T T cT T aT cT T T tT cT cC gC cG cC T cT cG cC T tT cG cC gC gC cC gC aA A cC gC gC C cT cT gC cC cT T cG cC cT T tG cC cT T gG cC T aT cT cT cG gC gC C cT cT gG cC cC gC cC cC cT T C cC cT T T cT T aT cT cT cT cT gC gC tT cT cG gC cC cC gA A cC cC gG C cC cC gG gC cC gC aC cC cT cT cT cT gG gC tT cT cT T cC cC gG cC cC cC T tT cG cC T T cT T aC cC cC cC cT T cC gC T tT cG cC T cT cT T cT cT cT T tA A A gA gA A gA aA A A A A gA gA A gA gA tA A A gA A A gA A A A gA tA A A gA gA A gA aA A A A A 653 EncryptannealBP(4) TCAATAAATCATGCTGCCCCTCCCGTGTAAAATCTCCTTGGGAAAACCGCTTCGGAGGCGTCGGTCGTAGGCAGGTAGTTGTCTGTTCCCTGCGAGC GAGCGAGGTCCATTCCCAGGTGGCGCGCGGCGTCCCCTTTCATGGCTGGACGGATGTGGCCCCATTCCGGGAGGGGTAGGGCGGGGGCGGTGTA GGCCGCCCGGTTCCTCCTTTTCCTCCGTTCTTCTTTCGTTCCTCCTTTT 240 349

EncryptsenseBinary(4) 0001100111011101000111011101001000111111110100010100111100111100000011111111000000011001111111111100001111 1000011101110111010010011111110011000000000011000111101100110001011101110100101111111111001111001100110000 0100111011011100110011111100101111111100110001111111111000011101111011001111110111101100011111011110000100 0111100001100000111110000100011001111111110001111011111100110111101111110001011110111111000010111011001011 1111111111010001000110011111111100101110111000011110111011111100100111101111110011001111110010111111111111 1111110001000101111111110100011110111000000011111011100010100111101110001000011110000110101110111111111111 1111001000010111111111111100111011100010111011101110110001111101111011001100111111001010111011101110111111 0011100001110001111101111011001111111111001111111111111100010000110011000000000011000010000011001100110011 0000000000110000000001000011001100000011001100000011001100110000010000110011000000000011000010000011001100 110011 960 Encryptsensehex(4) 19DD1DD23FD14F3C0FF019FFC3E1DDD27F30031ECC5DD2FFCF3304EDCCFCBFCC7FE1DECFDEC7DE11E183E119FF1EFCDEFC5EFC2E CBFFD119FF2EE1EEFC9EFCCFCBFFFF117FD1EE03EE29EE21E1AEFFFF217FFCEE2EEEC7DECCFCAEEEFCE1C7DECFFCFFFC43300308333 30030043303303330433003083333 240

Csense GTTATTATTTAGTTAATTAAGTTATTATTTAGTTAATATAAGTTATTATTTAGTTAATACTAAGTTATTATTTAGTTAATACTTAAGTTATTATTTAGTTA ATACTTTAAGTTATTATTTAGTTAATACTTTTAAGTTATTATTTAGTTAATACTTTTTAAGTTATTATTTAGTTAATACTTTTATAAGTTATTATTTAGTT AATACTTTTAATAAGTTATTATTTAGTTAATACTTTTA 240 Cantisense CAATAATAAATCAATTAATTCAATAATAAATCAATTATATTCAATAATAAATCAATTATGATTCAATAATAAATCAATTATGAATTCAATAATAAATCA ATTATGAAATTCAATAATAAATCAATTATGAAAATTCAATAATAAATCAATTATGAAAAATTCAATAATAAATCAATTATGAAAATATTCAATAATAA ATCAATTATGAAAATTATTCAATAATAAATCAATTATGAAAAT 240

EncryptsenseBP(5) tT cT cT T A cT gC cC cG cC T tT cG cC gG gG A A gA gA tG cG cT T cT A gC cC cG cG T tT A A gG gG cG gG cT T gC tG cT cT gC cG A gA cT cT cG gG tG cG A T gA cT T aA A T gA tT A cT gA cT A T A cT A T tT A cC gC gC cG gA aA cC cC T T tG cG A T A A gA cT cC cC gG C cG cC gG gC cG gC aC cC A cT gG gC tG cC cG gC cG cC gG cC cT cT T tT cT cT gG gC cC gC aT cT A A cT T T tT cT cT T cT cT T cT cT cT T tG cC cC gC gA A gG aC cC cC cG cC cT T gG C cC cC gC cC cG gC cC cC cT T C cC cG gC T cT T aT cC cC cG cC gG cC gC gC tT cT cT T cT cT gC cC cC cC gA tA cC cC gG gC cT T aT cT cT cT cT T T cT T T tA A A gA A A gA A A A gA tA A A gA gA A gA aA A A A A gA 653 EncryptannealBP(5) TCCGTCTGAGGTAGAATTCCAACGCTTGAAGTTTAAAACGTACCTATCCCAAAATGCCGCTGCTTCCCTGTCTGTTGTTACCGGGGAATGTTCCGGA CAGATATGGTCATAGATAGAGCCGTCCATGTTCTTCGGTCCGCCGCCCGAGGTCTAGGGAGCGACGGTGATGGCGCGATGCGTGGAGAGTTTCCG CCTGGGCGGGATCGTCCCCGGCGGGTTCTTCTTTCGTTCCTCCTTTTC 240 EncryptsenseBinary(5) 0111111111111100001111110001111011011110110001111101111000100010001100110000000001011101111111001111001100 0111101101110111000111001100110010001011010010111111000001010111111111000111010011000011111111110100100101 1101001111000000111111001000001111000000011100111111000011110011110000111111001111000111001111100001000111 0100001000111011101100110001011101001111000011001100001111111011100010100111011110001000011101000110101110 0011111100100001010111101101000111011110001011101111111111000111111111110010000111100001101111110011001111 1111001100011111111111110011111111110011111111111111000101111011100001000000110010101011101110110111101111 1100001010011110111000011110110100011110111011111100100111101101000111001111110010111110111011011110001011 1000010001011111111111110011111111000111101110111000000100111011100010000111111100101111111111111111111100 1100111111001100010000110011000000110011000000110011001100000100001100110000000000110000100000110011001100 110000 960 Encryptsensehex(5) 7FFC3F1EDEC7DE2233005DFCF31EDDC73322D2FC15FF1D30FFD25D3C0FC83C073F0F3C3F3C73E11D08EECC5D3C330FEE29DE21D1 AE3F215ED1DE2EFFC7FF21E1BF33FCC7FFCFFCFFFC5EE1032AEEDEFC29EE1ED1EEFC9ED1CFCBEEDE2E117FFCFF1EEE04EE21FCBFFFF CCFCC433033033304330030833330 240

Csense TTATTATTTAGTTAATTAAGTTATTATTTAGTTAATATAAGTTATTATTTAGTTAATACTAAGTTATTATTTAGTTAATACTTAAGTTATTATTTAGTTAA TACTTTAAGTTATTATTTAGTTAATACTTTTAAGTTATTATTTAGTTAATACTTTTTAAGTTATTATTTAGTTAATACTTTTATAAGTTATTATTTAGTTA ATACTTTTAATAAGTTATTATTTAGTTAATACTTTTAA 240 Cantisense AATAATAAATCAATTAATTCAATAATAAATCAATTATATTCAATAATAAATCAATTATGATTCAATAATAAATCAATTATGAATTCAATAATAAATCAA TTATGAAATTCAATAATAAATCAATTATGAAAATTCAATAATAAATCAATTATGAAAAATTCAATAATAAATCAATTATGAAAATATTCAATAATAAA 350

TCAATTATGAAAATTATTCAATAATAAATCAATTATGAAAATT 240

EncryptsenseBP(6) cG cG gC cC cT T A A cG gG tA cT cT T gA cT A T gA tT cC cC gG cG cG gG cG cG cG gC tA A A T gG cC gA cT gA T tT cT cC gG cC cC gA A cT A gC C cC cC T T cT T G cC T T tT cT cT T cT cT T cT cT cT T tT A A gG gC cC gC aC cC cC gC T tT A A T cT cT T cG cC cT T tT cT cC gC T cT gA aA cT cT A gA gC C cT cT gG cC cC gC cC cC cT T tT cT cC gC T cT T aT cG cC cT cT T T tT cT cT T cT cT T cT cT cT T tT A A T T cG gC aT cT cC cC cG cC gC gC tG cC cT T cT cT T cT cG cC T tT cC cC gA gA cC gC G cC cC cC cT T cC gC T tT cC cC gG cC cC gC cG cC cT T C cC cC gC gC cC T aT cT cT cT cT T T cT T T tT A A gA A A gA A A A gA tA A A gA gA A gA aA A A A A gA gA 648 EncryptannealBP(6) AATGCGTTAAGCCGCCTGCTGGAAAAAAATGTTGAGCCCGTCGAGGCTCTTCGGGGCGAGGGTCCGCCGCCCGTTTATGTGGGTGTTTGCCGAGC GTCGTGCCCCCTCTCCCAGGTGGCGTCGTGCGTAGCCGGTCCGCCGCCCGTTTGGATTCGGAGTTAGCGCCGCAGGTGGCCGTAGGGCGGTGTGG AGGTAGCGCGGTTGGTCCCCGGCGGTTTCTTCTTTCGTTCCTCCTTTTCC 240 EncryptsenseBinary(6) 1101110100011110111111000011001111010010010011111111110000001111001111000000011111101110001011011101001011 0111011101000101000011001111000010111000001111000011000111111111100010111011100000001111110011000110011110 1110110011001111110001101110110011000111111111111100111111111100111111111111110001110011001100100001111000 0110101110111000011100011100110011110011111111110011011110111111000111111111100001110011110000100011111111 0011000000011001111111110010111011100001111011101111110001111111111000011100111111001011110111101111111111 0011000111111111111100111111111100111111111111110001110011001111001100110100011011111111101110110111100001 0001010111101111110011111111110011111101111011000111111011100000000011100001011011101110111011111100111000 0111000111111011100010111011100001110111101111110010011110111000010001111011001011111111111111111111001100 1111110011000111001100110000001100110000001100110011000001000011001100000000001100001000001100110011001100 000000 960 Encryptsensehex(6) DD1EFC33D24FFC0F3C07EE2DD2DDD1433C2E0F0C7FE2EE03F319EECCFC6ECC7FFCFFCFFFC73321E1AEE1C733CFFCDEFC7FE1CF08FF 3019FF2EE1EEFC7FE1CFCBDEFFCC7FFCFFCFFFC733CCD1BFEEDE115EFCFFCFDEC7EE00E16EEEFCE1C7EE2EE1DEFC9EE11ECBFFFFCCF CC7330330333043300308333300 240

Csense TATTATTTAGTTAATTAAGTTATTATTTAGTTAATATAAGTTATTATTTAGTTAATACTAAGTTATTATTTAGTTAATACTTAAGTTATTATTTAGTTAAT ACTTTAAGTTATTATTTAGTTAATACTTTTAAGTTATTATTTAGTTAATACTTTTTAAGTTATTATTTAGTTAATACTTTTATAAGTTATTATTTAGTTAA TACTTTTAATAAGTTATTATTTAGTTAATACTTTTAAC 240 Cantisense ATAATAAATCAATTAATTCAATAATAAATCAATTATATTCAATAATAAATCAATTATGATTCAATAATAAATCAATTATGAATTCAATAATAAATCAAT TATGAAATTCAATAATAAATCAATTATGAAAATTCAATAATAAATCAATTATGAAAAATTCAATAATAAATCAATTATGAAAATATTCAATAATAAAT CAATTATGAAAATTATTCAATAATAAATCAATTATGAAAATTG 240

EncryptsenseBP(7) cT T cG cG gA A cC cC gG C cC cC gG gC cC cC gG gC tA cT cG gC cG cC gA cT A cT gG C cG cC gG gC cC gC A T gC C A A gA A cG gC cG cG cC gC tG cC cG gC T A gC aC A T gA tT A cT gA cT A T A cT A T tG cC cC gC gA A gC G cT cT gC gG C cC cT T cG cG gG cG cT cT gA tA cT A gC gC cC gC aC cC cG cG gA gA C cG A gA cC cC gA A cC cC gC C cC cC gA gA cC gC aT cT cG cC cT T T tT cT cT T cT cT T cT cT cT T C cC cG gC T cT gG aC cT cT cC cC cT T T tT cT cT T cT cT T cC cC cT T tA A cC gC T cT gC aC cC cC cC cC gG cC T T tA A cG gC cC cC gC cC A A gG C cT cT T T cT T aT cT cT cT cT T T cT T T tT cT A gA A A gA A A A gA tA A A gA gA A gA aA A A A A gA gA aA 650 EncryptannealBP(7) CGAACTGGACGGATGGATGCATAGCCTCACAGATGTTGTCTTCTATAAGTAGATGTTGTGCTTCCCTGTCTGAGGTCTTACCTACGCGAAAACCCGC TTTGTGGAACCCATCGGCTGGTCGGCCGTTCAGCGGTCCGCCGCCCGCGATGCAGCCGGCGGTCCGCCGGGCGGTGTGCTGGGGGAGGGGTATG GTGTTACCCGGCGTCCCCGGCGGTCTCTTCTTTCGTTCCTCCTTTTCCC 240 EncryptsenseBinary(7) 1111110011011101000000111110111000101001111011100010000111101110001000010100111111010001110111100000111100 1111110010100111011110001000011110000100111100000110010011001100000011110100011101110111100001010111101101 0001110000110001101000111100000001110011111100001111001111000011111100111100010111101110000100000011000101 1011111111000100101001111011111100110111010010110111111111000001001111001100010001111000011010111011011101 0000000010011101001100001110111000000011111011100001100111101110000000001110000110111111110111101111110011 0001111111111111001111111111001111111111111100100111101101000111001111001010101111111111101110111111001100 0111111111111100111111111100111011101111110001000011111000011100111100011010111011101110111000101110110011 0001000011110100011110111000011110001100110010100111111111110011001111110010111111111111111111110011001111 1100110001111111001100000011001100000011001100110000010000110011000000000011000010000011001100110011000000 001000 351

960 Encryptsensehex(7) FCDD03EE29EE21EE214FD1DE0F3F29DE21E13C193303D1DDE15ED1C31A3C073F0F3C3F3C5EE10316FF129EFCDD2DFF04F311E1AE DD009D30EE03EE19EE00E1BFDEFCC7FFCFFCFFFC9ED1CF2AFFEEFCC7FFCFFCEEFC43E1CF1AEEEE2ECC43D1EE1E3329FFCCFCBFFFFCC FCC7F303303330433003083333008 240

Csense ATTATTTAGTTAATTAAGTTATTATTTAGTTAATATAAGTTATTATTTAGTTAATACTAAGTTATTATTTAGTTAATACTTAAGTTATTATTTAGTTAATA CTTTAAGTTATTATTTAGTTAATACTTTTAAGTTATTATTTAGTTAATACTTTTTAAGTTATTATTTAGTTAATACTTTTATAAGTTATTATTTAGTTAAT ACTTTTAATAAGTTATTATTTAGTTAATACTTTTAACT 240 Cantisense TAATAAATCAATTAATTCAATAATAAATCAATTATATTCAATAATAAATCAATTATGATTCAATAATAAATCAATTATGAATTCAATAATAAATCAATT ATGAAATTCAATAATAAATCAATTATGAAAATTCAATAATAAATCAATTATGAAAAATTCAATAATAAATCAATTATGAAAATATTCAATAATAAATC AATTATGAAAATTATTCAATAATAAATCAATTATGAAAATTGA 240

EncryptsenseBP(8) gG cG cG gG cC cC cC gC tG cC cT gA gC cC cG gC gG C cG cC gC cC A T cC cC cG gC tG cC cT gA gG cC gC cC T T tG cC A T cG cG gG cG A cT gA tA cC cG gG gG cT gA G cG gA gA tG cG A gA A A gG cG A A gA tA A A gG gG cT T aT A A gA gA tT cC cG gC cC A gA cG cC cG gC tA cT cG gC gG cC gA aT cT A cC gC gA tT cT cT gA A cT T A A cT T C cC cT T gA A gG aC cC cC cC cC gC gC tT cT cT T cT cT T cT cT cT T tT cC cC T T cC gC G cC cC cC A A gC gC tG cC cC gC cC cC gG cC cT cT gG C cT cT gG gC cT T aT cT cG cC cC gC cC gC T tT A A T cT cT T cC cC cT T C cC cG gC T cT gA aA cT cT cT cT T T cT T T tT cT cT gA A A gA A A A gA tA A A gA gA A gA aA A A A A gA gA aA A 656 EncryptannealBP(8) AAAAGGGTAGCCTGATACAGTGTGGGATAGCCAGTGGGAGTGAAAATCCGGAAACCAACCAATCTTAATTCGTTAACGTTTCCTGATGTCAGATGC ATAGCTCTGTCTCCCTCGTTCGCGCGCTAGGGGGTTTCCGCCGCCCGTGGGGGTAGGGTTTTAGGTGGAGCCACCCATCGTCAGGTGTGTTTGCCG GGCGCGATGCCCCCCCGGCGGTCCCTTCTTTCGTTCCTCCTTTTCCCT 240 EncryptsenseBinary(8) 0010110111010010111011101110000101011110111100000001111011010001001010011101111000011110001111001110111011 0100010101111011110000001011100001111011001100010111100011110011011101001011010011111100000100111011010010 0010111100000110110100000000010111010011000000110011001011010011001100000100001100110010001011111100101100 1100110000000001111110110100011110001100001101111011010001010011111101000100101110000010111111001111100001 0000011111111111000000111111110000110011111111001001111011111100000000110010101011101110111011100001000101 1111111111110011111111110011111111111111000111111011101100110011100001011011101110111000110011000100010101 1110111000011110111000101110111111110010100111111111001000011111110010111111110111101110000111100001110001 1100110011110011111111110011101110111111001001111011010001110011110000100011111111111111111100110011111100 1100011111111111000000110011000000110011001100000100001100110000000000110000100000110011001100110000000010 000011 960 Encryptsensehex(8) 2DD2EEE15EF01ED129DE1E3CEED15EF02E1ECC5E3CDD2D3F04ED22F06D005D30332D33043322FCB33007ED1E30DED14FD12E0BF3 E107FF03FC33FC9EFC032AEEEE117FFCFFCFFFC7EECCE16EEE33115EE1EE2EFF29FF21FCBFDEE1E1C733CFFCEEFC9ED1CF08FFFFCCFC C7FF033033304330030833330083 240

Csense TTATTTAGTTAATTAAGTTATTATTTAGTTAATATAAGTTATTATTTAGTTAATACTAAGTTATTATTTAGTTAATACTTAAGTTATTATTTAGTTAATAC TTTAAGTTATTATTTAGTTAATACTTTTAAGTTATTATTTAGTTAATACTTTTTAAGTTATTATTTAGTTAATACTTTTATAAGTTATTATTTAGTTAATA CTTTTAATAAGTTATTATTTAGTTAATACTTTTAACTA 240 Cantisense AATAAATCAATTAATTCAATAATAAATCAATTATATTCAATAATAAATCAATTATGATTCAATAATAAATCAATTATGAATTCAATAATAAATCAATTA TGAAATTCAATAATAAATCAATTATGAAAATTCAATAATAAATCAATTATGAAAAATTCAATAATAAATCAATTATGAAAATATTCAATAATAAATCA ATTATGAAAATTATTCAATAATAAATCAATTATGAAAATTGAT 240

EncryptsenseBP(9) cG cG T cT cT cT T tA cT cT gC gC cC cC T T tT cT cT T cC cC gC cC cC cC gC tG cC cC gA gA cT T cG gG gA tA cG cG gA A cC gG cC cC cC gG C cC A T gA cT gA aT A T gA tT A cT gA cT A T A cT A T tA cT cG gC T A gC aC cG cC gC gC tA A A T A A gG cG cT cT gG C A A gC gC cG gC aA cT cG cC gG gC tT A A T A cT gG cC cG cC gG C cG cC gC gC cG gC aT cT A A cC gC gC C cT cT T cT cT T cT cT cT T tT cT cC gC gG cC gC aC cT cT cG cC cT T gC C cC cC gC cC cG gC cC cC cG gC tT cT cT T T cT gC aC cC cC cT cT gC cC gG gC C cC cC gC cC cC gG cC cC cC gC C cT cT gA gA cG gC aT cT cT cT cT T T cT T T tT cT cT T A A gA A A A gA tA A A gA gA A gA aA A A A A gA gA aA A gA 657 352

EncryptannealBP(9) AAGCCCGGCCTTGGGGTCCGGGTGGGTAGGCCCGAACGAACTGAGGGACGTGCCCTTGCTTCCCTGTCTGGCATGTTGAGTTGTTGTTAACCACTT TTATCCAGATTTTGTCAGAGACAGTTATTCTTGTTCCCGCCGCCCGTCGTAGTGCCAGCGTCGGTGATGGATTCCGGCTGGGCCTGATCGGTGGAG GGTCCCCCATTCCCCGGCGGTCCGTTCTTTCGTTCCTCCTTTTCCCTC 240 EncryptsenseBinary(9) 1101110111001111111111111100010011111111000100011110111011001100011111111111110011101110000111101110111000 0101011110111000000000111111001101001000000100110111010000001111100010111011101110001010011110001111000000 1111000010110011110000000111001111110000111100111100001111110011110001001111110100011100001100011010110111 1000010001010000110011110000110011001011011111111100101001001100110001000111010001100011111101111000100001 0111001100111100001111110010111011011110001010011101111000010001110100011011111100110011111000010001100111 1111111100111111111100111111111111110001111111111000010010111000011010111111111101111011111100000110011110 1110000111101101000111101110110100010111111111111100110011110001101011101110111111110001111000100001100111 1011100001111011100010111011101110000110011111111100000000110100011011111111111111111111001100111111001100 0111111111111100001100110000001100110011000001000011001100000000001100001000001100110011001100000000100000 110000 960 Encryptsensehex(9) DDCFFFC4FF11EECC7FFCEE1EEE15EE00FCD204DD03E2EEE29E3C0F0B3C073F0F3C3F3C4FD1C31ADE11433C332DFF293311D18FDE2 1733C3F2EDE29DE11D1BF33E119FFCFFCFFFC7FE12E1AFFDEFC19EE1ED1EED17FFCCF1AEEFF1E219EE1EE2EEE19FF00D1BFFFFCCFCC 7FFC330333043300308333300830 240

Csense TATTTAGTTAATTAAGTTATTATTTAGTTAATATAAGTTATTATTTAGTTAATACTAAGTTATTATTTAGTTAATACTTAAGTTATTATTTAGTTAATACT TTAAGTTATTATTTAGTTAATACTTTTAAGTTATTATTTAGTTAATACTTTTTAAGTTATTATTTAGTTAATACTTTTATAAGTTATTATTTAGTTAATAC TTTTAATAAGTTATTATTTAGTTAATACTTTTAACTAA 240 Cantisense ATAAATCAATTAATTCAATAATAAATCAATTATATTCAATAATAAATCAATTATGATTCAATAATAAATCAATTATGAATTCAATAATAAATCAATTAT GAAATTCAATAATAAATCAATTATGAAAATTCAATAATAAATCAATTATGAAAAATTCAATAATAAATCAATTATGAAAATATTCAATAATAAATCAA TTATGAAAATTATTCAATAATAAATCAATTATGAAAATTGATT 240

EncryptsenseBP(10) cT T A A cT T C cC A gA gC cC cG gC gA tT cT cT gC cC A gA cT A cC gG tA A cG gG gG cC T cT gA gA C cG A gA cG cC gG cC cG cC gA tT cT cT T T cT T aT cT T T tT cT cT T cT cT T cT cT cT T tT cT cT T T cT T aT cT cT T T tT cT cT T cT cT T cT cT cT T tT cT cT T T cT T aT cT cT cT T T tT cT cT T cT cT T cT cT cT T tT cT cT T T cT T aT cT cT cT cT T T tT cT cT T cT cT T cT cT cT T tT cT cT T T cT T aT cT cT cT cT cT T T tT cT cT T cT cT T cT cT cT T tT cT cT T T cT T aT cT cT cT cT T cT T T tT cT cT T cT cT T cT cT cT T tT cT cT T T cT T aT cT cT cT cT T T cT T T tT cT cT T cT cT T cT cT cT T tT cT cT T T cT gA aA A A A A gA gA aA A gA gA 642 EncryptannealBP(10) CGTTCGCGTCTGATCTCCTGTCCTGAGTAAAGGCCCCATCAGAGAGCTCCGGCGTCGGTCCGCCGCCCGTCCGGCGTCCGGTCCGCCGCCCGTCCG GCGTCCCGGTCCGCCGCCCGTCCGGCGTCCCCGGTCCGCCGCCCGTCCGGCGTCCCCCGGTCCGCCGCCCGTCCGGCGTCCCCGCGGTCCGCCGCC CGTCCGGCGTCCCCGGCGGTCCGCCGCCCGTCCGGCCCTTTTCCCTCC 240 EncryptsenseBinary(10) 1111110000110011111111001001111000110000000111101101000100000111111111110001111000110000111100111110001001 0000111101001000101110110011110000000010011101001100001101111000101110110111100000011111111111110011001111 1100101111111100110001111111111111001111111111001111111111111100011111111111110011001111110010111111111111 0011000111111111111100111111111100111111111111110001111111111111001100111111001011111111111111110011000111 1111111111001111111111001111111111111100011111111111110011001111110010111111111111111111110011000111111111 1111001111111111001111111111111100011111111111110011001111110010111111111111111111111111001100011111111111 1100111111111100111111111111110001111111111111001100111111001011111111111111111111001111110011000111111111 1111001111111111001111111111111100011111111111110011001111110010111111111111111111110011001111110011000111 1111111111001111111111001111111111111100011111111111110011001111000010000011001100110011000000001000001100 000000 960 Encryptsensehex(10) FC33FC9E301ED107FF1E30F3E243D22ECF009D30DE2EDE07FFCCFCBFCC7FFCFFCFFFC7FFCCFCBFFCC7FFCFFCFFFC7FFCCFCBFFFCC7F FCFFCFFFC7FFCCFCBFFFFCC7FFCFFCFFFC7FFCCFCBFFFFFCC7FFCFFCFFFC7FFCCFCBFFFFCFCC7FFCFFCFFFC7FFCCFCBFFFFCCFCC7FFCFF CFFFC7FFCCF083333008300 240

Csense ATTTAGTTAATTAAGTTATTATTTAGTTAATATAAGTTATTATTTAGTTAATACTAAGTTATTATTTAGTTAATACTTAAGTTATTATTTAGTTAATACTT 353

TAAGTTATTATTTAGTTAATACTTTTAAGTTATTATTTAGTTAATACTTTTTAAGTTATTATTTAGTTAATACTTTTATAAGTTATTATTTAGTTAATACT TTTAATAAGTTATTATTTAGTTAATACTTTTAACTAAG 240 Cantisense TAAATCAATTAATTCAATAATAAATCAATTATATTCAATAATAAATCAATTATGATTCAATAATAAATCAATTATGAATTCAATAATAAATCAATTATG AAATTCAATAATAAATCAATTATGAAAATTCAATAATAAATCAATTATGAAAAATTCAATAATAAATCAATTATGAAAATATTCAATAATAAATCAAT TATGAAAATTATTCAATAATAAATCAATTATGAAAATTGATTC 240

EncryptsenseBP(11) T cT cC cC gC C A A gC gC cC cG gA T tA cT cG gC A cT gA cT cC cC gC C A cT gG gC cC gC A gA gG tG cT cT gC cC cT gA cT cT cT T tT cT A T gG cG gA aA cG gG T tT cC cC T cT A gA A A cG gG tT cT cG gG gA cT gA aT A cT gA T tA cT cG gC A cT gC cC A cT gC C A cT gG gC cT T aT cT cT cT gC gC tT cT cT T cT cT gG cC cT cT gG C cT cT T T cC gC aA A cC cC cT T T tT cT cT gC cC cC gC cG cC cT T tT cT cT T T cT T aT cT cT cT cT cT T T tT cT cT T cT cT T cC cC cT T tT cT cG gC T cT gC aC cC cC cC cC gA A gG gC C cC cG gC cT cT gC cC cG cC T tT cT cT gC gC cC gC aC cC cG cC cT T gG cC gC gC tA A cC gC cG cC T cT cG cC T tT cT cT T T cT T aT cT cT cT cT T T aT cT T T tT 665 EncryptannealBP(11) GCGGTCTTTTGACGGCATTCCCGGTCTCATGTTCAACCTGCCCCCGTCTGAACCAAGTGGGCTCTTAATCAACCCTTCCGGCATTCTGTCTCTCATCG TCCCTTTCCGCCAGCCACCCGGGTCTGGCGGTCCTGGTAGCGTCCGGCGTCCCCCGGTCCGCCGGGCGTCATGCTGGGGGCTATCGATCCTGAGGT CCTTGTGGAGCGAGTTGTGTAGGCAGGTCCGGCGTCCCCGGTCGGT 240 EncryptsenseBinary(11) 1100111111101110000110010011001100010001111011010000110001001111110100010011111100001111111011100001100100 1111110010000111100001001100000010010111111111000111101111000011111111111111000111111100111100001011010000 1000110100101100011111101110110011110011000000110011110100100111111111010010000011110000101100111111000011 0001001111110100010011111100011110001111110001100100111111001000011111110010111111111111110001000101111111 1111110011111111001011101111111100101001111111111100110011100001100000111110111011111100110001111111111100 0111101110000111011110111111000111111111111100110011111100101111111111111111111111110011000111111111111100 1111111111001110111011111100011111111101000111001111000110101110111011101110000000110010000110011110110100 0111111111000111101101111011000111111111110001000111100001101011101101111011111100001011100001000101000011 1110000111011110110011111101111011000111111111111100110011111100101111111111111111111100110010111111110011 000111 960 Encryptsensehex(11) CFEE193311ED0C4FD13F0FEE193F21E13025FF1EF0FFFC7F3C2D08D2C7EECF3033D27FD20F0B3F0C4FD13F1E3F193F21FCBFFF117FF CFF2EFF29FFCCE183EEFCC7FF1EE1DEFC7FFCCFCBFFFFFCC7FFCFFCEEFC7FD1CF1AEEEE03219ED1FF1EDEC7FF11E1AEDEFC2E1143E1 DECFDEC7FFCCFCBFFFFCCBFCC7 240

Csense TTTAGTTAATTAAGTTATTATTTAGTTAATATAAGTTATTATTTAGTTAATACTAAGTTATTATTTAGTTAATACTTAAGTTATTATTTAGTTAATACTTT AAGTTATTATTTAGTTAATACTTTTAAGTTATTATTTAGTTAATACTTTTTAAGTTATTATTTAGTTAATACTTTTATAAGTTATTATTTAGTTAATACTT TTAATAAGTTATTATTTAGTTAATACTTTTAACTAAGT 240 Cantisense AAATCAATTAATTCAATAATAAATCAATTATATTCAATAATAAATCAATTATGATTCAATAATAAATCAATTATGAATTCAATAATAAATCAATTATGA AATTCAATAATAAATCAATTATGAAAATTCAATAATAAATCAATTATGAAAAATTCAATAATAAATCAATTATGAAAATATTCAATAATAAATCAATT ATGAAAATTATTCAATAATAAATCAATTATGAAAATTGATTCA 240

EncryptsenseBP(12) cC cC cT T tA A A T gG cC A gA T tA cG cG T cT cT T A cT A T tG cG cC gC gC cC gC cC gA gA C cG cT T cC cC T A A A gC C A A gA gA cT T aA A gA gA tA cT cT gA A cT gA cT A cT T tT cT cT T T cT T aT cT cT T T tT cT cT gA A cT T cC cC cT T C cC cT T gG cC T aT A A cG gC T tT cG cC T cT cT T cG cC cG gC tT cT cC gC gC cC gC aC cC cC cC cC gC gC C cC A gA cT cT gC cC cG cC T tT cT cT T T cT T aT cT cT cT cT cT T T tT cT cT T cT cT T cT A A gC C cT cT gG gC cT T aC cC cC cC cT T cG gC gG C cC cC T cT cG gC cC cC cG gC tG cC cC gC T cT gG aC cC cC cC cC gG gC cT T T tT A A gC cC cT T cT cT A gA tT cT cT T T cT T aT cT cT cT cT T T aT cT T T tT cT 648 EncryptannealBP(12) GGCGGTTGAGTCGGAAGCCGTCTGAAGTTGTGCCCACGGGGTTTTCTTCCCGCTCCGCCCTCCCTCGTCCGGCGTCCGGTCCCTCGGGCGCGCGAG GTTTATGTAGGCCGAGATTCGTTGTGGGGGTTCGTCCCTGAGGTCCGGCGTCCCCCGGTCCGCCGCTTTCCCATCGGGGGCGATACGGGCATGGAT AGGTGCAGGGGGATCGGTTTTGCGCCTCTCCGGCGTCCCCGGTCGGTC 240 EncryptsenseBinary(12) 1110111011111100010000110011110000101110001100001100010011011101110011111111110000111111001111000101110111 1000010001111000011110000000001001110111111100111011101100001100110011000110010011001100000000111111001000 0011000000000100111111110000001111110000111100111111110001111111111111001100111111001011111111111100110001 1111111111000000111111110011101110111111001001111011111100001011101100101100110011110100011100011111011110 1100111111111100110111101101000101111111111000010001111000011010111011101110111000010001100111100011000011 354

1111110001111011011110110001111111111111001100111111001011111111111111111111111100110001111111111111001111 1111110011110011001100011001111111110010000111111100101011101110111011111100110100010010100111101110110011 1111010001111011101101000101011110111000011100111100101010111011101110111000100001111111001100011100110011 0001111011111100111111110011000001111111111111001100111111001011111111111111111111001100101111111100110001 111111 960 Encryptsensehex(12) EEFC433C2E30C4DDCFFC3F3C5DE11E1E009DFCEEC333193300FC83004FF03F0F3FC7FFCCFCBFFCC7FF03FCEEFC9EFC2ECB33D1C7DE CFFCDED17FE11E1AEEEE119E30FF1EDEC7FFCCFCBFFFFFCC7FFCFFCF3319FF21FCAEEEFCD129EECFD1EED15EE1CF2AEEEE21FCC7331 EFCFF307FFCCFCBFFFFCCBFCC7F 240

Section 10 Create checksum checksum = 35975141

'------Sender copy of hash------Sender Hash = GGTCCGAAGCTCGGTGCGCGGCATACCCTAGTGGTGAATCAAGCTTCGTGCTAGAGTCGGGGCGTGGCAACGGGGTTCGGTCATAGGCAGAAGG CCAATAAGCCAAATAACCGTAAAATGGCGTGCCCGTACAAGCAATCCGTTCGCTTGGGTTCTCCGGCGTCCGGCTGAACGTGTTGCTTTCCCAAAA GGTTGCCGCCGCAATCGGTTCCCCTCGTCCAACTCTAGCCAATCTCTTAGTGAGGTAGATAGCTTGTCAATAAATCATGCTGCCCCTCCCGTGTAAA ATCTCCTTGGGAAAACCGCTTCGGAGGCGTCGGTCCGTCTGAGGTAGAATTCCAACGCTTGAAGTTTAAAACGTACCTATCCCAAAATGCCGCTGCT AATGCGTTAAGCCGCCTGCTGGAAAAAAATGTTGAGCCCGTCGAGGCTCTTCGGGGCGAGGGTCCGAACTGGACGGATGGATGCATAGCCTCACA GATGTTGTCTTCTATAAGTAGATGTTGTGCTTCAAAAGGGTAGCCTGATACAGTGTGGGATAGCCAGTGGGAGTGAAAATCCGGAAACCAACCAAT C

Plain Text = the 123 of my fields are very large please require all personnel to take their equipment with them for the work to be performed in 365777 small increments it will be good to get practice on these tasks / Message Builder 4 - for DNA Hash Code System ended at: 12/11/2012 3:03:14 PM

355

APPENDIX C - SAMPLE MUTANT ENCRYPTIONS.

Table C-1. Mutant Encryptions

64 base pair hash cryptographic key (only 1st 64 bases are significant) code 1 AAAAAAGGATCGC TTATTATTTAGTAAGTTATTATTTAGTTAAGTTATTATTTAG CCGTCAGCCCTCTG TTTAAGTTATTATTTAGTTATAAGTTATTATTTAGTTAATAA GCGTTTCAACCCCT GTTATTATTTAGTTAATTAAGTTATTATTTAGTTAATATAA CAAGTACAGGTAT GTTATTATTTAGTTAATACTAAGTTATTATTTAGTTAATACT AGATGCTGTG TAAGTTATTATTTAGTTAATACTTTAAGTTATTATTTA 2 AAAAAAGTAGGTG AGTTATTATTTAGTTAAGTTATTATTTAGTTTAAGTTATTAT TGTCGAGCCCCTC TTAGTTATAAGTTATTATTTAGTTAATAAGTTATTATTTAGT GTGCCTCCAATGCC TAATTAAGTTATTATTTAGTTAATATAAGTTATTATTTAGTT TGAATCAGAGGCA AATACTAAGTTATTATTTAGTTAATACTTAAGTTATTATTTA GATAGCCTCCT GTTAATACTTTAAGTTATTATTTAGTTAATACTTTTA 3 AAAAAAGTAGGTC ATTATTTAGTTAAGTTATTATTTAGTTTAAGTTATTATTTAG GGGCTATTCCCGC TTATAAGTTATTATTTAGTTAATAAGTTATTATTTAGTTAAT GGCGTTTCAACCCC TAAGTTATTATTTAGTTAATATAAGTTATTATTTAGTTAAT TCAATGATAGCCA ACTAAGTTATTATTTAGTTAATACTTAAGTTATTATTTAGTT GAGAGCTTCCG AATACTTTAAGTTATTATTTAGTTAATACTTTTAAGTT 4 AAAAAATGATGGT TAAGTTATTATTTAGTAAGTTATTATTTAGTTAAGTTATTAT CCGCCAGTGCTCC TTAGTTTAAGTTATTATTTAGTTATAAGTTATTATTTAGTTA GGCTCTCCAATGCC ATAAGTTATTATTTAGTTAATTAAGTTATTATTTAGTTAAT TGAATCAGATGGA ATAAGTTATTATTTAGTTAATACTAAGTTATTATTTAGTTA GAGATTCTGGC ATACTTAAGTTATTATTTAGTTAATACTTTAAGTTATTA 5 AAAAAATCAGTGT TTTATAAGTTATTATTTAGTAAGTTATTATTTAGTTAAGTTA CGGTCAGCGTTTG TTATTTAGTTTAAGTTATTATTTAGTTATAAGTTATTATTTA GGCCCTCGAACGT GTTAATAAGTTATTATTTAGTTAATTAAGTTATTATTTAGTT CGTAATGAGACCT AATATAAGTTATTATTTAGTTAATACTAAGTTATTATTTAG ATACAGCCTGTC TTAATACTTAAGTTATTATTTAGTTAATACTTTAAGTT 6 AAAAAATGAGTCT GTTATTATTTAGTTTAAGTTATTATTTAGTTATAAGTTATTA TGTCGAGCCCCTC TTTAGTTAATAAGTTATTATTTAGTTAATTAAGTTATTATTT GTGCCCTCAATCGC AGTTAATATAAGTTATTATTTAGTTAATACTAAGTTATTAT GTAAGTAGATGTA TTAGTTAATACTTAAGTTATTATTTAGTTAATACTTTAAGTT TAGATTCCCTC ATTATTTAGTTAATACTTTTAAGTTATTATTTAGTTAA

356

Table C-1 (continued)

64 base pair hash cryptographic key (only 1st 64 bases are significant) code 7 AAAAAACGATCGT TATTTAGTAAGTTATTATTTAGTTAAGTTATTATTTAGTTTA CGGCTAGCTCTCG AGTTATTATTTAGTTATAAGTTATTATTTAGTTAATAAGTT CGTGCTCTAACTCC ATTATTTAGTTAATTAAGTTATTATTTAGTTAATATAAGTT GTAAGGAGATCTA ATTATTTAGTTAATACTAAGTTATTATTTAGTTAATACTTAA TAGATTGCTTC GTTATTATTTAGTTAATACTTTAAGTTATTATTTAGTTA 8 AAAAAAGGATCGT ATTTAGTTAAGTTATTATTTAGTTTAAGTTATTATTTAGTTA CGGCTAGCTCTTGT TAAGTTATTATTTAGTTAATAAGTTATTATTTAGTTAATTA CTCGTTCAATGTCG AGTTATTATTTAGTTAATATAAGTTATTATTTAGTTAATACT TAATCAGAGCCAG AAGTTATTATTTAGTTAATACTTAAGTTATTATTTAGTTAAT ATAGTGCCCG ACTTTAAGTTATTATTTAGTTAATACTTTTAAGTTATT 9 AAAAAATCAGTGT AGTTTAAGTTATTATTTAGTTATAAGTTATTATTTAGTTAAT CGGTCAGTGCCCT AAGTTATTATTTAGTTAATTAAGTTATTATTTAGTTAATAT GGCCCCTTAAGCC AAGTTATTATTTAGTTAATACTAAGTTATTATTTAGTTAAT GTGAACGATAGGT ACTTAAGTTATTATTTAGTTAATACTTTAAGTTATTATTTAG AGACAGCGTCCG TTAATACTTTTAAGTTATTATTTAGTTAATACTTTTTAA 10 AAAAAACGATGGC TTATAAGTTATTATTTAGTAAGTTATTATTTAGTTAAGTTAT TGGCGATCTCTCCG TATTTAGTTTAAGTTATTATTTAGTTATAAGTTATTATTTAG TTCCCGTAACTCCT TTAATAAGTTATTATTTAGTTAATTAAGTTATTATTTAGTTA GAAGGATAGCTAT ATATAAGTTATTATTTAGTTAATACTAAGTTATTATTTAGTT AGATTCCCTC AATACTTAAGTTATTATTTAGTTAATACTTTAAGTTA 11 AAAAAAGGAGGG AAGTTATTATTTAGTAAGTTATTATTTAGTTAAGTTATTATT CGGTCGAGCCCCT TAGTTTAAGTTATTATTTAGTTATAAGTTATTATTTAGTTAA CGTGCCCCGAACC TAAGTTATTATTTAGTTAATTAAGTTATTATTTAGTTAATAT CGGGAACGAGATT AAGTTATTATTTAGTTAATACTAAGTTATTATTTAGTTAAT TATAGAGTCCTTC ACTTAAGTTATTATTTAGTTAATACTTTAAGTTATTAT 12 AAAAAAGGACGTC TATTATTTAGTAAGTTATTATTTAGTTAAGTTATTATTTAGT GGGCTATTCCCGC TTAAGTTATTATTTAGTTATAAGTTATTATTTAGTTAATAA GTCTCTCTAATCCG GTTATTATTTAGTTAATTAAGTTATTATTTAGTTAATATAA CGAATTAGATCTA GTTATTATTTAGTTAATACTAAGTTATTATTTAGTTAATACT GAGACTCCCCG TAAGTTATTATTTAGTTAATACTTTAAGTTATTATTTAG 13 AAAAAATTAGGTT TTTAGTAAGTTATTATTTAGTTAAGTTATTATTTAGTTTAA TTGTTACTCGCGCG GTTATTATTTAGTTATAAGTTATTATTTAGTTAATAAGTTAT TTCGTTTAATCCGT TATTTAGTTAATTAAGTTATTATTTAGTTAATATAAGTTATT CAATGATAGCCAG ATTTAGTTAATACTAAGTTATTATTTAGTTAATACTTAAGT ATATCTTCCC TATTATTTAGTTAATACTTTAAGTTATTATTTAGTTAAT

357

Table C-1 (continued)

64 base pair hash cryptographic key (only 1st 64 bases are significant) code 14 AAAAAACGATGGC TAGTAAGTTATTATTTAGTTAAGTTATTATTTAGTTTAAGT TGGCGAGCGTTTG TATTATTTAGTTATAAGTTATTATTTAGTTAATAAGTTATTA GGCCCTCGAATGG TTTAGTTAATTAAGTTATTATTTAGTTAATATAAGTTATTAT TGGAAGTAGATTT TTAGTTAATACTAAGTTATTATTTAGTTAATACTTAAGTTA ATATACTCCCTG TTATTTAGTTAATACTTTAAGTTATTATTTAGTTAATAC 15 AAAAAAGTACGGC TTTAGTTAAGTTATTATTTAGTTTAAGTTATTATTTAGTTAT TGTTTATGCCTCGC AAGTTATTATTTAGTTAATAAGTTATTATTTAGTTAATTAA GTGCTCTAACTCCT GTTATTATTTAGTTAATATAAGTTATTATTTAGTTAATACTA GAACGATAGGTAG AGTTATTATTTAGTTAATACTTAAGTTATTATTTAGTTAATA ACAGCGTGCT CTTTAAGTTATTATTTAGTTAATACTTTTAAGTTATTA 16 AAAAAATTAGGTT TTAGTTAAGTTATTATTTAGTTTAAGTTATTATTTAGTTATA TTGTTACTCCCCTG AGTTATTATTTAGTTAATAAGTTATTATTTAGTTAATTAAG GCCCCTTAAGCCG TTATTATTTAGTTAATATAAGTTATTATTTAGTTAATACTAA GTAAGGAGATCTA GTTATTATTTAGTTAATACTTAAGTTATTATTTAGTTAATAC TAGATCCCGGC TTTAAGTTATTATTTAGTTAATACTTTTAAGTTATTAT 17 AAAAAACGATGGC AGTTAAGTTATTATTTAGTTTAAGTTATTATTTAGTTATAA TGGCGAGTGGGTC GTTATTATTTAGTTAATAAGTTATTATTTAGTTAATTAAGTT TGTGCTTCAATGCG ATTATTTAGTTAATATAAGTTATTATTTAGTTAATACTAAG TCAATGATAGCCA TTATTATTTAGTTAATACTTAAGTTATTATTTAGTTAATACT GATAGCGGCTG TTAAGTTATTATTTAGTTAATACTTTTAAGTTATTATTT 18 AAAAAAGGACGGT TATTATTTAGTTTAAGTTATTATTTAGTTATAAGTTATTATT CTCTTAGTGCTTGC TAGTTAATAAGTTATTATTTAGTTAATTAAGTTATTATTTA GTGCCCGAACCCG GTTAATATAAGTTATTATTTAGTTAATACTAAGTTATTATTT GGAACGATAGGCA AGTTAATACTTAAGTTATTATTTAGTTAATACTTTAAGTTA GATAGCCTCCT TTATTTAGTTAATACTTTTAAGTTATTATTTAGTTAATA 19 AAAAAAGTAGGGC ATTATTTAGTTTAAGTTATTATTTAGTTATAAGTTATTATTT CCGTCAGCCCTCTG AGTTAATAAGTTATTATTTAGTTAATTAAGTTATTATTTAG GCCGCGTAATCGC TTAATATAAGTTATTATTTAGTTAATACTAAGTTATTATTTA GGAAGGATATGGA GTTAATACTTAAGTTATTATTTAGTTAATACTTTAAGTTATT GAGATTCTGGC ATTTAGTTAATACTTTTAAGTTATTATTTAGTTAATAC 20 AAAAAATTAGGTT TAGTTTAAGTTATTATTTAGTTATAAGTTATTATTTAGTTAA TTGTTACTCGTCGC TAAGTTATTATTTAGTTAATTAAGTTATTATTTAGTTAATAT GTGCTCTAACTCCT AAGTTATTATTTAGTTAATACTAAGTTATTATTTAGTTAAT TAATCAGAGCCAG ACTTAAGTTATTATTTAGTTAATACTTTAAGTTATTATTTAG ATAGTGCCTG TTAATACTTTTAAGTTATTATTTAGTTAATACTTTTTA

358

Table C-1 (continued)

64 base pair hash cryptographic key (only 1st 64 bases are significant) code 21 AAAAAAGTAGTGT ATAAGTTATTATTTAGTAAGTTATTATTTAGTTAAGTTATT TTCTCACTCGTTGG ATTTAGTTTAAGTTATTATTTAGTTATAAGTTATTATTTAGT GTGTTTCAATCGCG TAATAAGTTATTATTTAGTTAATTAAGTTATTATTTAGTTA TAAGTAGAGGCAG ATATAAGTTATTATTTAGTTAATACTAAGTTATTATTTAGTT ATAGCCTCCT AATACTTAAGTTATTATTTAGTTAATACTTTAAGTTATT 22 AAAAAATCATTCTT TTATTTAGTAAGTTATTATTTAGTTAAGTTATTATTTAGTTT TGTCAGTGTTTGTC AAGTTATTATTTAGTTATAAGTTATTATTTAGTTAATAAGT TCGTTCAATGTCGG TATTATTTAGTTAATTAAGTTATTATTTAGTTAATATAAGTT AACGATAGGTAGA ATTATTTAGTTAATACTAAGTTATTATTTAGTTAATACTTAA CAGCCCGGC GTTATTATTTAGTTAATACTTTAAGTTATTATTTAGTT 23 AAAAAAGGAGGG AAGTTATTATTTAGTTAAGTTATTATTTAGTTTAAGTTATTA CGGGCCAGTGCTC TTTAGTTATAAGTTATTATTTAGTTAATAAGTTATTATTTAG CGGCTCTTCAATCG TTAATTAAGTTATTATTTAGTTAATATAAGTTATTATTTAGT CGTAAGTAGATCC TAATACTAAGTTATTATTTAGTTAATACTTAAGTTATTATTT ACAGAGTGTCTG AGTTAATACTTTAAGTTATTATTTAGTTAATACTTTT 24 AAAAAATGAGTCT GTTATTATTTAGTTAAGTTATTATTTAGTTTAAGTTATTATT TTTGTATTCGTTCT TAGTTATAAGTTATTATTTAGTTAATAAGTTATTATTTAGTT CTCCCCGAACCCG AATTAAGTTATTATTTAGTTAATATAAGTTATTATTTAGTT GGAACGATATGGA AATACTAAGTTATTATTTAGTTAATACTTAAGTTATTATTTA GAGATTCTGGC GTTAATACTTTAAGTTATTATTTAGTTAATACTTTTAA 25 AAAAAAGGAGGTT GTTAAGTTATTATTTAGTTTAAGTTATTATTTAGTTATAAG TGTGTAGCGTTTG TTATTATTTAGTTAATAAGTTATTATTTAGTTAATTAAGTTA GGCCCTCGAACCG TTATTTAGTTAATATAAGTTATTATTTAGTTAATACTAAGTT GCGAAGGAGAGG ATTATTTAGTTAATACTTAAGTTATTATTTAGTTAATACTTT GAGATATCTTCCC AAGTTATTATTTAGTTAATACTTTTAAGTTATTATTTA 26 AAAAAAGTAGGTG AGTTATTATTTAGTTTAAGTTATTATTTAGTTATAAGTTATT TGGCCAGTGCTCC ATTTAGTTAATAAGTTATTATTTAGTTAATTAAGTTATTATT GGCTCTCTAAGCC TAGTTAATATAAGTTATTATTTAGTTAATACTAAGTTATTA GGGAAGGACAGG TTTAGTTAATACTTAAGTTATTATTTAGTTAATACTTTAAGT CATACAGCCTGTC TATTATTTAGTTAATACTTTTAAGTTATTATTTAGTTA 27 AAAAAAGGATTCT ATTTAGTTTAAGTTATTATTTAGTTATAAGTTATTATTTAGT TTGTCAGTGTTTGG TAATAAGTTATTATTTAGTTAATTAAGTTATTATTTAGTTA TCTCTCTAATCCGC ATATAAGTTATTATTTAGTTAATACTAAGTTATTATTTAGTT GAATGATAGCCAG AATACTTAAGTTATTATTTAGTTAATACTTTAAGTTATTATT AGAGCTTCCG TAGTTAATACTTTTAAGTTATTATTTAGTTAATACTTT

359

Table C-1 (continued)

64 base pair hash cryptographic key (only 1st 64 bases are significant) code 28 AAAAAAGGATCGT TTTAGTTTAAGTTATTATTTAGTTATAAGTTATTATTTAGTT CGGCTAGCTCTCCT AATAAGTTATTATTTAGTTAATTAAGTTATTATTTAGTTAA TGCCCTTAATCGTG TATAAGTTATTATTTAGTTAATACTAAGTTATTATTTAGTTA GAAGTACAGGTAT ATACTTAAGTTATTATTTAGTTAATACTTTAAGTTATTATTT AGATGCTGCC AGTTAATACTTTTAAGTTATTATTTAGTTAATACTTTT 29 AAAAAATGAGTCT GTTATTATTTAGTAAGTTATTATTTAGTTAAGTTATTATTTA CTCTTAGTGCTTGC GTTTAAGTTATTATTTAGTTATAAGTTATTATTTAGTTAATA GTGGGTTAATGCC AGTTATTATTTAGTTAATTAAGTTATTATTTAGTTAATATA GTAAGGATAGCCA AGTTATTATTTAGTTAATACTAAGTTATTATTTAGTTAATA GAGAGCTTCCC CTTAAGTTATTATTTAGTTAATACTTTAAGTTATTATTT 30 AAAAAAGTACGGC ATTTAGTAAGTTATTATTTAGTTAAGTTATTATTTAGTTTAA TGTTTATGCCCCTG GTTATTATTTAGTTATAAGTTATTATTTAGTTAATAAGTTAT GCCCCTTAAGCCCT TATTTAGTTAATTAAGTTATTATTTAGTTAATATAAGTTATT TAAGTAGAGCTAC ATTTAGTTAATACTAAGTTATTATTTAGTTAATACTTAAGT AGAGCGGCTG TATTATTTAGTTAATACTTTAAGTTATTATTTAGTTAA 31 AAAAAAGGACGGC TATTATTTAGTTAAGTTATTATTTAGTTTAAGTTATTATTTA CCGTCAGCCCTCTG GTTATAAGTTATTATTTAGTTAATAAGTTATTATTTAGTTA GTGGGTTAATGCC ATTAAGTTATTATTTAGTTAATATAAGTTATTATTTAGTTA GTAAGTATACCTA ATACTAAGTTATTATTTAGTTAATACTTAAGTTATTATTTA GATAGTGGCTG GTTAATACTTTAAGTTATTATTTAGTTAATACTTTTAAGT 32 AAAAAATCAGGTC TTATTTAGTTTAAGTTATTATTTAGTTATAAGTTATTATTTA GGGCTATTCCCGC GTTAATAAGTTATTATTTAGTTAATTAAGTTATTATTTAGTT GTTGGGTTAATGC AATATAAGTTATTATTTAGTTAATACTAAGTTATTATTTAG CGTAAGTAGATTT TTAATACTTAAGTTATTATTTAGTTAATACTTTAAGTTATTA ATAGAGTCCTTC TTTAGTTAATACTTTTAAGTTATTATTTAGTTAATACT 33 AAAAAAGTACGGC TTAGTTTAAGTTATTATTTAGTTATAAGTTATTATTTAGTTA TGTTTATGCCCTGT ATAAGTTATTATTTAGTTAATTAAGTTATTATTTAGTTAAT CTCGTTCAATGTCG ATAAGTTATTATTTAGTTAATACTAAGTTATTATTTAGTTA TAATTAGATCTAGA ATACTTAAGTTATTATTTAGTTAATACTTTAAGTTATTATTT GACTCCGTC AGTTAATACTTTTAAGTTATTATTTAGTTAATACTTTTT 34 AAAAAAGGAGGTT AGTAAGTTATTATTTAGTTAAGTTATTATTTAGTTTAAGTT TGTGTATCTCTCCG ATTATTTAGTTATAAGTTATTATTTAGTTAATAAGTTATTAT TTCCCGTAACGTCG TTAGTTAATTAAGTTATTATTTAGTTAATATAAGTTATTATT TAATGAGACCTAG TAGTTAATACTAAGTTATTATTTAGTTAATACTTAAGTTAT ATAGTGTCCC TATTTAGTTAATACTTTAAGTTATTATTTAGTTAATACT

360

Table C-1 (continued)

64 base pair hash cryptographic key (only 1st 64 bases are significant) code 35 AAAAAAGTAGTGT GTAAGTTATTATTTAGTTAAGTTATTATTTAGTTTAAGTTA TTCTTATGCCCTCT TTATTTAGTTATAAGTTATTATTTAGTTAATAAGTTATTATT GTCGGTTAACTCCT TAGTTAATTAAGTTATTATTTAGTTAATATAAGTTATTATTT GAAGGATAGCCAT AGTTAATACTAAGTTATTATTTAGTTAATACTTAAGTTATT ACAGCCTGTC ATTTAGTTAATACTTTAAGTTATTATTTAGTTAATACTT 36 AAAAAATCAGTTG TTATTTAGTTAAGTTATTATTTAGTTTAAGTTATTATTTAGT TGTTTAGTCGGTCG TATAAGTTATTATTTAGTTAATAAGTTATTATTTAGTTAATT TCTCTCTAATCCGC AAGTTATTATTTAGTTAATATAAGTTATTATTTAGTTAATA GAAGTACAGGTAT CTAAGTTATTATTTAGTTAATACTTAAGTTATTATTTAGTTA AGATGCTGCC ATACTTTAAGTTATTATTTAGTTAATACTTTTAAGTTA 37 AAAAAATCAGTGT TAGTTAAGTTATTATTTAGTTTAAGTTATTATTTAGTTATAA CGGTCAGTCGCGC GTTATTATTTAGTTAATAAGTTATTATTTAGTTAATTAAGTT GTTCGTTTAATCGC ATTATTTAGTTAATATAAGTTATTATTTAGTTAATACTAAG TTAAGTAGAGCTA TTATTATTTAGTTAATACTTAAGTTATTATTTAGTTAATACT CAGATTGCTTC TTAAGTTATTATTTAGTTAATACTTTTAAGTTATTATT 38 AAAAAAGGATCGT TTATTATTTAGTTTAAGTTATTATTTAGTTATAAGTTATTAT TTTGTATTCGTTCT TTAGTTAATAAGTTATTATTTAGTTAATTAAGTTATTATTTA CTCGTCCAATGCCT GTTAATATAAGTTATTATTTAGTTAATACTAAGTTATTATTT GAATCAGATCCAC AGTTAATACTTAAGTTATTATTTAGTTAATACTTTAAGTTA AGAGTGTCTG TTATTTAGTTAATACTTTTAAGTTATTATTTAGTTAAT 39 AAAAAACGAGTTG TATTTAGTTTAAGTTATTATTTAGTTATAAGTTATTATTTAG TGTTTAGTCGGTCT TTAATAAGTTATTATTTAGTTAATTAAGTTATTATTTAGTTA GCGTTTCAACCCCT ATATAAGTTATTATTTAGTTAATACTAAGTTATTATTTAGTT CAATTATACCTAGA AATACTTAAGTTATTATTTAGTTAATACTTTAAGTTATTATT TAGTGGCTG TAGTTAATACTTTTAAGTTATTATTTAGTTAATACTT 40 AAAAAATTAGGTT ATTTATAAGTTATTATTTAGTAAGTTATTATTTAGTTAAGTT TTGTTACTGGGTCT ATTATTTAGTTTAAGTTATTATTTAGTTATAAGTTATTATTT GTGCTTCAATGGT AGTTAATAAGTTATTATTTAGTTAATTAAGTTATTATTTAG GGAAGTAGATTTA TTAATATAAGTTATTATTTAGTTAATACTAAGTTATTATTTA GATAGTGTCCC GTTAATACTTAAGTTATTATTTAGTTAATACTTTAAGT 41 AAAAAAGTAGTTG ATTATTTAGTAAGTTATTATTTAGTTAAGTTATTATTTAGTT TGTTTAGTCGGTCT TAAGTTATTATTTAGTTATAAGTTATTATTTAGTTAATAAG TGCCCTTAATCGTG TTATTATTTAGTTAATTAAGTTATTATTTAGTTAATATAAGT GAATCAGAGCCAG TATTATTTAGTTAATACTAAGTTATTATTTAGTTAATACTTA ATAGTGTGCT AGTTATTATTTAGTTAATACTTTAAGTTATTATTTAGT

361

Table C-1 (continued)

64 base pair hash cryptographic key (only 1st 64 bases are significant) code 42 AAAAAATCAGTGT TTAGTAAGTTATTATTTAGTTAAGTTATTATTTAGTTTAAG CGGTCAGTGGGTC TTATTATTTAGTTATAAGTTATTATTTAGTTAATAAGTTATT TGTGCTTCAATCGG ATTTAGTTAATTAAGTTATTATTTAGTTAATATAAGTTATT CGAAGGAGAGGG ATTTAGTTAATACTAAGTTATTATTTAGTTAATACTTAAGT AGAGATGCTGTC TATTATTTAGTTAATACTTTAAGTTATTATTTAGTTAATA 43 AAAAAACGATTCTT TATTTAGTTAAGTTATTATTTAGTTTAAGTTATTATTTAGTT TGTCAGTGTTTCTT ATAAGTTATTATTTAGTTAATAAGTTATTATTTAGTTAATT GCCCTTAATCGTG AAGTTATTATTTAGTTAATATAAGTTATTATTTAGTTAATA GAATTAGATCTAG CTAAGTTATTATTTAGTTAATACTTAAGTTATTATTTAGTTA AGACTCCGTG ATACTTTAAGTTATTATTTAGTTAATACTTTTAAGTTAT 44 AAAAAAGTAGTGT TTAAGTTATTATTTAGTTTAAGTTATTATTTAGTTATAAGTT TTCTTATCTCTCCG ATTATTTAGTTAATAAGTTATTATTTAGTTAATTAAGTTATT TTCCCGTAATGGTG ATTTAGTTAATATAAGTTATTATTTAGTTAATACTAAGTTA GAAGTAGATTTAT TTATTTAGTTAATACTTAAGTTATTATTTAGTTAATACTTTA AGATGCTGTC AGTTATTATTTAGTTAATACTTTTAAGTTATTATTTAG 45 AAAAAATGATGGT TAAGTTATTATTTAGTTTAAGTTATTATTTAGTTATAAGTTA CCGTTATGCCCTCT TTATTTAGTTAATAAGTTATTATTTAGTTAATTAAGTTATTA GTCGGTTAACGTC TTTAGTTAATATAAGTTATTATTTAGTTAATACTAAGTTATT GTAATGAGACCTA ATTTAGTTAATACTTAAGTTATTATTTAGTTAATACTTTAA TATACTCCCTG GTTATTATTTAGTTAATACTTTTAAGTTATTATTTAGT 46 AAAAAAGGAGGG AAGTTATTATTTAGTTTAAGTTATTATTTAGTTATAAGTTAT CGGGTCACTCGTT TATTTAGTTAATAAGTTATTATTTAGTTAATTAAGTTATTAT GGGTGTTTCAACTC TTAGTTAATATAAGTTATTATTTAGTTAATACTAAGTTATT CTGAAGGATAGCC ATTTAGTTAATACTTAAGTTATTATTTAGTTAATACTTTAA AGATAGTGTCCC GTTATTATTTAGTTAATACTTTTAAGTTATTATTTAGTT 47 AAAAAAGTAGGTG AGTTATTATTTAGTAAGTTATTATTTAGTTAAGTTATTATTT TTTGTATTCGTTCT AGTTTAAGTTATTATTTAGTTATAAGTTATTATTTAGTTAAT CTCGCGTAATCGC AAGTTATTATTTAGTTAATTAAGTTATTATTTAGTTAATAT GGAAGGATACCTA AAGTTATTATTTAGTTAATACTAAGTTATTATTTAGTTAAT GATAGTGGCTG ACTTAAGTTATTATTTAGTTAATACTTTAAGTTATTATT

362

Table C-1 (continued)

64 base pair hash cryptographic key (only 1st 64 bases are significant) code 48 AAAAAATGATGGT TAAGTTATTATTTAGTTAAGTTATTATTTAGTTTAAGTTATT CCGTCACTCGTTGG ATTTAGTTATAAGTTATTATTTAGTTAATAAGTTATTATTTA GTGTTTTAAGCCG GTTAATTAAGTTATTATTTAGTTAATATAAGTTATTATTTA GGAAGGACAGGTA GTTAATACTAAGTTATTATTTAGTTAATACTTAAGTTATTA TAGATTCCCTC TTTAGTTAATACTTTAAGTTATTATTTAGTTAATACTTT 49 AAAAAAGGATCGT TTATTATTTAGTTAAGTTATTATTTAGTTTAAGTTATTATTT CTCTTAGTGCTTGC AGTTATAAGTTATTATTTAGTTAATAAGTTATTATTTAGTT GTCGCGTAATCGC AATTAAGTTATTATTTAGTTAATATAAGTTATTATTTAGTT GGAAGGAGATTTA AATACTAAGTTATTATTTAGTTAATACTTAAGTTATTATTTA TAGAGTCCTTC GTTAATACTTTAAGTTATTATTTAGTTAATACTTTTAAG 50 AAAAAAGGAGGTT TATAAGTTATTATTTAGTAAGTTATTATTTAGTTAAGTTATT TGTGTATGCCCTCT ATTTAGTTTAAGTTATTATTTAGTTATAAGTTATTATTTAGT GTCGGTTAAGCCG TAATAAGTTATTATTTAGTTAATTAAGTTATTATTTAGTTA GGAAGGACAGCCA ATATAAGTTATTATTTAGTTAATACTAAGTTATTATTTAGTT CAGAGTGTCTG AATACTTAAGTTATTATTTAGTTAATACTTTAAGTTAT

363