Coding Theory of Permutations/Multipermutations and in the LEAN Theorem Prover
Total Page:16
File Type:pdf, Size:1020Kb
Coding Theory of Permutations/Multipermutations and in the LEAN Theorem Prover May 2018 Chiba University Graduate School of Science Division of Department of Mathematics and Informatics Justin Kong for Lukela i I.ABSTRACT This thesis investigates problems in Coding theory, a fundamental field of study for the reliable transmission or storage of data. Contributions are in two general areas, permutation/multipermutation codes and coding theory formalization. Linear Codes are well studied, but less is known about nonlinear codes. Permutation and multipermutation codes are types of nonlinear codes. Formal- ization involves the precise statement of mathematical theorems in a specified language in such a way that the veracity of these theorems may be verified algorithmically. Although coding theory is a field of study well-suited for formalization, work in coding theory formalization has been relatively limited. In this thesis we advance the study of permutation and multipermutation codes in three ways. First, we extend an LP (linear programming) decoding method originally designed for permutation codes in the Euclidean metric to permutation codes in the Kendall tau metric. Second, we provide new methods for calculating Ulam permutation sphere sizes. The results are used to present new bounds on the maximum possible size of permutation codes in the Ulam metric, including the fundamental theorem that nontrivial perfect Ulam permutation codes do not exist. Third, new methods for calculating Ulam multipermutation sphere sizes for certain parameters are provided as well as resulting bounds on the maximum possible code size. In this thesis we advance the study of formalization in coding theory in two ways. Contri- butions are made using the Lean theorem proof assistant, a software designed specifically for formalization. The first contribution is a set of files that formalize relevant basic mathematical definitions and theorems as well as the famous Repetition codes and Hamming (7,4) code. The second contribution is a set of files that formalize Levenshtein codes and related definitions and theorems. Levenshtein codes are a type of nonlinear deletion and/or insertion correcting code. ii II.ACKNOWLEDGEMENTS Firstly, I would like to thank my Lord and Savior Jesus Christ, who has lead me throughout my journey of education and without whom I cannot so much as breathe. I am grateful for the love and support of my family, especially my wife Keiko who has been a constant encouragement. I am indebted greatly to my advisor, Dr. Manabu Hagiwara, who has patiently guided me step by step throughout my post-graduate studies. I am constantly amazed by his mathematical prowess and insight, and simultaneous wisdom and kindness. iii CONTENTS i Abstract i ii Acknowledgements ii iii List of Figures vi iv List of Tables vii 1 Introduction 1 1-A Contributions . .1 1-B Organization of Thesis . .3 2 Preliminaries and Literature Overview 4 2-A Introduction . .4 2-B The Idea of Coding . .4 2-C Minimum Distance in Coding Theory . .6 2-D Permutation/Multipermutation Codes Literature Review . .7 2-E Formalization in Coding Theory Literature Review . .9 2-F Conclusion . 10 3 Kendall tau LP-decodable Permutation Codes 11 3-A Introduction . 11 3-B Extending LP decoding Methods . 12 3-C Minimum Weight . 18 3-D Weight Enumerator Polynomials . 20 3-E Parabolic Subgroups and Reflection Subgroups . 27 3-F Conclusion . 35 4 Ulam Permutation Codes 36 4-A Introduction . 36 iv 4-B Preliminaries and Notation . 37 4-C Permutation Ulam Sphere Size . 44 4-D Nonexistence of Nontrivial Perfect Ulam Permutation Codes . 48 4-E Conclusion . 54 5 Ulam Multipermutation Codes 55 5-A Introduction . 55 5-B Multipermutation Ulam Sphere Size and Duplication Sets . 55 5-C Min/Max Spheres and Code Size Bounds . 68 5-C1 Non-Binary Case . 68 5-C2 Binary Case – Cut Location Maximizing Sphere Size . 72 5-D binary case – number of cuts maximizing sphere size . 76 5-E Conclusion . 84 6 Formalization of Coding Theory in Lean 85 6-A Introduction . 85 6-B About Lean . 87 6-C File access and overview . 88 6-D Error correcting systems definition and examples . 94 6-E Formalization of Levenshtein distance . 98 6-F Formalization of Insertion/Deletion spheres . 100 6-G Conclusion . 106 7 Conclusion 106 Appendix A: Consolidation for Compact Constraints 108 Appendix B 119 Appendix C 121 Appendix D 122 v Appendix E 124 Appendix F 125 Appendix G 128 Appendix H 132 References 135 vi III.LIST OF FIGURES 1 Weak Bruhat ordering for S4 .............................. 33 2 Translocation illustration . 42 3 Young diagram and SYT . 45 4 Constructing q5 from q4 (when n = 30)........................ 78 5 Constructing q4 from q3 (when n = 34)........................ 79 6 File dependencies . 88 7 Consolidation . 116 Pc−1 2 2 8 Diagram of i=1 qc−1(i) − qc (i) .......................... 130 2 2 9 Diagram of qc−1(i) − qc (i) ............................... 130 vii IV.LIST OF TABLES I Kendall tau, Euclidean distances . 14 II Euclidean, Kendall tau distances from (1; 0; 2; 3) and [1; 0; 2; 3] ........... 15 III Kendall tau, Euclidean weights . 16 IV Kendall tau, Euclidean weights of C5 ......................... 19 V Permutation Ulam sphere sizes . 37 VI Theoretical limit on maximum Ulam permutation code size . 37 VII Non-feasibility of perfect t-error correcting codes . 52 VIII Multipermutation Ulam Sphere Sizes . 55 IX Theoretical limits on maximum Ulam multipermutation code size . 56 X Maximum sphere size verses bounded value . 83 XI Positivity, Homogeneous Constraints . 113 XII Row-sum (Column-sum) and Weak Row-sum (Column-sum) Constraints . 113 XIII Doubly Stochastic Constraint, Quasi-Homogeneous Constraint, Merged Constraint . 113 1 1. INTRODUCTION In the information age that we live in, data is communicated and stored through a plethora of digital devices. Without the advent of coding theory, reliable communication and data storage would not be possible. Error-correcting codes, explained in chapter 2, are utilized constantly to combat the inherent noise of communication channels. The study of these codes is called coding theory. Codes have also been improving since their advent in the late 1940’s and early 1950’s. With their improvement communication becomes faster and data storage denser. In fact, current codes can approach known limits to what is possible for certain communication schemes. However, the quest to improve codes and to expand their underlying theory is not over by any means. A. Contributions The main contributions of this thesis can be divided into two large categories: 1) contributions to coding theory in the area of permutation/multipermutation codes and 2) contributions to coding theory in the area of formalization. We begin by providing some context for each contribution before stating the actual results of each category. The first category of contribution is to the field of permutation and multipermuation codes. Permutation and multipermutation codes are types of nonlinear codes. Linear codes are well studied and largely well-understood because of their close relation to linear algebra. On the other hand, there remains much mystery in the field of nonlinear codes. Depending on the physical scenario, nonlinear codes may demonstrate superior performance over linear codes. Hence it is worthwhile to devote some energy to research in this area. One category of nonlinear codes are permutation and multipermutation codes. Certain prop- erties, discussed in later chapters, make permutation and multipermutation codes attractive can- didates for use in applications such as flash memory devices. There are several questions that must be answered before this is possible, and it is also necessary to generally expand the theory of permutation and multipermutation codes. 2 Our main contributions to permutation/multipermutation codes are threefold. We state them here, although some explanation of the terminology of these statements is found in subsequent chapters. First, we extend a decoding method to a new class of permutation codes, calling them Kendall tau LP-decodable permutation codes. Second, we provide new ways of calculating Ulam permutation sphere sizes, and use this to prove the fundamental bound on maximal code size that perfect Ulam permutation codes do not exist. Third, we provide new ways of calculating Ulam multipermutation sphere sizes, and use this to prove new bounds on the maximal code size of Ulam multipermutation codes. These results were partially published in [37, 51, 52]. The second category of contribution is to the field of formalization in coding theory. Math- ematical formalization involves the precise statement of mathematical theorems in a specified language in such a way that the veracity of these theorems may be verified algorithmically. When an error-correcting code fails to perform its purported function, errors in communication may result. Moreover, the study of coding theory is often complex and proofs of the properties of a particular error-correcting code may be difficult to understand or verify for non-specialists in coding theory. Even among coding theorists, the vast quantity of coding schemes can sometimes lead to miscommunication. In the future, formalization may become a valuable tool for verifying properties of codes. It may also be a unifying force, as formalization removes the ambiguity of language. Current work in the area of the formalization of coding theory is limited. Libraries of coding theory are necessary to aid future work, since any definitions or lemmas that are formalized in a proof- assistant can be subsequently applied in future formalization. As our main contribution to formalization in coding theory, we provide, to the best of our knowledge, the first coding theory library for the Lean theorem proof assistant. The library is called “Cotoleta” (COding Theory Over the LEan Theorem proof Assistant). The Lean theorem proof assistant is a free software released by Microsoft Research in cooperation with Carnegie Mellon University. Our library includes a couple of key components.