L Repository

Total Page:16

File Type:pdf, Size:1020Kb

L Repository University of Warwick institutional repository: http://go.warwick.ac.uk/wrap A Thesis Submitted for the Degree of PhD at the University of Warwick http://go.warwick.ac.uk/wrap/61310 This thesis is made available online and is protected by original copyright. Please scroll down to view the document itself. Please refer to the repository record for this item for information to help you to cite it. Our policy information is available from the repository home page. Sequence Distance Embeddings by Graham Cormode Thesis Submitted to the University of Warwick for the degree of Doctor of Philosophy Computer Science January 2003 Contents List of Figures vi Acknowledgments viii Declarations ix Abstract xi Abbreviations xii Chapter 1 Starting 1 1.1 Sequence Distances . ....................................... 2 1.1.1 Metrics ............................................ 3 1.1.2 Editing Distances ...................................... 3 1.1.3 Embeddings . ....................................... 3 1.2 Sets and Vectors ........................................... 4 1.2.1 Set Difference and Set Union . .............................. 4 1.2.2 Symmetric Difference . .................................. 5 1.2.3 Intersection Size ...................................... 5 1.2.4 Vector Norms . ....................................... 5 1.3 Permutations ............................................ 6 1.3.1 Reversal Distance ...................................... 7 1.3.2 Transposition Distance . .................................. 7 1.3.3 Swap Distance ....................................... 8 1.3.4 Permutation Edit Distance . .............................. 8 1.3.5 Reversals, Indels, Transpositions, Edits (RITE) . .................... 9 1.4 Strings ................................................ 9 1.4.1 Hamming Distance . .................................. 10 1.4.2 Edit Distance . ....................................... 10 1.4.3 Block Edit Distances . .................................. 11 1.5 Sequence Distance Problems . .................................. 14 1.5.1 Efficient Computation and Communication . .................... 14 1.5.2 Approximate Pattern Matching .............................. 15 1.5.3 Geometric Problems . .................................. 16 1.5.4 Approximate Neighbors .................................. 16 1.5.5 Clustering for k-centers .................................. 17 1.6 The Shape of Things to Come . .................................. 17 ii Chapter 2 Sketching and Streaming 21 2.1 Approximations and Estimates .................................. 22 2.1.1 Sketch Model . ....................................... 23 2.1.2 Streaming . ....................................... 23 2.1.3 Equality Testing ....................................... 24 2.2 Vector Distances ........................................... 26 2.2.1 Johnson-Lindenstrauss lemma .............................. 26 2.2.2 Frequency Moments . .................................. 27 2.2.3 L1 Streaming Algorithm .................................. 28 2.2.4 Sketches using Stable Distributions . ......................... 29 2.2.5 Summary of Vector Lp Distance Algorithms . .................... 31 2.3 Set Spaces and Vector Distances .................................. 32 2.3.1 Symmetric Difference and Hamming Space . .................... 32 2.3.2 Set Union and Distinct Elements ............................. 35 2.3.3 Set Intersection Size . .................................. 37 2.3.4 Approximating Set Measures . .............................. 40 2.4 Geometric Problems . ....................................... 41 2.4.1 Locality Sensitive Hash Functions . ......................... 41 2.4.2 Approximate Furthest Neighbors for Euclidean Distance ............... 43 2.4.3 Clustering for k-centers .................................. 44 2.5 Discussion . ............................................ 45 Chapter 3 Searching Sequences 46 3.1 Introduction . ............................................ 47 3.1.1 Computational Biology Background . ......................... 47 3.1.2 Results ............................................ 48 3.2 Embeddings of Permutation Distances .............................. 48 3.2.1 Swap Distance ....................................... 49 3.2.2 Reversal Distance ...................................... 51 3.2.3 Transposition Distance . .................................. 53 3.2.4 Permutation Edit Distance . .............................. 55 3.2.5 Hardness of Estimating Permutation Distances . .................... 58 3.2.6 Extensions . ....................................... 62 3.3 Applications of the Embeddings . .............................. 62 3.3.1 Sketching for Permutation Distances . ......................... 62 3.3.2 Approximating Pairwise Distances . ......................... 64 3.3.3 Approximate Nearest Neighbors and Clustering .................... 65 3.3.4 Approximate Pattern Matching with Permutations . ............... 67 3.4 Discussion . ............................................ 68 Chapter 4 Strings and Substrings 70 4.1 Introduction . ............................................ 71 4.2 Embedding String Edit Distance with Moves into L1 Space . ............... 72 4.2.1 Edit Sensitive Parsing . .................................. 72 4.2.2 Parsing of Different Metablocks ............................. 73 4.2.3 Constructing ET(a) .................................... 75 4.3 Properties of ESP . ....................................... 77 4.3.1 Upper Bound Proof . .................................. 77 4.3.2 Lower Bound Proof . .................................. 79 4.4 Embedding for other block edit distances . ......................... 80 iii 4.4.1 Compression Distance . .................................. 85 4.4.2 Unconstrained Deletes . .................................. 86 4.4.3 LZ Distance . ....................................... 87 4.4.4 Q-gram distance ...................................... 88 4.5 Solving the Approximate Pattern Matching Problem for String Edit Distance with Moves 89 4.5.1 Using the Pruning Lemma . .............................. 89 4.5.2 ESP subtrees . ....................................... 89 4.5.3 Approximate Pattern Matching Algorithm . .................... 90 4.6 Applications to Geometric Problems . .............................. 92 4.6.1 Approximate Nearest and Furthest Neighbors . .................... 92 4.6.2 String Outliers ....................................... 94 4.6.3 Sketches in the Streaming model ............................. 95 4.6.4 Approximate p-centers problem ............................. 95 4.6.5 Dynamic Indexing . .................................. 96 4.7 Discussion . ............................................ 97 Chapter 5 Stables, Subtables and Streams 98 5.1 Introduction . ............................................ 99 5.1.1 Data Stream Comparison . .............................. 99 5.1.2 Tabular Data Comparison . ..............................100 5.2 Sketch Computation . .......................................102 5.2.1 Implementing Sketching Using Stable Distributions . ...............102 5.2.2 Median of Stable Distributions ..............................103 5.2.3 Faster Sketch Computation . ..............................105 5.2.4 Implementation Issues . ..................................108 5.3 Stream Based Experiments . ..................................109 5.4 Experimental Results for Clustering . ..............................113 5.4.1 Accuracy Measures . ..................................114 5.4.2 Assessing Quality and Efficiency of Sketching . ....................115 5.4.3 Clustering Using Sketches . ..............................118 5.4.4 Clustering Using Various Lp Norms . .........................121 5.5 Discussion . ............................................125 Chapter 6 Sending and Swapping 126 6.1 Introduction . ............................................127 6.1.1 Prior Work . .......................................127 6.1.2 Results ............................................129 6.2 Bounds on communication . ..................................130 6.3 Near Optimal Document Exchange . ..............................132 6.3.1 Single Round Protocol . ..................................132 6.3.2 Application to distances of interest . .........................133 6.3.3 Computational Cost . ..................................134 6.4 Computationally Efficient Protocols for String Distances ...................136 6.4.1 Hamming distance . ..................................137 6.4.2 Edit Distance. .......................................140 6.4.3 Tichy’s Distance .......................................141 6.4.4 LZ Distance . .......................................143 6.4.5 Compression Distances and Edit Distance with Moves . ...............144 6.4.6 Compression Distance with Unconstrained Deletes . ...............147 6.5 Computationally Efficient Protocols for Permutation Distances . ...............147 iv 6.6 Discussion . ............................................149 Chapter 7 Stopping 152 7.1 Discussion . ............................................153 7.1.1 Nature of Embeddings . ..................................153 7.1.2 Permutations and Strings . ..............................154 7.2 Extensions . ............................................155 7.2.1 Trees . ............................................155 7.2.2 Graphs ............................................156 7.3 Further Work ............................................157 Appendix A Supplemental Sectionon Sequence Similarity 159 A.1 Combined Permutation Distances
Recommended publications
  • Sequence Distance Embeddings
    Sequence Distance Embeddings by Graham Cormode Thesis Submitted to the University of Warwick for the degree of Doctor of Philosophy Computer Science January 2003 Contents List of Figures vi Acknowledgments viii Declarations ix Abstract xi Abbreviations xii Chapter 1 Starting 1 1.1 Sequence Distances . ....................................... 2 1.1.1 Metrics ............................................ 3 1.1.2 Editing Distances ...................................... 3 1.1.3 Embeddings . ....................................... 3 1.2 Sets and Vectors ........................................... 4 1.2.1 Set Difference and Set Union . .............................. 4 1.2.2 Symmetric Difference . .................................. 5 1.2.3 Intersection Size ...................................... 5 1.2.4 Vector Norms . ....................................... 5 1.3 Permutations ............................................ 6 1.3.1 Reversal Distance ...................................... 7 1.3.2 Transposition Distance . .................................. 7 1.3.3 Swap Distance ....................................... 8 1.3.4 Permutation Edit Distance . .............................. 8 1.3.5 Reversals, Indels, Transpositions, Edits (RITE) . .................... 9 1.4 Strings ................................................ 9 1.4.1 Hamming Distance . .................................. 10 1.4.2 Edit Distance . ....................................... 10 1.4.3 Block Edit Distances . .................................. 11 1.5 Sequence Distance Problems . .................................
    [Show full text]
  • Bilinear Forms Over a Finite Field, with Applications to Coding Theory
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Elsevier - Publisher Connector JOURNAL OF COMBINATORIAL THEORY, Series A 25, 226-241 (1978) Bilinear Forms over a Finite Field, with Applications to Coding Theory PH. DELSARTE MBLE Research Laboratory, Brussels, Belgium Communicated by J. H. can Lint Received December 7, 1976 Let 0 be the set of bilinear forms on a pair of finite-dimensional vector spaces over GF(q). If two bilinear forms are associated according to their q-distance (i.e., the rank of their difference), then Q becomes an association scheme. The characters of the adjacency algebra of Q, which yield the MacWilliams transform on q-distance enumerators, are expressed in terms of generalized Krawtchouk polynomials. The main emphasis is put on subsets of s2 and their q-distance structure. Certain q-ary codes are attached to a given XC 0; the Hamming distance enumerators of these codes depend only on the q-distance enumerator of X. Interesting examples are provided by Singleton systems XC 0, which are defined as t-designs of index 1 in a suitable semilattice (for a given integer t). The q-distance enumerator of a Singleton system is explicitly determined from the parameters. Finally, a construction of Singleton systems is given for all values of the parameters. 1. INTRODUCTION Classical coding theory may be introduced as follows. Let r denote a finite-dimensional vector space over the finite field K. The Hamming weight wt(f) of a vector f E r is the number of nonzero coordinates off in a fixed K-basis of lY Then (r, wt) is a normed space, called Hamming space; and a code simply is a nonempty subset of r endowed with the Hamming distance attached to wt.
    [Show full text]
  • Scribe Notes
    6.440 Essential Coding Theory Feb 19, 2008 Lecture 4 Lecturer: Madhu Sudan Scribe: Ning Xie Today we are going to discuss limitations of codes. More specifically, we will see rate upper bounds of codes, including Singleton bound, Hamming bound, Plotkin bound, Elias bound and Johnson bound. 1 Review of last lecture n k Let C Σ be an error correcting code. We say C is an (n, k, d)q code if Σ = q, C q and ∆(C) d, ⊆ | | | | ≥ ≥ where ∆(C) denotes the minimum distance of C. We write C as [n, k, d]q code if furthermore the code is a F k linear subspace over q (i.e., C is a linear code). Define the rate of code C as R := n and relative distance d as δ := n . Usually we fix q and study the asymptotic behaviors of R and δ as n . Recall last time we gave an existence result, namely the Gilbert-Varshamov(GV)→ ∞ bound constructed by greedy codes (Varshamov bound corresponds to greedy linear codes). For q = 2, GV bound gives codes with k n log2 Vol(n, d 2). Asymptotically this shows the existence of codes with R 1 H(δ), which is similar≥ to− Shannon’s result.− Today we are going to see some upper bound results, that≥ is,− code beyond certain bounds does not exist. 2 Singleton bound Theorem 1 (Singleton bound) For any code with any alphabet size q, R + δ 1. ≤ Proof Let C Σn be a code with C Σ k. The main idea is to project the code C on to the first k 1 ⊆ | | ≥ | | n k−1 − coordinates.
    [Show full text]
  • The Chromatic Number of the Square of the 8-Cube
    The chromatic number of the square of the 8-cube Janne I. Kokkala∗ and Patric R. J. Osterg˚ard¨ † Department of Communications and Networking Aalto University School of Electrical Engineering P.O. Box 13000, 00076 Aalto, Finland Abstract A cube-like graph is a Cayley graph for the elementary abelian group of order 2n. In studies of the chromatic number of cube-like k graphs, the kth power of the n-dimensional hypercube, Qn, is fre- quently considered. This coloring problem can be considered in the k framework of coding theory, as the graph Qn can be constructed with one vertex for each binary word of length n and edges between ver- tices exactly when the Hamming distance between the corresponding k words is at most k. Consequently, a proper coloring of Qn corresponds to a partition of the n-dimensional binary Hamming space into codes with minimum distance at least k + 1. The smallest open case, the 2 chromatic number of Q8, is here settled by finding a 13-coloring. Such 13-colorings with specific symmetries are further classified. 1 Introduction arXiv:1607.01605v1 [math.CO] 6 Jul 2016 A cube-like graph is a Cayley graph for the elementary abelian group of order 2n. One of the original motivations for studying cube-like graphs was the fact that they have only integer eigenvalues [5]. Cube-like graphs also form a generalization of the hypercube. ∗Supported by Aalto ELEC Doctoral School, Nokia Foundation, Emil Aaltonen Foun- dation, and by Academy of Finland Project 289002. †Supported in part by Academy of Finland Project 289002.
    [Show full text]
  • Constructing Covering Codes
    Research Collection Master Thesis Constructing Covering Codes Author(s): Filippini, Luca Teodoro Publication Date: 2016 Permanent Link: https://doi.org/10.3929/ethz-a-010633987 Rights / License: In Copyright - Non-Commercial Use Permitted This page was generated automatically upon download from the ETH Zurich Research Collection. For more information please consult the Terms of use. ETH Library Constructing Covering Codes Master Thesis Luca Teodoro Filippini Thursday 24th March, 2016 Advisors: Prof. Dr. E. Welzl, C. Annamalai Department of Theoretical Computer Science, ETH Z¨urich Abstract Given r, n N, the problem of constructing a set C 0,1 n such that every∈ element in 0,1 n has Hamming distance at⊆ most { }r from some element in C is called{ }the covering code construction problem. Con- structing a covering code of minimal size is such a hard task that even for r = 1, n = 10 we don’t know the exact size of the minimal code. Therefore, approximations are often studied and employed. Among the several applications that such a construction has, it plays a key role in one of the fastest 3-SAT algorithms known to date. The main contribution of this thesis is presenting a Las Vegas algorithm for constructing a covering code with linear radius, derived from the famous Monte Carlo algorithm of random codeword sampling. Our algorithm is faster than the deterministic algorithm presented in [5] by a cubic root factor of the polynomials involved. We furthermore study the problem of determining the covering radius of a code: it was already proven -complete for r = n/2, and we extend the proof to a wider range ofN radii.
    [Show full text]
  • About Chapter 13
    Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981 You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links. About Chapter 13 In Chapters 8{11, we established Shannon's noisy-channel coding theorem for a general channel with any input and output alphabets. A great deal of attention in coding theory focuses on the special case of channels with binary inputs. Constraining ourselves to these channels simplifies matters, and leads us into an exceptionally rich world, which we will only taste in this book. One of the aims of this chapter is to point out a contrast between Shannon's aim of achieving reliable communication over a noisy channel and the apparent aim of many in the world of coding theory. Many coding theorists take as their fundamental problem the task of packing as many spheres as possible, with radius as large as possible, into an N-dimensional space, with no spheres overlapping. Prizes are awarded to people who find packings that squeeze in an extra few spheres. While this is a fascinating mathematical topic, we shall see that the aim of maximizing the distance between codewords in a code has only a tenuous relationship to Shannon's aim of reliable communication. 205 Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981 You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.
    [Show full text]
  • Lincode – Computer Classification of Linear Codes
    LINCODE – COMPUTER CLASSIFICATION OF LINEAR CODES SASCHA KURZ ABSTRACT. We present an algorithm for the classification of linear codes over finite fields, based on lattice point enumeration. We validate a correct implementation of our algorithm with known classification results from the literature, which we partially extend to larger ranges of parameters. Keywords: linear code, classification, enumeration, code equivalence, lattice point enumeration ACM: E.4, G.2, G.4 1. INTRODUCTION Linear codes play a central role in coding theory for several reasons. They permit a compact representation via generator matrices as well as efficient coding and decoding algorithms. Also multisets of points in the projective space PG(k − 1; Fq) of cardinality n correspond to linear [n; k]q codes, see e.g. [7]. So, let q be a prime power and Fq be the field of order q.A q-ary linear code of length n, dimension k, and minimum (Hamming) distance at least d is called an [n; k; d]q code. If we do not want to specify the minimum distance d, then we also speak of an [n; k]q code or of an [n; k; fw1; : : : ; wlg]q if the non-zero codewords have weights in fw1; : : : ; wkg. If for the binary case q = 2 all weights wi are divisible by 2, we also speak of an even code. We can also n look at those codes as k-dimensional subspaces of the Hamming space Fq . An [n; k]q code can be k×n k represented by a generator matrix G 2 Fq whose row space gives the set of all q codewords of the code.
    [Show full text]
  • An Index Structure for Fast Range Search in Hamming Space By
    An Index Structure for Fast Range Search in Hamming Space by Ernesto Rodriguez Reina A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master in Computer Science in Faculty of Science Computer Science University of Ontario Institute of Technology November 2014 c Ernesto Rodriguez Reina, 2014 Abstract An Index Structure for Fast Range Search in Hamming Space Ernesto Rodriguez Reina Master in Computer Science Faculty of Science University of Ontario Institute of Technology 2014 This thesis addresses the problem of indexing and querying very large databases of binary vectors. Such databases of binary vectors are a common occurrence in domains such as information retrieval and computer vision. We propose an indexing structure consisting of a compressed trie and a hash table for supporting range queries in Hamming space. The index structure, which can be updated incrementally, is able to solve the range queries for any radius. Out approach minimizes the number of memory access, and as result significantly outperforms state-of-the-art approaches. Keywords: range queries, r-neighbors queries, hamming distance. ii To my beloved wife for being my source of inspiration. iii Acknowledgements I would like to express my special appreciation and thanks to my advisors Professor Dr. Ken Pu and Professor Dr. Faisal Qureshi for the useful comments, remarks and engagement throughout the learning process of this master thesis. You both have been tremendous mentors for me. Your advices on both research as well as on my career have been priceless. I owe my deepest gratitude to my college Luis Zarrabeitia for introducing me to Professor Qureshi and Professor Pu, and also for all your help to came to study to the UOIT.
    [Show full text]
  • A Tutorial on Quantum Error Correction
    Proceedings of the International School of Physics “Enrico Fermi”, course CLXII, “Quantum Computers, Algorithms and Chaos”, G. Casati, D. L. Shepelyansky and P. Zoller, eds., pp. 1–32 (IOS Press, Amsterdam 2006). c Societ`aItaliana di Fisica. ° A Tutorial on Quantum Error Correction Andrew M. Steane Centre for Quantum Computation, Department of Physics, University of Oxford, Clarendon Laboratory, Parks Road, Oxford OX1 3PU, England. A Tutorial on Quantum Error Correction 2 1. Introduction Quantum error correction (QEC) comes from the marriage of quantum mechanics with the classical theory of error correcting codes. Error correction is a central concept in classical information theory, and quantum error correction is similarly foundational in quantum information theory. Both are concerned with the fundamental problem of communication, and/or information storage, in the presence of noise. The codewords which arise in QEC are also interesting objects in their own right, displaying rich forms of entanglement. This introduction will concentrate on the essential ideas of QEC and on the construction and use of quantum error correcting codes. The motivation is that although QEC is now quite a well established subject, it remains only vaguely understood by many people interested in quantum information and computing, or little studied by students learning it, because it appears to require mathematical knowledge they do not possess, and to be limited in application. Introductions tend to stay at the level of single- error-correcting codes, and single-qubit encoding codes, but this is a mistake because some of the essential properties only emerge when multiple errors and encoding of multiple qubits are considered.
    [Show full text]
  • Notes 5.1: Fourier Transform, Macwillams Identities, and LP Bound February 2010 Lecturer: Venkatesan Guruswami Scribe: Venkat Guruswami & Srivatsan Narayanan
    Introduction to Coding Theory CMU: Spring 2010 Notes 5.1: Fourier Transform, MacWillams identities, and LP bound February 2010 Lecturer: Venkatesan Guruswami Scribe: Venkat Guruswami & Srivatsan Narayanan We will discuss the last and most sophisticated of our (upper) bounds on rate of codes with certain relative distance, namely the first linear programming bound or the first JPL bound due to McEliece, Rodemich, Rumsey, and Welch, 1977 (henceforth, MRRW). This bound is the best known asymptotic upper bound on the rate of a binary code for a significant range of relative distances (which is roughly δ 2 (0:273; 1=2)). We will present a complete and self-contained proof of the this bound. A variant called the second JPL bound gives the best known upper bound for the remainder of the range, and we will mention this bound (without proof) at the end. The linear programming bound is so-called because it is based on Delsarte's linear programming approach which shows that the distance distribution of a binary code satisfies a family of linear constraints whose coefficients are the evaluations of a certain family of orthogonal polynomials (in this case, the Krawtchouk polynomials). The optimum (maximum) of this linear program gives an upper bound on A(n; d). MRRW constructed good feasible solutions to the dual of linear program using tools from the theory of orthogonal polynomials, and their value gave an upper bound on A(n; d) by weak duality. In these notes, we will use Fourier analysis of functions defined on the hypercube to derive a relationship between the weight distribution of a linear code and its dual, called the MacWilliams identifies.
    [Show full text]
  • Generalizations of the Macwilliams Extension Theorem Serhii Dyshko
    Generalizations of the MacWilliams Extension Theorem Serhii Dyshko To cite this version: Serhii Dyshko. Generalizations of the MacWilliams Extension Theorem. General Mathematics [math.GM]. Université de Toulon, 2016. English. NNT : 2016TOUL0018. tel-01565075 HAL Id: tel-01565075 https://tel.archives-ouvertes.fr/tel-01565075 Submitted on 19 Jul 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. ÉCOLE DOCTORALE Mer et Sciences (ED548) Institut de Mathématiques de Toulon THÈSE présentée par : Serhii DYSHKO soutenue le : 15 Décembre 2016 pour obtenir le grade de Docteur en Mathématiques Généralisations du Théorème d’Extension de MacWilliams THÈSE dirigée par : M. LANGEVIN Philippe Professeur, Université de Toulon JURY : M. AUBRY Yves Maître de conférence HDR, Université de Toulon M. GREFERATH Marcus Professeur, Aalto University M. LEROY André Professeur, Université d'Artois M. RANDRIAMBOLOLONA Hugues Maître de conférence, Telecom ParisTech M. WOOD Jay Professeur, Western Michigan University M. ZEMOR Gilles Professeur, Université de Bordeaux CONTENTS 1. Preliminaries :::::::::::::::::::::::::::::::: 1 1.1 Rings and modules . .1 1.2 Characters and the Fourier transform . .2 1.3 Hamming space and codes . .3 1.4 Categories of codes . .3 1.5 Additive codes . .4 2.
    [Show full text]
  • Duality Between Packings and Coverings of the Hamming Space
    Advances in Mathematics of Communications Web site: http://www.aimSciences.org Volume 1, No. 1, 2007, 93–97 DUALITY BETWEEN PACKINGS AND COVERINGS OF THE HAMMING SPACE Gerard´ Cohen D´epartement Informatique Ecole Nationale Sup´erieure des T´el´ecommunications 46 rue Barrault, 75634 Paris, France Alexander Vardy Department of Electrical and Computer Engineering Department of Computer Science and Engineering Department of Mathematics University of California San Diego 9500 Gilman Drive, La Jolla, CA 92093, USA (Communicated by Simon Litsyn) Abstract. We investigate the packing and covering densities of linear and nonlinear binary codes, and establish a number of duality relationships between the packing and covering problems. Specifically, we prove that if almost all codes (in the class of linear or nonlinear codes) are good packings, then only a vanishing fraction of codes are good coverings, and vice versa: if almost all codes are good coverings, then at most a vanishing fraction of codes are good packings. We also show that any specific maximal binary code is either a good packing or a good covering, in a certain well-defined sense. 1. Introduction F n Let 2 be the vector space of all the binary n-tuples, endowed with the Hamming F n metric. Specifically, the Hamming distance d(x, y) between x, y ∈ 2 is defined as the number of positions where x and y differ. A binary code of length n is a subset F n of 2 , while a binary linear code of length n and dimension k is a k-dimensional F n subspace of 2 .
    [Show full text]