On the Variance of Average Distance of Subsets in the Hamming Space

This document is downloaded from DR‑NTU (https://dr.ntu.edu.sg) Nanyang Technological University, Singapore. On the variance of average distance of subsets in the Hamming space Fu, Fang‑Wei; Ling, San; Xing, Chaoping 2004 Fu, F. W., Ling, S., & Xing, C. (2004). On the variance of average distance of subsets in the Hamming space. Discrete Applied Mathematics, 145(3), 465‑478. https://hdl.handle.net/10356/96425 https://doi.org/10.1016/j.dam.2004.08.004 © 2004 Elsevier B.V. This is the author created version of a work that has been peer reviewed and accepted for publication by Discrete Applied Mathematics, Elsevier B.V. It incorporates referee’s comments but changes resulting from the publishing process, such as copyediting, structural formatting, may not be reflected in this document. The published version is available at: [http://dx.doi.org/10.1016/j.dam.2004.08.004]. Downloaded on 30 Sep 2021 18:29:50 SGT Discrete Applied Mathematics 145 (2005) 465–478 www.elsevier.com/locate/dam On the variance of average distance of subsets in the Hamming space Fang-Wei Fua,1, San Lingb, Chaoping Xingb aTemasek Laboratories, National University of Singapore, 5 Sports Drive 2, Singapore 117508, Singapore bDepartment of Mathematics, National University of Singapore, 2 Science Drive 2, Singapore 117543, Singapore Received 13 September 2002; received in revised form 23 August 2004; accepted 31 August 2004 Abstract n Let V be a finite set with q distinct elements. For a subset C of V , denote var(C) the variance of the average Hamming distance of C. Let T (n, M; q) and R(n, M; q) denote the minimum and maximum variance of the average Hamming distance of n subsets of V with cardinality M, respectively. In this paper, we study T (n, M; q) and R(n, M; q) for general q. Using methods from coding theory, we derive upper and lower bounds on var(C), which generalize and unify the bounds for the case q = 2. These bounds enable us to determine the exact value for T (n, M; q) and R(n, M; q) in several cases. © 2004 Elsevier B.V. All rights reserved. Keywords: Hamming space; Subsets; Average distance; Variance; Codes; Distance distribution 1. Introduction n Let V ={v1,v2,...,vq } be a finite set with q distinct elements, where q is a positive integer. Let V be the set of ordered n-tuples over V. The Hamming distance between two vectors a and b is the number of components where they differ, and is n denoted by dH(a, b). Let C be a subset of V with size |C|=M. The average Hamming distance of C is defined by 1 d(C)¯ = d (a, b). (1.1) M2 H a∈C b∈C The variance of the average distance of C is defined by 1 var(C) = [d (a, b) − d(C)¯ ]2. (1.2) M2 H a∈C b∈C It is easy to check that 1 var(C) = [d (a, b)]2 −[d(C)¯ ]2. (1.3) M2 H a∈C b∈C 1On leave from the Department of Mathematics, Nankai University, Tianjin 300071, P. R. China. E-mail addresses: [email protected] (F.-W. Fu), [email protected] (S. Ling), [email protected] (C. Xing). 0166-218X/$ - see front matter © 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.dam.2004.08.004 466 F.-W. Fu et al. / Discrete Applied Mathematics 145 (2005) 465–478 n The minimum and maximum average Hamming distance of a subset of V with size M are defined by n (n, M; q) = min{d(C)¯ | C is a subset of V with size |C|=M}, n (n, M; q) = max{d(C)¯ | C is a subset of V with size |C|=M}. n The minimum and maximum variance of the average distance of a subset of V with size M are defined by n T (n, M; q) = min{var(C)| C is a subset of V with size |C|=M}, n R(n, M; q) = max{var(C)| C is a subset of V with size |C|=M}. Ahlswede and Katona [2] first posed the problem of determining (n, M; q) on the extremal combinatorics of Hamming space. There are a number of papers (see [1–4,8–13,15,16]) dealing with this topic thereafter, and some exact values of (n, M; q) n are determined. It is still an open problem to determine (n, M; q) for general n, q and 1M q . Ahlswede and Althöfer [1] observed that this problem also occurs in the construction of good codes for write-efficient memories, introduced by Ahlswede and Zhang [3] as a model for storing and updating information on a rewritable medium with cost constraints. Kündgen [12] observed that this problem is equivalent to a covering problem in graph theory. Ahlswede and Katona [2] first mentioned the problem of determining (n, M; q) for q = 2 and gave a simple solution. Fu and Xing [11] gave a complete solution for the problem of determining (n, M; q)for general q. Since the variance is an important digital characteristic for the average distance, n Fu and Shen [9] first posed the problem of determining T (n, M; 2). For a subset C of {0, 1} , Fu and Shen [9] presented a lower n− bound and an upper bound on var(C). Moreover, they determined the exact value of T (n, 2 1; 2). If the size |C| is odd, Xia and Fu [15] improved the lower and upper bounds of Fu and Shen on var(C). Furthermore, they determined the exact values of n n− T (n, 2 − 1; 2) and T (n, 2 1 ± 1; 2). In this paper, we study T (n, M; q) and R(n, M; q) for general q. Using methods from coding theory, we derive upper and lower bounds on var(C), which generalize and unify the bounds for the case q = 2. These bounds enable us to determine the exact value for T (n, M; q) and R(n, M; q) in several cases. Without loss of generality, below we assume that V = Zq ={0, 1,...,q − 1}, the abelian group under addition modulo q, n since we only deal with the Hamming distance in the Hamming space V . Furthermore, if q is a prime power, we can assume n n that V = Fq , the finite field of q elements. The Hamming weight wH(a) of a vector a in Zq or Fq is the number of nonzero n n coordinates in a. Obviously, for a, b ∈ Zq or Fq , dH(a, b) = wH(a − b). If q is a prime power, denote ={(c1,c2,c3,...,cn) | ci ∈ Fq and c1 + c2 = 0}, + − = ∪{(0, 1, 0,...,0)}, = \{(0, 0, 0,...,0)}. If q is a positive integer and q 2, denote n−1 + − = Zq ×{0}, = ∪{(0,...,0, 1)}, = \{(0,...,0, 0)}. Our main results in this paper are given as follows. Theorem 1. For 2q 4, we have n− R(n, q 1; q) = var(), (1.4) n− + R(n, q 1 + 1; q) = var( ), (1.5) n− − R(n, q 1 − 1; q) = var( ). (1.6) Theorem 2. If q 2, we have n− T (n, q 1; q) = var(), (1.7) n− − T (n, q 1 − 1; q) = var( ). (1.8) If n3 or 2q 4, we have n− + T (n, q 1 + 1; q) = var( ). (1.9) F.-W. Fu et al. / Discrete Applied Mathematics 145 (2005) 465–478 467 + − + − The exact values of var(), var( ), var( ), var(), var( ) and var( ) will be computed in Section 3. It seems to be difficult to determine T (n, M; q) and R(n, M; q) in general. In particular, it is interesting to know whether Theorem 1 is still true for q being a prime power and q 5. This paper is organized as follows. In Section 2, in order to establish our results, we review some basic properties of distance distributions of codes. In Section 3, we compute var(C) for some subsets. In Section 4, we derive an upper bound on var(C) for 2q 4. Theorem 1 is proved by showing that this upper bound is tight for some cases. In Section 5, we derive a lower bound on var(C) for general q. Theorem 2 is proved by showing that this lower bound is tight for some cases. 2. Preliminaries In this section, we review some basic properties of distance distributions of codes. n For a subset C of V with size |C|=M, we call C an (n, M; q) code in coding theory. The distance distribution of C is defined by A = 1 |{(a, b)| a,b ∈ C, d (a, b) = i}|,i= , ,...,n. i M H 0 1 (2.1) The dual distance distribution of C is defined by n B = 1 K (j; q)A ,k= , ,...,n, k M k j 0 1 (2.2) j=0 where Kk(j; q) are the q-ary Krawtchouk numbers defined by k j n − j K (j; q) = (− )i(q − )k−i . k 1 1 i k − i (2.3) i=0 The distance enumerator of C is defined as n i WC(x) = Aix i=0 and the dual distance enumerator of C is defined as n i Wˆ C(x) = Bix . i=0 The MacWilliams–Delsarte identity (see [14]) gives the relationship between WC(x) and Wˆ C(x): − x Wˆ (x) = 1 [ + (q − )x]nW 1 , C M 1 1 C + (q − )x (2.4) 1 1 M n 1 − x WC(x) = [1 + (q − 1)x] Wˆ C .

On the Variance of Average Distance of Subsets in the Hamming Space

Sequence Distance Embeddings

Bilinear Forms Over a Finite Field, with Applications to Coding Theory

Scribe Notes

The Chromatic Number of the Square of the 8-Cube

Constructing Covering Codes

About Chapter 13

Lincode – Computer Classification of Linear Codes

An Index Structure for Fast Range Search in Hamming Space By

L Repository

A Tutorial on Quantum Error Correction

Notes 5.1: Fourier Transform, Macwillams Identities, and LP Bound February 2010 Lecturer: Venkatesan Guruswami Scribe: Venkat Guruswami & Srivatsan Narayanan

Generalizations of the Macwilliams Extension Theorem Serhii Dyshko