Balanced Sparsest Generator Matrices for MDS Codes

Total Page:16

File Type:pdf, Size:1020Kb

Balanced Sparsest Generator Matrices for MDS Codes Balanced Sparsest Generator Matrices for MDS Codes Son Hoang Dau∗, Wentu Song†, Zheng Dong‡, Chau Yuen§ Singapore University of Technology and Design, Singapore Emails: {∗sonhoang dau, †wentu song, ‡dong zheng, §yuenchau}@sutd.edu.sg Abstract—We show that given n and k, for q sufficiently large, In fact, any error-correcting code can be used in the there always exists an [n, k]q MDS code that has a generator aforementioned scheme for sensor networks. We choose to G matrix satisfying the following two conditions: study MDS codes first because their structure, especially their G − (C1) Sparsest: each row of has Hamming weight n k + 1; weight distribution, is well studied (see, for instance [1, Ch. (C2) Balanced: Hamming weights of the columns of G differ from each other by at most one. 11]). Moreover, they have optimal error-correcting capability, given the length and the dimension. We prove that over a I. INTRODUCTION sufficiently large field, there always exists an MDS code that We study the existence and provide a construction of a has a balanced and sparsest generator matrix, which is ideally sparsest and balanced generator matrix of Maximum Distance suitable for the above encoding scheme for sensor networks. Separable (MDS) codes. A generator matrix is the sparsest Necessary notations and definitions are provided in Sec- if it contains the least number of nonzero entries among all tion II. We state and prove our main result in Section III. generator matrices of the same MDS code. A generator matrix II. PRELIMINARIES is balanced if every column contains approximately the same F number of nonzero entries. More specifically, we require that We denote by q the finite field with q elements. Let the number of nonzero entries in each column differs from [n] denote the set {1, 2,...,n}. The support of a vector u Fn u each other by at most one. = (u1,...,un) ∈ q is defined by supp( ) = {i ∈ [n]: Apart from being of theoretical interest, our study on ui 6= 0}. The (Hamming) weight of u is |supp(u)|. We can balanced sparsest generator matrices for MDS codes was also define weight and support of a row or a column of a motivated by its application in error correction for sensor matrix over some finite field, by regarding them as vectors networks. Suppose n sensors, S1,..., Sn, collectively measure over that field. Apart from Hamming weight, we also use k conditions x1,...,xk, such as temperature, pressure, light other standard notions from coding theory such as minimum intensity, etc. Let x = (x1,...,xk), where xi ∈ Fq for each distance, linear [n, k]q and [n,k,d]q codes, MDS codes, and i =1,...,k (Fq is a finite field of q elements). These sensors generator matrices (for instance, see [1]). G Fk×n G transmit the information they collected to a base station, which For a matrix = (gi,j ) ∈ q , the support matrix of , is a data collector. Furthermore, each sensor performs some denoted supp(G),isa k×n binary matrix M = (mi,j ) where encoding on the information it has, before transmitting the mi,j = 0 if gi,j = 0 and mi,j = 1 if gi,j 6= 0. Let M = information back to the base station in the following way. Let (mi,j ) be a k × n binary matrix. We denote by var(M) = M G be an k×n generator matrix of an [n,k,d]q error-correcting (vi,j ) the matrix obtained from by replacing every nonzero code. Sensor Si transmits the scalar product of x and column entry mi,j =1 by ξi,j , where ξi,j ’s are indeterminates. More arXiv:1301.5108v1 [cs.IT] 22 Jan 2013 i of G to the base station. It is well known in classical formally, vi,j = 0 if mi,j = 0 and vi,j = ξi,j if mi,j = 1. coding theory that this coding scheme allows the base station We also denote by gr(M) the bipartite graph G = (V , E ) x d−1 defined as follows. The vertex set V can be partitioned into to retrieve when at most ⌊ 2 ⌋ sensors transmit wrong information. Moreover, the base station can also identify two parts, namely, the left part L = {ℓ1,...,ℓk}, and the right the malfunctioned sensors. For each sensor Si, only those part R = {r1,...,rn}. The edge set is conditions corresponding to nonzero entries of column i of G E = (ℓi, rj ): i ∈ [k], j ∈ [n], mi,j 6=0 . are involved into encoding. So it is sufficient for Si to measure only such conditions. Thus, if G is sparse then in average, For any k × n matrix N, we define f(N) = P det(P ), n P each sensor only needs to measure a few among k conditions where the product is taken over all k submatrices of order in order to achieve the desired error correction capability. On k of N. Q top of that, if columns of G have approximately the same III. MAIN RESULT number of nonzero entries then the sensors are required to measure approximately the same number of conditions. This A sparsest generator matrix of an [n, k]q MDS code would balance guarantees an even distribution of workload among have precisely n−k+1 nonzero entries in every row. Moreover, k(n−k+1) sensors, which is an important criterion for sensor networks if it is balanced, then each column contains either ⌊ n ⌋ k(n−k+1) where energy saving is a critical issue. or ⌈ n ⌉ nonzero entries. Hereafter, we often use Ri, i ∈ [k], and Cj , j ∈ [n], to denote the supports of row i and ∪i∈I Ri ≥ n − k + |I|, for every subset ∅ 6= I ⊆ [k]. (2) column j, respectively, of a k ×n binary matrix M. Note that Proof: Suppose that (1) holds and that there exists a Ri ⊆ [n] and Cj ⊆ [k]. nonempty set I ⊆ [k] satisfying Lemma 1. Let M = (mi,j ) be a k×n binary matrix. Suppose ∪i∈I Ri ≤ n − k + |I|− 1. (3) that each row of M has weight n − k +1. Then M is the We aim to obtain a contradiction. The condition (3) is equiv- support matrix of a generator matrix of some [n, k]q MDS n−1 alent to code over a sufficiently large field Fq (q> ) if and only k−1 ∩i I Ri ≥ k − |I| +1. (4) if f(var(M)) 6≡ 0. ∈ M G G Hence there exists a set J of k − |I| +1 columns of M that Proof: Suppose = supp( ), where = (gi,j ) satisfies ∩j∈J Cj ≥ |I|. Equivalently we have is a generator matrix of some [n, k]q MDS code. Due to a well-known property of MDS codes (see [1, p. 319]), ∪j∈J C j ≤ k − |I| < k − |I| +1= |J|. (5) every submatrix of order k of G has nonzero determinant. We obtain a contradiction between (1) and (5). The “only if” Therefore, f(G) 6= 0. Note that f(var(M)) can be regarded direction can be proved in a similar manner. as a multivariable polynomial in Fq[...,ξi,j ,...]. Moreover, since M = supp(G), we deduce that f(G) can be obtained Lemma 5. Let M = (mi,j ) be a k×n binary matrix. Suppose from f(var(M)) by substituting ξi,j by gi,j for all i, j where that each row of M has weight n − k +1. Then M is the gi,j 6=0. As f(G) 6=0, we conclude that f(var(M)) 6≡ 0. support matrix of a generator matrix of some [n, k]q MDS M F n−1 Now suppose that f(var( )) 6≡ 0. Note that each column code over a sufficiently large field q (q> k−1 ) if and only of var(M) belongs to precisely n−1 submatrices of order k if (2) holds. k−1 of var(M). Hence the exponent of each ξi,j in f(var(M)) n−1 M Proof: The proof follows from Lemma 1-4. is at most k−1 . Since f(var( )) 6≡ 0, by [2, Lemma n−1 F We present below our main result. 4], if q > k−1 then there exist gi,j ∈ q (for i, j M where mi,j = 1) so that f(var( ))(...,gi,j,...) 6= 0. Let Theorem 6 (Main Theorem). Suppose 1 ≤ k ≤ n and q > G n−1 = (gi,j ) (for i, j where mi,j = 0 we set gi,j = 0). Since . Then there always exists an [n, k] MDS code that has G M k−1 q f( ) = f(var( ))(...,gi,j,...) 6= 0, again by [1, p. 319], a generator matrix G satisfying the following two conditions. G we deduce that is a generator matrix of an [n, k]q MDS (C1) Sparsest: each row of G has weight n − k +1. G code. Therefore, each row of has weight at least n − k +1, (C2) Balanced: column weights of G differ from each other due to the Singleton Bound (see [1, p. 33]). Since each row by at most one. of M also has weight n − k +1, we deduce that gi,j 6= 0 whenever mi,j =1. Therefore, M = supp(G). By Lemma 5, to prove Theorem 6, we need to show that there always exists a k × n binary matrix M satisfying the M Lemma 2. Let = (mi,j ) be a k × n binary matrix. following properties Then var M if and only if every bipartite subgraph f( ( )) 6≡ 0 (P1) each row of M has weight n − k +1, induced by the k left-vertices and some k right-vertices in (P2) column weights of M differ from each other by at most M gr( ) has a perfect matching.
Recommended publications
  • Formalizing Randomized Matching Algorithms
    Formalizing Randomized Matching Algorithms Dai Tri Man Leˆ and Stephen A. Cook Department of Computer Science, University of Toronto fledt, [email protected] Abstract—Using Jerˇabek’s´ framework for probabilistic rea- set to another. Using this method, Jerˇabek´ developed tools for soning, we formalize the correctness of two fundamental RNC2 describing algorithms in ZPP and RP. He also showed in [13], algorithms for bipartite perfect matching within the theory VPV [14] that the theory VPV+sWPHP(L ) is powerful enough to for polytime reasoning. The first algorithm is for testing if a FP bipartite graph has a perfect matching, and is based on the formalize proofs of very sophisticated derandomization results, Schwartz-Zippel Lemma for polynomial identity testing applied e.g. the Nisan-Wigderson theorem [23] and the Impagliazzo- to the Edmonds polynomial of the graph. The second algorithm, Wigderson theorem [12]. (Note that Jerˇabek´ actually used the due to Mulmuley, Vazirani and Vazirani, is for finding a perfect single-sorted theory PV1+sWPHP(PV), but these two theories matching, where the key ingredient of this algorithm is the are isomorphic.) Isolating Lemma. In [15], Jerˇabek´ developed an even more systematic ap- proach by showing that for any bounded P=poly definable set, I. INTRODUCTION there exists a suitable pair of surjective “counting functions” 1 There is a substantial literature on theories such as PV, S2 , which can approximate the cardinality of the set up to a poly- VPV, V 1 which capture polynomial time reasoning [8], [4], nomially small error. From this and other results he argued [17], [7].
    [Show full text]
  • Coded Sparse Matrix Multiplication
    Coded Sparse Matrix Multiplication Sinong Wang∗, Jiashang Liu∗ and Ness Shroff∗† ∗Department of Electrical and Computer Engineering yDepartment of Computer Science and Engineering The Ohio State University fwang.7691, liu.3992 [email protected] Abstract r×t In a large-scale and distributed matrix multiplication problem C = A|B, where C 2 R , the coded computation plays an important role to effectively deal with “stragglers” (distributed computations that may get delayed due to few slow or faulty processors). However, existing coded schemes could destroy the significant sparsity that exists in large-scale machine learning problems, and could result in much higher computation overhead, i.e., O(rt) decoding time. In this paper, we develop a new coded computation strategy, we call sparse code, which achieves near optimal recovery threshold, low computation overhead, and linear decoding time O(nnz(C)). We implement our scheme and demonstrate the advantage of the approach over both uncoded and current fastest coded strategies. I. INTRODUCTION In this paper, we consider a distributed matrix multiplication problem, where we aim to compute C = A|B from input matrices A 2 Rs×r and B 2 Rs×t for some integers r; s; t. This problem is the key building block in machine learning and signal processing problems, and has been used in a large variety of application areas including classification, regression, clustering and feature selection problems. Many such applications have large- scale datasets and massive computation tasks, which forces practitioners to adopt distributed computing frameworks such as Hadoop [1] and Spark [2] to increase the learning speed.
    [Show full text]
  • Algebraic Algorithms in Combinatorial Optimization
    Algebraic Algorithms in Combinatorial Optimization CHEUNG, Ho Yee A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Philosophy in Computer Science and Engineering The Chinese University of Hong Kong September 2011 Thesis/Assessment Committee Professor Shengyu Zhang (Chair) Professor Lap Chi Lau (Thesis Supervisor) Professor Andrej Bogdanov (Committee Member) Professor Satoru Iwata (External Examiner) Abstract In this thesis we extend the recent algebraic approach to design fast algorithms for two problems in combinatorial optimization. First we study the linear matroid parity problem, a common generalization of graph matching and linear matroid intersection, that has applications in various areas. We show that Harvey's algo- rithm for linear matroid intersection can be easily generalized to linear matroid parity. This gives an algorithm that is faster and simpler than previous known al- gorithms. For some graph problems that can be reduced to linear matroid parity, we again show that Harvey's algorithm for graph matching can be generalized to these problems to give faster algorithms. While linear matroid parity and some of its applications are challenging generalizations, our results show that the al- gebraic algorithmic framework can be adapted nicely to give faster and simpler algorithms in more general settings. Then we study the all pairs edge connectivity problem for directed graphs, where we would like to compute minimum s-t cut value between all pairs of vertices. Using a combinatorial approach it is not known how to solve this problem faster than computing the minimum s-t cut value for each pair of vertices separately.
    [Show full text]
  • Coded Computation for Speeding up Distributed Machine Learning
    CODED COMPUTATION FOR SPEEDING UP DISTRIBUTED MACHINE LEARNING DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of the Ohio State University By Sinong Wang Graduate Program in Electrical and Computer Engineering The Ohio State University 2019 Dissertation Committee: Ness B. Shroff, Advisor Atilla Eryilmaz Abhishek Gupta Andrea Serrani c Copyrighted by Sinong Wang 2019 ABSTRACT Large-scale machine learning has shown great promise for solving many practical ap- plications. Such applications require massive training datasets and model parameters, and force practitioners to adopt distributed computing frameworks such as Hadoop and Spark to increase the learning speed. However, the speedup gain is far from ideal due to the latency incurred in waiting for a few slow or faulty processors, called \straggler" to complete their tasks. To alleviate this problem, current frameworks such as Hadoop deploy various straggler detection techniques and usually replicate the straggling task on another available node, which creates a large computation overhead. In this dissertation, we focus on a new and more effective technique, called coded computation to deal with stragglers in the distributed computation problems. It creates and exploits coding redundancy in local computation to enable the final output to be recoverable from the results of partially finished workers, and can therefore alleviate the impact of straggling workers. However, we observe that current coded computation techniques are not suitable for large-scale machine learning application. The reason is that the input training data exhibit both extremely large-scale targeting data and a sparse structure. However, the existing coded computation schemes destroy the sparsity and creates large computation redundancy.
    [Show full text]
  • Matchings in Graphs 1 Definitions 2 Existence of a Perfect Matching
    Matchings in Graphs Lecturer: Prajakta Nimbhorkar Meeting: 7 Scribe: Karteek 18/02/2010 In this lecture we look at an \efficient" randomized parallel algorithm for each of the follow- ing: • Checking the existence of perfect matching in a bipartite graph. • Checking the existence of perfect matching in a general graph. • Constructing a perfect matching for a given graph. • Finding a min-weight perfect matching in the given weighted graph. • Finding a Maximum matching in the given graph. where “efficient" means an NC-algorithm or an RNC-algorithm. 1 Definitions Definition 1 NC: The set of problems that can be solved within poly-log time by a polynomial number of processors working in parallel. Definition 2 RNC: The set of problems for which there is a poly-log time algorithm which uses polynomially many processors in parallel and in addition is allowed polynomially many random bits and the following properties hold: 1 • If the correct answer is YES, the algorithm returns YES with probabilty ≥ 2 . • If the correct answer is NO, the algorithm always returns NO. 2 Existence of a perfect matching 2.1 Bipartite Graphs Here we consider the decision problem of checking if a bipartite graph has a perfect matching. Let the given graph be G = (V; E). Let V =A[B. Since we are looking for perfect matchings, we only look at graphs where jAj = jBj = n, because otherwise, there cannot be a perfect matching. Fact 3 In a bipartite graph as above, every perfect matching can be seen as a permutation from Sn and every permutation from Sn can be seen as perfect matching.
    [Show full text]
  • A Random Linear Network Coding Approach to Multicast Tracey Ho, Member, IEEE, Muriel Médard, Senior Member, IEEE, Ralf Koetter, Senior Member, IEEE, David R
    IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 10, OCTOBER 2006 4413 A Random Linear Network Coding Approach to Multicast Tracey Ho, Member, IEEE, Muriel Médard, Senior Member, IEEE, Ralf Koetter, Senior Member, IEEE, David R. Karger, Associate Member, IEEE, Michelle Effros, Senior Member, IEEE, Jun Shi, and Ben Leong Abstract—We present a distributed random linear network coding approach for transmission and compression of informa- tion in general multisource multicast networks. Network nodes independently and randomly select linear mappings from inputs onto output links over some field. We show that this achieves ca- pacity with probability exponentially approaching I with the code length. We also demonstrate that random linear coding performs compression when necessary in a network, generalizing error ex- ponents for linear Slepian–Wolf coding in a natural way. Benefits of this approach are decentralized operation and robustness to network changes or link failures. We show that this approach can take advantage of redundant network capacity for improved success probability and robustness. We illustrate some potential advantages of random linear network coding over routing in two Fig. 1. An example of distributed random linear network coding. and $ examples of practical scenarios: distributed network operation are the source processes being multicast to the receivers, and the coefficients and networks with dynamically varying connections. Our deriva- are randomly chosen elements of a finite field. The label on each link represents the process being transmitted on the link. tion of these results also yields a new bound on required field size for centralized network coding on general multicast networks. Index Terms—Distributed compression, distributed networking, This family of problems includes traditional single-source mul- multicast, network coding, random linear coding.
    [Show full text]
  • Advanced Algorithms, a Graduate-Level Course Taught by Anupam Gupta at Carnegie Mellon University in Fall 2020
    ADVANCEDALGORITHMS notes for cmu 15-850 (fall 2020) lecturer: anupam gupta About this document This document contains the course notes for 15-850:Advanced Algorithms, a graduate-level course taught by Anupam Gupta at Carnegie Mellon University in Fall 2020. Parts of these notes were written by the students of previous versions of the class (based on the lectures) and then edited by the professor. The names of the student scribes will appear soon, as will missing details and bug fixes, more chapters, exercises, and (better) figures. The notes have not been thoroughly checked for accuracy, espe- cially attributions of results. They are intended to serve as study resources and not as a substitute for professionally prepared publica- tions. We apologize for any inadvertent inaccuracies or misrepresen- tations. More information about the course, including problem sets and references, can be found on the course website: https://www.cs.cmu.edu/~15850/ The style files (as well as the text on this page!) are mildly adapted from the ones developed by Yufei Zhao (MIT), for his notes on Graph Theory and Additive Combinatorics. As some of you may guess, the LATEX template used for these notes is called tufte-book. Contents I Discrete Algorithms 9 1 Minimum Spanning Trees 11 1.1 Minimum Spanning Trees: History............. 11 1.2 The Classical Algorithms.................. 13 1.3 Fredman and Tarjan’s O(m log∗ n)-time Algorithm... 15 1.4 A Linear-Time Randomized Algorithm.......... 18 1.5 Optional: MST Verification................. 21 1.6 The Ackermann Function.................. 25 1.7 Matroids............................ 25 2 Arborescences: Directed Spanning Trees 27 2.1 Arborescences........................
    [Show full text]
  • Network Flow Algorithms for Wireless Networks and Design and Analysis of Rate Compatible LDPC Codes Cuizhu Shi Iowa State University
    Iowa State University Capstones, Theses and Graduate Theses and Dissertations Dissertations 2011 Network flow algorithms for wireless networks and design and analysis of rate compatible LDPC codes Cuizhu Shi Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/etd Part of the Electrical and Computer Engineering Commons Recommended Citation Shi, Cuizhu, "Network flow algorithms for wireless networks and design and analysis of rate compatible LDPC codes" (2011). Graduate Theses and Dissertations. 11972. https://lib.dr.iastate.edu/etd/11972 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Network flow algorithms for wireless networks and design and analysis of rate compatible LDPC codes by Cuizhu Shi A dissertation submitted to the graduate faculty in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Major: Electrical Engineering Program of Study Committee: Aditya Ramamoorthy, Major Professor Zhengdao Wang Nicola Elia Sang W. Kim Ryan Martin Iowa State University Ames, Iowa 2011 Copyright c Cuizhu Shi, 2011. All rights reserved. ii TABLE OF CONTENTS LIST OF TABLES . v LIST OF FIGURES . vi ACKNOWLEDGEMENTS . viii ABSTRACT . ix CHAPTER 1. OVERVIEW . 1 1.1 Introduction on Wireless Information Flow . .3 1.1.1 The Known Facts . .4 1.1.2 The Unknown . .6 1.2 A Step Further { the Deterministic Channel Model .
    [Show full text]
  • Coded Sparse Matrix Multiplication
    Coded Sparse Matrix Multiplication Sinong Wang 1 Jiashang Liu 1 Ness Shroff 2 Abstract and the master node has to collect the results from all work- ers to output matrix C. As a result, a major performance In a large-scale and distributed matrix multiplica- bottleneck is the latency in waiting for a few slow or faulty tion problem C = A|B, where C 2 r×t, the R processors – called “stragglers” to finish their tasks (Dean coded computation plays an important role to ef- & Barroso, 2013). To alleviate this problem, current frame- fectively deal with “stragglers” (distributed com- works such as Hadoop deploy various straggler detection putations that may get delayed due to few slow techniques and usually replicate the straggling task on an- or faulty processors). However, existing coded other available node. schemes could destroy the significant sparsity that exists in large-scale machine learning problems, Recently, forward error correction or coding techniques pro- and could result in much higher computation over- vide a more effective way to deal with the “straggler” in head, i.e., O(rt) decoding time. In this paper, we the distributed tasks (Dutta et al., 2016; Lee et al., 2017a; develop a new coded computation strategy, we Tandon et al., 2017; Yu et al., 2017; Li et al., 2018; Wang call sparse code, which achieves near optimal re- et al., 2018). It creates and exploits coding redundancy in covery threshold, low computation overhead, and local computation to enable the matrix C recoverable from linear decoding time O(nnz(C)). We implement the results of partial finished workers, and can therefore alle- our scheme and demonstrate the advantage of the viate some straggling workers.
    [Show full text]
  • Exact Covers Via Determinants Andreas Björklund
    Exact Covers via Determinants Andreas Björklund To cite this version: Andreas Björklund. Exact Covers via Determinants. 27th International Symposium on Theoretical Aspects of Computer Science - STACS 2010, Inria Nancy Grand Est & Loria, Mar 2010, Nancy, France. pp.95-106. inria-00455349 HAL Id: inria-00455349 https://hal.inria.fr/inria-00455349 Submitted on 10 Feb 2010 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Symposium on Theoretical Aspects of Computer Science 2010 (Nancy, France), pp. 95-106 www.stacs-conf.org EXACT COVERS VIA DETERMINANTS ANDREAS BJORKLUND¨ E-mail address: [email protected] Abstract. Given a k-uniform hypergraph on n vertices, partitioned in k equal parts such that every hyperedge includes one vertex from each part, the k-Dimensional Match- ing problem asks whether there is a disjoint collection of the hyperedges which covers all vertices. We show it can be solved by a randomized polynomial space algorithm in O∗(2n(k−2)/k) time. The O∗() notation hides factors polynomial in n and k. The general Exact Cover by k-Sets problem asks the same when the partition constraint is dropped and arbitrary hyperedges of cardinality k are permitted.
    [Show full text]
  • Admin Review String Matching
    Admin Arora talk. No class Monday. Review Fingerprinting: Universe of size u • Map to random fingerprint in universe of size v u • ≤ probability of collision 1=v • Freivald's technique verify matrix multiplication AB = C • check ABr = Cr for random r in 0; 1 n • f g probability of success 1=2 • works to check any matrix identity, not just product • useful if matrices \implicit" like AB • mapping size-n2 matrices to size-n vectors • In general, many ways to fingerprint explicitly represented objects. But for implicit objects, different methods have different strengths and weaknesses. We'll fingerprint 3 ways: vector multiply • number mod a random prime • polynomial evaluation at a random point • String matching Checksums: Alice and Bob have bit strings of length n • Think of n bit integers a, b • take a prime number p, compare a mod p and b mod p with log p bits. • trouble if a = b (mod p). How avoid? How likely? • { c = a b is n-bit integer. − 1 { so at most n prime factors. { How many prime factors less than k? Θ(k= ln k) { so take 2n2 log n limit { number of primes about n2 { So on random one, 1=n error prob. { O(log n) bits to send. { implement by add/sub, no mul or div! How find prime? { Well, a randomly chosen number is prime with prob. 1= ln n, { so just try a few. { How know its prime? Simple randomized test (later) Pattern matching in strings m-bit pattern • n-bit string • work mod prime p of size at most t • prob.
    [Show full text]
  • Lecture 11 — January 28, 2014 1 Overview 2 Ideas from Linear Algebra
    Advanced Graph Algorithms Jan-Apr 2014 Lecture 11 | January 28, 2014 Lecturer: Saket Sourabh Scribe: Prafullkumar P Tale 1 Overview In this lecture we will revisit the problem of finding perfect matching in non-bipartite graphs. We will study the algebraic approach to solve this problem. In the earlier lectures, we have seen the algorithm to decide whether a graph has perfect matching or not. In this, we will present O(nw+2) and O(nw+1) order algorithm to find the perfect matching. Before attacking perfect matching problem, we will state few well known results from Linear Algebra. 2 Ideas from Linear Algebra Theorem 1. If M is real skew symmetric matrix, then det(M) = [P f(M)]2 (1) where determinant and pfaffian of matrix A is given as follows n X Y det(M) = sgn(σ) Mi,σ(i) σ2Sn i=1 X pf(M) = sgn(µ)mµ µ #crossing where Sn is permutation group, µ is matching on f1; 2; : : : ; ng and sgn(µ) = (−1) . Definition 2 (Rank). For any given matrix M, rank of M is defined as dimension of largest square sub-matrix Msub, such that det(Msub) 6= 0. Notation Let M be an n × n matrix and X; Y be subsets of f1; 2; : : : ; ng. M[X; Y ] or MXY will denote the submatrix of M obtained by choosing the rows of M corresponding to indices in X and columns corresponding to indices in Y . Similarly M XY will donate the submatrix of M obtained by deleting rows of M corresponding to indices in X and columns corresponding to indices in Y .
    [Show full text]