1 Numerically stable coded matrix computations via circulant and rotation matrix embeddings Aditya Ramamoorthy and Li Tang Department of Electrical and Computer Engineering Iowa State University, Ames, IA 50011 fadityar,
[email protected] Abstract Polynomial based methods have recently been used in several works for mitigating the effect of stragglers (slow or failed nodes) in distributed matrix computations. For a system with n worker nodes where s can be stragglers, these approaches allow for an optimal recovery threshold, whereby the intended result can be decoded as long as any (n − s) worker nodes complete their tasks. However, they suffer from serious numerical issues owing to the condition number of the corresponding real Vandermonde-structured recovery matrices; this condition number grows exponentially in n. We present a novel approach that leverages the properties of circulant permutation matrices and rotation matrices for coded matrix computation. In addition to having an optimal recovery threshold, we demonstrate an upper bound on the worst-case condition number of our recovery matrices which grows as ≈ O(ns+5:5); in the practical scenario where s is a constant, this grows polynomially in n. Our schemes leverage the well-behaved conditioning of complex Vandermonde matrices with parameters on the complex unit circle, while still working with computation over the reals. Exhaustive experimental results demonstrate that our proposed method has condition numbers that are orders of magnitude lower than prior work. I. INTRODUCTION Present day computing needs necessitate the usage of large computation clusters that regularly process huge amounts of data on a regular basis. In several of the relevant application domains such as machine learning, arXiv:1910.06515v4 [cs.IT] 9 Jun 2021 datasets are often so large that they cannot even be stored in the disk of a single server.