SPIN: A Fast and Scalable Matrix Inversion Method in Apache Spark Chandan Misra Swastik Haldar Indian Institute of Technology Kharagpur Indian Institute of Technology Kharagpur West Bengal, India West Bengal, India
[email protected] [email protected] Sourangshu Bhaacharya Soumya K. Ghosh Indian Institute of Technology Kharagpur Indian Institute of Technology Kharagpur West Bengal, India West Bengal, India
[email protected] [email protected] ABSTRACT many of the workloads. In the big data era, many of these applica- e growth of big data in domains such as Earth Sciences, Social tions have to work on huge matrices, possibly stored over multiple Networks, Physical Sciences, etc. has lead to an immense need servers, and thus consuming huge amounts of computational re- for efficient and scalable linear algebra operations, e.g. Matrix in- sources for matrix inversion. Hence, designing efficient large scale version. Existing methods for efficient and distributed matrix in- distributed matrix inversion algorithms, is an important challenge. version using big data platforms rely on LU decomposition based Since its release in 2012, Spark [19] has been adopted as a dom- block-recursive algorithms. However, these algorithms are com- inant solution for scalable and fault-tolerant processing of huge plex and require a lot of side calculations, e.g. matrix multiplica- datasets in many applications, e.g., machine learning [11], graph tion, at various levels of recursion. In this paper, we propose a processing [7], climate science [12], social media analytics [2], etc. different scheme based on Strassen’s matrix inversion algorithm Spark has gained its popularity for its in-memory distributed data (mentioned in Strassen’s original paper in 1969), which uses far processing ability, which runs interactive and iterative applications fewer operations at each level of recursion.