Arxiv:Math/0307321V2
Total Page:16
File Type:pdf, Size:1020Kb
A Group-theoretic Approach to Fast Matrix Multiplication Henry Cohn Christopher Umans Microsoft Research Department of Computer Science One Microsoft Way California Institute of Technology Redmond, WA 98052-6399 Pasadena, CA 91125 [email protected] [email protected] Abstract known, but since 1990 nobody has been able to improve on them. In this paper: We develop a new, group-theoretic approach to bound- ing the exponent of matrix multiplication. There are two • We developa new approach to bounding ω that imports components to this approach: (1) identifying groups G that the problem into the domain of group theory and repre- admit a certain type of embedding of matrix multiplication sentation theory. The approach is relatively simple and into the group algebra C[G], and (2) controlling the dimen- almost entirely separate from the existing machinery sions of the irreducible representations of such groups. We built up since Strassen’s original algorithm. present machinery and examples to support (1), including a • We demonstrate the feasibility of the group theory as- proof that certain families of groups of order n2+o(1) sup- pect of the approach by identifying a family of groups port n × n matrix multiplication, a necessary condition for for which a parameter that mirrors ω approaches 2. We the approach to yield exponent 2. Although we cannot yet also exhibit techniques for bounding this critical pa- completely achieve both (1) and (2), we hope that it may be rameter and prove non-trivial bounds for a number of possible, and we suggest potential routes to that result using diverse groups and group families. the constructions in this paper. • We pose a question in representation theory (Ques- tion 4.1 below) that represents a potential barrier to 1. Introduction directly obtaining non-trivial bounds on ω using this approach. We do not know the answer to this question. Strassen [14] made the startling discovery that one can A positive answer would illuminate a path that might multiply two n × n matrices in only O(n2.81) field opera- lead to ω = 2 using the techniques that we present in tions, compared with 2n3 for the standard algorithm. This this paper. immediately raises the question of the exponent of matrix multiplication: what is the smallest number ω such that Our approach is reminiscent of a question asked by for each ε > 0, matrix multiplication can be carried out Coppersmith and Winograd (in Section 11 of [6]) about in at most ω+ε operations? Clearly . It is avoiding “three disjoint equivoluminous subsets” in abelian arXiv:math/0307321v2 [math.GR] 31 Oct 2003 O(n ) ω ≥ 2 widely believed that ω = 2, but the best bound known is groups, which would lead to ω = 2 if it has a positive an- ω < 2.38, due to Coppersmith and Winograd [6], follow- swer. However, our technique is completely different, and ing a sequence of improvements to Strassen’s original algo- our framework seems to have more algebraic structure to rithm (see [4, p. 420]for the history). It is knownthat all the make use of (whereas theirs is more combinatorial). standard linear algebra problems (for example, computing determinants, solving systems of equations, inverting ma- 1.1. Analogy with fast polynomial multiplication trices, computing LUP decompositions—see Chapter 16 of [4]) have the same exponent as matrix multiplication, which There is a close analogy between the framework we pro- makes ω a fundamental number for understanding algorith- pose in this paper and the well-known algorithm for multi- mic linear algebra. In addition, there are non-algebraic al- plying two degree n polynomials in O(n log n) operations gorithms whose complexity is expressed in terms of ω (see, using the Fast Fourier Transform (FFT). In this section we e.g., Section 16.9 in [4]). elucidate this analogy to give a high-level description of our Several fairly elaborate techniques for bounding ω are technique. 438 Copyright c 2003 IEEE. Reprinted from Proceedings of the 44th Annual Symposium on Foundations of Computer Science. This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of Cornell University’s products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to [email protected]. By choosing to view this document, you agree to all provisions of the copyright laws protecting it. Suppose we wish to multiply the polynomials A(x) = and |T | × |U| matrices, respectively, then we define n 1 i n 1 i − a x and B(x) = − b x . The naive way to i=0 i i=0 i 1 1 2 A¯ = a s− t and B¯ = b t− u. do this is to compute n products of the form aibj , and s,t t,u fromP these the 2n−1 coefficientsP of the product polynomial If S,T,UXsatisfy the triple product propertyX (see Defini- A(x) · B(x). Of course a far better algorithm is possible; tion 2.1), then we can read off the entries of the product we describe it below in language that easily translates into matrix AB from A¯B¯ ∈ C[G]: entry (AB) is simply the our framework for matrix multiplication. s,u coefficient of the group element s 1u. Let G bea groupand let C[G] be the group algebra—that − C In the case of polynomial multiplication, the simplicity is, every element of [G] is a formal sum g G agg with ∈ of the embedding obscures the fact that if G is too large ag ∈ C, and the product of two such elements is 2 P (e.g., if |G| = n rather than O(n)), then the benefit of the entire scheme is destroyed. Avoiding this pitfall turns out to be the main challenge in the new setting. We wish to em- a g · b h = a b f. g h g h bed matrix multiplication into a group algebra over a small g G h G f G gh=f X∈ X∈ X∈ X group G, as the size of G is a lower bound on the com- plexity of multiplication in C[G]. It is not surprising, for We often identify the element g G agg with the vector ∈ example, that n × n matrix multiplication can be embed- of its coefficients. If G is the cyclic group of order m, then 3 P ded into the group algebra of a group of order n . We show the product of two elements a = (ag)g G and b = (bg)g G 3 ∈ ∈ that abelian groups cannot beat n and we identify families is a cyclic convolution of the vectors a and b. The impor- of non-abelian groups of size n2+o(1) that admit such an tant observation is that a cyclic convolution is almost what embedding. is needed to compute the coefficients of the product polyno- It might seem that this result together with the above mial A(x)·B(x)—the only problem is that it wraps around. trick for performing group algebra multiplication (i.e., tak- To avoid this problem, we embed A(x) and B(x) as ele- ing the DFT, multiplying in the Fourier domain, and trans- ¯ ¯ C ments A, B ∈ [G] as follows: Let z be a generator of G, forming back) would imply that ω = 2. There are, how- which we assume to be a cyclic group of order m> 2n − 1, ever, two complications introduced by the fact that we are and define forced to work with non-abelian groups. The first is that we n 1 n 1 know of fast algorithms to compute the DFT only for lim- − − i i ited classes of non-abelian groups (see Section 13.5 in [4]). A¯ = aiz and B¯ = biz . i=0 i=0 However, the DFT is linear, and because of the recursive X X structure of divide and conquer matrix multiplication algo- Since the group size m is large enough to avoid wrapping rithms, linear transformations applied before and after the around, we can read off the coefficients of the product poly- recursive step are “free.” For example, in Strassen’s original nomial from the element A¯B¯ ∈ C[G]: the coefficient of matrix multiplication algorithm, the number of matrix addi- xi in A(x)B(x) is the coefficient of the group element zi tions and scalar multiplications in the recursive step does in A¯B¯. This is a wordy account of a so-far simple cor- not affect the bound on ω. So this potential complication is respondence, but the payoff is near. The Discrete Fourier in fact no problem at all. Transform (DFT) for C[G] is an invertible linear transfor- The second complication is that for C[G] when G is non- G mation D : C[G] → C| |, which turns multiplication in abelian, multiplication in the Fourier domain is not simply C C G G [G] into pointwise multiplication of vectors in | |. We pointwise multiplication of vectors in C| |. Instead it is can therefore compute the product A¯B¯ by first computing block-diagonal matrix multiplication, where the dimensions D(A¯) and D(B¯) and then computing the inverse DFT of of the blocks are the dimensions of the irreducible repre- their pointwise product. Thus, using the O(m log m) Fast sentations of G. We thus obtain a reduction of n × n ma- Fourier Transform algorithm, we can perform multiplica- trix multiplication to a number of smaller matrix multipli- tion in C[G] (and therefore polynomial multiplication, via cations of varying sizes, which gives rise to an inequality the embedding above) in O(m log m) operations.