The Speed Limit of Multiplication

The Speed Limit of Multiplication. Ola Mæhlen Institutt for matematiske fag NTNU “Forum for matematiske perler” IMF, NTNU 06. Sept. 2019 In this presentation, numbers are represented in base 10. Thus, computing a number means to calculate its digits in base 10. Definitions A multiplication algorithm � takes in two integers and computes their product. (Here in base 10.) We let M � ( n ) denote the maximum number of elementary steps (addition/multiplication of single digits, etc) needed for � to compute the product of two n-digit integers. The complexity of is the asymptotic behaviour of � M�(n) as n goes to infinity. The complexity of multiplication is the optimal complexity of all possible multiplication algorithms. We formally write M ( n ) for the number of steps required by an “optimal multiplication algorithm” (i.e. of optimal complexity). The complexity of multiplication is the asymptotic behaviour of M ( n ) . Why do we care about the complexity of multiplication? Multiplication is a fundamental building block in computation: Operation Algorithm Complexity Optimal multiplication Squaring algorithm O(M(n)) Division Newton-Raphson division O(M(n)) Square root (first n digits) Newton’s method O(M(n)) Schönhage controlled Greatest common divisor Euclidean descent algorithm O(M(n)log(n)) π ( n decimal places) Gauss-Legendre algorithm O(M(n)log(n)) Note that the complexity of squaring cannot be much lower than half that of multiplication, as (x + y)2 − (x − y)2 xy = 4 and so one multiplication can be exchange for the cost of two squares (+ negligible extra steps). Grade-school multiplication (GS) -The algorithm taught in school. -Similar method used in ancient Egypt at least 4000 years ago. 2 -Has quadratic complexity i.e. M! GS(n) behaves no better than O! (n ) asymptotically. Example with n! = 6 and numbers a! = 249416, b! = 133758, 249416 On k! ’th row: Multiply 1 digit with n! -digit number, × 133758 and add k! − 1 extra zeros to the result (shifts) 1995328 8 × 249416 requires ! ≂ n + k − 1 steps. 12470800 5 × 249416 Lastly, we sum over n! numbers of length ! ∈ [n,2n], 174591200 7 × 249416 requires ! ≂ n2 steps. 748248000 3 × 249416 Number of elementary steps is approximately 7482480000 3 249416 × n 2 24941600000 1 249416 5n n × M (n) ≂ n + (k − 1) + n2 = − = 33361385328 GS (∑ ) k=1 2 2 2 ⇒ MGS(n) ≂ n . Consequently, the complexity of multiplication satisfies (at least) M(n) = O(n2) . Kolmogorov’s n! 2-conjecture In 1956 Andrey Kolmogorov conjectured that the complexity of multiplication is quadratic, M! (n) ≂ n2. Later in 1960 he organised a seminar on problems in cybernetics at Moscow University; here he stated his conjecture. Attending was the 23-year old student Anatoly Karatsuba. A week’s search later, he discovered the Karatsuba algorithm: log2(3) MKA(n) = O(n ), log2(3) = 1.58496... Andrey Kolmogorov disproving Kolmogorov’s conjecture! After learning of this new method, Kolmogorov presented it at the next meeting… and then terminated the seminar. In 1962, Kolmogorov wrote and published the article: A. Karatsuba and Yu. Ofman, “Multiplication of Multiplace Numbers on Automata” Karatsuba first learned of the article when given its reprint. The article spawned a new area of computation theory: Fast multiplication algorithms. Anatoly Karatsuba Karatsuba algorithm (KA), 1960 a1 a2 b1 b2 We consider the integers from earlier a = 249416, b = 133758. Thus: Knowing ! , ! , ! and ! , a = 249 × 1000 + 416 = a 103 + a , a1b1 a1b2 a2b1 a2b2 1 2 we could build ! using only 3 ab b = 133 × 1000 + 758 = b110 + b2, addition and shifts, 6 3 ab = a1b110 + (a1b2 + a2b1)10 + a2b2, M(n) < 4M(n/2) + O(n), ⇒ M(n) = O(n2) . Karatsuba realized that one can obtain the sum a! 1b2 + a2b1, without calculating the products a! 1 b 2 and a! 2 b 1 individually. Indeed, knowing a! 1b1 and a! 2b2 we need only one extra multiplication to attain this sum: a1b2 + a2b1 = (a1 + a2)(b1 + b2) − a1b1 − a2b2, M(n) < 3M(n/2) + O(n), ⇒ M(n) = O(nlog2(3)) . if , return . �KA(a, b) : n = 1 ab log2(3) MKA(n) = O(n ) . else set x = �KA(a1, b1), y = �KA(a2, b2), More efficient than grade- z = �KA(a1 + a2, b1 + b2), school multiplication around n > 60 . and return x102n + (z − x − y)10n + y . Toom-Cook algorithm (Tk), 1963 Introduced by Andrei Toom, and further simplified by Stephen Cook . A family of algorithms: Toom-k! (KA=Toom-2). Example of Toom-3, with n! = 6 : Andrei Toom Stephen Cook a = 249416 → A(x) = 24x2 + 94x + 16, Notice: b = 133758 → B(x) = 13x2 + 37x + 58. C(100) = A(100)B(100) = ab . Want to calculate C! ( x ) = A ( x ) B ( x ) . Being a polynomial of degree 4, we need 5 evaluations. Choose small values: x! = 2,1,0, − 1, − 2. Coefficients of C! (x) Each is a product of C(2) 24 23 22 2 1 c4 Compact integers of ≈ n/3 C(1) 1 1 1 1 1 c3 notation: digits. Toom-3 is C(0) = 0 0 0 0 1 c2 reapplied to evaluate C(−1) 1 −1 1 −1 1 c1 ⃗ 4 3 2 C = Π c ⃗ these 5 products. C(−2) (−2) (−2) (−2) −2 1 c0 Calculating Π! −1(in advance), ! c ⃗ is retrieved by a weighted sum of the elements of !C.⃗ Toom- : MT3(n) < 5MT3(n/3) + O(n), k MTk(n) < (2k − 1)MTk(n/k) + O(n), log3(5) logk(2k−1) ⇒ MT3(n) = O(n ) . ⇒ MTk(n) = O(n ) . Toom-Cook can get “linear + ” complexity, as ε logk(2k − 1) ↘ 1, k → ∞ . Some more definitions Pointwise product: (xk) ⋅ (yk) = (xkyk), Sequences are by default of length N . Cost = O(NM(n)) Notation: N − 1 . (xk) = (xk)k=0 = (x0, x1, . , xN−1) Cyclic convolution: (xk) * (yk) = (Σj xjyk−j), Some more definitions Pointwise product: (xk) ⋅ (yk) = (xkyk), Sequences are by default of length N . Cost = O(NM(n)) Notation: N − 1 . (xk) = (xk)k=0 = (x0, x1, . , xN−1) Cyclic convolution: (xk) * (yk) = (Σj xjyk−j), The index k − j is considered modulo N . Some more definitions Pointwise product: (xk) ⋅ (yk) = (xkyk), Sequences are by default of length N . Cost = O(NM(n)) Notation: N − 1 . (xk) = (xk)k=0 = (x0, x1, . , xN−1) Cyclic convolution: Discrete Fourier transform: (xk) * (yk) = (Σj xjyk−j), −1 2 (xk̂ ) = ℱ{(xk)} (xk) = ℱ {(xk̂ )} Cost = O(N M(n)) − 2πi kj 1 2πi kj Convolution theorem: ⇒ x̂ = x e N , ⇒ x = x̂ e N , k Σ j k N Σ j j j −1 Cost = O(N2M(n)) Cost = O(N2M(n)) (xk) * (yk) = ℱ {(xk̂ ) ⋅ (yk̂ )} . The expression ℤ [ x ] / ⟨ x N − 1 ⟩ denotes the ring of polynomials with integer coefficients modulo x N − 1, in other words, we have x N = 1 . Elements of ℤ [ x ] / ⟨ x N − 1 ⟩ are represented Polynomials: P(x) Q(x) P(x)Q(x) by polynomials of degree less than . Coefficients: N (pk) (qk) (pk) * (qk) For polynomials A ( x ) , B ( x ) of degree less than N / 2 , the classical product coincides with the ring-product. The coefficients ( c k ) of C ( x ) = A ( x ) B ( x ) can be calculated from: −1 ̂ though seemingly (ck) = (ak) * (bk) = ℱ {(ak̂ ) ⋅ (bk)}, not more efficient… Fast Fourier transform (FFT) Carl. F. Gauß A Fast Fourier transform is any discrete Fourier transform algorithm of complexity O! (N log(N)M(n)) and not O! (N2M(n)). Popularised in 1965, when the Cooley-Tukey FFT was introduced. The same algorithm was known to Gauss, who used it to interpolate the trajectories of the asteroids Pallas and Juno. Unfortunately, his work was first published after his death and only in Latin. James. W. Cooley John Tukey ℱ ̂ (ak), (bk) (ak̂ ), (bk) O(N log(N)M(n)) * O(N2M(n)) O(NM(n)) ⋅ O(N log(N)M(n)) ̂ (ck) (ak̂ ) ⋅ (bk) ℱ−1 Blue path calculates (! ck) in O! (N log(N)M(n)) steps! Schönhage-Strassen algorithm (SSA1, SSA2), 1971 Two algorithms developed by Arnold Schönhage and Volker Strassen in 1971. Description of SSA1 with numbers a! , b of n! digits: (Set N! = n/log(n)) Arnold Schönhage Volker Strassen N−1 n/N a → A(x) = aN−1x + . + a0, a = A(10 ), (ak) = (a0, …, aN−1,0,…,0) . N−1 n/N b → B(x) = bN−1x + . + b0, b = B(10 ), (bk) = (b0, …, bN−1,0,…,0) . 2N Consider A! ( x ) , B ( x ) elements of ℤ! [ x ] / ⟨ x − 1 ⟩ . Thus (! ak), (bk) ends with N! zeroes to represent the absence higher order terms. To attain C! (x) = A(x)B(x), we compute the coefficients with a FFT: The final product is constructed from −1 ̂ ℱ {(ak̂ ) ⋅ (bk)} = (ck) shifts and 2N−1 addition: n/N kn/N LHS contains O! (N log(N)) multiplications ab = C(10 ) = ∑ ck10 . of ≈! n! /N-digit numbers. SSA1 is reapplied. k=0 1+ε M1(n) = O(N log(N)M1(n/N)) = O(nM1(log(n))), ⇒ M1(n) = O(n log(n) ) SSA2 is more famous, elegant and popular. Similar to SSA1, but uses the more ‘natural’ number theoretic transform instead of FFT. This algorithm attains better complexity: M2(n) < 2 n M2( n) + O(n log(n)), ⇒ M2(n) = O(n log(n)log log(n)) The first Schönhage-Strassen multiplication algorithm in action: 249416 × 133758 = R(100) (24x2 + 94x + 16) × (13x2 + 37x + 58) P(x) × Q(x) = R(x) ℱ ℱ−1 P ̂⋅ Q ̂= R ̂ Recursion happens here Schönhage-Strassen n! log(n)-conjecture, 1971 The complexity of SSA contains a large constant factor (efficient for n! > 10 000), but has the asymptotic behaviour: M(n) = O(n log(n)log log(n)) .

Load more