Data Sparse Matrix Computations - Lecture 3

Scribe: John Ryan September 5, 2017

Contents

1 Application of Fast Fourier Transform1

2 Project Topic Example3

3 Beginning Fast Multipole Method4

1 Application of Fast Fourier Transform Discrete Convolution The discrete convolution is a common technique in signal processing. Suppose we have two signals, xi and yi for i = 0, ..., N − 1 which are both periodic with respect to N. That is, xi+jN = xi

yi+jN = yi (1) for any integer j. We’d like to compute the discrete convolution of x with y, which is deﬁned as

N−1 X gk = (x ∗ y)k = xnyk−n (2) n=0 for k = 0, ..., N − 1. (Scribe note: for some visual explanations on convolutions, the reader may visit https://en.wikipedia.org/wiki/Convolution and check out the "Visual Explanation" subsection)

Convolution as a Matrix/Vector multiplication Notice that (2) can be written as g = Y x where g is a column vector with elements gk = (x ∗ y)k, x is a column vector with elements xk, and Y is an NxN matrix. By examining (2), we can deduce that the elements of the ﬁrst row of the matrix Y should be Y0,: = {y0, y−1, y−2, ..., y−(N−1)} Similarly, the second row should be

Y1,: = {y1, y0, y−1, ..., y−(N−2)}

Now, we may take advantage of the periodicity of the signal y. Namely, we note that

y−1 = yN−1

y−2 = yN−2 and so on (see equation (1)). With this in mind, the rows of the matrix Y can now be written as

Y0,: = {y0, yN−1, yN−2, ..., y1}

1 Figure 1: An example of a discrete convolution, taken from https://tex.stackexchange.com/ questions/328627/compute-convolution-of-discrete-signals-in-tikz.

Figure 2: An example of a circulant matrix.

Y1,: = {y1, y0, yN−1, ..., y2} and so on. Notice the pattern here - a given row is almost the same as the row above it in the matrix, except that the last element has been made the ﬁrst, and every other element has been shifted to the right. Indeed, this is true for every row of the Y matrix, and as a result, the Y matrix (as well as any other matrix satisfying this property that each row is a shift of the one above/below it with an element on the end "wrapping around") is called a circulant matrix. See ﬁgure 1. Well that’s nice, but how does this help us? Recall that we’re trying to compute the matrix vector product g = Y x It turns out that circulant matrices work well with the Fourier transform, in a way that will provide us a smart way to perform the above operation

Fourier Transform of a Convolution Recall from last time that the Discrete Fourier Transform of a signal (vector) x is

N−1 1 X −2πjk/N yk = √ xje 2π j=0 and can be written as a matrix/vector product:

y = F x

Recall also that we have an algorithm, the Fast Fourier Transform, which allows us to compute the Fourier Transform in O(n log n), where n is the size of the original signal x. It turns out (and we will show) that, if Y is a circulant matrix, then FYF ∗ is diagonal. Let us start by left multiplying both sides of g = Y x by F .

F g = F Y x

N−1 N−1 X X −2πilk/N gˆk = xnyl−ne l=0 n=0 Now we multiply the right side by e−2πik(n−n)/N (which is equal to 1) and we get

N−1 N−1 X X −2πilk/N −2πik(n−n)/N gˆk = xnyl−ne e l=0 n=0

N−1 N−1 X −2πikn/N X −2πi(l−n)/N = xne yl−ne n=0 l=0

2 Now we notice that, due to the periodicity of y, the inner sum constitutes a Fourier transform of y regardless of what n is. Therefore, we have

N−1 X −2πikn/N gˆk = xne yˆk n=0

=x ˆkyˆk Ah! In matrix/vector form, this says that

F g = Diag(ˆy)F x

∗ where Diag(ˆy) is a diagonal matrix whose elements are Dii =y ˆi. Then, left multiplying by F (which is the inverse of F ), we get g = F ∗Diag(ˆy)F x (3) which, since g = Y x, implies that F ∗Diag(ˆy)F = Y or Diag(ˆy) = FYF ∗ which completes our proof that FYF ∗ is diagonal. So now, instead of the naive matrix multiplication of Y x that would take O(n2) steps, we may perform the operations in equation (3). 1. First, FFT x to get xˆ – O(n log(n) steps, 2. then FFT y to get yˆ – O(n log(n)) steps, 3. then multiply xˆ by Diag(ˆy) – O(n) steps, 4. then ﬁnally perform an inverse Fourier transform to get g,– O(n log(n)) steps. This takes O(n log(n)) time. There are two other important classes of matrices that we talk about today.

1. Toeplitz matrices, whose entries Tij depend only on i − j. You can imagine that the "diagonals" are constant. Example: circulant matrices..

2. Hankel matrices, whose entries Hij depend only on i + j. You can imagine that the "anti- diagonals" are constant. Example: if you look at Figure 1’s reﬂection in a mirror.

Exercise: you can embed a Toeplitz or a Hankel matrix of size N into a circulant matrix of size 2N − 1, and speed up matrix multiplication that way. How? (Scribe’s answer at end of document)

2 Project Topic Example

Suppose we have N−1 X −itω F (ω) = xN (t)e n=0

So far, we have considered this for evenly spaced ωk, like so 2πk ω = k N This is an equispaced transform What about ωk that aren’t evenly spaced? Or even, instead of summing over t = 0, 1, 2..., what about summing over tk for k = 0, 1, 2, ... and the tk aren’t necessarily evenly spaced. How do we approach such transforms algorithmically? This leads to a set of algos known as USFFTs (unequally spaced FFTs) or NUFFTs (non- uniform FFTs). A potential plan for the ﬁnal project could be to

3 Figure 3: An example of an uneven frequency sampling. How do we eﬃciently and accurately transform to spatial coordinates?

• look at and describe/discuss these algorithms (similarities/diﬀerences), • implement some of them, and look at how they scale (whether it’s as expected or not), and explore their behavior and properties, and

• look into extensions into 2D with applications. An example of an application might be the procedure by which many modern medical machines infer the density of a body part by ﬁring signals at it from many diﬀerent angles and seeing what goes through. The samples one gets from this procedure are not evenly spaced in frequency space, so something special needs to be done to convert the raw data into information on what’s going on in spatial coordinates.

3 Beginning Fast Multipole Method

Now we’re interested in computing

N X Φ(xi) = K(xi, xj)qj (4) j=1 for i = 1, ..., N The qi are weights, or charges. For now, let’s say K(xi, xj) = log(kxi − xjk) when xi 6= xj and K(xi, xj) = 0 when xi = xj. Much like in our previous discussions, we can consider (4) as a matrix/vector product.

Φ = Kq where K is a matrix deﬁned by Kij = K(xi, xj). d−1 1 Enter the Fast Multipole Method (FMM), which approximates the solution Φ in O(N log ( )) time, where d is the dimension and is a measure of the precision. This runtime is accurate as → 0. It does this by approximating K(x, y) when it seems reasonable. For example, suppose you have a set of M "targets" xi and a set of N "sources" yi, and the sources are far from the targets in space, so that K(x, y) is smooth. Suppose further that your kernel can be approximated like so

p X K(x, y) ≈ ul(x)vl(y) l=1

4 which is to say, there is a low-rank approximation for the kernel. Then (4) may be written

N p X X Φ(xi) = ul(xi)vl(yj)qj j=1 l=1

(Scribe’s note: I believe the switch from x to y is meant to emphasize that we are now discussing sets of sources and targets, rather than just some set of points). Reordering the sums, we get

p N X X Φ(xi) = ul(xi) vl(yj)qj l=1 j=1

Notice that the inner sum has no dependence on i. Therefore, we may start by performing that sum for every l (which will take O(MP ) time), and then for each i we only need to compute the outer sum (which takes O(P ) time, and there are N diﬀerent values of i, so in total it takes O(NP )). Therefore, we can ﬁnd the entire Φ vector in O(NP + MP ) time.

Linear Algebra interpretation If we just naively multiplied Kq to get Φ, that would take us O(MN) time. However, with the FMM, we’re essentially giving K a low-rank approximation:

K = uvT where u is an M by p matrix and v is an N by p matrix. Then Φ = Kq = uvT q takes O(MP +NP ) steps.

Scribe’s answer to exercise: At first I tried typing it up, but it didn’t go well. The following pages contain the answer for Toeplitz matrices - for Hankel matrices everything is the same except that C is formed in a slightly different way. Note: there is a slightly better approach than what I wrote. Instead of doing an interlacing like I did, you can make the first row of C be the first row of A followed by the first N − 1 entries of the last row of A. If you make a circulant matrix out of this, the upper left NxN block will be A, and in the final result, the vector we desire will be the first N entries of the output, rather than being every other entry as in the images below. Since it is easier to read/write sequential memory, this approach is better.

References

[1] Bracewell, R. (1986), The Fourier Transform and Its Applications (2nd ed.), McGraw–Hill, ISBN 0-07-116043-4. [2] Carrier, J.; Greengard, L.; Rokhlin, V. A fast adaptive multipole algorithm for particle simulations. SIAM J. Sci. Statist. Comput. 9 (1988), no. 4, 669–686. [3] Cooley, James W.; Tukey, John W. (1965). An algorithm for the machine calculation of complex Fourier series. Mathematics of Computation. 19 (90): 297–301. [4] Damelin, S.; Miller, W. (2011), The Mathematics of Signal Processing. Cambridge University Press, ISBN 978-1107601048

[5] Davis, Philip J., Circulant Matrices, Wiley, New York, 1970 [6] Golub G.H., Van Loan C.F. (1996), Matrix Computations (Johns Hopkins University Press) §4.7—Toeplitz and Related Systems [7] Golub, van Loan, §4.7.7 Circulant Systems

5 [8] Greengard, L.; Rokhlin, V. A fast algorithm for particle simulations. J. Comput. Phys. 73 (1987), no. 2, 325–348.

[9] Greengard, Leslie; Rokhlin, Vladimir A new version of the fast multipole method for the Laplace equation in three dimensions. Acta numerica, 1997, 229–269, Acta Numer., 6, Cambridge Univ. Press, Cambridge, 1997. [10] Greengard L, Lin P (2000) Spectral approximation of the free-space heat kernel. Applied and Computational Harmonic Analysis 9:83–97.

[11] Nabors, K.; Korsmeyer, F. T.; Leighton, F. T.; White, J. Preconditioned, adaptive, multipole- accelerated iterative methods for three-dimensional ﬁrst-kind integral equations of potential theory. Iterative methods in numerical linear algebra (Copper Mountain Resort, CO, 1992). SIAM J. Sci. Comput. 15 (1994), no. 3, 713–735.

[12] Rockmore, D.N. (January 2000). The FFT: an algorithm the whole family can use. Computing in Science Engineering. 2 (1): 60–64. [13] Rokhlin, V. Rapid solution of integral equations of scattering theory in two dimensions. J. Comput. Phys. 86 (1990), no. 2, 414–439. [14] Rokhlin, V. Diagonal forms of translation operators for the Helmholtz equation in three dimensions. Appl. Comput. Harmon. Anal. 1 (1993), no. 1, 82–93. [15] Van Loan C.F., Computational Frameworks for the Fast Fourier Transform (SIAM, 1992).

6 7 8 9