<<

Introduction Classical Results Approximation and Probabilistic Results Randomized Algorithms - Strategies and Benefits Research Activity Open Problems and Future Research Directions

Second Exam Presentation Low Rank Approximation John Svadlenka

City University of New York Graduate Center

Date Pending Introduction Classical Results Approximation and Probabilistic Results Randomized Algorithms - Strategies and Benefits Research Activity Open Problems and Future Research Directions Outline

1 Introduction

2 Classical Results

3 Approximation and Probabilistic Results

4 Randomized Algorithms - Strategies and Benefits

5 Research Activity

6 Open Problems and Future Research Directions Introduction Classical Results Problem Definition Approximation and Probabilistic Results Overview of Conventional Algorithms Randomized Algorithms - Strategies and Benefits Related Problems Research Activity Motivation for New Approaches Open Problems and Future Research Directions

Given an m × n matrix A, we are often interested in approximating A as the product of an m × k matrix B and a k × n matrix C.

A ≈ B · C Why?

Provided it is true that k  min(m, n): Arithmetic cost of matrix vector product is 2(m + n)k Storage space of matrix A is (m + n)k (m + n)k  m × n We denote the product B · C as a rank k approximation of A Introduction Classical Results Problem Definition Approximation and Probabilistic Results Overview of Conventional Algorithms Randomized Algorithms - Strategies and Benefits Related Problems Research Activity Motivation for New Approaches Open Problems and Future Research Directions

More formally, we seek a rank k matrix approximation of matrix A for some  > 0 such that:

kA − Abk k ≤ (1 + )kA − Ak k

Ak is the theoretical best rank k approximation of A

Matrix norms are Frobenius k· kF or Spectral k· k2

2 Pm,n 2 ||A||F := i,j=1 |aij | ||A||2 := sup||v||2=1 ||Av||2

Ak can be computed from the SVD with cost O((m + n)mn). So we seek less costly approaches. Why? Introduction Classical Results Problem Definition Approximation and Probabilistic Results Overview of Conventional Algorithms Randomized Algorithms - Strategies and Benefits Related Problems Research Activity Motivation for New Approaches Open Problems and Future Research Directions

Suppose m = n and compare mn(m + n) = 2n3 with n2 log n:

n n3 n2 log n 10 1,000 332 100 1.00e+06 66,400 1,000 1.00 e+09 1.00 e+06 10,000 1.00 e+12 1.33 e+09

Consider the above statistics in light of some recent trends: Conventional LRA does not scale for Big Data purposes Approximation algorithms are increasingly preferred Applications utilizing numerical are expanding beyond traditional scientific and engineering disciplines Introduction Classical Results Problem Definition Approximation and Probabilistic Results Overview of Conventional Algorithms Randomized Algorithms - Strategies and Benefits Related Problems Research Activity Motivation for New Approaches Open Problems and Future Research Directions

Conventional LRA algorithms generate decompositions, most important of these are SVD, Rank-Revealing QR (RRQR), and RRLU: Singular Value Decomposition (SVD) [Eckhart-Young] Let A be an m × n matrix with r = rank(A) whose elements may be complex. Then there exists two unitary matrices U and V , and an m × n Σ with nonnegative elements σi , where σ1 ≥ σ2 ≥ · · · ≥ σr > 0 and σj = 0 for j > r, such that:

A = UΣV ∗

U and V are m × m and n × n, respectively. Introduction Classical Results Problem Definition Approximation and Probabilistic Results Overview of Conventional Algorithms Randomized Algorithms - Strategies and Benefits Related Problems Research Activity Motivation for New Approaches Open Problems and Future Research Directions

QR Decomposition Let A be an m × n matrix with m ≥ n whose elements may be complex. Then there exists an m × n matrix Q and an n × n matrix R such that A = QR where the columns of Q are orthonormal and R is upper triangular.

Cost O(mn min (m, n)) is lower than that for SVD. There are several efficient strategies to orthogonalize A Column i of A is the linear combination of columns of Q with the coefficients given by column i of R Introduction Classical Results Problem Definition Approximation and Probabilistic Results Overview of Conventional Algorithms Randomized Algorithms - Strategies and Benefits Related Problems Research Activity Motivation for New Approaches Open Problems and Future Research Directions

The LRA problem is also significant for these related subjects: Principal Component Analysis Clustering Algorithms Tensor Decomposition Rank Structured Matrices But a series of recent trends have provided impetus for new approaches to LRA... Introduction Classical Results Problem Definition Approximation and Probabilistic Results Overview of Conventional Algorithms Randomized Algorithms - Strategies and Benefits Related Problems Research Activity Motivation for New Approaches Open Problems and Future Research Directions

Consider these examples of Emerging Applications and Big Data: New disciplines: Machine Learning, Data Science, Image Processing Modern Massive Data Sets from Physical Systems Modelling, Sensor Measurements, Internet New Fields: Recommender Systems, Complex Systems Science

Classical LRA algorithms and their implementations, though well-developed over many years, are characterized by: Limited parallelization opportunities Relatively high computational complexity Memory bottlenecks with out-of-core data sets Introduction Eckhart-Young Theorem For SVD Classical Results Low Rank Format and Matrix Decompositions Approximation and Probabilistic Results QR Decomposition Randomized Algorithms - Strategies and Benefits Skeleton (CUR) Decomposition Research Activity Interpolative Decomposition Open Problems and Future Research Directions Decomposition Summary

[Eckhart-Young Theorem] m×n Let A ∈ C and let Ak be the truncated SVD of rank k where Uk , Vk , and Σk are m × k, n × k and k × k, respectively. We have:

∗ Ak = Uk Σk Vk

Then the approximation errors are defined as below. Furthermore, these are the smallest errors of any rank k approximation of A.

kA − Ak k2 = σk+1 v u umin(m,n) u X 2 kA − Ak kF = t σj j=k+1 Introduction Eckhart-Young Theorem For SVD Classical Results Low Rank Format and Matrix Decompositions Approximation and Probabilistic Results QR Decomposition Randomized Algorithms - Strategies and Benefits Skeleton (CUR) Decomposition Research Activity Interpolative Decomposition Open Problems and Future Research Directions Decomposition Summary Given a rank-k SVD representation of a matrix we may generate its low rank format:

∗ Ak = Uk · (Σk Vk ) Other decompositions consist of matrix factors being orthogonal or having a row and/or column subset of the original matrix: RRQR UTV CUR Interpolative Decomposition (ID) (one-sided and two-sided)

We may generate a low rank format similarly with: W = CUR = [CU]R = C[UR] W = UTV = (UT )V = U(TV ) Introduction Eckhart-Young Theorem For SVD Classical Results Low Rank Format and Matrix Decompositions Approximation and Probabilistic Results QR Decomposition Randomized Algorithms - Strategies and Benefits Skeleton (CUR) Decomposition Research Activity Interpolative Decomposition Open Problems and Future Research Directions Decomposition Summary

Existence of a QR factorization for any matrix can be proven in many ways. For example, it follows from Gram-Schmidt orthogonalization: Theorem

Suppose (a1, a2,..., an) is a linearly independent list of vectors of a fixed dimension. Then there is an orthonormal list of vectors (q1, q2,..., qn) such that span(a1, a2,..., an) = span(q1, q2,..., qn). Introduction Eckhart-Young Theorem For SVD Classical Results Low Rank Format and Matrix Decompositions Approximation and Probabilistic Results QR Decomposition Randomized Algorithms - Strategies and Benefits Skeleton (CUR) Decomposition Research Activity Interpolative Decomposition Open Problems and Future Research Directions Decomposition Summary

Shortcomings of Gram-Schmidt QR algorithm wrt LRA: Problem: The algorithm may fail if rank(A) < n Solution: Introduce a column pivoting strategy Impact: A = QRP where P is a Problem: Rounding error impacts orthogonalization Solution: Normalize qi before computing qi+1 0 Solution: Compute qi s up to some epsilon tolerance Introduction Eckhart-Young Theorem For SVD Classical Results Low Rank Format and Matrix Decompositions Approximation and Probabilistic Results QR Decomposition Randomized Algorithms - Strategies and Benefits Skeleton (CUR) Decomposition Research Activity Interpolative Decomposition Open Problems and Future Research Directions Decomposition Summary

Skeleton (CUR) Decomposition Theorem Let A be an m × n matrix of rank k of real elements with rank(A) = k. Then there exists a nonsingular k × k submatrix Ab of A.

Moreover, let I be and J be the index sets of the rows and columns of A, respectively, in Ab. Then A = CUR where U = Ab−1 and C = A(1..m, J) and R = A(I , 1..n).

A set of k columns and rows captures A0s column, row spaces Skeleton is in contrast to SVD’s left and right singular vectors Can use QRP or LUP algorithms to find the submatrix Ab Introduction Eckhart-Young Theorem For SVD Classical Results Low Rank Format and Matrix Decompositions Approximation and Probabilistic Results QR Decomposition Randomized Algorithms - Strategies and Benefits Skeleton (CUR) Decomposition Research Activity Interpolative Decomposition Open Problems and Future Research Directions Decomposition Summary

Interpolative Decomposition Lemma Suppose A is an m × n matrix of rank k whose elements may be complex. Then there exists an m × k matrix B consisting of a subset of columns of A and a k × n matrix P such that: A = B · P

The Ik matrix appears in some column subset of P

|pij | ≤ 1 for all i and j

ID more appropriate for data analysis purposes Also appropriate if properties of A required in decomposition Introduction Eckhart-Young Theorem For SVD Classical Results Low Rank Format and Matrix Decompositions Approximation and Probabilistic Results QR Decomposition Randomized Algorithms - Strategies and Benefits Skeleton (CUR) Decomposition Research Activity Interpolative Decomposition Open Problems and Future Research Directions Decomposition Summary

What type of decomposition is better? It depends...

NLA Theoretician’s point of view: orthogonal matrices are better.

Input error propagation minimized Orthogonal bases reduce amount of arithmetic They preserve vector and matrix properties in multiplication But are not easy to understand for data analysis

Data Analyst’s perspective: submatrices are better. Preserve structural properties of original matrix Easier to understand in application terms But may not be well-conditioned Introduction Classical Results Rationale for Approximation Algorithms Approximation and Probabilistic Results Dimension Reduction Randomized Algorithms - Strategies and Benefits Column and Row Sampling Research Activity CUR and Maximum Volume Open Problems and Future Research Directions The case for approximation approaches to LRA? A large set of results concerning:

Random Matrices and subspace projections Existential Results for rank k approximations Column and/or Row Sampling Matrix skeletons (CUR) and volume maximization

New algorithmic approaches:

Process some matrix much smaller than the original Provide arbitrary accuracy up to machine Employ adaptive and non-adaptive strategies Separate randomized and deterministic processing Introduction Classical Results Rationale for Approximation Algorithms Approximation and Probabilistic Results Dimension Reduction Randomized Algorithms - Strategies and Benefits Column and Row Sampling Research Activity CUR and Maximum Volume Open Problems and Future Research Directions

Johnson-Lindenstrauss Lemma [1984] d k×d Let X1, X2,... Xn ∈ R . Then for  ∈ (0, 1) there exists Φ ∈ R 1 for k = O( 2 log n) such that:

(1 − )kXi − Xj k2 ≤ kΦXi − ΦXj k2 ≤ (1 + )kXi − Xj k2

Distances among vectors in Euclidean space approximately preserved in lower dimensional space independent of d

Matrix vector multiplication is O(d log n) for each Xi

Dasgupta and Gupta (2003) proved that standard Gaussian matrices with i.i.d. N(0, 1) can be used for Φ

Achlioptas (2003) showed that random {+1,-1} entries suffice. Introduction Classical Results Rationale for Approximation Algorithms Approximation and Probabilistic Results Dimension Reduction Randomized Algorithms - Strategies and Benefits Column and Row Sampling Research Activity CUR and Maximum Volume Open Problems and Future Research Directions

Next major result: matrix vector multiplication in O(d log d + |P|).

Fast Johnson-Lindenstrauss Transform [Ailon Chazelle 2006] k×d d×d l Let Φ = PHD P ∈ R H, D ∈ R d = 2

−1 Pij ∼ N(0, q ) with probability q log2 n Pij = 0 with probability 1 − q q = min(Θ( d ), 1)

1 1    d− 2 d− 2  Hq Hq h H2 = 1 1 and H2q := q = 2 , h = 1,... l d− 2 −d− 2 Hq −Hq

D is a diagonal matrix with dii drawn uniformly from {1, −1}. 2 Then we have that with probability 3 that:

(1 − )kkXi k2 ≤ kΦXi k2 ≤ (1 + )kkXi k2 Introduction Classical Results Rationale for Approximation Algorithms Approximation and Probabilistic Results Dimension Reduction Randomized Algorithms - Strategies and Benefits Column and Row Sampling Research Activity CUR and Maximum Volume Open Problems and Future Research Directions

Relative-Error Bound (Frobenius norm) [Sarlos 2006] m×n Let A ∈ R . If Φ is an r × n J-L transform with i.i.d. zero mean k entries {−1, +1} for r = Θ(  + k log k) and if  ∈ (0, 1), then with probability ≥ .5, we have that:

kA − ProjAΦT ,k (A)kF ≤ (1 + )kA − Ak kF where ProjAΦT ,k (A) is the best rank k approximation of the projection of A in the column space of AΦT .

Papadimitriou et al. (2000) first applied random projections for Latent Semantic Indexing (LSI) and derived an additive error bound result. Introduction Classical Results Rationale for Approximation Algorithms Approximation and Probabilistic Results Dimension Reduction Randomized Algorithms - Strategies and Benefits Column and Row Sampling Research Activity CUR and Maximum Volume Open Problems and Future Research Directions A relative-error bound in the spectral norm uses a power iteration to offset any slow singular value decay. Relative-Error Bound (Spectral norm) [Halko et al. 2011] m×n Let A ∈ R . If B is an n × 2k Gaussian matrix and Y = (AA∗)qAB such that q is a small non-negative integer and 2k is the target rank approximation where 2 ≤ k ≤ 0.5min{m, n} then: r 1  2 min (m, n) 2q+1 kA − Proj (A)k ≤ 1 + 4 kA − A k E Y ,2k 2 k − 1 k 2

A power iteration increases A0s largest singular values improving accuracy A refined proof [Woodruff 2014] gave a rank-k approximation Introduction Classical Results Rationale for Approximation Algorithms Approximation and Probabilistic Results Dimension Reduction Randomized Algorithms - Strategies and Benefits Column and Row Sampling Research Activity CUR and Maximum Volume Open Problems and Future Research Directions

From the Relative-Error Bound results of Sarl´osand Halko et al.:

With l > k random linear combinations of A0s columns ⇒ We can obtain a rank k approximation of A

How and why? Multiplying A by random vector x gives y ∈ colspace(A) With high probability y 0s are linearly independent We get a new approximate basis Ab for A with dimension l Project A on to Ab Get a rank k matrix approximation of this projection Introduction Classical Results Rationale for Approximation Algorithms Approximation and Probabilistic Results Dimension Reduction Randomized Algorithms - Strategies and Benefits Column and Row Sampling Research Activity CUR and Maximum Volume Open Problems and Future Research Directions

Consider the existence result of Ruston (1962) for a collection of k m×n columns, C, in A ∈ R :

† p kA − CC Ak2 ≤ 1 + k(n − k)kA − Ak k2

The CX approximation is A ≈ CX where X := C †A

Sampling with Euclidean norms of matrix columns [Frieze Kannan Vempala 2004] to get additive error bounds

Sampling according to the top right singular vectors [Boutsidis Mahoney Drineas 2010] for relative error bounds Introduction Classical Results Rationale for Approximation Algorithms Approximation and Probabilistic Results Dimension Reduction Randomized Algorithms - Strategies and Benefits Column and Row Sampling Research Activity CUR and Maximum Volume Open Problems and Future Research Directions

Another approach to LRA extends column sampling to also include row sampling:

Extensions to both CX probability distribution approaches Approximation error proportional to square of the CX error

General Approach: Sample c columns of A to get C as in CX Sample r rows from A using a probability distribution constructed from C Re-scale the selected rows and columns Additional processing steps to get an LRA Introduction Classical Results Rationale for Approximation Algorithms Approximation and Probabilistic Results Dimension Reduction Randomized Algorithms - Strategies and Benefits Column and Row Sampling Research Activity CUR and Maximum Volume Open Problems and Future Research Directions More recent directions include CUR with volume sampling: Pseudo-Skeleton Approximation [Goreinov et al 1997] m×n Suppose A ∈ R . Then there exists a set of k columns and rows, C and R, in A as given by their index sets c and r, k×k respectively, and a matrix U ∈ R such that: √ √ √ kA − CURk2 ≤ O( k( m + n))kA − Ak k2

Maximal Volume for LRA [Goreinov and Tyrtyshnikov 2001] Suppose Ab is a CUR approximation of the form given above and U = A(r, c)−1. If A(r, c) has maximal determinant modulus of all k × k submatrices of A, then

kA − AbkC ≤ (k + 1)kA − Ak k2 Introduction Classical Results Rationale for Approximation Algorithms Approximation and Probabilistic Results Dimension Reduction Randomized Algorithms - Strategies and Benefits Column and Row Sampling Research Activity CUR and Maximum Volume Open Problems and Future Research Directions

CUR approximation of A depends on finding a sufficiently large volume submatrix:

Submatrix is the intersection of C and R in the CUR Volume quantifies the orthogonality of matrix columns It is NP-hard to find a submatrix of maximal volume Greedy algorithms find approximate maximal volume Introduction Classical Results Dimension Reduction Approximation and Probabilistic Results Variations on Dimension Reduction Algorithm Randomized Algorithms - Strategies and Benefits Tradeoffs with Randomized Maps Research Activity CUR Decomposition and Cross Approximation Open Problems and Future Research Directions

This random projection algorithm follows from J-L Lemma and Relative-Error Bound Results:

m×n Input: A ∈ R Input: rank k, oversampling parameter p m×(k+p) (k+p)×n Output: B ∈ R , C ∈ R l ← k + p n×l Construct random Gaussian matrix G ∈ R Y ← A · G Get an orthogonal basis matrix Q for Y B ← Q C ← Q∗ · A Output B, C Algorithm 1: Dimension Reduction [Halko et al 2011] Introduction Classical Results Dimension Reduction Approximation and Probabilistic Results Variations on Dimension Reduction Algorithm Randomized Algorithms - Strategies and Benefits Tradeoffs with Randomized Maps Research Activity CUR Decomposition and Cross Approximation Open Problems and Future Research Directions

To get a rank l SVD approximation for A using algorithm output: 1 Run an SVD algorithm on the matrix C = UbΣV ∗ 2 U ← B · Ub

Comments on the algorithm Algorithm itself uses conventional steps on smaller matrices Matrix Matrix mult. (block operation) preferable for A Costliest step is Y ← A · G requiring O(mnl) ops QR factorization may avoid overhead of column pivoting Oversampling parameter typically higher with other random matrices Introduction Classical Results Dimension Reduction Approximation and Probabilistic Results Variations on Dimension Reduction Algorithm Randomized Algorithms - Strategies and Benefits Tradeoffs with Randomized Maps Research Activity CUR Decomposition and Cross Approximation Open Problems and Future Research Directions

Other possibilities:

Introduce parallelism for matrix

SRFT/SHRT random multipliers reduce multiplication cost to O(mn log l)

Superfast abridged (sparse) versions of SRFT/SHRT allow further cost reduction though no probability guarantee. Introduction Classical Results Dimension Reduction Approximation and Probabilistic Results Variations on Dimension Reduction Algorithm Randomized Algorithms - Strategies and Benefits Tradeoffs with Randomized Maps Research Activity CUR Decomposition and Cross Approximation Open Problems and Future Research Directions p n Subsampled Random Hadamard Transform (SRHT) is l DHR

n×n D ∈ C is diagonal matrix of random {-1, +1} entries H is the n×l R ∈ I has random columns from the

Gaussian random matrices Have to generate n × l entries, also expensive multiplication Probability of failure is 3e−p

Fast SRFT/SRHT Recursive Divide and Conquer ⇒ smaller complexity cost Only n + l random entries needed 1 Probability of failure rises: O( k ) for rank-k approximation Non-sequential memory access ⇒ memory bottlenecks Introduction Classical Results Dimension Reduction Approximation and Probabilistic Results Variations on Dimension Reduction Algorithm Randomized Algorithms - Strategies and Benefits Tradeoffs with Randomized Maps Research Activity CUR Decomposition and Cross Approximation Open Problems and Future Research Directions In general, desirable properties of random multipliers include:

Orthogonal Sparse (but not too sparse) Structured

Questions to consider with regard to SRFT/SRHT: Are there alternatives that do not have the memory issues? Concerns of FFT with limited parallelization Alternatives - tradeoff arithmetic complexity for better memory performance and parallelization? Can we have the best of both worlds?

Results on different multipliers to be shown from my own research ... Introduction Classical Results Dimension Reduction Approximation and Probabilistic Results Variations on Dimension Reduction Algorithm Randomized Algorithms - Strategies and Benefits Tradeoffs with Randomized Maps Research Activity CUR Decomposition and Cross Approximation Open Problems and Future Research Directions CUR Cross-Approximation

W3 W2 W1

W1

W1

W1

The first three recursive steps of a Cross Approximation algorithm output of three striped matrices W1, W2, and W3

Adapted from Low Rank Approximation: New Insights, Accurate Superfast Algorithms, Pre-processing and Extensions, Victor Y. Pan, Qi Luan, John Svadlenka, Liang Zhao 2017

Introduction Classical Results Dimension Reduction Approximation and Probabilistic Results Variations on Dimension Reduction Algorithm Randomized Algorithms - Strategies and Benefits Tradeoffs with Randomized Maps Research Activity CUR Decomposition and Cross Approximation Open Problems and Future Research Directions To complete the CUR approximation: 1 Form the matrix U by getting the inverse of A(I , J) 2 Set C = A(:, J) and R = A(I , :)

How to approximate the maximum volume: Use RRLU or RRQR algorithms Example: LU Factorization to generate an upper [CT Pan 2000] n n×n Q For triangular matrix T ∈ R : det(T ) = tii i=1 Goal is to maximize absolute values on T 0s diagonal Involves column interchanges and searching for maximum absolute-valued elements Introduction Classical Results Dimension Reduction Approximation and Probabilistic Results Variations on Dimension Reduction Algorithm Randomized Algorithms - Strategies and Benefits Tradeoffs with Randomized Maps Research Activity CUR Decomposition and Cross Approximation Open Problems and Future Research Directions

Some comments on the CUR Cross Approximation:

As with Dimension Reduction, runs an algorithm on smaller matrix than A Each pass through the algorithm’s loop requires only O((m + n)k2) ops Implications of not using all matrix entries in the algorithm? How to parallelize this algorithm? Perhaps Divide and Conquer approach with small blocks. Introduction Classical Results Approximation and Probabilistic Results Dimension Reduction Randomized Algorithms - Strategies and Benefits CUR Approximation Research Activity Open Problems and Future Research Directions Formulate random multipliers with the strategy: 1 Utilize structured, sparse primitive matrices of random (Gaussian, Bernoulli) variables to form families of random multipliers B t n×l P 2 B ∈ R , B = Bi and t is a small constant i=1 3 Bi are chosen and applied from the following classes: Abridged and Permuted Hadamard APH (with optional scaling S) Orthogonal Permutation matrix P Inverse bidiagonal matrix IBD :(I + SZ)−1 0 ...... 0  .. ..  1 . . 0 S is a diagonal matrix, and Z =    .. ..  0 . . 0 .  . ... 1 0 Introduction Classical Results Approximation and Probabilistic Results Dimension Reduction Randomized Algorithms - Strategies and Benefits CUR Approximation Research Activity Open Problems and Future Research Directions Numerical Experiments: Relative errors with various multipliers

SVD-generated Matrices Laplacian Matrices Multiplier Sum Mean Std Mean Std Gaussian 1.07E-08 3.82E-09 2.05E-13 1.62E-13 ASPH, 2 IBD 1.23E-08 5.84E-09 1.69E-13 1.34E-13 ASPH, 3 IBD 1.33E-08 1.00E-08 1.98E-13 1.30E-13 3 IBD 1.18E-08 6.23E-09 1.78E-13 1.42E-13 APH, 3 IBD 1.28E-08 1.40E-08 2.33E-13 3.44E-13 APH, 2 IBD 1.43E-08 1.87E-08 1.78E-13 1.61E-13 ASPH, 1 P 1.22E-08 1.26E-08 2.21E-13 2.83E-13 ASPH, 2 P 1.51E-08 1.18E-08 3.57E-13 9.27E-13 ASPH, 3 P 1.19E-08 6.93E-09 2.24E-13 1.76E-13 APH, 3 P 1.26E-08 1.16E-08 2.15E-13 1.70E-13 APH, 2 P 1.31E-08 1.18E-08 1.25E-14 5.16E-14 Introduction Classical Results Approximation and Probabilistic Results Dimension Reduction Randomized Algorithms - Strategies and Benefits CUR Approximation Research Activity Open Problems and Future Research Directions

Investigate novel approaches that decrease computation:

Sum of IBD’s without APH, ASPH IBD is a rank structured matrix: low rank off-diagonal blocks

Matrix Matrix Multiplication with IBD is O((n + l)m) ops

Good spatial and temporal locality (unlike SRFT/SRHT)

Generalize to other rank structured matrices? Introduction Classical Results Approximation and Probabilistic Results Dimension Reduction Randomized Algorithms - Strategies and Benefits CUR Approximation Research Activity Open Problems and Future Research Directions

Our numerical experiments are promising, but new directions to be investigated and from computational perspective: Incorporate approximate leverage scores Avoid random memory access (max element searching, column and row interchanges) Look for matrix matrix multiplication possibilities instead Extensions to tensors? Introduction Classical Results Approximation and Probabilistic Results Dimension Reduction Randomized Algorithms - Strategies and Benefits CUR Approximation Research Activity Open Problems and Future Research Directions CUR Cross Approximation Benchmark Results

Inputs rank mean std baart 6 1.94e-07 3.57e-09 shaw 12 3.02e-07 6.84e-09 gravity 25 3.35e-07 1.97e-07 wing 4 1.92e-06 8.78e-09 foxgood 10 7.25e-06 1.09e-06 inverse Laplace 25 2.40e-07 6.88e-08 Table: CUR approximation of benchmark 1000 × 1000 input matrices (at the numerical rank of the input matrices) of discretized Integral Equations from the San Jose University Singular Matrix Database Introduction Classical Results Approximation and Probabilistic Results Randomized Algorithms - Strategies and Benefits Research Activity Open Problems and Future Research Directions

Open Problems: Do there exist random multipliers for Dimension Reduction such that Matrix Matrix multiplication can be done faster than O(mn log n)?

Does their exist a CUR approximation algorithm with a relative error (1 + ) bound in the spectral norm?

Future Research Directions:

Theoretical, algorithmic, and computational research in Low Rank Approximation, its applications, and related problem areas Introduction Classical Results Approximation and Probabilistic Results Randomized Algorithms - Strategies and Benefits Research Activity Open Problems and Future Research Directions Acknowledgements

I would like to thank my mentor, Professor Victor Pan, for his thoughtful guidance, insight, and support throughout my doctoral education. I am also grateful to Professors Feng Gu and xxxxx for their participation and interest as committee members for my Second Exam. Thank you. Introduction Classical Results Approximation and Probabilistic Results Randomized Algorithms - Strategies and Benefits Research Activity Open Problems and Future Research Directions ReferencesI

N. Halko, P. G. Martinsson, J. A. Tropp, Finding Structure with Randomness: Probabilistic Algorithms for Approximate Matrix Decompositions, SIAM Review, 53, 2, 217–288, 2011. M. W. Mahoney, Randomized Algorithms for Matrices and Data, Foundations and Trends in Machine Learning, NOW Publishers, 3, 2, 2011. Preprint: arXiv:1104.5557 (2011) (Abridged version in: Advances in Machine Learning and Data Mining for Astronomy, edited by M. J. Way et al., 647–672, 2012.) Woodruff, David P., Sketching as a tool for numerical linear algebra, Foundations and Trends R in Theoretical Computer Science, 10, 1–2, 1–157, 2014. Introduction Classical Results Approximation and Probabilistic Results Randomized Algorithms - Strategies and Benefits Research Activity Open Problems and Future Research Directions ReferencesII

T. Sarl´os,Improved Approximation Algorithms for Large Matrices via Random Projections, Proceedings of IEEE Symposium on Foundations of Computer Science (FOCS), 143–152, 2006. Golub, Gene H., and Christian Reinsch, Singular value decomposition and least squares solutions, Numerische mathematik, 14, 5, 403–420, 1970. Axler, Sheldon Jay, Linear Algebra Done Right, Springer, New York, NY, 1997 (second edition). Introduction Classical Results Approximation and Probabilistic Results Randomized Algorithms - Strategies and Benefits Research Activity Open Problems and Future Research Directions ReferencesIII

S. A. Goreinov, E. E. Tyrtyshnikov and N. L. Zamarashkin, A theory of pseudo-skeleton approximation, Linear Algebra And Its Applications, 261, 1–21,1997. E. Liberty, F. Woolfe, P. G. Martinsson, V. Rokhlin and M. Tygert, Randomized algorithms for the low rank approximation of matrices, PNAS, 104, 51, 20167-20172, 2007. F. Woolfe, E. Liberty, V. Rokhlin, and M. Tygert, A fast randomized algorithm for the approximation of matrices, Technical Report YALEU/DCS/TR-1380, Yale University Department of Computer Science, New Haven, CT, 2007. Introduction Classical Results Approximation and Probabilistic Results Randomized Algorithms - Strategies and Benefits Research Activity Open Problems and Future Research Directions ReferencesIV

W. B. Johnson and J. Lindenstrauss, Extension of Lipschitz mapping into Hilbert spaces, Proc. of modern analysis and probability, Contemporary Mathematics, 26, 189-206, 1984. N. Ailon and B. Chazelle, Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform, STOC 2006: Proc. 38th Ann. ACM Theory of Computing, 557-563, 2006. Drineas, Petros, Michael W. Mahoney, and S. Muthukrishnan, Relative-error CUR matrix decompositions, SIAM Journal on Matrix Analysis and Applications, 30, 2, 844-881, 2008. Introduction Classical Results Approximation and Probabilistic Results Randomized Algorithms - Strategies and Benefits Research Activity Open Problems and Future Research Directions ReferencesV

C. H. Papadimitriou, P. Raghavan, H. Tamaki and S. Vempala, Latent Semantic Indexing: A probabilistic analysis, Journal of Computer and System Sciences, 61, 2, 217-235, 2000. S. A. Goreinov, N. L. Zamarashkin and E. E. Tyrtyshnikov, Pseudo-skeleton approximations by matrices of maximal volume, Mathematical Notes, 62, 4, 515-519, 1997. S. A. Goreinov and E. E. Tyrtyshnikov, The maximal-volume concept in approximation by low-rank matrices, Contemporary Mathematics, 208, 47-51, 2001. Introduction Classical Results Approximation and Probabilistic Results Randomized Algorithms - Strategies and Benefits Research Activity Open Problems and Future Research Directions ReferencesVI

D. Achlioptas, Database-friendly random projections, Proc. ACM Symp. on the Principles of Database Systems, 274-281, 2001. C.-T. Pan, On the existence and computation of rank-revealing LU factorizations, Linear Algebra and its Applications, 316, 199–222, 2000. V. Y. Pan, Structured Matrices and Polynomials: Unified Superfast Algorithms, Birkh¨auser/Springer, Boston/New York, 2001. Pan, Victor, John Svadlenka, and Liang Zhao, Fast Derandomized Low-rank Approximation and Extensions, CoRR, abs/1607.05801, 2016. Introduction Classical Results Approximation and Probabilistic Results Randomized Algorithms - Strategies and Benefits Research Activity Open Problems and Future Research Directions ReferencesVII

Rudelson, Mark, and Roman Vershynin, Non-asymptotic theory of random matrices: extreme singular values, CoRR, abs/1003.2990v2, 2010. Dasgupta, Sanjoy, and Anupam Gupta, An elementary proof of a theorem of Johnson and Lindenstrauss, Random Structures and Algorithms, 22, 1, 60-65, 2003. A. Frieze, R. Kannan and S. Vempala, Fast Monte-Carlo algorithms for finding low-rank approximations, Journal of the ACM 51, 6, 1025-1041, 2004. Introduction Classical Results Approximation and Probabilistic Results Randomized Algorithms - Strategies and Benefits Research Activity Open Problems and Future Research Directions ReferencesVIII

Akin, Berkin, Franz Franchetti, and James C. Hoe, FFTs with near-optimal memory access through block data layouts, Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. IEEE, 2014. Barba, Lorena A., and Rio Yokota, How will the fast multipole method fare in the exascale era, SIAM News, 46, 6, 1-3, 2013. M. Gu and S. C. Eisenstat, Efficient algorithms for computing a strong rank-revealing QR factorization, SIAM Journal of Scientific Computing, 17, 4, 848-869, 1996. Introduction Classical Results Approximation and Probabilistic Results Randomized Algorithms - Strategies and Benefits Research Activity Open Problems and Future Research Directions ReferencesIX

Lindtjorn, Olav, et al, Beyond traditional microprocessors for geoscience high-performance computing applications, Ieee Micro, 31, 2, 41-49, 2011. Ruston, A., Auerbach’s theorem and tensor products of Banach spaces, Mathematical Proceedings of the Cambridge Philosophical Society, 58, 3, doi:10.1017/S0305004100036744, 476-480, 1962. Cheng, Hongwei, et al., On the compression of low rank matrices, SIAM Journal on Scientific Computing, 26, 4, 1389-1404, 2005. Introduction Classical Results Approximation and Probabilistic Results Randomized Algorithms - Strategies and Benefits Research Activity Open Problems and Future Research Directions APPENDIX:Traditional Applications

Applications of matrix computations have typically included: Physical Sciences and Engineering Data Collection and Analysis Computer Graphics Biological and Life Sciences

The Theoretical Computer Science (TCS) perspective is increasingly important: Cross-fertilization of research in both fields Demands of new applications of interest to TCS Shortcomings of conventional LRA algorithms Introduction Classical Results Approximation and Probabilistic Results Randomized Algorithms - Strategies and Benefits Research Activity Open Problems and Future Research Directions APPENDIX: Two-sided Interpolative Decomposition

Two-sided Interpolative Decomposition Theorem [Cheng et al 2005] Let A be an m × n matrix and k ≤ min(m, n). Then there exists:

I  A = P k A I |T  P∗ + X L S S k R

(m−k)×k such that PL and PR are permutation matrices. S ∈ C and k×(n−k) T ∈ C and X satisfy: p kSkF ≤ k(m − k) p kT kF ≤ k(n − k) p kX k2 ≤ σk+1(A) 1 + k(min(m, n) − k) Introduction Classical Results Approximation and Probabilistic Results Randomized Algorithms - Strategies and Benefits Research Activity Open Problems and Future Research Directions APPENDIX: Deterministic Algorithms

Theorem

Gram-Schmidt and QR Factorization: Suppose (a1, a2,..., an) is a linearly independent list of vectors in an inner product space V . Then there is an orthonormal list of vectors (q1, q2,..., qn) such that span(a1, a2,..., an) = span(q1, q2,..., qn).

Proof. Let proj(r, s) := r denote the projection of r on to s w1 := a1 w2 := a2 − proj(a2, w1) . . wn = an − proj(an, w1) − proj(an, w2) − · · · − proj(an, wn−1) q1 = w1/kw1k, q2 = w2/kw2k,..., qn = wn/kwnk Introduction Classical Results Approximation and Probabilistic Results Randomized Algorithms - Strategies and Benefits Research Activity Open Problems and Future Research Directions APPENDIX: Deterministic Algorithms

Re-arranging equations for w1, w2,..., wn to be equations with a1, a2,..., an on the left-hand side and replacing wi with qi gives A = Q · R where A = [a1, a2,..., an]

Q = [q1, q2,..., qn]   < q1, a1 > < q1, a2 > < q1, a3 > . . . < q1, an >  0 < q2, a2 > . . . < q2, an−1 > < q2, an >    0 0 < q3, a3 > . . . < q3, an > R =    . . . . .   . . . . .  0 0 0 0 < qn, an > Introduction Classical Results Approximation and Probabilistic Results Randomized Algorithms - Strategies and Benefits Research Activity Open Problems and Future Research Directions APPENDIX: Deterministic Algorithms

As a QR Gram Schmidt alternative, consider an product Q1Q2 ... Qn that transforms A to upper triangular form R

(Qn ... Q2Q1)A = R −1 Multiplying both sides by (Qn ... Q2Q1) , we have that:

−1 −1 (Qn ... Q2Q1) (Qn ... Q2Q1)A = (Qn ... Q2Q1) R

A = Q1Q2 ... QnR A product of orthogonal matrices is also orthogonal so allowing for column-pivoting we have that:

AΠ = Q1Q2 ... QnR

A Householder reflection matrix is used for each Qi , i = 1, 2,..., n to transform A to R column-wise... Introduction Classical Results Approximation and Probabilistic Results Randomized Algorithms - Strategies and Benefits Research Activity Open Problems and Future Research Directions APPENDIX: Deterministic Algorithms

A Householder matrix vector multiplication Hx = (I − 2vv T )x reflects a vector x across the hyperplane normal to v.

Unit vector v is constructed for each Qi Householder matrix so that entries of column i below the diagonal of A vanish.

x = (aii , aii+1,..., ain) for column i v depends upon x and the standard basis vector ei The matrix product Qi · A is applied The above items are repeated for each column of A Impact to QR algorithm: Householder matrices improve numerical stability But each matrix Qi is applied separately to A Therefore, parallelism options are limited Introduction Classical Results Approximation and Probabilistic Results Randomized Algorithms - Strategies and Benefits Research Activity Open Problems and Future Research Directions APPENDIX: Deterministic Algorithms

SVD decomposition of A = UΣV occurs in two distinct steps: 1st Step: Use two sequences of Householder translations to reduce A to upper bidiagonal form:

B = Qn ... Q2Q1AP1P2 ... Pn−2

Therefore, we have that: A = Q1Q2 ... QnBPn−2 ... P2P1 2nd Step: Use two sequences of Givens rotations (orthogonal transformations) to reduce B to diagonal form Σ

Σ = Gn−1 ... G2G1BF1F2 ... Fn−1

Likewise, we have that: B = G1G2 ... Gn−1ΣFn−1 ... F2F1

Set U := Q1Q2 ... QnG1G2 ... Gn−1 ∗ ∗ Set V := (F1F2 ... Fn−1) (P1P2 ... Pn−2) Introduction Classical Results Approximation and Probabilistic Results Randomized Algorithms - Strategies and Benefits Research Activity Open Problems and Future Research Directions APPENDIX: Deterministic Algorithms

SVD cost is O(mn max(m,n)) QR cost is O(kmn) for a rank-k approximation

Random memory access (eg, column pivoting) contributes to memory bottlenecks This is especially the case for out-of-core data sets The standard QR algorithm forms Q from a product of Householder reflector matrices which permits better numerical stability.