arXiv:1504.06729v4 [cs.DS] 12 Jul 2016 pia rnia opnn nlssi itiue and Distributed in Analysis Component Principal Optimal oal,orrslsd o eedo h odto ubrof number condition the on depend not do results our Notably, cuayparameter accuracy oeso optto.Gvnamatrix a Given t in computation. problem (PCA) of Analysis models Component Principal the study We where .I untl tem,i hc ercieetisof entries receive we which in streams, turnstile In 4. .W yasteaoelwrbudwhen bound lower above the bypass We 2. .I h temn oe nwihteclmsarv n tat a at one arrive columns the which in model streaming the In 3. .I h rirr atto oe fKna ta.(OT201 (COLT al. et Kannan of model partition arbitrary the In 1. n oduf(TC20) n ace their matches and 2009), (STOC Woodruff and i oe on fWorf NP,21) eso ihtopas two with show We 2014). (NIPS, Woodruff O of bound lower bit pt o re em.Ti eovsa pnqeto o high for question open an resolves This terms. order low to up O O e,w hwhwt banafcoiaino a of factorization a obtain to how show we der, rvdbudof bound proved eednei nw ob eurd u esprt hsdepe this separate we but required, be to known is dependence ewe od essbt)ti sotmlfraycntn n constant any for optimal is this bits) versus words between omncto on when bound communication htec evr nadto ooutputting to addition in server, each dimensions, matrix that the of independent is communication Our an otiiga containing KD 03 iha mrvdaayi yGahm n Philli and Ghashami by analysis O improved an with 2013) (KDD, matching a eevsasbe fclms eew banan obtain we Here columns. of subset a receives matrix a A ( (( ( ( km skm/ε km/ε O k m ( ∈ [email protected] km/ε + poly( + ) R n ) m ) A poly( + ) ra ubr”o pc saheal nasnl as which pass, single a in achievable is space of numbers” “real kε e ok e York New York, New × )+poly( i nttt o nedsilnr nomto Sciences Information Interdisciplinary for Institute Ω( n hitsBoutsidis Christos − and U 1 stebs rank- best the is skφ/ε ) 0 nissa ta s esledsrbtdclm ustsele subset column distributed solve we is, (that span its in k/ε od fsae hsipoe the improves This space. of words < ε < A O sk/ε ) k/ε ( ) skm = snhaUiest,Biig China Beijing, University, Tsinghua od fsaead(pt the to (up and space of words oe on o itiue ounsbe eeto.Achie selection. subset column distributed for bound lower k ) P ) 1 A odsaeuprbud hsams ace known a matches almost This bound. upper space word [email protected] ewn ootu an output to want we , +poly( )+ od (of words i s =1 − temn Models Streaming UU A k A i ahmciesol output should machine Each . prxmto to approximation T ssas u o prei ahclm,i impossible. is column, each in sparse not but sparse is sk/ε A elnZhong Peilin O k A F 2 (log( ) Abstract ≤ od,adso noptimal an show and words, ∈ 1+ (1 R nm A m U × )) is ε upt ustof subset a outputs , n ) Ω(( , m is omncto.W banteim- the obtain We communication. bits) φ 1+ (1 k · akparameter rank a - [email protected] A sparse × O A m O lae,California Almaden, eso h following. the show We . ( k (( ai .Woodruff P. David − ε skφ/ε + A poly( ) m rhnra matrix orthonormal apoiaerank- -approximate A n B Research IBM n tatm na rirr or- arbitrary an in time a at one nec ounadec server each and column each in + ) k kε k A nε k/ε poly( + ) F 2 edsrbtdadstreaming and distributed he − . ) ahof each 4), , − m,a loih fLiberty of algorithm an ime, me fpasses. of umber 1 rcso C.A PCA. precision n civsteguarantee the achieves and ) ) 2 U ) dnefrom ndence odlwrbound. lower word emadtedistinction the and term kε s(OA 04 shows 2014) (SODA, ps anne l achieve al. et Kannan . − O < k Ω( sk/ε 2 e n a achieve can one ses ) ( k/ε efis mrv to improve first we on fClarkson of bound skm s rank ) to) eshow We ction). ahnsholds machines ) odprotocol. word k ) U oun of columns oe bound lower arxusing matrix m ( o which for A . poly( ) Ω( igour ving n an and , km/ε ε − A 1 ) ) 1 Introduction
In distributed-memory computing systems, such as, Hadoop [1] or Spark [3], Principal Compo- nent Analysis (PCA) and the related Singular Value Decomposition (SVD) of large matrices is becoming very challenging. Machine learning libraries implemented on top of such systems, for example mahout [2] or mllib [4], provide distributed PCA implementations since PCA is often used as a building block for a learning algorithm. PCA is useful for dimension reduction, noise removal, visualization, etc. In all of these implementations, the bottleneck is the communication; hence, the focus has been on minimizing the communication cost of the related algorithms, and not the computational cost, which is the bottleneck in more traditional batch systems. The data matrix corresponding to the dataset in hand, e.g., a term-document matrix represent- ing a text collection, or the Netflix matrix representing user’s ratings for different movies, could be distributed in many different ways [46]. In this paper, we focus on the following so-called arbi- trary partition and column partition models. In the arbitrary partition model, a matrix A Rm n ∈ × is arbitrarily distributed among s machines. Specifically, A = s A where A Rm n is held i=1 i i ∈ × by machine i. Unless otherwise stated, we always assume m n. In the column partition model, ≤P a matrix A Rm n is distributed column-wise among s