arXiv:1006.5086v1 [stat.CO] 26 Jun 2010 smlin fvrals hc ae ta trciecoc o aylar many for choice attractive an it makes which algorithm variables, efficient of and millions Fast as sparseness. achieves and zero toward and oesmlrta hs rmmr itn eltps[ types cell distant fr more networks from regulatory those gene than inference, ord similar network are more gene chromo measurements their dynamic (MS) by In spectrometry naturally mass ordered from nu are fragments copy genes as and such patterns, features cation chromosomal genomics, smooth In and applications. sparseness both achieves method the such, As zero. h sa u fsurderr,btadtoal eaie the penalizes additionally but procedu errors, Lasso squared the of is sum example usual used the widely inc One are coefficients procedures. in sification sparsity encourage that terms Regularization Introduction 1. eetosrain. ue as nstececet flna r linear of coefficients the finds function Lasso Fused observations.) ferent h euaiaintr ihparameter with term regularization the ftenndffrniblt fthe of non-differentiability the of hr h euaiaintr ihparameter with term regularization the where oprtv eoi yrdzto CH aa[ v number data copy (CGH) DNA hybridization particularly detect genomic areas to these Lasso comparative fused in applied applications successfully found Wang has it surprisingly, not on of term coe regression regularization regression linear additional the in an Consider ordering placing ficients. by natural account certain into is ordering there where situation h ue as ehditoue yTbhrn ta.[ al. et Tibshirani by introduced method Lasso fused The ersino lsicto aibe ihsm neetodrn oc ordering inherent some with variables classification or Regression y pi rga ehdfrlresaefsdLasso fused scale large for method Bregman Split i r h epne.(eassume (We responses. the are ewrsadphrases: and Keywords Abstract: xsigsles n hwta ti seilyecetfor efficient meth especially our is that it that demonstrate We show and CGH. solvers, array includi existing from applications de data real-world We Th genomic and classifier. properties. and data convergence vector artificial its support both prove on fused Lasso and large-scale fused of method a class Lagrangian and a speci paper solve Lasso a this to or In fused method size, matrix. Bregman identity medium split is or on matrix small predictor of the problems which with deal only can euaiaintr,sligtefsdLsopolmi com is problem Lasso fused the solving term, regularization ue as xlisti reigb xlctyregulariz explicitly by an ordering through efficients this exploits Lasso Fused reigo ersino lsicto offiinsoccurs coefficients classification or regression of Ordering Φ( nttt o eoisadBonomtc,Uiest fC of University Bioinformatics, and Genomics for Institute β = ) eateto optrSine nvriyo California, of University Science, Computer of Department ℓ 1 2 1 omrglrzr oee,det osprblt n nons and nonseparability to due However, regularizer. norm i =1 n ue as,Bemniteration, Bregman Lasso, Fused ℓ 1 y e-mail: om h as rcdr ed osrn h ersincoeffic regression the shrink to tends procedure Lasso the norm, u-oY n iou Xie Xiaohui and Ye Gui-Bo i x − ij matgnrcvr 090/3fl:SFas.e ae J date: SBFLasso.tex file: 2009/08/13 ver. imsart-generic λ { j y , 2 ( =1 p x i [email protected] hik h ieecsbtennihoigcecet toward coefficients neighboring between differences the shrinks i x y , r tnadzdwt eoma n ntvrac cosdif- across variance unit and mean zero with standardized are λ ij i 1 β ) } norgstesast ftergeso offiins while coefficients, regression the of sparsity the encourages j 3 i n =1 .Tbhrn ta.ue ue as oslc proteomic select to Lasso fused used al. et Tibshirani ]. 2 where , 1 + ; 2 [email protected] λ .FsdLsoepot hs aua reig so, ordering, natural these exploits Lasso Fused ]. 1 i n h ieecsbtennihoigco- neighboring between differences the ing =1 gpoemcdt rmms spectrometry mass from data proteomic ng p large x 1 ℓ epooea trtv loih based algorithm iterative an propose we , uainlydmnig xsigsolvers Existing demanding. putationally 1 ℓ sa xeso fLso n osdr the considers and Lasso, of extension an is ] i | as rbes nldn generalized a including problems, Lasso nr,Ms pcrmty ra CGH. Array spectrometry, Mass -norm, 1 β efrac formto stested is method our of performance e ( = esi h ersincoefficients. regression the in ness i rdb hi ast-hrerto (m/z). ratios mass-to-charge their by ered r vial osleLsowt smany as with Lasso solve to available are s omo h ersincecet.Because coefficients. regression the of norm brvrain CV,eieei modifi- epigenetic (CNV), variations mber p lcs ftefsdLsopolmin problem Lasso fused the of case al geso ymnmzn h olwn loss following the minimizing by egression | esnl en sdi ersinadclas- and regression in used being reasingly ieoragrtmuigaugmented using algorithm our rive oa oain.I rtois molecular proteomics, In locations. somal mdvlpetlycoe eltpsare types cell closer developmentally om small , + cet.FsdLsotksti natural this takes Lasso Fused fficients. x efrlna ersin hc minimizes which regression, linear for re i di aytmsfse hnthe than faster times many is od λ rain ntmrsmlsuigarray using samples tumor in ariations 1 nmn elwrdapplications. real-world many in utbe o xml,Tbhrn and Tibshirani example, For suitable. 2 x , . . . , h ieecso nihoig coef- “neighboring” of differences the i =2 n lfri,Irvine alifornia, esaera-ol applications. real-world ge-scale p u aual nmn real-world many in naturally cur problems. | β ip Irvine i ) − T β r h rdco variables predictor the are i otns fthe of moothness − 1 | , n 9 2010 29, une ients (1) features that can separate tumor vs normal samples [1]. In addition to the application areas mentioned above, fused Lasso or the extension of it has also found applications in a number of other areas, including image denoising, social networks [2], quantitative trait network analysis [4], etc. The loss function in (1) is strictly convex, so a global optimal solution is guaranteed to exist. However, finding the optimal solution is computationally challenging due to the nondifferentiability of Φ(β). Existing methods circumvent the nondifferentiability of Φ(β) by introducing 2p−1 additional variables and converting the unconstrained optimization problem into a constrained one with 6p − 1 linear inequality constraints. Standard convex optimization tools such as SQOPT [5] and CVX [6] can then be applied. Because of the large number of variables introduced, these methods are computationally demanding in terms of both time and space, and, in practice, have only been able to solve fused Lasso problems with small or medium sizes. Component-wise coordinate descent has been proposed as an efficient approach for solving many l1 regu- larized convex optimization problems, including Lasso, grouped Lasso, elastic nets, graphical Lasso, logistic regression, etc [7]. However, coordinate descent cannot be applied to the fused Lasso problem because the variables in the loss function Φ(β) are nonseparable due to the second regularization term, and as such, convergence is not guaranteed [8]. For a special class of fused Lasso problems, named fused Lasso signal approximator (FLSA), where the predictor variables xij = 1 for all i = j and 0 otherwise, there are algorithms available to solve it efficiently. A key observation of FLSA first noticed by Friedman et al. is that for fixed λ1, increasing λ2 can only cause pairs of variables to fuse and they become unfused for any larger values of λ2. This observation allows Friedman et al. to develop a fusion algorithm to solve FLSA for a path of λ2 values by keeping track of fused variables and using coordinate descent for component-wise optimization. The fusion algorithm was later extended and generalized by Hoefling [9]. However, for the fusion algorithm to work, the solution path as a function of λ2 has to be piecewise linear , which is not true for the general fused Lasso problem [10]. As such, these algorithms are not applicable to the general fused Lasso case. In this paper, we propose a new method based on the split Bregman iteration for solving the general fused Lasso problem. Although the Bregman iteration was an old technique proposed in the sixties [11, 12], it gained significant interest only recently after Osher and his coauthors demonstrated its high efficiency for image restoration [13, 14, 15]. Most recently, it has also been shown to be an efficient tool for compressed sensing [16, 17, 18], matrix completion [19] and low rank matrix recovery [20]. In the following, we will show that the general fused Lasso problem can be reformulated so that split Bregman iteration can be readily applied. The rest of the paper is organized as follows. In Section 2, we derive algorithms for a class of fused Lasso problems from augmented Lagrangian function including SBFLasso for general fused Lasso, SBFLSA for FLSA and SBFLSVM for fused Lasso support vector classifier. The convergence properties of our algorithms are also presented. We demonstrate the performance and effectiveness of the algorithm through numerical examples in Section 3, and describe additional implementation details. Algorithms described in this paper are implemented in Matlab and are freely available from the authors.
2. Algorithms
2.1. Split Bregman iteration for a generalized fused Lasso problem
We first describe our algorithm in a more general setting than the one described in (1). Instead of the quadratic error function, we allow the error function to be any convex function of the regression coefficients. In addition, we relax the assumption that the coefficients should be ordered along a line as in (1), and allow the ordering to be specified arbitrarily, e.g., according to a graph. For the generalized fused Lasso, we find β by solving the following unconstrained optimization problem
min V (β)+ λ1 β 1 + λ2 Lβ 1, (2) β
where V (β)= V (β; X,y) is the error term, the regularization term with parameter λ1 encourages the sparsity of β as before, and the regularization term with parameter λ2 shrinks the differences between neighboring variables as specified in matrix L toward zero. We assume L is an m × p matrix. In the standard fused Lasso 2 in (1), L is simply a (p − 1) × p matrix with zeros entries everywhere except 1 in the diagonal and −1 in the superdiagonal. The unconstrained problem (2) can be reformulated into an equivalent constrained problem
min V (β)+ λ1 a 1 + λ2 b 1 β s.t. a = β b = Lβ. (3)
Although split Bregman methods originated from Bregman iterations [16, 13, 18, 21, 14, 15], it is more convenient to derive the split Bregman method for the generalized Lasso using the augmented Lagrangian method [22, 23]. Note that the Lagrangian function of (3) is
L˜(β,a,b,u,v)= V (β)+ λ1 a 1 + λ2 b 1 + u,β − a + v,Lβ − b , (4) where u ∈ Rp is a dual variable corresponding to the linear constraint β = a, v ∈ Rm is a dual variable corresponding to the linear constraint Lβ = b, , denotes the standard inner product in Euclidean space. µ1 2 The augmented Lagrangian function of (3) is similar to the (4) except for adding two terms 2 β − a 2 + µ2 2 2 Lβ − b 2 to penalize the violation of linear constraints β = a and Lβ = b. That is,