Some Reminders About the Lasso Estimator

Model High dimensional data situations (d >> n) Oracle inequality Some reminders about the Lasso estimator. Marius Kwemou 1,2 Supervised by: M-L. Taupin 1 And A.K. Diongue 2 1Université d’Evry val d’Essonne (France) Laboratoire de Statistique et génome. 2Université Gaston Berger de Saint-Louis (Sénégal) LERSTAD. February, 2014 Marius Kwemou Some reminders about the Lasso estimator. Model High dimensional data situations (d >> n) Oracle inequality Outline 1 Model 2 High dimensional data situations (d >> n) 3 Oracle inequality Marius Kwemou Some reminders about the Lasso estimator. Model High dimensional data situations (d >> n) Oracle inequality Outline 1 Model 2 High dimensional data situations (d >> n) 3 Oracle inequality Marius Kwemou Some reminders about the Lasso estimator. Model High dimensional data situations (d >> n) Oracle inequality linear regression model One observes n independent and identically distributed (i.i.d) couples (z1, Y1),..,(zn, Yn) from the joint distribution of d (Z, Y ) ∈ R × R such that: T yi = zi β0 + i (1) β0 is the unknown parameter to estimate. n Matrix notation, let (y, X) ∈ R × Mn,d y = Xβ0 + (2) Marius Kwemou Some reminders about the Lasso estimator. Model High dimensional data situations (d >> n) Oracle inequality linear regression model One observes n independent and identically distributed (i.i.d) couples (z1, Y1),..,(zn, Yn) from the joint distribution of d (Z, Y ) ∈ R × R such that: T yi = zi β0 + i (1) β0 is the unknown parameter to estimate. n Matrix notation, let (y, X) ∈ R × Mn,d y = Xβ0 + (2) Marius Kwemou Some reminders about the Lasso estimator. Model High dimensional data situations (d >> n) Oracle inequality Estimation If n > d, the estimator β^ of β0 is defined as follows β^ := argmax `(β), β∈Rd where ` is the likelihood function. → if ∼ N (0, σ2I), β^ is the well known OLS estimator. β^ = (X T X)−1X T y Marius Kwemou Some reminders about the Lasso estimator. Model High dimensional data situations (d >> n) Oracle inequality Outline 1 Model 2 High dimensional data situations (d >> n) 3 Oracle inequality Marius Kwemou Some reminders about the Lasso estimator. Model High dimensional data situations (d >> n) Oracle inequality Penalization If d >> n, direct maximization of likelihood can lead to: Overfitting: the classifier can only behave well in training set, and can be bad in test set. Unstable: since empirical risk is data dependent, hence random, small change in the data can lead to very different estimators. → Penalization L^(β) + pen(β) | {z } | {z } empirical risk penalty Marius Kwemou Some reminders about the Lasso estimator. Model High dimensional data situations (d >> n) Oracle inequality Penalization If d >> n, direct maximization of likelihood can lead to: Overfitting: the classifier can only behave well in training set, and can be bad in test set. Unstable: since empirical risk is data dependent, hence random, small change in the data can lead to very different estimators. → Penalization L^(β) + pen(β) | {z } | {z } empirical risk penalty Marius Kwemou Some reminders about the Lasso estimator. Model High dimensional data situations (d >> n) Oracle inequality `0-penalization Akaike information criterion (AIC) AIC(β) = 2L^(β) + 2kβk0, Bayesian information criterion (BIC) BIC(β) = 2L^(β) + 2 log(n)kβk0, where kβk0 = card{j ∈ 1,..., d, βj 6= 0}. b Produce interpretable models. Non-convex optimization. d I card(P({1,..., d})) = 2 models, I stepwise selection. Marius Kwemou Some reminders about the Lasso estimator. Model High dimensional data situations (d >> n) Oracle inequality `0-penalization Akaike information criterion (AIC) AIC(β) = 2L^(β) + 2kβk0, Bayesian information criterion (BIC) BIC(β) = 2L^(β) + 2 log(n)kβk0, where kβk0 = card{j ∈ 1,..., d, βj 6= 0}. b Produce interpretable models. Non-convex optimization. d I card(P({1,..., d})) = 2 models, I stepwise selection. Marius Kwemou Some reminders about the Lasso estimator. Model High dimensional data situations (d >> n) Oracle inequality `1−penalization Lasso [Tibshirani, 1996] d X β^(λ) = arg min L^(β) + λ |βj | β∈Rd j=1 b Regularization technique for simultaneous estimation and selection, β^L,j (λ) = 0 for j ∈/ K(β^L) ⊂ {1,..., d}. b High dimension d >> n. b Convex optimization. Marius Kwemou Some reminders about the Lasso estimator. Model High dimensional data situations (d >> n) Oracle inequality `1−penalization b If ∼ N (0, I) and X T X = I, for j ∈ {1,..., d}, β^0LS − λ/ if β^0LS > λ/ j 2 j 2 ^0LS ^0LS βj (λ) = βj + λ/2 if βj < −λ/2 0 otherwise 4 2 0 −2 −4 −4 −2 0 2 4 Marius Kwemou Some reminders about the Lasso estimator. Model High dimensional data situations (d >> n) Oracle inequality Geometric view of penalization max `(β) s.t. pen(β) ≤ ξ β Marius Kwemou Some reminders about the Lasso estimator. Model High dimensional data situations (d >> n) Oracle inequality 3.2.Régularisations! 23 Geometry of the Lasso p β2 β2 β!2 βls βls β1 β1 β!1 Fig. 3.2 –FigureComparaisons : Lasso solution des solutions. de problèmes régularisés par unenorme!1 et !2. À gauche de la figure 3.2, β!1 est l’estimateur du problème (3.2)régularisé !1 par une norme !1.Ladeuxièmecomposantedeβ est annulée, car l’ellipse Marius Kwemou Some reminders about the Lasso estimator. atteint la région admissible sur l’angle situé sur l’axe β2 = 0. À droite de la figure 3.2, β!2 est l’estimateur du problème (3.2)régulariséparunenorme !2.Laformecirculairedelarégionadmissiblen’incitepasles coefficients àatteindredesvaleursnulles. Afin de poursuivre cette discussion avec des arguments à la foissimples et formels, on peut donner l’expression d’un coefficient des estimateurs β!1 et β!2 ,lorsquelamatriceX est orthogonale (ce qui correspond à des contours circulaires pour la fonction de perte quadratique). Pour β!2 ,nous avons 1 β!2 = βls . m 1 + λ m Les coefficients subissent un rétrécissement2 proportionnel par le biais du !2 facteur 1 / (1 + λ).Enparticulier,βm ne peut être nul que si le coefficient ls !1 βm est lui même exactement nul. Pour β ,nousavons !1 ls ls βm = sign βm βm λ , | | − + ! "! " où [u]+ = max(0, u).Onobtientainsiunseuillage«doux»:lescompo- santes des coefficients des moindres carrés sont rétrécies d’une constante λ lorsque βls > λ ,etsontannuléssinon. | m| Stabilité Définition 3.2 Stabilité —SelonBreiman[1996], un problème est instable si pour des ensembles d’apprentissage similaires mais pas identiques (petites perturbations), on obtient des prédictions ou des estimateurs très différents (grande perturbation). Remarque 3.5 — Bousquet et Elisseeff [2002] ont défini de façon formelle différentes notions de stabilité, basées sur le comportement des estimateurs quand l’échantillon d’apprentissage est perturbé parleretraitoule remplacement d’un exemple. " 2Shrinkage, en anglais. Model High dimensional data situations (d >> n) Oracle inequality Choice of λ Coefficient paths Age 1.35870893 1.0 0.5 0.23826836 Coefficients 0.0 −0.5 −0.41327441 Fumeur −1.0 25 20 15 10 5 0 −0.99463175 Lambda Figure : Regularization path. Marius Kwemou Some reminders about the Lasso estimator. Model High dimensional data situations (d >> n) Oracle inequality Choice of λ Cross validation which is recommended when the goal is to minimize the prediction error. It can be computationally slow. Information criteria such as AIC or BIC can be defined for the Lasso ^ 2 AIC(λ) = Ln(β(λ)) + n df (λ), ^ log n BIC(λ) = Ln(β(λ)) + n df (λ), where df (λ) is the degrees of freedom of the Lasso for a given parameter λ. Marius Kwemou Some reminders about the Lasso estimator. Model High dimensional data situations (d >> n) Oracle inequality Choice of λ Coefficient paths Age 1.35870893 1.0 0.5 0.23826836 Coefficients 0.0 0.5 − 0.41327441 − 0 1.0 λ Fumeur − 0.99463175 25 20 15 10 5 0 − Lambda Figure : Regularization path Marius Kwemou Some reminders about the Lasso estimator. Model High dimensional data situations (d >> n) Oracle inequality Properties of the Lasso Prediction Does the Lasso provide a good approximation of Xβ0? Estimation Does the Lasso provide a good approximation of β0? Selection Does the Lasso select the right covariates? Sign recovery Does the Lasso select the right covariates and identify correctly the signs of their coefficients? Marius Kwemou Some reminders about the Lasso estimator. Model High dimensional data situations (d >> n) Oracle inequality Outline 1 Model 2 High dimensional data situations (d >> n) 3 Oracle inequality Marius Kwemou Some reminders about the Lasso estimator. Model High dimensional data situations (d >> n) Oracle inequality Model : yi = f0(xi ) + i p d X Γ ⊆ β ∈ R , fβ(.) = βj φj (.) . j=1 Oracle inequality n o R(f ^, f0) ≤ C × inf R(fβ, f0) + ∆n . (3) β β∈Γ Marius Kwemou Some reminders about the Lasso estimator. Model High dimensional data situations (d >> n) Oracle inequality Model : yi = f0(xi ) + i p d X Γ ⊆ β ∈ R , fβ(.) = βj φj (.) . j=1 Oracle inequality n o R(f ^, f0) ≤ C × inf R(fβ, f0) + ∆n . (3) β β∈Γ Marius Kwemou Some reminders about the Lasso estimator. Model High dimensional data situations (d >> n) Oracle inequality Model : yi = f0(xi ) + i p d X Γ ⊆ β ∈ R , fβ(.) = βj φj (.) . j=1 Oracle inequality n o R(f ^, f0) ≤ C × inf R(fβ, f0) + ∆n . (3) β β∈Γ Marius Kwemou Some reminders about the Lasso estimator. Model High dimensional data situations (d >> n) Oracle inequality Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B, 58(1):267–288. Marius Kwemou Some reminders about the Lasso estimator..

Load more