arXiv:1710.05513v1 [stat.ML] 16 Oct 2017 rn.Zpn hoi upre yteHn ogPDFellowsh PhD Kong palomar@ Hong [email protected]; the (e-mail: by (HKPFS). supported is Zhao Ziping grant. ie sfollows: as given onerto oeig[,5 ] EMfor VECM A 6]. 5, se [4, time modeling in problem cointegration inference and stu estimation Johansen statistical Later, relat the series. stationary time nonstationary linear the the in concept ships the describe proposed to first Granger “cointegration” and of Engle 3], [2, In var ables. ti macroeconomic and in returns t used financial for widely for modeling is series test It im- and very equilibriums. estimate cointegrated is to long-run [1] analysis cointegration (VECM) in model portant correction error vector The sparsity. group outliers, heavy-tails, simulations. numerical through of shown performance is The algorithm this problem. nonconvex proposed to the applied based is solve algorithm method efficient (MM) majorization-minimization An the on reduction. dimension spar and addition, feature tion realize In to considered the issue. are on relations this cointegration based tackle method to estimation distribution model Cauchy pap robust this In outli a methods. propose and these we of data inapplicability the heavy-tailed to lead practice, can Gaussian in underlying but, the distribution analysis assume traditional methodologies The estimation equ long-run relationships. the correction estimate variable to cointe- librium used error for is model which vector series analysis, time gration the important an is finance, (VECM) model and econometrics In where h noainwt mean with innovation the eue onerto rank cointegration reduced a ν Γ i eoe h drift, the denotes ( hswr a upre yteHn ogRC1261 researc 16206315 RGC Kong Hong the by supported was work This ne Terms Index i 1 = ∆ OUTMXMMLKLHO SIAINO PREVCO ERRO VECTOR SPARSE OF ESTIMATION LIKELIHOOD MAXIMUM ROBUST ∆ stefis ifrneoeao,i.e., operator, difference first the is p , . . . , y t = ν − — + .INTRODUCTION 1. 1 Π otistesotrnefcs and effects, short-run the contains ) Πy onerto nlss outstatistics, robust analysis, cointegration h ogKn nvriyo cec n ehooy ogKon Hong Technology, and Science of University Kong Hong The eemnsteln-u equilibriums, long-run the determines ABSTRACT t − 0 1 n covariance and + r i.e., , P eateto lcrncadCmue Engineering, Computer and Electronic of Department p i =1 − 1 ak( rank Γ i ∆ iigZa n ailP Palomar P. Daniel and Zhao Ziping y Π ∆ t Σ − = ) y Matrix . i ORCINMODEL CORRECTION t ust.hk). + = y K < r ε t y t ∈ , t pScheme ip − Π R selec- y and , ε K died t ion- ries and has − t me (1) ers er, he se 1 is is i- i- h , eso euto eg,lso[7) n[8,element-wis [18], In on d imposed [17]). was and sparsity lasso selection feature (e.g., realize reduction to mension way a as interest research outliers. fit of rep- better influence the conservative to dampen distributions a and heavy-tails heavy-tailed as the distribution of Cauchy resentative the log-likeliho of we the [15], on function on based based problem paper, estimation the this formulate estimators In VECM. likelihood for maximum introduced pseudo were the [15], a tim In methods in estimation needed. outliers effective and and heavy-tails simple the view modeling, with of series deal point To empirical 14]. an suc 13, and the studied [12, theoretical Lucas and a from outliers. test 11 both of issues presence Dickey-Fuller 10, the [9, the in Papers test of Johansen properties issues. the these discussed to analy sensitive Cointegration particularly models. is estimated the with in procedures effects estimation verse and analysis theoretical ma the typically in assumption con noise features Gaussian stylized popular observatio These the tradict faulty [8]. like data processed corruption, regula wrongly data and and as political wit well like as associated factors, changes, external often to are due and outliers heavy-tails exhibit variables cnmcter n a eue o ttsia rirg [7 arbitrage statistical for used be can and theory economic ogrnsainr iesre endb h cointegratio the by defined matrix series time stationary long-run tcnb rte as written be can it nr” ahrta h popular the than rather “norm”, ta fipsn h ru priyon sparsity in- group paper, the this imposing In of space. stead parameter keep low-rank naturally the and of relations sa geometry group cointegration the purpose, reduce all simultaneously selection in can variable feature it since the better realize is sparsity to 20], [19, by a elz h aetre ihu h ha factorization ahead the without target same Π the realize can on sparsity group put y uete“iglrt su”i piiain ae nwhic of on regularizer based sparsity optimization, group the in re- to issue” proposed “singularity firstly the also duce is approximation counterpart better smoothed a A has power. which [21] function Geman-type t = ssi ob onertdwt rank with cointegrated be to said is preotmzto 1]hsbcm h ou fmuch of focus the become has [16] optimization Sparse macroeconomic and returns financial that well-known is It αβ β uhln-u qiirusaeotnipidby implied often are equilibriums long-run Such . T o priyprun,ie,apoiaigthe approximating i.e., pursuing, sparsity For . Π Π = n d akcntan o t which it, for constraint rank a add and β αβ nVC oeig sindicated As modeling. VECM in T ( ℓ α β 1 , nr,w s nonconvex a use we -norm, β sattained. is g. ∈ r R and , K β × eequivalently we , r β .Accordingly, ). T y t R ie the gives tory ad- the me the ℓ sis od de ns ]. re 0 i- h h h n e e ] - - Robust estimation is somewhat underrated in financial ap- sparsity and to remove the “singularity issue”, i.e., when us- plications due to the complex computations that are time and ing nonsmooth functions, the variable may get stuck at a non- resource intensive. By considering the robust loss and the smooth point [23], a smooth nonconvexfunction based on the regularizer, a nonconvex optimization problem is finally for- rational (Geman) function in [21] is used given as follows: mulated. The expectation-maximization (EM) is usually used 2 px , x ǫ to solve the robust losses (e.g., [22]). However, EM cannot be ǫ 2ǫ(p+ǫ)2 rat (x)= 2 | |≤ . p |x| 2ǫ +pǫ , x >ǫ applied for our formulation. To deal with it, an efficient algo- ( p+|x| 2(p+ǫ)2 rithm based on the majorization-minimization (MM) method − | | In order to attain feature selection in VECM, i.e., sparse coin- is proposed with estimation performance numerically shown. tegration relations, according to [19, 20], we can impose the row-wise group sparsity on matrix β. In fact, due to Π = 2. ROBUST ESTIMATION OF SPARSE VECM αβT , the row-wise sparsity imposed on β can also be real- N ized by directly estimating Π through imposing the column- Suppose a sample path yt (N >K) and the needed { }t=1 wise group sparsity on Π and constraining its rank. Then we pre-sample values are available, then the VECM (1) can be have the sparsity regularizer of matrix Π which is given by written into a matrix form as follows: K ǫ R (Π)= rat ( πi ) , (4) i=1 p k k2 ∆Y = ΠY−1 + Γ∆X + E, (2) where πi (i = 1,...,K)P denotes the ith column of Π. The grouping effect is achieved by taking the ℓ2-norm of each where Γ = [Γ ,..., Γp , ν], ∆Y = [∆y ,..., ∆yN ], 1 −1 1 group, and then applying the group regularization. Y−1 = [y0,..., yN−1], ∆X = [∆x1,..., ∆xN ] with T ∆x = ∆yT ,..., ∆yT , 1 , and E = [ε ,..., ε ]. t t−1 t−p+1 1 N 2.3. Problem Formulation 2.1. Robustness Pursued by Cauchy Log-likelihood Loss By combining the robust loss function (3) and the sparsity regularizer (4), we attain a penalized maximum likelihood es- The robustness is pursued by a multivariate Cauchy distribu- timation formulation which is specified as follows: tion. Assume the innovations εt’s in (1) follow Cauchy dis- K , tribution, i.e., εt Cauchy (0, Σ) with Σ S , then the minimize F (θ) L (θ)+ ξR (Π) ∼ ∈ ++ θ={Π,Γ,Σ} (5) probability density function is given by subject to rank (Π) r, Σ 0. ≤ 1+K K Γ 1 − 1+ This is a constrained smooth nonconvex problem due to the 2 − 2 T −1 2 gθ (εt)= K [det (Σ)] 1+ εt Σ εt . nonconvexity of the objective function and the constraint set. Γ 1 (νπ) 2 2 The negative log-likelihood loss function of the Cauchy dis- 3. PROBLEM SOLVING VIA THE MM METHOD tribution for N samples from (1) is written as follows: The MM method [24, 25, 26] is a generalization of the well- N 1+K N known EM method. For an optimization problem given by L (θ)= 2 log det (Σ)+ 2 i=1 log 1+ 2 (3) − 1 minimize f (x) subject to x , Σ 2 (∆yi Πyi−1 PΓ∆xi−1) , x ∈ X − − 2 instead of dealing with this problem directly which could be where the constants are dropped and θ , Π (α, β) , Γ, Σ . difficult, the MM-based algorithm solves a series of simpler { } subproblems with surrogate functions that majorize f (x) (0) 2.2. Group Sparsity Pursued by Nonconvex Regularizer over . More specifically, starting from an initial point x , it producesX a sequence x(k) by the following update rule: For a vector x RK , the sparsity level is usually measured by ∈ K (k) (k−1) the ℓ0-“norm” (or sgn( x )) as x = sgn( xi )= k, x arg min f x, x , | | k k0 i=1 | | ∈ x∈X where k is the number of nonzero entries in x. Generally, P (k) applying the ℓ0-“norm” to different groups of variables can where the surrogate majorizing function f x, x satisfies enforce group sparsity in the solutions. The ℓ -“norm” is not 0 f x(k), x(k) = f x(k) , x(k) , convex and not continuous, which makes it computationally ∀ ∈ X f x, x(k) f (x) , x, x(k) , difficult and leads to intractable NP-hard problems. So, ℓ1- ′ ≥ ∀ ∈ X f x(k), x(k); d = f ′ x(k); d , d, s.t. x(k) + d . norm as the tightest convex relaxation is usually used to ap- ∀ ∈ X proximate the ℓ0-“norm” in practice, which is easier for opti- The objective function value is monotonically nonincreasing mization and still favors sparse solutions. at each iteration. In order to use the MM method, the key step Tighter nonconvex sparsity-inducing functions can lead to is to find a majorizing function to make the subproblem easy better performance [16]. In this paper, to better pursue the to solve, which will be discussed in the following subsections. 0.7 3.1. Majorization for the Robust Loss Function L (θ) fractional function fractional function (ǫ-smoothed) majorizing function Instead of using the EM method [22], in this paper, we derive 0.6 majorizing point the majorizing function for L (θ) from an MM perspective. 0.5 Lemma 1. At any point x(k) R, log (1 + x) log 1+ x(k) 1 (k) ∈ ≤ (k) + k x x , with the equality attained at x = x . 0.4 1+x( ) − Based on Lemma 1, at the iterate θ(k), the loss function 0.3 L (θ) can be majorized by the following function: rational function
L θ, θ(k) 0.2 1 ≃ 1 2 N 1 − 0.1 log det (Σ)+ Σ 2 ∆Y¯ ΠY¯ −1 Γ∆X¯ , 2 2 − − F