Arixv manuscript No. (will be inserted by the editor)

Higher-order Moment Optimization via The Difference-of-Convex Programming and Sums-of-Squares

Yi-Shuai Niu · Ya-Juan Wang

Received: date / Accepted: date

Abstract We are interested in developing a Difference-of-Convex (DC) pro- gramming approach based on Difference-of-Convex-Sums-of-Squares (DC-SOS) decomposition techniques for high-order moment (Mean--- ) portfolio optimization model. This problem can be formulated as a nonconvex quartic multivariate polynomial optimization, then a DC program- ming formulation based on the recently developed DC-SOS decomposition is investigated. We can use a well-known DC algorithm, namely DCA, for its numerical solution. Moreover, an acceleration technique for DCA, namely Boosted-DCA (BDCA), based on an inexact line search (Armijo-type line search) to accelerate the convergence of DCA for smooth and nonsmooth DC program with convex constraints is proposed. This technique is applied to DCA based on DC-SOS decomposition, and DCA based on universal DC decompo- sition. Numerical simulations of DCA and Boosted-DCA on synthetic and real datasets are reported. Comparisons with some non-dc programming based op- timization solvers (KNITRO, FILTERSD, IPOPT and MATLAB fmincon) demonstrate that our Boosted-DC algorithms can achieve same numerical re- sults with good performance comparable to these efficient methods on solving the high-order moment portfolio optimization model.

Keywords High-order moment portfolio optimization · Difference-of-Convex programming · Difference-of-Convex-Sums-of-Squares · Boosted-DCA

Funding: The authors are supported by the National Natural Science Foundation of China (Grant 11601327).

Yi-Shuai Niu School of Mathematical Sciences, and SJTU-Paristech Elite Institute of Technology, Shang- hai Jiao Tong University, Shanghai, China E-mail: [email protected] Ya-Juan Wang arXiv:1906.01509v2 [math.OC] 25 Apr 2020 College of Business, Shanghai University of Finance and Economics, Shanghai, China 2 Y.S. Niu et al.

Mathematics Subject Classification (2010) 91G10 · 90C06 · 90C29 · 90C30 · 90C90

1 Introduction

The concepts of portfolio optimization and diversification are fundamental to understand financial market and financial decision making. The major break- through came in [30] with the introduction of the mean-variance portfolio selection model (MV model) developed by (Nobel Laureate in Economics in 1990). This model provided an answer to the fundamental question: How should an investor allocate funds among the possible invest- ment choices? Markowitz firstly quantified return and risk of a security, using the statistical measures of its and variance. Then, he sug- gested that investors should consider return and risk together, and determine the allocation of funds based on their return-risk trade-off. Before Markowitz’s seminal article, the finance literature had treated the interplay between return and risk in an ad hoc fashion. Based on MV model, the investors are going to find among the infinite number of portfolios that achieve a particular return objective with the smallest variance. The portfolio theory had a major impact on the academic research and financial industry, often referred to as “the first revolution on Wall Street”. More discussions about MV model can be found in the review article [49]. For a long time, there is a confusion that the application of the MV model requires Gaussian return distribution. This is not true! For this issue, Markovitz has declared in [31] that “the persistence of the Great Confusion - that MV analysis is applicable in practice only when return distributions are Gaussian or functions quadratic - is as if geography textbooks of 1550 still described the Earth as flat.” In fact, the normality of asset returns is not necessary in MV model, and has been widely rejected in empirical tests. Many return distributions in real market exhibit fat tails and asymmetry which will significantly affect portfolio choices and asset pricing [4,22]. E.g., [18] showed that in the presence of positive skewness, investors may be willing to accept a negative expected return. There are some rich literature that attempted to model higher-order moments in the pricing of derivative securities, starting from the classic models of [33] (jump-diffusions) and [20] (stochastic volatility), see [7] for more related works. Therefore, many scholars suggested introduc- ing high-order moments such as skewness (3rd-order moment) and kurtosis (4th-order moment) into portfolio optimization model. The first work attempted to extend the MV model to higher-order mo- ments was proposed in [21]. Some noteworthy works such as [4] and [27] were mainly focused on the mean-variance-skewness model (MVS model). Later, more extensions of high-order moment portfolio models adapted kurtosis were investigated by several authors (e.g., [10], [11], [29], and [17] etc). From a mathematical point of view, a higher-order moment portfolio model can be viewed as an approximation of the general expected utility function, in which Portfolio Optimization based on DC programming and SOS 3 people consider the Taylor series expansion of the utility function and drop the higher-order terms from the expansion. Therefore, the classical MV model is in fact a rough approximation of the general utility function, and the higher-order moment model will be more accurate. The reader is referred to the excellent survey article on the 60 years’ development in portfolio optimization [24] for more information about different portfolio selection models. Despite the advantages of the higher-order moment portfolio models, in practice, these models are however seldom used. There are many reasons, typically, practitioners rely upon a utility function based on mean-variance approximation, which is trusted to perform well enough [27]. Moreover, due to the limitations of computing power in the 20th century, constructing and solv- ing a higher-order moment portfolio model is very difficult, e.g., a model with quartic polynomial approximation and several hundreds of assets is already intractable. Fortunately, with the rapid development of CPU and GPU hard- ware in the early of 21th century, as well as the adequate computer memory, the computing power available today can handle some higher-order moment portfolios (at least portfolios with moderate size). Therefore, it is the right time to investigate the numerical solutions of the higher-order moment port- folio optimization. In this paper, we will focus on a high-order moment portfolio model which takes mean, variance, skewness, and kurtosis into consideration, namely MVSK model. It consists of maximizing the mean and skewness of the portfolio while minimizing the variance and kurtosis. This is a multi-objective optimization problem which can be studied as a weighted nonconvex quartic polynomial optimization problem (see e.g., [1,46]). Due to the NP-hardness of the general nonconvex polynomial optimization, we cannot expect to construct a poly- nomial time global optimization algorithm to solve it. The existing meth- ods in the related works include: Stochastic algorithms (Differential Evolu- tion and Stochastic Differential Equation) for high-order moment portfolio optimization are proposed in [29]; a DC (Difference-of-Convex) programming approach for MVSK model using the universal DC decomposition of polyno- mial functions over compact convex sets and the classical DCA algorithm is investigated in [46]. Numerical comparisons of DCA with other nonlinear opti- mization methods, e.g., Lasserre’s hierarchy (Gloptipoly), sequential method (MATLAB fmincon), trust-region method (MATLAB fmincon), and branch-and-bound (LINGO) are also reported in that paper. Recently, [5] used machine learning approaches (e.g., regularization and cross- validation) in high-order moment portfolio optimization. [9] proposed a new class of non-negative symmetric tensors to reformulate a kurtosis minimiza- tion portfolio model as a multi-linear form optimization model, which can be solved by the MBI or BCD method. Our contributions include: (i) Construct of a new DC decomposition for the MVSK model based on the recently developed Difference-of-Convex-Sums- of-Squares (DC-SOS) decomposition technique proposed in [37]. This decom- position is expected to produce a better DC decomposition than the universal one. (ii) Investigate a Boosted-DCA, namely BDCA (inspired by [2]), for con- 4 Y.S. Niu et al. vex constrained DC programs (both smooth or nonsmooth cases) based on an Armijo-type line search in order to accelerate the convergence of DCA ap- plying to the MVSK model. More general portfolio model with higher-order moments (with order higher than 4) can be derived accordingly and solved by our DC programming approaches. (iii) Develop a software package on MAT- LAB for the proposed DC algorithms (DCA, BDCA, UDCA and UBDCA). We test their numerical performances, and compare them with nonlinear optimiza- tion solvers KNITRO, FILTERSD, IPOPT and MATLAB fmincon. Numerical simulations using synthetic and real datasets illustrate good performances of our DC programming approaches (particularly Boosted-DC algorithms) for solving MVSK models. The paper is organized as follows: Section2 presents the MVSK model. The DC-SOS decomposition technique and the DC programming formulation for the MVSK model based on DC-SOS decomposition are introduced in Section 3. The DC programming algorithm (DCA) for finding the KKT solution of the MVSK model is proposed in Section4. Then, we focus on establishing Boosted- DCA for smooth and non-smooth DC programs within convex constraints in Section5. Some theoretical results are proved in this section. Applying Boosted-DCA to DC-SOS decompositions and to universal DC decompositions are discussed respectively in Section6 and7. Numerical results comparing different DC algorithms and some classical nonlinear optimization approaches for both synthetic and real datasets are reported in Section8. Conclusions and topics worth studying in the future are presented in the final section.

2 High-order Moment Portfolio Optimization Model

Consider a portfolio with n assets. In this section, we investigate a high-order moment portfolio optimization model consists of the first 4th-order moments (Mean-Variance-Skewness-Kurtosis, namely MVSK). The portfolio model in- volving moments beyond 4th-order can be defined in a similar way.

2.1 Portfolio Inputs

The inputs of the MVSK model consist of the first four order moments and co-moments of the portfolio returns estimated by sample moments and co- moments defined as follows: Let E denote the expectation operator; let n be the number of assets and T be the number of periods; let Ri,t be the return rate of the asset i ∈ {1, . . . , n} in the period t ∈ {1,...,T }. The return rate of the asset i is denoted by Ri, n and R = (Ri) ∈ R stands for the return rate vector. We have n 1. Mean (1st-order moment): denoted by µ = (µi) ∈ R whose i-th element µi is defined by T 1 X µ := (R ) ≈ R . (1) i E i T i,t t=1 Portfolio Optimization based on DC programming and SOS 5

2. Variance and (second central moment and co-moment): de- n2 noted by Σ = (σi,j) ∈ R where σi,j is defined by T 1 X σ := [(R − µ )(R − µ )] ≈ (R − µ )(R − µ ). (2) ij E i i j j T − 1 i,t i j,t j t=1 3. Skewness and Co-skewness (third central moment and co-moment): de- n3 noted by S = (Si,j,k) ∈ R where Si,j,k is defined by

Si,j,k := E[(Ri − µi)(Rj − µj)(Rk − µk)] T 1 X ≈ (R − µ )(R − µ )(R − µ ). (3) T it i jt j kt k t=1 4. Kurtosis and Co-kurtosis (fourth central moment and co-moment): de- n4 noted by K = (Ki,j,k,l) ∈ R where Ki,j,k,l is defined by

Ki,j,k,l := E[(Ri − µi)(Rj − µj)(Rk − µk)(Rl − µl)] T 1 X ≈ (R − µ )(R − µ )(R − µ )(R − µ ) (4) T it i jt j kt k lt l t=1 These inputs can be written as tensors and easily computed from data using the formulas (1), (2), (3) and (4). Note that these tensors have perfect symmetry, e.g., Σ is a real symmetric positive semi-definite matrix, and the values of Si,j,k (resp. Ki,j,k,l) with all permutations of the index (i, j, k) (resp. n+1 n+2 (i, j, k, l)) are equals. Therefore, we only need to compute 2 , 3 and n+3 4 independent elements respectively. When dealing with these high-order moments and co-moments, it is con- venient to “slice” these tensors and create a big matrix from the slices. In our previous work [46], we have discussed using Kronecker product ⊗ to rewrite co-skewness (resp. co-kurtosis) tensor to n × n2 (resp. n × n3) matrix by the formulations: T T Sˆ = E[(R − µ)(R − µ) ⊗(R − µ) ], T T T Kˆ = E[(R − µ)(R − µ) ⊗(R − µ) ⊗(R − µ) ]. Then converting Sˆ and Kˆ into sparse matrices by keeping only the independent elements based on symmetry. This computing technique is very useful when dealing with large-scale cases with large n.

2.2 Mean-Variance-Skewness-Kurtosis Portfolio Model

Let us denote the decision variable of the portfolio (called portfolio weights) n as x ∈ R . We assume that no sales or leverage are allowed, i.e., x ≥ 0 and sums up to one, thus x is restricted in the standard (n − 1)-simplex n T Ω := {x ∈ R+ : e x = 1} where e denotes the vector of ones. The first four order portfolio moments are functions of the portfolio decision variable x defined as follows: 6 Y.S. Niu et al.

n T 1. Mean (1st-order portfolio moment): m1 : R → R, x 7→ µ x. n T 2. Variance (2nd-order portfolio moment): m2 : R → R, x 7→ x Σx. 3. Skewness (3rd-order portfolio moment):

n n X m3 : R → R, x 7→ Si,j,k xixjxk. i,j,k=1

4. Kurtosis (4th-order portfolio moment):

n n X m4 : R → R, x 7→ Ki,j,k,l xixjxkxl. i,j,k,l=1 A rational investor’s preference is the highest odd moments, as this would decrease extreme values on the side of losses and increase them on the side of gains. As far as even moments, the wider the tails of the returns distri- bution, the higher the even moments will be. Therefore, the investor prefers low even moments which implies decreased dispersion of the payoffs and less uncertainty of returns [48]. Based on these observations, the MVSK portfolio optimization model consists of maximizing the expected return and skewness while minimizing the variance and kurtosis [12, 41]. n 4 Let us denote F : R → R defined by: T F (x) := (−m1(x), m2(x), −m3(x), m4(x)) . The MVSK model is described as a multi-objective optimization problem as:

min{F (x): x ∈ Ω}, which can be further investigated as a weighted single-objective optimization:

min f(x) = c T F (x) (MVSK) s. t. x ∈ Ω where the parameter c denotes the investor’s preference verifying c ≥ 0. For example, the risk-seeking investor will have more weights on c1 and c3, while the risk-aversing investor will have more weights on c2 and c4.

3 DC formulation for (MVSK) model based on sums-of-squares

The (MVSK) model as a nonconvex quartic polynomial optimization prob- lem can be formulated as a DC (Difference-of-Convex) programming problem, since any polynomial function is C∞ which is indeed a DC function. However, constructing a DC decomposition for a high-order polynomial, i.e., rewriting the original polynomial to the difference of two convex polynomials, is not trivial for polynomial of degree higher than 2. In [46], we discussed the con- struction of a DC decomposition for the objective function f using an universal ρ 2 ρ 2  DC decomposition technique in form of f(x) = 2 kxk − 2 kxk − f(x) with a Portfolio Optimization based on DC programming and SOS 7

large enough parameter ρ. Specifically, the parameter ρ must be greater than an upper bound of the spectral radius of the Hessian matrix of f over Ω. The quality of this kind of decomposition depends on the parameter ρ, while a small ρ is always preferred than a large one. The reason is that when ρ is too ρ 2 ρ 2 large, the DC components 2 kxk and 2 kxk − f(x) are more convex. We have shown in [37, 39] that, a better DC decomposition, under the framework of DCA, must be undominated (i.e., the DC components must be less convex as possible). In this paper, we will use a new DC decomposition technique without esti- mating the parameter ρ, namely DC-SOS (Difference-of-Convex-SOS) decom- position, introduced in [37]. We illustrate some main results about DC-SOS decomposition for polynomials in the next subsection.

3.1 DC-SOS decomposition for polynomials

The basic idea of DC-SOS decomposition is to represent any polynomial as dif- ference of two convex and sums-of-squares (namely, CSOS) polynomials, then we can prove that any polynomial can be rewritten as DC-SOS in polynomial time by solving a semi-definite programming problem (SDP). The minimal degree for DC components equals to the degree of the polynomial if it is even, or equals to the degree of the polynomial plus one if it is odd. n Let x ∈ R , and R[x] be the vector space of real valued polynomials with variable x and coefficients in R. The set Rd[x] stands for the vector subspace of R[x] with polynomials of degree ≤ d. It is well-known that R[x] is an infinitely dimensional space and the subspace Rd[x] is a finitely dimensional space with n+d dim Rd[x] = n . Definition 1 (SOS polynomial) A polynomial p is called Sums-Of-Squares Pm 2 (SOS), if there exist polynomials q1, . . . , qm such that p = i=1 qi . The set of all SOS polynomials in R[x] is denoted by SOSn; and the subset of SOSn in Rd[x] is denoted by SOSn,d. We can extend the definition of the SOS polynomial with convexity to get the definition of the CSOS polynomial. Analogously, the set of all CSOS poly- nomials in R[x] is denoted by CSOSn, and the subset of CSOSn in Rd[x] is denoted by CSOSn,d. For the set of SOS and CSOS polynomials, we have

Proposition 1 (See [37]) Both SOSn and CSOSn (resp. SOSn,2d and CSOSn,2d) 1 are proper cones in R[x] (resp. R2d[x]). Now, consider the difference of two SOS or CSOS polynomials, we introduce the definition of D-SOS and DC-SOS polynomials as follows: Definition 2 (D-SOS and DC-SOS polynomial) A polynomial p is called difference-of-sums-of-squares (D-SOS) (resp. difference-of-convex-sums-of-squares (DC-SOS)) if there exist SOS (resp. CSOS) polynomials s1 and s2 such that

p = s1 − s2. 1 A proper cone is a full-dimensional (solid), closed, pointed (without line) convex cone. 8 Y.S. Niu et al.

• The components s1 and s2 are D-SOS (resp. DC-SOS) components of p; • The set of all D-SOS (DC-SOS) polynomials in R[x] is denoted by D-SOSn (resp. DC-SOSn); • The subset of D-SOSn (resp. DC-SOSn) in Rd[x] is denoted by D-SOSn,d (resp. DC-SOSn,d). Using Proposition1, we can prove that

Proposition 2 (See [37]) Both D-SOSn and DC-SOSn (resp. D-SOSn,d and DC-SOSn,d) are vector subspaces of R[x] (resp. Rd[x]). Moreover, the next theorem shows the equivalence among the three vector spaces R[x], D-SOSn and DC-SOSn, as well as the minimal degree for each D-SOS or DC-SOS component. Theorem 1 (See [37])

• R[x] = D-SOSn = DC-SOSn • For any p ∈ R[x], there exist D-SOS (resp. DC-SOS) components s1 and deg(p) s2 such that max{deg(s1), deg(s2)} = 2d 2 e. The complexity for constructing D-SOS and DC-SOS decompositions is stated as follows:

Theorem 2 (See [37]) Any polynomial p ∈ Rd[x] can be rewritten as DC- SOS (resp. D-SOS) in polynomial time by solving an SDP. The proofs of Theorems1 and2, and Propositions1 and2 can be found in our paper [37]. Note that using SDP is only for theoretical proof of polynomial- time constructibility of D-SOS and DC-SOS decompositions with minimal degree. In practice, we never solve SDPs since the associate SDPs for high- order polynomials are often very large-scale which are inefficient to be solved numerically, despite that there exists polynomial-time interior point method. In fact, for DC-SOS decomposition, we suggest using practical algorithms (e.g., Parity DC-SOS decomposition or Minimal degree DC-SOS decomposition) established in [37] which are based on the next three elementary cases: B For xixj: we can use either 1 1 x x = (x + x )2 − (x − x )2, (5) i j 4 i j 4 i j | {z } | {z } CSOS CSOS or 1 1 x x = (x + x )2 − (x2 + x2) . (6) i j 2 i j 2 i j | {z } | {z } CSOS CSOS

A single variable xi is a special case of xixj with xj = 1. 2k B For xi , k ∈ N: a trivial DC-SOS decomposition is

2k 2k xi = xi − 0 . (7) |{z} |{z} CSOS CSOS Portfolio Optimization based on DC programming and SOS 9

2 B For p × q with (p, q) ∈ DC-SOS : Let p1 − p2 and q1 − q2 be DC-SOS de- compositions of p and q, then a DC-SOS decomposition of p × q is 1 1 p × q = [(p + q )2 + (p + q )2] − [(p + q )2 + (p + q )2]. (8) 2 1 1 2 2 2 1 2 2 1 In the (MVSK) model, we are going to find DC decompositions based on DC-SOS for portfolio moments mi, i = 1,..., 4. The first two moments m1 and m2 are already convex (linear for m1 and quadratic convex for m2). So we will only focus on m3 and m4. In the next two subsections, we will use the equations (5), (6), (7) and (8) to produce DC formulations for m3 and m4.

3.2 DC decomposition for m3

By the symmetry of the co-skewness tensor S, we can rewrite m3 as

n X m3(x) = Si,j,k xixjxk i,j,k=1 n X 3 = Si,i,i xi three common indices i=1 n 3 X X + S x2x two common indices 1 i,i,k i k i=1 k6=i X + 3! Si,j,k xixjxk no common index 1≤i

Let N = {1, . . . , n}, P = {(i, k): i ∈ N , k 6= i}, and Q = {(i, j, k) : 1 ≤ i < n j < k ≤ n}, with sizes | N | = n, | P | = n(n − 1), and | Q | = 3 , then the expression of m3 is simplified as:

X 3 X 2 X m3(x) = Si,i,i xi + 3 Si,i,k xi xk + 6 Si,j,k xixjxk. i∈N (i,k)∈P (i,j,k)∈Q

3 2 There are three types of monomials xi , xi xk and xixjxk in m3 whose DC decompositions are given by: 3 3 n B For xi , ∀i ∈ N : since xi is locally convex on R+ ⊃ Ω, a DC decomposition 3 n for xi on R+ is trivial as 3 xi = gi(x) − hi(x) with 3 gi(x) = xi ; hi(x) = 0 (9) n being both convex functions on R+. Their gradients are 2 n ∇gi(x) = 3xi ei; ∇hi(x) = 0R . (10) n with ei being the i-th unit vector of R . 10 Y.S. Niu et al.

2 B For xi xk, ∀(i, k) ∈ P: a DC-SOS formulation is 1 x2x = (x2 − 0) (x + 1)2 − (x − 1)2 i k 4 i k k 1 h 2 i 1 h 2i = x2 + (x + 1)2 + (x − 1)4 − (x + 1)4 + x2 + (x − 1)2 8 i k k 8 k i k =gi,k(x) − hi,k(x) where 1 h 2 i g (x) = x2 + (x + 1)2 + (x − 1)4 , (11) i,k 8 i k k

1 h 2i h (x) = (x + 1)4 + x2 + (x − 1)2 (12) i,k 8 k i k n are both CSOS on R . Their gradients are 1 1 ∇g (x) = x x 2 + 2x + x 2 + 1 e + (2x 3+x 2 x +6x +x 2)e , (13) i,k 2 i k k i i 2 k i k k i k 1 1 ∇h (x) = x x 2 − 2x + x 2 + 1 e + (2x 3+x 2 x +6x −x 2)e . (14) i,k 2 i k k i i 2 k i k k i k

B For xixjxk, ∀(i, j, k) ∈ Q: a DC-SOS decomposition is given in a similar fashion as

xixjxk =(xixj)(xk) 1 = (x + x )2 − (x − x )2 (x + 1)2 − (x − 1)2 16 i j i j k k 1 h 2 2i = (x + x )2 + (x + 1)2 + (x − x )2 + (x − 1)2 32 i j k i j k 1 h 2 2i − (x + x )2 + (x − 1)2 + (x − x )2 + (x + 1)2 32 i j k i j k =gi,j,k(x) − hi,j,k(x) where

1 h 2 2i g (x) = (x + x )2 + (x + 1)2 + (x − x )2 + (x − 1)2 , (15) i,j,k 32 i j k i j k

1 h 2 2i h (x) = (x + x )2 + (x − 1)2 + (x − x )2 + (x + 1)2 (16) i,j,k 32 i j k i j k n are both CSOS on R . Their gradients are 1 ∇g (x) = (x x 2 + 2x x + 3x x 2 + x 3 + x )e i,j,k 4 i k j k i j i i i 1 + (x x 2 + 2x x + x 3 + 3x 2 x + x )e 4 j k i k j i j j j 1 + (x 3 + x 2 x + x 2 x + 3x + 2x x )e , (17) 4 k j k i k k i j k Portfolio Optimization based on DC programming and SOS 11

1 ∇h (x) = (x x 2 − 2x x + 3x x 2 + x 3 + x )e i,j,k 4 i k j k i j i i i 1 + (x x 2 − 2x x + x 3 + 3x 2 x + x )e 4 j k i k j i j j j 1 + (x 3 + x 2 x + x 2 x + 3x − 2x x )e . (18) 4 k j k i k k i j k

It follows a DC decomposition of m3 as m3(x) = gm3 (x) − hm3 (x) where X X X gm3 (x) = Si,i,i gi(x) + 3 Si,i,k gi,k(x) − 3 Si,i,k hi,k(x) i∈I+(S) (i,j)∈J +(S) (i,j)∈J -(S) X X + 6 Si,j,k gi,j,k(x) − 6 Si,j,k hi,j,k(x), (19) (i,j,k)∈K+(S) (i,j,k)∈K-(S)

X X X hm3 (x) = − Si,i,i gi(x) + 3 Si,i,k hi,k(x) − 3 Si,i,k gi,k(x) i∈I-(S) (i,j)∈J +(S) (i,j)∈J -(S) X X + 6 Si,j,k hi,j,k(x) − 6 Si,j,k gi,j,k(x). (20) (i,j,k)∈K+(S) (i,j,k)∈K-(S)

n + being both convex functions on R+, in which the index sets I (S) := {i ∈ - + N : Si,i,i > 0}; I (S) := {i ∈ N : Si,i,i < 0}; J (S) := {(i, k) ∈ P : Si,i,k > - + 0}; J (S) := {(i, k) ∈ P : Si,i,k < 0}; K (S) := {(i, j, k) ∈ Q : Si,j,k > 0}; and - K (S) := {(i, j, k) ∈ Q : Si,j,k < 0}; the functions gi, gi,k, hi,k, gi,j,k and hi,j,k are given respectively in (9), (11), (12), (15) and (16); and their gradients are computed accordingly.

3.3 DC decomposition for m4

The DC decomposition for m4 is constructed in a similar way as in m3. We firstly rewrite m4 as

X 4 X 3 X 2 2 m4(x) = Ki,i,i,i xi + 4 Ki,i,i,k xi xk + 6 Ki,i,k,k xi xk i∈N (i,k)∈P (i,k)∈Pb X 2 X + 12 Ki,i,j,k xi xjxk + 24 Ki,j,k,l xixjxkxl. (i,j,k)∈Qb (i,j,k,l)∈R where N = {1, . . . , n}, P = {(i, k): i ∈ N , k 6= i}, Pb = {(i, k) ∈ P : k > i}, Qb = {(i, j, k): i ∈ N , (j < k) 6= i}, and R = {(i, j, k, l) : 1 ≤ i < j < k < l ≤ n(n−1) n}. The sizes of these sets are | N | = n, | P | = n(n − 1), | Pb | = 2 , | Qb | = n−1 n n 2 , and | R | = 4 . 4 3 2 2 2 There are 5 types of monomials xi , xi xk, xi xk, xi xjxk and xixjxkxl whose DC decompositions are established as follows: 4 4 4 B For xi , ∀i ∈ N : xi = xi − 0 = gei(x) − ehi(x) where 4 gei(x) = xi ; ehi(x) = 0, (21) 12 Y.S. Niu et al.

n are convex functions on R , and their gradients are 3 ∇g (x) = 4x e ; ∇h (x) = 0 n . (22) ei i i ei R 2 2 2 2 B For xi xk, ∀(i, k) ∈ Pb: xi xk = gbi,k(x) − bhi,k(x) where

1 2 2 2 1 4 4 gi,k(x) = (x + x ) ; bhi,k(x) = (x + x ), (23) b 2 i k 2 i k n are convex functions on R , and their gradients are 2 2 ∇gbi,k(x) = 2 xk + xi (xiei + xkek); (24) 3 3  ∇bhi,k(x) = 2 xi ei + xkek . (25) 3 3 B For xi xk, ∀(i, k) ∈ P: xi xk = gei,k(x) − ehi,k(x) where

1 h 2 i g (x) = x2 + (x + x )2 + (x − x )4 ; (26) ei,k 8 i i k i k

1 h 4 2 22i ehi,k(x) = (xi + xk) + x + (xi − xk) , (27) 8 i n are convex functions on R , and their gradients are 1 1 ∇g (x) = x 7x 2 + 3x x + 5x 2 e + (2x 3 + 7x 2 x + x 3)e ; (28) ei,k 2 i k i k i i 2 k i k i k

1 2 2 1 3 2 3 ∇ehi,k(x) = xi 7xk − 3xi xk + 5xi ei + (2xk + 7xi xk − xi )ek. (29) 2 2 2 2 B For xi xjxk, ∀(i, j, k) ∈ Qb: xi xjxk = gei,j,k(x) − ehi,j,k(x) where

1 h 2 i g (x) = x2 + (x + x )2 + (x − x )4 ; (30) ei,j,k 8 i j k j k

1 h 4 2 22i ehi,j,k(x) = (xj + xk) + x + (xj − xk) (31) 8 i n are convex functions on R , and their gradients are 1 ∇g (x) = x x 2 + 2x x + x 2 + x 2 e ei,j,k 2 i k j k j i i 1 + (6x x 2 + x 2 x + 2x 3 + x 2 x )e 2 j k i k j i j j 1 + (2x 3 + 6x 2 x + x 2 x + x 2 x )e , (32) 2 k j k i k i j k

1 2 2 2 ∇ehi,j,k(x) = xi xk − 2xj xk + xj + xi ei 2 1 + (6x x 2 − x 2 x + 2x 3 + x 2 x )e 2 j k i k j i j j 1 + (2x 3 + 6x 2 x + x 2 x − x 2 x )e . (33) 2 k j k i k i j k Portfolio Optimization based on DC programming and SOS 13

B For xixjxkxl, ∀(i, j, k, l) ∈ R: xixjxkxl = gi,j,k,l(x) − hi,j,k,l(x) where

1 2 2 g (x) = [(x + x )2 + (x + x )2 + (x − x )2 + (x − x )2 ]; (34) i,j,k,l 32 i j k l i j k l

1 2 2 h (x) = [(x + x )2 + (x − x )2 +(x − x )2 + (x + x )2 ], (35) i,j,k,l 32 i j k l i j k l n are convex functions on R , and their gradients are 1 ∇g (x) = (x x 2 + 2x x x + x x 2 + 3x x 2 + x 3)e i,j,k,l 4 i l j k l i k i j i i 1 + (x x 2 + 2x x x + x x 2 + x 3 + 3x 2 x )e 4 j l i k l j k j i j j 1 + (3x x 2 + 2x x x + x 3 + x 2 x + x 2 x )e 4 k l i j l k j k i k k 1 + (x 3 + 3x 2 x + x 2 x + x 2 x + 2x x x )e ; (36) 4 l k l j l i l i j k l

1 ∇h (x) = (x x 2 − 2x x x + x x 2 + 3x x 2 + x 3)e i,j,k,l 4 i l j k l i k i j i i 1 + (x x 2 − 2x x x + x x 2 + x 3 + 3x 2 x )e 4 j l i k l j k j i j j 1 + (3x x 2 − 2x x x + x 3 + x 2 x + x 2 x )e 4 k l i j l k j k i k k 1 + (x 3 + 3x 2 x + x 2 x + x 2 x − 2x x x )e . (37) 4 l k l j l i l i j k l

Therefore, a DC decomposition of m4 is given by

m4(x) = gm4 (x) − hm4 (x)

n where gm4 and hm4 are convex functions on R defined by: X g (x) = K g (x) m4 i,i,i,i ei i∈I+(K) X X + 4 Ki,i,i,k gei,k(x) − 4 Ki,i,i,k ehi,k(x) (i,k)∈J +(K) (i,k)∈J -(K) X X + 6 Ki,i,k,k gbi,k(x) − 6 Ki,i,k,k bhi,k(x) (i,k)∈Jb+(K) (i,k)∈Jb-(K) X X + 12 Ki,i,j,k gei,j,k(x) − 12 Ki,i,j,k ehi,j,k(x) (i,j,k)∈Kb+(K) (i,j,k)∈Kb-(K) X X + 24 Ki,j,k,l gi,j,k,l(x) − 24 Ki,j,k,l hi,j,k,l(x) (i,j,k,l)∈L+(K) (i,j,k,l)∈L-(K) (38) 14 Y.S. Niu et al.

X h (x) = − K g (x) m4 i,i,i,i ei i∈I-(K) X X + 4 Ki,i,i,k ehi,k(x) − 4 Ki,i,i,k gei,k(x) (i,k)∈J +(K) (i,k)∈J -(K) X X + 6 Ki,i,k,k bhi,k(x) − 6 Ki,i,k,k gbi,k(x) (i,k)∈Jb+(K) (i,k)∈Jb-(K) X X + 12 Ki,i,j,k ehi,j,k(x) − 12 Ki,i,j,k gei,j,k(x) (i,j,k)∈Kb+(K) (i,j,k)∈Kb-(K) X X + 24 Ki,j,k,l hi,j,k,l(x) − 24 Ki,j,k,l gi,j,k,l(x) (i,j,k,l)∈L+(K) (i,j,k,l)∈L-(K) (39)

+ - in which the index sets I (K) := {i ∈ N : Ki,i,i,i > 0}; I (K) := {i ∈ N : + - Ki,i,i,i < 0}; J (K) := {(i, k) ∈ P : Ki,i,i,k > 0}; J (K) := {(i, k) ∈ P : + - Ki,i,i,k < 0}; Jb (K) := {(i, k) ∈ Pb : Ki,i,k,k > 0}; Jb (K) := {(i, k) ∈ Pb : + - Ki,i,k,k < 0}; Kb (K) := {(i, j, k) ∈ Qb : Ki,i,j,k > 0}; Kb (K) := {(i, j, k) ∈ + - Qb : Ki,i,j,k < 0}; L (K) := {(i, j, k, l) ∈ R : Ki,j,k,l > 0}; and L (K) := {(i, j, k, l) ∈ R : Ki,j,k,l < 0}; the functions gei, gei,k, ehi,k, gbi,k, bhi,k, gei,j,k, ehi,j,k, gi,j,k,l and hi,j,k,l are defined in (21), (26), (27), (23), (30), (31), (34) and (35) re- spectively; and their gradients are computed accordingly.

3.4 DC formulation of (MVSK) model

Based on discussions in previous subsections, a DC decomposition for the polynomial objective function f of (MVSK) model is given by

f(x) = − c1m1(x) + c2m2(x) − c3m3(x) + c4m4(x)

= − c1m1(x) + c2m2(x) − c3(gm3 (x) − hm3 (x)) + c4(gm4 (x) − hm4 (x)) =G(x) − H(x)

where

G(x) = −c1m1(x) + c2m2(x) + c3hm3 (x) + c4gm4 (x), (40)

H(x) = c3gm3 (x) + c4hm4 (x). (41)

are both convex quartic polynomials on Ω where hm3 (x), gm4 (x), gm3 (x) and

hm4 (x) are defined in (20), (38), (19) and (39) respectively. The (MVSK) model is then formulated as a DC program as:

min{G(x) − H(x): x ∈ Ω}. (DCP)

Note that for portfolio optimization involving higher-order moments whose order is greater than 4, its DC programming formulation can be derived by using DC-SOS decomposition technique presented in subsection 3.1. Portfolio Optimization based on DC programming and SOS 15

4 DC algorithm for problem (DCP)

In this section, we will discuss how to use DCA (an efficient DC algorithm) for finding the KKT points of (DCP). Firstly, we will give a short introduction about the DC program and DCA, then we can apply DCA to solve (DCP).

4.1 DC program and DCA

n Let us denote Γ0(R ), the set of l.s.c. (lower semi-continuous) and proper n convex functions defined on R to (−∞, +∞] under the convention of (+∞)− (+∞) = +∞. The standard DC program is defined as

n α = min{f(x) := g(x) − h(x): x ∈ R }, n where g and h are both Γ0(R ) functions, and α denotes its optimal value n which is assumed to be finite (i.e., ∅= 6 dom g ⊂ dom h). A point x∗ ∈ R is called a critical point of the standard DC program if ∂g(x∗) ∩ ∂h(x∗) 6= ∅, where ∂h(x∗) denotes the subdifferential of h at x∗, defined by, see e.g. [47],

∗ n ∗ ∗ n ∂h(x ) := {y ∈ R : h(x) ≥ h(x ) + hx − x , yi, ∀x ∈ R }. The subdifferential generalizes the derivative in the sense that ∂h(x∗) reduces to the singleton {∇h(x∗)} if h is differentiable at x∗. Therefore, if g and h are n both differentiable convex functions on R , then the critical point x∗ verifies the classical first order optimality condition for unconstrained optimization problem as ∇f(x∗) = ∇g(x∗) − ∇h(x∗) = 0. Consider a convex constrained DC program:

min{g(x) − h(x): x ∈ C}

n where C ⊂ R is a nonempty closed convex set. This problem can be stan- dardized into a standard DC program as

n min{(g + χC)(x) − h(x): x ∈ R } by introducing the indicator function of C defined by ( 0 , if x ∈ C, χC(x) = +∞ , otherwise.

n Clearly, χC belongs to Γ0(R ). An efficient DC Algorithm for solving standard DC program, called DCA, was first introduced by Pham Dinh Tao in 1985 as an extension of the sub- gradient method, and has been extensively developed by Le Thi Hoai An and Pham Dinh Tao since 1994. DCA consists of solving the standard DC program by a sequence of convex ones as

k+1 k n x ∈ argmin{g(x) − hx, y i : x ∈ R } 16 Y.S. Niu et al.

with yk ∈ ∂h(xk). This convex program is in fact derived from convex overestimation of the DC function f at the current iteration point xk, denoted f k, which is con- structed by linearizing the DC component h at xk as

k k k k n f(x) = g(x) − h(x) ≤ g(x) − (h(x ) + hx − x , y i) = f (x), ∀x ∈ R

where yk ∈ ∂h(xk). Then, clearly

k n k n argmin{f (x): x ∈ R } ⇔ argmin{g(x) − hx, y i : x ∈ R }.

DCA applied to convex constrained DC program yields a similar scheme as:

xk+1 ∈ argmin{g(x) − hx, yki : x ∈ C}

with yk ∈ ∂h(xk). DCA will be often terminated if one of the following stop- ping criteria is verified:

k+1 k k k •k x − x k/(1 + kx k) ≤ ε1 (i.e., the sequence {x } converges). k+1 k k k •| f(x )−f(x )|/(1+|f(x )|) ≤ ε2 (i.e., the sequence {f(x )} converges). The convergence theorem of DCA (see e.g., [42]) states that DCA starting with any initial point x0 ∈ dom ∂h will generate a sequence {xk} such that

• The sequence {f(xk)} is decreasing and bounded below. • Every limit point of the sequence {xk} is a critical point of the standard DC program.

Note that for smooth DC program (i.e., both DC components g and h are smooth), the critical point is also a KKT point (we will show it later in Theorem3). However, as in many nonconvex optimization, a critical point (or a KKT point) may not be a local minimizer except for some some particular cases (e.g., when the objective function f is locally convex at critical points). A stronger definition than the critical point is d(irectional)-stationary point which is a critical point x∗ verifying ∅= 6 ∂h(x∗) ⊂ ∂g(x∗), i.e., f 0(x∗; x−x∗) ≥ 0 for all x ∈ dom g. The notation f 0(x; d) stands for the directional derivative at x in the direction d defined by:

f(x + td) − f(x) f 0(x; d) = lim . t↓0 t

Note that if f is differentiable at x, then f 0(x; d) = h∇f(x), di. In view of this, a d-stationary point seems more likely to be a local minimizer, but again, it is not true in general! For instance, let g : x 7→ 1, h : x 7→ kxk2, and consider the point x∗ = 0, clearly ∇g(x∗) − ∇h(x∗) = 0, then by definition, x∗ is a d-stationary point, but x∗ is a maximizer not a minimizer. The reader can refer to [25, 26, 42–45] for more topics on DC program and DCA. Portfolio Optimization based on DC programming and SOS 17

4.2 DCA for problem (DCP)

Since G and H defined in (40) and (41) are convex polynomial functions, then H is differentiable, and we can compute ∇H as :

∇H(x) = c3∇gm3 (x) + c4∇hm4 (x)

DCA requires solving a sequence of convex optimization problems as:

xk+1 ∈ argmin{G(x) − hx, ∇H(xk)i : x ∈ Ω}. (42)

The detailed DCA for (DCP) is given in Algorithm1:

Algorithm 1 DCA for (DCP) 0 n Input: initial point x ∈ R+; tolerance for optimal value ε1 > 0; tolerance for optimal solution ε2 > 0; Output: computed optimal solution x∗ and optimal value f ∗; 1: k ← 0; ∆f ← +∞; ∆x ← +∞; 2: while ∆f > ε1 or ∆x > ε2 do 3: xk+1 ∈ argmin{G(x) − hx, ∇H(xk)i : x ∈ Ω}; 4: f ∗ ← f(xk+1); x∗ ← xk+1; 5: ∆f ← |f ∗ − f(xk))|/(1 + |f ∗|); ∆x ← kx∗ − xkk/(1 + kx∗k); 6: k ← k + 1; 7: end while

Theorem 3 (Convergence theorem of DCA Algorithm1) DCA Algo- rithm1 will generate a sequence {xk} such that • The sequence {f(xk)} is decreasing and bounded below. • Every limit point of {xk} is a KKT point of problem (DCP).

Proof Based on the general convergence theorem of DCA as descried in sub- section 4.1, for proving that the sequence {f(xk)} is decreasing and that every limit point of {xk} is a critical point of (DCP), we just need to show that both the sequences {xk} and {f(xk)} are bounded. The sequence {xk} is bounded since every xk, k ∈ N is included in Ω which is a compact convex set. The boundness of the sequence {f(xk)} is obvious since any polynomial over a compact set is bounded. Now, for proving that every limit point of {xk} is also a KKT point of (DCP), let us denote x∗ as a limit point of {xk}, then based on DCA,

x∗ ∈ argmin{g(x) − h∇h(x∗), xi : x ∈ Ω}. (43)

The above optimization problem is convex and Ω is a polyhedral convex set, thus the linearity constraint qualification is verified, and x∗ satisfies the KKT 18 Y.S. Niu et al.

conditions of problem (43) as:

∇g(x∗) − ∇h(x∗) − λ − µe = 0  x ∈ Ω T λ x = 0  λ ≥ 0, µ ∈ R where (λ, µ) is the Lagrangian multiplier. It is straightforward to show that these KKT conditions are exactly the same for problem (DCP). Therefore, any limit point of {xk} is a KKT point of problem (DCP). ut

Note that, ∆f is often smaller than ∆x, thus ∆f < ε1 is much easier to be reached than ∆x < ε2 in numerical computation. In practice, we often terminate DCA by ∆f < ε1 only. The first reason is that the convergence of {f(xk)} is enough for the convergence of DCA. The more important reason is that the sequence {xk} may not be convergent if the component G is not strictly convex.

5 Boosted-DCA and Armijo-type line search

For a nonlinear DC function whose optimal solution lies in a flat region, i.e., the objective function becomes very flat near the optimal solution, it is often observed that the convergence of DCA becomes very slow. This is due to the fact that the convex overestimation f k in a flat region is generally not flat, since g is not flat, thus f k will fit poorly the object function f in the flat region which yields a small step to the next iteration point by DCA. For example, in our MVSK model, the convex overestimation f k of a quartic polynomial objective function f is still a convex quartic polynomial function. When f is very flat in a region, then f as difference of two convex quartic polynomials is more likely as a locally affine function, but f k as g plus an affine function may be far from f in this region. Therefore, f k could be a bad overestimation of f in a flat region, and particularly worse when the degree of g is high. In order to improve the performance of DCA for high-order polynomial optimization, we propose a Boosted-DCA (called BDCA) which consists of introducing a line search using Armijo-type rule to get an improved iteration point from the one obtained by DCA. Fukushima-Mine introduced such a line search in a proximal point algorithm for minimizing the sum of continuously differentiable functions with a closed proper convex function [14, 34]. Then, Arag´onArtacho-Vuong et al. were the first to apply this type of line search to accelerate DCA for the standard DC program with smooth functions in [2], and extended to nonsmooth cases (where the second DC component h is supposed to be nonsmooth, and g is still smooth) in [3]. In our paper, we will investigate and extend the acceleration technique for convex constrained DC programs. Consider the DC program with convex constraints defined as: min{f(x) := g(x) − h(x): x ∈ C} (P) Portfolio Optimization based on DC programming and SOS 19

where C is a nonempty closed convex set defined as

n C := {x ∈ R : u(x) ≤ 0, v(x) = 0},

n n p the functions g and h are Γ0(R ), u : R → R is a convex function and n q v : R → R is an affine function.

5.1 DC descent direction

Definition 3 Let xk be a feasible point of problem (P), and xk+1 be the next iteration point from xk obtained by DCA for problem (P). If the vector d := xk+1 − xk is a descent direction of f at xk+1 for problem (P), i.e., ∃η > 0, ∀t ∈]0, η[, xk+1 + td ∈ C and f(xk+1 + td) < f(xk+1), then d is called a DC descent direction.

Note that the DC descent direction (as any other descent direction) can be used to proceed a line search at xk+1 which helps to accelerate the convergence of DCA. Next, we will discuss the existence of such DC descent direction for differentiable and non-differentiable cases.

5.1.1 Differentiable case

We first consider the differentiable case with the next assumption: Assumption 1 The functions g, h and u in (P) are all differentiable, and some regularity conditions (e.g., the linearity constraint qualification, the Slater’s condition, or the Mangasarian-Fromovitz constraint qualification) hold. We can prove the following theorems: Theorem 4 Under Assumption1, let xk be a feasible point of problem (P), xk+1 be the next iteration point from xk obtained by DCA for problem (P), and d = xk+1 − xk, then h∇f(xk+1), di ≤ 0.

Proof Based on DCA, the point xk+1 is obtained as an optimal solution of the convex optimization problem

min{g(x) − hx, ∇h(xk)i : u(x) ≤ 0, v(x) = 0}, (44) and under the assumption of constraint qualifications, then xk+1 verifies the KKT conditions of the problem (44) as:

 ∇g(xk+1) − ∇h(xk) + Pp λ ∇u (xk+1) + Pq µ ∇v (xk+1) = 0,  i=1 i i i=1 j j  u(xk+1) ≤ 0, v(xk+1) = 0, k+1 (45) λiui(x ) = 0, i = 1, . . . , p,  q  λ ≥ 0, µ ∈ R , 20 Y.S. Niu et al.

p q where (λ, µ) ∈ R+ × R is the Lagrangian multiplier. It follows from the KKT conditions (45) and d = xk+1 − xk that

h∇f(xk+1), di = h∇g(xk+1) − ∇h(xk+1), di p q k X k+1 X k+1 k+1 = h∇h(x ) − λi∇ui(x ) − µj∇vj(x ) − ∇h(x ), di i=1 i=1 p q (46) k k+1 X k+1 X k+1 = h∇h(x ) − ∇h(x ), di − h λi∇ui(x ) + µj∇vj(x ), di | {z } i=1 i=1 (I) | {z } (II)

The sign of the part (I) is determined by the monotonicity of ∇H since h is convex, i.e., (I) = −h∇h(xk) − ∇h(xk+1), di ≤ 0. (47)

The sign of the part (II) is determined by the convexity of u and the affinity of v, i.e.,

 k+1 k k+1 ui(x ) − ui(x ) ≤ h∇ui(x ), di, i = 1, . . . , p, k+1 k k+1 (48) vj(x ) − vj(x ) = h∇vj(x ), di, j = 1, . . . , q, then it follows from (45), (48), u(xk) ≤ 0 and v(xk) = 0 that

p q X k+1 X k+1 (II) = λih∇ui(x ), di + µjh∇vj(x ), di i=1 i=1 p q X k+1 k X k+1 k ≥ λi(ui(x ) − ui(x )) + µj(vj(x ) − vj(x )) i=1 j=1 (49) p p q q X k+1 X k X k+1 X k = λiui(x ) − λiui(x ) + µjvj(x ) − µjvj(x ) i=1 | {z } i=1 | {z } j=1 | {z } j=1 | {z } =0 ≤0 =0 =0 ≥ 0.

Combining (46), (47) and (49), we get the required inequality

h∇f(xk+1), xk+1 − xki = (I) − (II) ≤ 0 − 0 = 0.

ut

Theorem 5 Under the same assumption in Theorem4, let h be ρ-strongly convex, xk and xk+1 be consecutive iteration points generated by DCA for problem (P), and d = xk+1 − xk. Then we have

h∇f(xk+1), di ≤ −ρkdk2. Portfolio Optimization based on DC programming and SOS 21

ρ 2 Proof The ρ-strong convexity of h means that the function x 7→ h(x) − 2 kxk is still convex for such ρ > 0, and in this case, ∇h is strongly monotone, i.e., h∇h(xk) − ∇h(xk+1), xk − xk+1i ≥ ρkxk+1 − xkk2. (50) By analogue in Theorem4, it follows from (46), (49) and (50) that h∇f(xk+1), xk+1 − xki = (I) − (II) ≤ −ρkxk − xk+1k2, which yields the required inequality h∇f(xk+1), di ≤ −ρkdk2. ut Theorems4 and5 provide us a potential descent direction as d = xk+1 −xk for DC function f at point xk+1, namely DC descent direction, which can be used to proceed a line search for accelerating the convergence of DCA. Note that if a strict inequality h∇f(xk+1), di < 0 holds and d is a feasible direction, then d must be a DC descent direction. Particularly, based on Theorem5, if h is ρ-strongly convex, d 6= 0 and d is a feasible direction of (P), then h∇f(xk+1), di ≤ −ρkdk2 < 0, so that d is a DC descent direction. Based on the above discussions, now, for checking DC descent direction, one important question is how to check whether xk+1−xk is a feasible direction k+1 at x ? Let A(x) denote the active set at x, i.e., ui(x) = 0, ∀i ∈ A(x). The next theorem “partially” replies this question. Theorem 6 Let xk and xk+1 be two consecutive iteration points obtained by DCA for problem (P), if d = xk+1 − xk is a feasible direction at xk+1 for the closed convex set C = {x : u(x) ≤ 0, v(x) = 0}, then A(xk+1) ⊂ A(xk). Proof For any feasible direction d and for any index i ∈ A(xk+1), we have k+1 k+1 k the inequality h∇ui(x ), di ≤ 0. Replacing d by x − x and using the convexity of ui, we get k k+1 k+1 ui(x ) − ui(x ) ≥ −h∇ui(x ), di ≥ 0. k+1 k+1 We obtain from the above inequality and ui(x ) = 0, ∀i ∈ A(x ) that k ui(x ) ≥ 0. (51) k k Moreover, since x is a feasible point, then ui(x ) ≤ 0, combining with (51), k k we obtain ui(x ) = 0, which implies i ∈ A(x ). ut Theorem6 indicates that the condition A(xk+1) ⊂ A(xk) is necessary (at least) for xk+1 − xk being a feasible direction at xk+1. Unfortunately, this condition is not sufficient in general. For instance, consider the constraint 2 2 2 k {x ∈√R : u1(x) = kxk − 1 ≤ 0, u2(x) = kx − 1k − 1 ≤ 0}, if we take x = (0.5, 3/2) and xk+1 = (0, 0), then clearly A(xk+1) = {2} ⊂ {1, 2} = A(xk), but the vector xk+1 − xk is not a feasible direction at xk+1. This condition can be sufficient if u is is affine which will be proved in the next theorem. 22 Y.S. Niu et al.

Theorem 7 If u is affine, then the condition A(xk+1) ⊂ A(xk) is both neces- sary and sufficient for d = xk+1 − xk being a feasible direction at xk+1.

Proof The necessary part is given by Theorem6. For proving the sufficient part, based on the affinity of u, we get

k+1 k k+1 k+1 k h∇ui(x ), di = ui(x ) − ui(x ) = 0, ∀i ∈ A(x ) ⊂ A(x ), which means d is a feasible direction at xk+1. ut

Note that in our portfolio application with simplex constraint Ω = {x : v(x) = e T x = 1, u(x) = x ≥ 0}, the function u is linear, so A(xk+1) ⊂ A(xk) is a necessary and sufficient condition for the feasibility of xk+1 − xk.

5.1.2 Non-differentiable case

An interesting question is to generalize Theorems4,5 and6 for non-differentiable case and without regularity conditions. Fortunately, these theorems still hold under the next assumption: Assumption 2 The function g is still differentiable, but h and u are non- differentiable, and regularity conditions are not required in problem (P).

Theorem 8 Under Assumption2, let xk be a feasible point of problem (P), xk+1 be the next iteration point from xk obtained by DCA for problem (P), and d = xk+1 − xk, then we have • If h is non-differentiable convex, then f 0(xk+1; d) ≤ 0. • If h is non-differentiable ρ-strongly convex, then ∃ρ > 0 such that

f 0(xk+1; d) ≤ −ρkdk2.

• If d is a feasible direction at xk+1, then A(xk+1) ⊂ A(xk).

The proof of Theorem8 for non-differentiable case is quite different from the differentiable one. In our MVSK model, all polynomials are differentiable, so Theorem8 is not interesting to this problem. The demonstration of Theorem 8 will be completed later in Appendix as a theoretical tool for further suitable applications. Note that if u is affine, then A(xk+1) ⊂ A(xk) is still a necessary and sufficient condition for checking a feasible direction d at xk+1 regardless the differentiability of h.

5.2 Armijo-type line search

Suppose that d = xk+1 − xk is a DC descent direction for problem (P), we are k+1 k+1 going to find a suitable stepsize α > 0 moving from x to xb along the direction d as k+1 k+1 xb = x + αd, (52) Portfolio Optimization based on DC programming and SOS 23

k+1 k+1 k+1 verifying f(xb ) < f(x ) and xb ∈ C. The exact line search finds the best α by solving the one-dimensional minimization problem:

min{f(xk+1 + αd): α > 0, xk+1 + αd ∈ C} using classical line search methods such as Fibonacci and golden section search, and line search methods based on curve fitting etc. However, the exact line search is often cumbersome. As a matter of fact, for handling large-scale cases, it is often desirable to sacrifice accuracy in the line search in order to conserve overall computation time. Therefore, we are more interested in inexact line search, e.g., Armijo-type line search, in which we won’t find the best α, but k+1 k+1 k+1 try to find an available α satisfying f(xb ) < f(x ) and xb ∈ Ω. We start from an initial trial stepsize α > 0 (neither too large nor too small, e.g., α = 1). Let σ ∈ (0, 1) and β ∈ (0, 1), where β denotes the reduction factor (or decay factor) to reduce the stepsize α as

α = βα, and σ is chosen to be closed to zero. E.g., β ∈ [0.1, 0.5] and σ ∈ [10−5, 0.1]. ρ 2 For ρ-strongly convex function h, the function h(.) − 2 k.k is still convex, then if α is reduced smaller than ρ, we get from the classical Armijo’s rule (see e.g., [6]) and Theorem5 or8 that

k+1 k+1 0 k+1 2 2 2 f(x ) − f(xb ) ≥ −σαf (x ; d) ≥ σαρkdk ≥ σα kdk > 0.

k+1 Therefore, we can stop reducing α when xb ∈ C and verifying

k+1 k+1 2 2 f(xb ) ≤ f(x ) − σα kdk . (53) Otherwise, since d is a descent direction, then f 0(xk+1; d) < 0 and the Armijo’s rule turns to k+1 k+1 0 k+1 f(x ) − f(xb ) ≥ −σαf (x ; d) > 0. k+1 We can stop reducing α when xb ∈ C and verifies again the inequality (53) in order to avoid the computation of f 0(xk+1; d) which is more time-consuming than the computation of kdk. Note that the choice of the parameters β and σ depends on the confidence we have on the initial stepsize α. An essential idea of Armijo’s line search is that the selection of α is neither too large nor too small. If α is too large, then we may need a fast reduction in α, so that β and σ should be chosen small; If α is too small, e.g., α ≤ ε/kdk for a given tolerance ε > 0, then we get from (52) that k+1 k+1 kxb − x k = kαdk ≤ ε. k+1 In this case, if xb ∈/ C, then there is no need to continue the line search and k+1 k+1 we will return the initial point x ∈ C instead of xb ∈/ C. The proposed Armijo-type line search is described in Algorithm2. 24 Y.S. Niu et al.

Algorithm 2 Armijo line search Input: descent direction d = xk+1−xk; point xk+1; reduction factor β ∈ (0, 1) (e.g., β = 0.3); initial stepsize α > 0 (e.g., α = 1); parameter σ ∈ (0, 1) (e.g., σ = 10−3); tolerance for line search ε > 0. k+1 Output: potentially improved candidate xb . 1: while α > ε/kdk do k+1 k+1 2: xb ← x + αd; k+1 k+1 2 2 3: ∆ ← f(x ) − f(xb ) − σα kdk ; k+1 4: if ∆ ≥ 0 and xb ∈ C then k+1 5: return xb ; 6: end if 7: α ← βα; 8: end while k+1 k+1 9: xb ← x ; k+1 10: return xb .

Algorithm 3 BDCA for problem (P) 0 n Input: initial point x ∈ R+; tolerance for optimal value ε1 > 0; tolerance for optimal solution ε2 > 0; Output: optimal solution x∗; optimal value f ∗; 1: k ← 0; ∆f ← +∞; ∆x ← +∞; 2: while ∆f > ε1 or ∆x > ε2 do 3: yk ∈ ∂h(xk); 4: xk+1 ∈ argmin{g(x) − hx, yki : x ∈ C}; 2 5: if (xk, xk+1) ∈ C and A(xk+1) ⊂ A(xk) then 6: d ← xk+1 − xk; 7: if f 0(xk+1; d) < 0 then k+1 k+1 8: Armijo line search Algorithm2 to get xb from x ; k+1 k+1 9: x ← xb 10: end if 11: end if 12: f ∗ ← f(xk+1); x∗ ← xk+1; 13: ∆f ← |f ∗ − f(xk)|/(1 + |f ∗|); ∆x ← kx∗ − xkk/(1 + kx∗k); 14: k ← k + 1; 15: end while

5.3 Boosted-DCA

Combining the DC descent direction, Armijo line search, and DCA, we propose a boosted-DCA, namely BDCA, described in Algorithm3. Figure1 illustrates how BDCA accelerates the convergence of DCA. The classical DCA start from xk will minimize the convex overestimator f k of f at xk to get a minimum xk+1; while the BDCA will continuously proceed an Armijo-type line search at xk+1 along the DC descent direction d = xk+1 −xk, Portfolio Optimization based on DC programming and SOS 25

∗ k+1 k+1 k k+1 k k k f(x ) ≤ f(xb ) ≤ f(x ) ≤ f (x ) ≤ f (x ) = f(x )

f k k k+1 k+1 k+1 x x xb = x + αd x∗ d f

Fig. 1 BDCA for acceleration.

k+1 k+1 k+1 which leads to a better candidate xb verifying f(xb ) ≤ f(x ). Clearly, if this inequality is strict, then BDCA accelerates the convergence of DCA to the local minimum x∗ of f. This acceleration is particularly useful when the graph of f is flat around the optimal solution, since then, the graph of the convex overestimation f k may not be good enough to fit the graph of f around the current iteration point. Note that starting from a same initial point, DCA and BDCA may yield different local minima regardless the strongly convexity of the DC components. Figure2 illustrates how it works.

∗ k+1 k+1 k k+1 k k k f(y ) ≤ f(xb ) ≤ f(x ) ≤ f (x ) ≤ f (x ) = f(x )

f k k k+1 k+1 k+1 x x xb = x + αd ∗ ∗ d x y f

Fig. 2 BDCA and DCA may converge to different local minima

In figure2, using classical DCA from xk without line search, we will prob- ably converge to the nearest local minimum x∗. However, if α is well chosen, it k+1 k+1 is possible to find a better candidate xb (than x ) located around a better local minimum y∗ (than x∗). This is a useful feature of the line search, which increases the potential of DCA to escape from the attraction of the closest but undesirable local minimum. 26 Y.S. Niu et al.

6 Boosted-DCA for the DC-SOS decomposition

Applying BDCA Algorithm3 to problem (DCP) within the standard simplex √ Ω, we can particularly choose the initial trial stepsize α = 2 which is large kdk √ enough since the distance√ between any two points in Ω is ≤ 2 and α = k+1 k+1 2 k kxb −x k/kdk ≤ kdk . Moreover, H is differentiable so that ∂h(x ) reduces to a singleton {∇H(xk)}. Algorithm4 describes BDCA for problem (DCP).

Algorithm 4 BDCA for problem (DCP) 0 n Input: initial point x ∈ R+; tolerance for optimal value ε1 > 0; tolerance for optimal solution ε2 > 0; Output: optimal solution x∗; optimal value f ∗; 1: k ← 0; ∆f ← +∞; ∆x ← +∞; 2: while ∆f > ε1 or ∆x > ε2 do 3: xk+1 ∈ argmin{G(x) − hx, ∇H(xk)i : x ∈ Ω}; 4: if (xk, xk+1) ∈ Ω2 and A(xk+1) ⊂ A(xk) then 5: d ← xk+1 − xk; 6: if h∇f(xk+1), di < 0 then √ 7: 2 k+1 Armijo line search with initial α = kdk to get xb from xk+1; k+1 k+1 8: x ← xb 9: end if 10: end if 11: f ∗ ← f(xk+1); x∗ ← xk+1; 12: ∆f ← |f ∗ − f(xk)|/(1 + |f ∗|); ∆x ← kx∗ − xkk/(1 + kx∗k); 13: k ← k + 1; 14: end while

Theorem 9 (Convergence theorem of BDCA Algorithm4) BDCA Algorithm4 for problem (DCP) will generate a sequence {xk} such that • The sequence {f(xk)} is decreasing and bounded below. • Every limit point of {xk} is a KKT point of problem (DCP).

Proof This theorem can be proved similarly as in Theorem3. We have that k k+1 k+1 all points x → x → xb are bounded on the compact set Ω, and k+1 k+1 k −∞ < min{f(x): x ∈ Ω} ≤ f(xb ) ≤ f(x ) ≤ f(x ) implies that the new sequence {f(xk)} is decreasing and bounded below. More- over, every limit point of {xk}, denoted by x∗, verifies that x∗ ∈ argmin{G(x)− hx, ∇H(x∗)i : x ∈ Ω} and the linearity constraint qualification holds, thus we use exactly the same way as in Theorem3 to prove that BDCA will stop at x∗ verifying the KKT conditions of problem (DCP). ut Portfolio Optimization based on DC programming and SOS 27

7 Boosted-DCA for the universal DC decomposition

Using Boosted-DCA, we can also accelerate the DC algorithm based on uni- versal DC decomposition proposed in [35,46]. The universal DC decomposition yields a DC program defined as:

min f(x) = G¯(x) − H¯ (x) (54) s.t. x ∈ Ω

¯ η 2 n ¯ where G(x) = 2 kxk2 is a convex quadratic function on R , and H(x) = η 2 2 kxk2 − f(x) is a differentiable convex function on Ω. A suitable parameter η can be compute using the formulas in Proposition 2 of [46] as:

n n X X η = 2c2kΣk∞ + 6c3 max ( |Si,j,k|) + 12c4 max ( |Ki,j,k,l|). 1≤i≤n 1≤i≤n j,k=1 j,k,l=1

DCA applied to problem (54), namely Universal DCA (UDCA), needs to compute in each iteration the gradient of the differentiable function H¯ by:

∇2m (x)x ∇2m (x)x ∇H¯ (x) = ηx + c µ − 2c Σx + c 3 − c 4 1 2 3 2 4 3 where

n !  n  2 X 2 X ∇ m3(x) = 6 Si,j,kxk ; ∇ m4(x) = 12 Ki,j,k,l xkxl . k=1 2 k,l=1 (i,j)∈N (i,j)∈N 2 Then, to solve the strictly convex quadratic optimization problem: η xk+1 ∈ argmin  kxk2 − hx, ∇H(xk)i : x ∈ Ω (55) 2 2 which is equivalent to the projection of the vector ∇H(xk)/η on the standard simplex Ω. This problem can be solved very efficiently by a strongly polynomial algorithm Block Pivotal Principal Pivoting Algorithm (BPPPA) presented in [23, 46], or by the explicit direct projection method presented in [15, 19]. Introducing the Armijo line search in UDCA, we can get a similar Boosted- UDCA, namely UBDCA, described in Algorithm5. The only difference be- tween Algorithms4 and5 is in the line 3, where the convex polynomial opti- mization (42) in Algorithm4 is replaced by the convex quadratic optimization (55) in Algorithm5. So its convergence theorem will be exactly the same. Note that solving the convex quadratic optimization (55) using strongly polynomial explicit methods (such as BPPPA) will be faster than solving the convex quartic polynomial optimization (42) using interior point method. How- ever, DC-SOS decomposition may provide better convex approximation of f which will lead to less number of iterations in both DCA and BDCA. There- fore, it is worth to compare the performances of DC algorithms based on the DC-SOS decomposition and the universal DC decomposition. 28 Y.S. Niu et al.

Algorithm 5 UBDCA for problem (54) 0 n Input: Initial point x ∈ R+; Tolerance for optimal value 1 > 0; Tolerance for optimal solution 2 > 0; Output: Optimal solution x∗; Optimal value f ∗; 1: k ← 0; ∆f ← +∞; ∆x ← +∞; 2: while ∆f > 1 or ∆x > 2 do 3: Compute xk+1 by solving the convex quadratic program (55); 4: if (xk, xk+1) ∈ Ω2 and A(xk+1) ⊂ A(xk) then 5: d ← xk+1 − xk; 6: if h∇f(xk+1), di < 0 then √ 7: 2 k+1 k+1 Armijo line search with initial α = kdk to get xb from x ; k+1 k+1 8: x ← xb 9: end if 10: end if 11: f ∗ ← f(xk+1); x∗ ← xk+1; 12: ∆f ← |f ∗ − f(xk)|/(1 + |f ∗|); ∆x ← kx∗ − xkk/(1 + kx∗k); 13: k ← k + 1; 14: end while

8 Numerical simulation

8.1 Experimental setup

The numerical experiments are performed on a Dell Workstation equipped with 4 Intel i7-6820HQ (8 cores), 2.70GHz CPU and 32 GB RAM. Our DC algorithms (DCA and BDCA) are implemented on MATLAB R2019a, namely MVSKOPT, based on a DC optimization toolbox (namely DCAM) and a multivari- ate polynomial matrix modeling toolbox (namely POLYLAB). The toolbox DCAM provides three main classes: DC function class (dcfunc), DC programming problem class (dcp), and DCA class (dca), which can be used to model and solve a DC programming problem within few lines of codes. This toolkit is released as an open-source code under the license of MIT through Github [36]. The toolbox POLYLAB is also developed by us to build efficiently multivariate polynomials, whose code is released as well on Github at [38]. We kindly wel- come researchers from all over the world for extensive tests of our codes in their applications, as well as your feedbacks, suggestions and contributions.

8.2 Data description

We use two datasets in our experiments. • Synthetic datasets: the dataset is randomly generated by taking the number of assets n in {4, 6,..., 20} and the number of periods T as 30. For each n, we generate 3 models in which the investor’s preference c is Portfolio Optimization based on DC programming and SOS 29

randomly chosen with c ≥ 0. The returns rates Ri,t are taken in [−0.1, 0.4] for all i ∈ {1, . . . , n}, t ∈ {1,...,T } using MATLAB function rand. This dataset is used to test the performance of DCA and Boosted-DCA. • Real datasets: we take the weekly real return rates of 1151 assets in Shanghai A shares ranged from January 2018 to December 2018 (51 weeks) downloaded from CSMAR http://www.gtarsc.com/ database. These data are used to analyze the optimal portfolios and plot efficient frontier on real stock market.

8.3 High-order moment computation

The input tensors of four moments (mean, covariance, co-skewness and co- kurtosis) are computed using the formulations (1), (2), (3) and (4). The “curse of dimensionality” is a crucial problem to construct MVSK models. Three important issues and the proposed improvements are needed to be noted:

• Tensor sparsity issue: We have explained in our previous work [46] that the moments and co-moments are often non-zeros which yields a dense non- convex quartic polynomial optimization problem for MVSK model. There- n4 fore, the number of monomials increases as fast as the order O( 4! ), since n+4 dim R4[x] = 4 . This inherent difficulty makes it very time-consuming to generate dense high-order multivariate polynomials in MATLAB. Figure3 shows the performances of different modeling tools (including POLYLAB [38], YALMIP [28], SOSTOOLS, MATLAB Symbolic Math Toolbox using sym, and MATLAB Optimization Toolbox using optimvar) for constructing MVSK models. Clearly, POLYLAB is much more fastest than the others, which is the reason why we use POLYLAB to model polynomial optimization prob- lems. Anyway, regarding to figures 3(a) and 3(b), the modeling time of POLYLAB is still growing very quickly. Based on this observation, we can predict with a fourth order polynomial interpolation that the generating time of an MVSK model with 50 variables could take about 1.59 hours for POLYLAB, 2.68 hours for SOSTOOLS, 4.72 hours for Sym, 7.48 hours for YALMIP, and 12.19 hours for optimvar. So the sparsity issue is one of the most important problems to limit the size of the MVSK model in practice. That is also the reason to develop POLYLAB as a by-product in this project.

• Computer memory issue: Based on the symmetry of the moment tensors, it is unnecessary to allocate computer memories for saving all high-order moments and co-moments. E.g., due to the size limitation of the allowed MATLAB array, the construction of an n4 co-kurtosis tensor with n = 300 yields approximately 60.3GB memories, that is intractable in our 32GB RAM testing device. In our previous work [46], we have tried using the Kronecker product and MATLAB mex programming technique to compute a co-skewness (resp. co-kurtosis) tensor as an n × n2 (resp. n × n3) sparse matrix by retaining only the independent elements. Even though, saving 30 Y.S. Niu et al.

(a) Number of assets v.s. cpu time (b) Number of assets v.s. log cpu time

Fig. 3 Performance of MVSK modelings using different modeling tools on MATLAB

huge amounts of moments data in memory is still very space-consuming. To overcome this difficulty, in this paper, we propose computing the moments and co-moments entries Just-In-Time (namely, JIT technique) when they are needed, and without saving them in memory at all. Moreover, these moments and co-moments need only to be computed once when construct- ing the polynomial objective function, and the resulted polynomial has at n+4 most 4 monomials which does not require large mount of memories. Once the polynomial is constructed, we can evaluate its value at any given point very quickly. JIT technology is particularly useful for overcoming the bottleneck of computer memory issue and improves a lot the numerical performance in large-scale simulations. • Gradient computing issue: Concerning the gradient computation required in DCA and any first- and second-order method, it is very computational expensive to derive the exact gradient when n is large. Therefore, we sug- gest using numerical gradient approximation when the exact gradient is not n available. E.g., for polynomial function f : R → R, the partial derivatives ∂if(x) can be approximated by

f(x + δe ) − f(x − δe ) ∂ f(x) ≈ i i , ∀i ∈ {1, . . . , n}. i 2δ

n where ei is the i-th unit vector of R , and δ is a small positive real number (e.g., δ = 0.01). Although, providing the exact derivatives is beneficial in evaluating the value of ∇f(x) at any given point x, however, when exact derivatives are difficult to be computed, the numerical gradient to approximate ∇f(x) will be very helpful. Portfolio Optimization based on DC programming and SOS 31

8.4 Numerical tests with syntactic datastes

8.4.1 Performance of DC algorithms

In this subsection, we will present the performance of our proposed four DC algorithms for MVSK model: DCA (Algorithm1) for DC-SOS decomposi- tion, BDCA (Algorithm4), UBDCA (Algorithm5), as well as our previously proposed UDCA in [46]. The polynomial convex optimization sub-problems re- quired in DCA and BDCA are solved by KNITRO [8] (an implementation of an interior-point-method for convex optimization), which seems to be the fastest solver comparing with MATLAB [32] fmincon, IPOPT [50] and CVX [16]; while the quadratic convex optimization sub-problems required in UDCA and UBDCA are solved by a strongly polynomial-time algorithm BPPPA, which can be easily implemented on MATLAB (see e.g., [46] for BPPPA). We test on the synthetic datasets with n ∈ {4, 6,..., 20} and T = 30. For each n, we generate three problems with different investor’s preferences (c = (10, 1, 10, 1) for risk-seeking, c = (1, 10, 1, 10) for risk-aversing, and c = (10, 10, 10, 10) for risk neutral). The initial point x0 for DCA and BDCA is randomly generated in {0, 1}n. The reason for using such an integer initial points is based on our observations that the optimal solutions of MVSK models are usually sparse, i.e., consist of many zero entries, and starting DCA and BDCA from sparse initial points seems to converge more frequently to sparse −6 −4 solutions as well. In DCA, the tolerances√ ε1 = 10 and ε2 = 10 . In Armijo 2 line search, the initial stepsize α = kdk , the reduction factor β = 0.5, the parameter σ = 10−3, and the stopping tolerance for line search ε = 10−8. Table1 summarizes some details of our tested synthetic models and their numerical results obtained by different DC algorithms (DCA, BDCA, UDCA and UBDCA). Some labels are explained as follows: Labels for MVSK models include the number of assets (n), the number of period (T), and the number of monomials (monos); Labels for numerical results of DC algorithms include the number of iterations (iter), the solution time (time), and the objective value (obj). We plot their numerical results in Figure4 in which the horizontal axis is the number of assets n, the vertical axis in Fig 4(a) is the average solution time of the three tested models (with different type of investor’s preference) for the same number of assets n, and the vertical axis in Fig 4(b) is the logarithm of the average solution time. We can observe in Fig 3(a) that the fastest DC algorithm is UBDCA, then BDCA, UDCA, and DCA. Based on the average cpu time in Table1, we can estimate that UBDCA is about 3.6 times faster than BDCA, 8 times faster than UDCA and 12 times faster than DCA. Fig 3(b) shows that the solution time for all DC algorithms seems increase exponentially with respect to the number of assets n, since the logarithm of the average solution time increases almost linearly with respect to the number of assets. Moreover, the boosted DC algorithms (BDCA and UBDCA) require less average number of iterations and result faster convergence than the classical DC algorithms (DCA and UDCA). 32 Y.S. Niu et al.

(a) n v.s. average solution time (b) n v.s. log average solution time

Fig. 4 Performance of DCA, BDCA, UDCA√ and UBDCA for different number of assets n −6 −4 2 −3 −8 with parameters ε1 = 10 , ε2 = 10 , α = kdk , β = 0.5, σ = 10 , and ε = 10

We conclude that the boosted-DCA indeed accelerates the convergence of the classical DCA. Concerning the quality of the computed solutions obtained by different DC algorithms, we plot in Fig5 that the objective values obtained by different DC algorithms substitute the objective values obtained by BDCA. Apparently, BDCA often obtains smaller objective values than the other three DC algo- rithms, but their differences are very small of order O(10−4). Moreover, DC algorithms based on DC-SOS decomposition (DCA and BDCA) often provide smaller objective values than universal DC decomposition (UDCA and UB- DCA). We believe that this is the benefit of DC-SOS decomposition technique, since the number of iterations using DC-SOS decomposition is always smaller than using universal DC decomposition, which demonstrates that the DC-SOS decomposition can indeed provide better convex overestimator of the original polynomial function, thus DC-SOS is a promising approach, as expected, to provide high quality DC decomposition for polynomials.

Fig. 5 Differences between the objective values of four DC algorithms and the objective values of BDCA with respect to different MVSK models Portfolio Optimization based on DC programming and SOS 33

Note that the DC-SOS decomposition will produce a high-order convex polynomial optimization sub-problem, which usually takes more time to solve than the convex quadratic optimization problem required by the general DC decomposition. Therefore, a fast convex polynomial optimization solver dealing with CSOS polynomial objective function and linear constraints, is extremely important to improve again the performance of DCA and BDCA, and thus deserves more attention in future research.

8.4.2 Comparison with other methods

In our previous work [46], we have reported the numerical performances of several existing solvers Gloptipoly, LINGO and fmincon (SQP and Trust Re- gion algorithms) for MVSK models. The interested reader can refer to [46] for more details. In this paper, we are going to compare our boosted DC algo- rithms (BDCA and UBDCA) with other existing solvers (KNITRO [8], FIL- TERSD [13], IPOPT [50] and MATLAB FMINCON [32]) which may represent the state-of-the-art level of local optimization solvers for large-scale problems. Note that all of these methods are using POLYLAB as polynomial modeling tool in our tests. A brief introduction about these compared solvers are summarized as follows: B KNITRO, short for ”Nonlinear Interior point Trust Region Optimiza- tion”, is a commercial software package for solving large scale nonlinear math- ematical optimization problems. It was co-created by Richard Waltz, Jorge Nocedal, Todd Plantenga and Richard Byrd in 2001 at Northwestern Uni- versity (through Ziena Optimization LLC), and has since been continually improved by developers at Artelys. Knitro offers four different optimization algorithms for solving optimization problems. Two algorithms are of the inte- rior point type, and two are of the active set type. It is also a solver for linear and nonlinear mixed-integer optimization models. Knitro supports a variety of programming languages (e.g., MATLAB, C/C++, C#, Java, Python, and Fortran), and modeling languages (e.g., AIMMS, AMPL, GAMS, and MPL). B IPOPT, short for ”Interior Point OPTimizer”, is a software library for large scale nonlinear optimization of continuous systems. IPOPT is part of the COIN-OR project, originally developed by Andreas W¨achter and Lorenz T. Biegler of the Department of Chemical Engineering at Carnegie Mellon University. It is written in Fortran, C and C++ and is released under the EPL. IPOPT implements a primal-dual interior point method, and uses line searches based on Filter methods (Fletcher and Leyffer). IPOPT can be called from various modeling environments and MATLAB interface is available. B FILTERSD is a software package for solving Nonlinear Programming Prob- lems and Linearly Constrained Problems in continuous optimization. FIL- TERSD is written in Fortran 77 by Roger Fletcher and released as part of COIN-OR projects (the COIN project leader is Frank E. Curtis). FILTERSD uses a generalization of Robinsons method, globalised by using a filter and trust region. The MATLAB interface is provided through OPTI Toolbox. BFMINCON is a MATLAB based optimization solver for constrained non- 34 Y.S. Niu et al.

linear optimization problems. It is a component of the MATLAB Optimization Toolbox. We will test the interior-point algorithm implemented in fmincon. Numerical results of KNITRO, FILTERSD, IPOPT and FMINCON with default parameter settings are reported in Table2. The comparisons of these solvers with boosted-DCA (BDCA and UBDCA) are demonstrated in Fig6 in which the number of assets v.s. the logarithm of the average solution times of different methods is given in Fig 6(a); while the differences between the objective values of all solvers and the objective values of KNITRO for all tested models are given in Fig 6(b). Apparently, KNITRO often provides best numerical solutions within shortest solution time. It worth noting that, except fmincon, all other solvers are developed in either C/C++ or Fortran, and are invoked through MATLAB interfaces, thus they all perform pretty fast. We can observe that although BDCA and UB- DCA are developed in MATLAB, they are still comparable with these best performance solvers.

(a) n v.s. log average solution time (b) models v.s. objective values

Fig. 6 Comparisons among KNITRO, FILTERSD, IPOPT, FMINCON, and Boosted-DCA (BDCA and UBDCA)

8.5 Numerical tests with real datasets

In this subsection, we are interested in the shape of the portfolio efficient frontier. We are going to plot efficient frontiers for optimal portfolios provided by MVSK models for different types of investors (risk-neutral, risk-seeking and risk-aversing). We use real datasets of Shanghai A shares, and randomly select 10 potentially ‘good’ candidates among 1151 assets, which are selected based on their positive average returns within 51 weeks. Among these candidates, we establish MVSK models with desired expected return m1 varying from 0 to 0.4 by step 0.001. These models are in form of

min{c2m2(x) − c3m3(x) + c4m4(x): x ∈ Ω, m1(x) = rk}. Portfolio Optimization based on DC programming and SOS 35 where rk is the desired expected return in {0, 0.001, 0.002,..., 0.4}, and the investor’s preference c is randomly chosen as follows:

• for risk-neutral investor, we take ci ∈ [20, 22], ∀i ∈ {1,..., 4}; 2 2 • for risk-aversing investor, we take (c2, c4) ∈ [20, 22] and (c1, c3) ∈ [1, 3] ; 2 2 • for risk-seeking investor, we take (c1, c3) ∈ [20, 22] and (c2, c4) ∈ [1, 3] . Then we use BDCA to solve these models for each type of investor, and obtain optimal portfolios to generate efficient frontiers. Fig7 illustrates ef- ficient frontiers for three types of investors. Fig 7(a) presents the classical Mean-Variance efficient frontiers. As we expected, these frontiers are likely as portions of hyperbola. Fig 7(b), 7(c) and 7(d) represent high-dimensional efficient frontiers (Mean-Variance-Skewness frontiers, Mean-Variance-Kurtosis frontiers, and Mean-Skewness-Kurtosis frontiers). We observe that the shape of efficient frontiers for three types of investors are quite different from each other. Risk-aversing efficient frontier has lower risk and lower expected re- turns; Risk-seeking efficient frontier has higher risk and higher expected re- turns; while risk-neutral efficient frontier is just between them. Moreover, the skewness and kurtosis of the optimal portfolios are both increasing when the mean (return) and variance (risk) are large enough, which indicates that a higher mean-variance optimal portfolio will have a higher probability of gains, but also have more uncertainty of returns. For more insights on the shapes of high-dimensional efficient frontiers, the reader can refer to [10, 11].

9 Conclusion and perspectives

In this paper, we proposed several DC programming approaches for solving the high-order moment portfolio optimization problem (mean-variance-skewness- kurtosis model). We first reformulated the NVSK model as a DC programming problem based on DC-SOS decomposition, then applied DCA to find its lo- cal optimal solution. We also proposed an acceleration technique for DCA based on an Armijo-type line search, namely Boosted-DCA. This technique can be used for smooth and non-smooth DC programs with convex constraints. Then, we use Boosted-DCA on MVSK model and established respectively to BDCA (Boosted-DCA with DC-SOS decomposition) and UBDCA (Boosted- DCA with universal DC decomposition). Numerical simulations comparing DCA, BDCA, UBDCA and UDCA (a classical DC algorithm based on univer- sal DC decomposition proposed in [46]) demonstrated that the boosted-DCA (BDCA and UBDCA) indeed accelerated the convergence of the classical DCA (DCA and UDCA). Moreover, numerical results also showed that DC-SOS is a promising approach, as expected, to provide high quality DC decomposition for polynomials. Our future works are varies. Firstly, a faster convex optimization solver than KNITRO for convex polynomial optimization based on DC-SOS over linear constraints is extremely important to the performance of DCA and BDCA. It deserves more attention on developing such a fast solution method 36 Y.S. Niu et al.

(a) (b)

(c) (d)

Fig. 7 high-dimensional portfolio efficient frontiers for risk-neutral, risk-aversing and risk- seeking optimal portfolios

based on the specific structure of the optimization problem. One possible idea is to reformulate the problem, based on the DC-SOS structure, to convex quadratic program within convex quadratic constrains by introducing addi- tional variables to replace all SOS terms. Then, we can use various more ef- ficient convex quadratic optimization solvers/second order cone programming solvers(e.g., CPLEX, GUROBI and Mosek) instead of general nonlinear opti- mization solvers (KNITRO and IPOPT). Secondly, in fact, we do not really need to find a minimizer of the convex optimization subproblem in DCA, but to find a better feasible solution xk+1 ∈ Ω such that f(xk+1) < f(xk) is just fine. Therefore, it is possible to think about using a descent algorithm, such as the Wolfe algorithm, to search for a better feasible point xk+1 from xk without entirely solving the convex optimization subproblem in DCA. This idea, namely partial solution strategy, has been proved to be useful for achiev- ing faster convergence in large-scale DC optimization problems (see e.g., [40]). Researches in these directions will be reported subsequently.

Acknowledgements The authors are supported by the National Natural Science Founda- tion of China (Grant 11601327). Portfolio Optimization based on DC programming and SOS 37

Appendix

Proof of Theorem8 in Subsection 5.1.2: Let g be differentiable convex, h and u be non-differentiable convex, then

f(xk+1 + td) − f(xk+1) f 0(xk+1; d) = lim t↓0 t g(xk+1 + td) − g(xk+1) h(xk+1 + td) − h(xk+1) = lim − lim t↓0 t t↓0 t = h∇g(xk+1), di − h0(xk+1, d). (56)

Since h is non-differentiable and convex, then

∂h(xk+1) = {v : h(xk+1 + s) ≥ h(xk+1) + hv, si, ∀s}.

Taking s = td, t > 0, we have ∀v ∈ ∂h(xk+1),

k+1 k+1 h(x + td) − h(x ) t↓0 hv, di ≤ −−→ h0(xk+1; d). (57) t The equations (56) and (57) implies

f 0(xk+1; d) ≤ h∇g(xk+1) − v, di, ∀v ∈ ∂h(xk+1). (58)

Now, DCA for problem (P) yields the next convex optimization subproblem

xk+1 ∈ argmin{g(x) − hu, xi : x ∈ C} where u ∈ ∂h(xk). This implies

k+1 k+1 0 ∈ ∇g(x ) − u + ∂χC(x ).

For any nonempty closed convex set C, it is easy to derive that

k+1 k+1 ∂χC(x ) = NC(x )

k+1 k+1 where NC(x ) is the normal cone of C at x ∈ C. It follows that

k+1 k+1 u − ∇g(x ) ∈ NC(x ).

Thus, by definition of the normal cone, we have

hu − ∇g(xk+1), xk+1 − xi ≥ 0, ∀x ∈ C .

This inequality is obviously true for x = xk ∈ C, so we obtain

hu − ∇g(xk+1), xk+1 − xki ≥ 0. (59)

Combining equations (58), (59) and d = xk+1 − xk, we get 38 Y.S. Niu et al.

f 0(xk+1; d) ≤ h∇g(xk+1) − v, di = h∇g(xk+1) − u + u − v, di = h∇g(xk+1) − u, di + hu − v, di ≤ hu − v, di. (60)

Now, we can derive the results for non-differentiable h and u as follows: B If h is non-differentiable convex: by the monotonicity of ∂h, we have hu − v, di = hu − v, xk+1 − xki ≤ 0, ∀v ∈ ∂h(xk+1), ∀u ∈ ∂h(xk). (61)

It follows from (60) and (61) that

f 0(xk+1; d) ≤ 0.

B If h is non-differentiable ρ-strongly convex: we have hu − v, di = hu − v, xk+1 − xki ≤ −ρkdk2, ∀v ∈ ∂h(xk+1), ∀u ∈ ∂h(xk). (62)

It follows from (60) and (62) that

f 0(xk+1; d) ≤ −ρkdk2.

B If ui is non-differentiable convex: we have that if d is a feasible direction at xk+1, then A(xk+1) ⊂ A(xk). The demonstration is the same as in Theorem k+1 0 k+1 6, and we only need to replace the term h∇ui(x ), di by ui(x ; d) for non- differentiable ui. ut

References

1. Ahmadi, A.A., Olshevsky, A., Parrilo, P.A., Tsitsiklis, J.N.: Np-hardness of deciding convexity of quartic polynomials and related problems. Mathematical Programming 137(1-2), 453–476 (2013) 2. Arag´onArtacho, F.J., Fleming, R.M., Vuong, P.T.: Accelerating the dc algorithm for smooth functions. Mathematical Programming 169(1), 95–118 (2018) 3. Arag´onArtacho, F.J., Vuong, P.T.: The boosted dc algorithm for nonsmooth functions. arXiv preprint arXiv:1812.06070 (2018) 4. Arditti, F.D., Levy, H.: Portfolio efficiency analysis in three moments: the multiperiod case. In: Financial Dec Making Under Uncertainty, pp. 137–150. Elsevier (1977) 5. Ban, G.Y., El Karoui, N., Lim, A.E.: Machine learning and portfolio optimization. Management Science 64(3), 1136–1154 (2016) 6. Bertsekas, D.P.: Nonlinear Programming: 2nd Edition. Athena (1999) 7. Bhandari, R., Das, S.R.: Options on portfolios with higher-order moments. Finance Research Letters 6(3), 122–129 (2009) 8. Byrd, R.H., Nocedal, J., Waltz, R.A.: Knitro: An integrated package for nonlinear op- timization. In: Large-scale nonlinear optimization, pp. 35–59. Springer (2006). URL https://www.artelys.com/docs/knitro 9. Chen, B., He, S., Li, Z., Zhang, S.: On new classes of nonnegative symmetric tensors. SIAM Journal on Optimization 27(1), 292–318 (2017) Portfolio Optimization based on DC programming and SOS 39

10. De Athayde, G.M., Flˆores, R.G.: Incorporating skewness and kurtosis in portfolio op- timization: a multidimensional efficient set. In: Advances in portfolio construction and implementation, pp. 243–257. Elsevier (2003) 11. De Athayde, G.M., FlˆoresJr, R.G.: Finding a maximum skewness portfolioa general solution to three-moments portfolio choice. Journal of Economic Dynamics and Control 28(7), 1335–1352 (2004) 12. Fabozzi, F.J., Focardi, S.M., Kolm, P.N.: Financial modeling of the equity market: from CAPM to cointegration, vol. 146. John Wiley & Sons (2006) 13. Fletcher, R.: Filtersd – a library for nonlinear optimization written in fortran. URL https://projects.coin-or.org/filterSD 14. Fukushima, M., Mine, H.: A generalized proximal point algorithm for certain non- convex minimization problems. International Journal of Systems Science 12(8), 989– 1000 (1981) 15. Gondran, M., Minoux, M.: Graphs and algorithms. Wiley (1984) 16. Grant, M., Boyd, S., Ye, Y.: Cvx: Matlab software for disciplined convex programming (2008). URL http://cvxr.com/cvx/ 17. Harvey, C.R., Liechty, J.C., Liechty, M.W., M¨uller,P.: Portfolio selection with higher moments. Quantitative Finance 10(5), 469–485 (2010) 18. Harvey, C.R., Siddique, A.: Conditional skewness in asset pricing tests. The Journal of Finance 55(3), 1263–1295 (2000) 19. Held, M., Wolfe, P., Crowder, H.P.: Validation of subgradient optimization. Mathemat- ical programming 6(1), 62–88 (1974) 20. Heston, S.L.: A closed-form solution for options with stochastic volatility with appli- cations to bond and currency options. The review of financial studies 6(2), 327–343 (1993) 21. Jean, W.H.: The extension of portfolio analysis to three or more parameters. Journal of financial and Quantitative Analysis 6(1), 505–515 (1971) 22. Jondeau, E., Rockinger, M.: Conditional volatility, skewness, and kurtosis: existence, persistence, and comovements. Journal of Economic dynamics and Control 27(10), 1699–1737 (2003) 23. J´udice,J., Pires, F.: Solution of large-scale separable strictly convex quadratic programs on the simplex. Linear Algebra and its applications 170, 214–220 (1992) 24. Kolm, P.N., T¨ut¨unc¨u,R., Fabozzi, F.J.: 60 years of portfolio optimization: Practical challenges and current trends. European Journal of Operational Research 234(2), 356– 371 (2014) 25. Le Thi, H.A., Pham, D.T.: Large-scale molecular optimization from distance matrices by a dc optimization approach. SIAM Journal on Optimization 14(1), 77–114 (2003) 26. Le Thi, H.A., Pham, D.T.: Dc programming and dca: thirty years of developments. Mathematical Programming 169(1), 5–68 (2018) 27. Levy, H., Markowitz, H.M.: Approximating expected utility by a function of mean and variance. American Economic Review 69(69), 308–317 (1979) 28. Lofberg, J.: Yalmip: A toolbox for modeling and optimization in matlab. In: 2004 IEEE international conference on robotics and automation (IEEE Cat. No. 04CH37508), pp. 284–289. IEEE (2004). URL https://yalmip.github.io/ 29. Maringer, D., Parpas, P.: Global optimization of higher order moments in portfolio selection. Journal of Global Optimization 43(2-3), 219–230 (2009) 30. Markowitz, H.: Portfolio selection. Journal of Finance 7(1), 77–91 (1952) 31. Markowitz, H.: Mean–variance approximations to expected utility. European Journal of Operational Research 234(2), 346–355 (2014) 32. MathWorks: Matlab documentation r2019a (2019). URL https://www.mathworks.com/ help/matlab/ 33. Merton, R.C.: Option pricing when underlying stock returns are discontinuous. Journal of financial economics 3(1-2), 125–144 (1976) 34. Mine, H., Fukushima, M.: A minimization method for the sum of a convex function and a continuously differentiable function. Journal of Optimization Theory and Applications 33(1), 9–23 (1981) 35. Niu, Y.S.: Programmation dc et dca en optimisation combinatoire et optimisation poly- nomiale via les techniques de sdp: codes et simulations num´eriques.Ph.D. thesis, Rouen, INSA (2010) 40 Y.S. Niu et al.

36. Niu, Y.S.: DCAM – a matlab modeling and optimization toolbox for dc program (2016). URL https://github.com/niuyishuai/DCAM 37. Niu, Y.S.: On difference-of-sos and difference-of-convex-sos decompositions for polyno- mials. arXiv preprint arXiv:1803.09900 (2018) 38. Niu, Y.S.: POLYLAB – a matlab multivariate polynomial toolbox (2019). URL https: //github.com/niuyishuai/Polylab 39. Niu, Y.S., J´udice,J., Le Thi, H.A., Pham, D.T.: Improved dc programming approaches for solving the quadratic eigenvalue complementarity problem. Applied Mathematics and Computation 353, 95–113 (2019) 40. Niu, Y.S., Pham, D.T.: Dc programming approaches for bmi and qmi feasibility prob- lems. In: Advanced Computational Methods for Knowledge Engineering, pp. 37–63. Springer (2014) 41. Parpas, P., Rustem, B.: Global optimization of the scenario generation and portfolio selection problems. In: International Conference on Computational Science and Its Applications, pp. 908–917. Springer (2006) 42. Pham, D.T., Le Thi, H.A.: Convex analysis approach to dc programming: Theory, al- gorithms and applications. Acta mathematica vietnamica 22(1), 289–355 (1997) 43. Pham, D.T., Le Thi, H.A.: A dc optimization algorithm for solving the trust-region subproblem. SIAM Journal on Optimization 8(2), 476–505 (1998) 44. Pham, D.T., Le Thi, H.A.: The dc (difference of convex functions) programming and dca revisited with dc models of real world nonconvex optimization problems. Annals of operations research 133(1-4), 23–46 (2005) 45. Pham, D.T., Le Thi, H.A., Pham, V.N., Niu, Y.S.: Dc programming approaches for discrete portfolio optimization under concave transaction costs. Optimization letters 10(2), 261–282 (2016) 46. Pham, D.T., Niu, Y.S.: An efficient dc programming approach for portfolio decision with higher moments. Computational Optimization and Applications 50(3), 525–554 (2011) 47. Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton, NJ (1970) 48. Scott, R.C., Horvath, P.A.: On the direction of preference for moments of higher order than the variance. Journal of Finance 35(4), 915–919 (1980) 49. Steinbach, M.C.: Markowitz revisited: Mean-variance models in financial portfolio anal- ysis. Siam Review 43(1), 31–85 (2001) 50. W¨achter, A., Biegler, L.T.: On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Mathematical programming 106(1), 25–57 (2006). URL https://github.com/coin-or/Ipopt Portfolio Optimization based on DC programming and SOS 41

Table 1 Numerical results√ of DCA, BDCA, UDCA and UBDCA with parameters ε1 = −6 −4 2 −3 −8 10 , ε2 = 10 , α = kdk , β = 0.5, σ = 10 , and ε = 10 02 01 01 01 01 01 01 01 01 + 00 − + 00 + 00 − + 00 + 00 − + 00 + 00 − + 00 + 00 − + 00 + 00 − + 00 + 00 − + 00 + 00 − + 00 + 00 − + 00 e e e e e e e e e e e e e e e e e e e e e e e e e e e 980 947 678 813 267 529 848 528 700 800 472 823 848 510 663 277 550 844 869 593 757 236 698 057 986 672 861 ...... 1 9 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1 2 1 1 1 − − − − − − − − − − − − − − − − − − − − − − − − − − − UBDCA 02 00 00 01 00 01 01 01 02 03 04 03 05 07 06 07 10 11 17 35 19 34 59 51 61 12 87 20 ...... 378 0 8 0 6 0 0 0 141110 0 16 0 15 0 15 0 15 0 15 0 17 0 19 0 14 0 14 0 19 0 20 0 31 0 20 0 23 0 31 0 35 0 31 0 42 0 38 0 18 1 0 0 iter time(sec.) obj 02 01 01 01 01 01 01 01 01 + 00 − + 00 + 00 − + 00 + 00 − + 00 + 00 − + 00 + 00 − + 00 + 00 − + 00 + 00 − + 00 + 00 − + 00 + 00 − + 00 e e e e e e e e e e e e e e e e e e e e e e e e e e e 980 946 678 813 267 529 848 528 700 800 472 823 848 509 663 277 549 844 869 592 757 236 697 057 986 669 861 ...... 9 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 1 1 1 − − − − − − − − − − − − − − − − − − − − − − − − − − − UDCA 38 01 01 01 02 02 01 06 05 39 00 15 27 16 27 61 27 65 53 69 60 48 03 66 48 39 12 72 ...... 4 0 281716 0 51 0 44 0 18 0 83 0 72 0 0 0 46 0 124186137 0 135 0 246 0 131 0 203 0 361 0 192 0 115 1 508 0 487 0 3 453 3 516345 4 793 4 523 4 13 216 7 1 iter time(sec.) obj 02 01 01 01 01 01 01 01 01 + 00 − + 00 + 00 − + 00 + 00 − + 00 + 00 − + 00 + 00 − + 00 + 00 − + 00 + 00 − + 00 + 00 − + 00 + 00 − + 00 e e e e e e e e e e e e e e e e e e e e e e e e e e e 980 947 678 813 267 529 848 528 700 800 473 823 848 510 663 277 551 844 869 593 757 236 698 057 986 672 861 ...... 1 9 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1 2 1 1 1 − − − − − − − − − − − − − − − − − − − − − − − − − − − BDCA 02 03 03 05 04 06 06 09 11 13 12 13 23 24 16 31 47 45 60 77 29 69 76 70 99 20 76 72 ...... 255 0 7 0 5 0 8 0 7 0 7 0 0 0 79 0 9 0 7 0 0 8 0 1010 0 0 12 0 101110 0 11 0 12 0 16 0 0 15 1 2313 1 23 2 14 1 10 4 2 0 iter time(sec.) obj 01 01 01 01 01 01 02 01 01 + 00 + 00 − + 00 + 00 + 00 + 00 − + 00 + 00 − + 00 − + 00 + 00 − + 00 + 00 − + 00 − + 00 + 00 − + 00 − + 00 e e e e e e e e e e e e e e e e e e e e e e e e e e e 529 700 800 473 823 848 510 277 551 844 869 593 757 236 698 057 986 672 861 980 947 678 813 267 848 528 663 ...... 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1 2 1 1 1 1 9 1 1 1 1 1 1 − − − − − − − − − − − − − − − − − − − − − − − − − − − DCA 63 93 83 57 09 14 68 16 33 67 42 18 88 43 59 48 40 03 38 53 01 03 03 07 05 07 10 19 ...... 265 0 8 0 7 0 0 8 0 9 0 0 8 0 13 0 155711 0 24 0 36 0 16 0 0 78 0 2331 2 32 0 24 1 79 1 12 1 25 6 1 2 635932 10 10 2 101120 10 15 iter time(sec.) obj 69 69 69 209 209 209 494 494 494 1000 1000 1000 1819 1819 1819 3059 3059 3059 4843 4844 4844 7314 7313 7312 10625 10622 10623 monos T 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 average 4 4 6 8 4 6 6 8 8 n 10 10 10 12 12 12 14 14 14 16 16 16 18 18 18 20 20 20 42 Y.S. Niu et al.

Table 2 Numerical results of KNITRO, FILTERSD, IPOPT and FMINCON with default parameters 02 01 01 01 01 01 01 01 01 + 00 − + 00 + 00 − + 00 + 00 − + 00 + 00 − + 00 + 00 − + 00 + 00 − + 00 + 00 − + 00 + 00 − + 00 + 00 − + 00 e e e e e e e e e e e e e e e e e e e e e e e e e e e 980 947 678 813 267 529 848 528 700 800 473 823 848 510 663 277 551 844 869 593 757 236 698 057 986 672 861 ...... 1 9 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1 2 1 1 1 − − − − − − − − − − − − − − − − − − − − − − − − − − − FMINCON 01 01 01 02 02 02 02 03 02 03 04 03 04 08 04 05 11 06 06 15 11 11 23 15 17 37 20 08 ...... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 time(sec.) obj 02 01 01 01 01 01 01 01 01 + 00 − + 00 + 00 − + 00 + 00 − + 00 + 00 − + 00 + 00 − + 00 + 00 − + 00 + 00 − + 00 + 00 − + 00 + 00 − + 00 e e e e e e e e e e e e e e e e e e e e e e e e e e e 980 947 678 813 267 529 848 528 700 800 473 823 848 510 663 277 551 844 869 593 757 236 698 057 986 672 861 ...... 1 9 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1 2 1 1 1 − − − − − − − − − − − − − − − − − − − − − − − − − − − IPOPT 01 01 01 02 01 01 02 02 02 02 04 04 03 07 03 05 09 06 08 16 13 16 26 18 24 41 25 09 ...... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 time(sec.) obj 02 01 01 01 01 01 01 01 01 + 00 − + 00 + 00 − + 00 + 00 − + 00 + 00 − + 00 + 00 − + 00 + 00 − + 00 + 00 − + 00 + 00 − + 00 + 00 − + 00 e e e e e e e e e e e e e e e e e e e e e e e e e e e 980 947 678 813 267 529 848 528 700 800 473 823 848 510 663 277 551 844 869 593 757 236 698 057 986 672 861 ...... 1 9 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1 2 1 1 1 − − − − − − − − − − − − − − − − − − − − − − − − − − − FILTERSD 01 01 01 01 01 01 01 02 01 01 05 01 01 08 02 03 18 05 02 19 13 03 39 05 08 49 23 08 ...... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 time(sec.) obj 02 01 01 01 01 01 01 01 01 + 00 − + 00 + 00 − + 00 + 00 − + 00 + 00 − + 00 + 00 − + 00 + 00 − + 00 + 00 − + 00 + 00 − + 00 + 00 − + 00 e e e e e e e e e e e e e e e e e e e e e e e e e e e 980 947 678 813 267 529 848 528 700 800 473 823 848 510 663 277 551 844 869 593 757 236 698 057 986 672 861 ...... 1 9 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1 2 1 1 1 − − − − − − − − − − − − − − − − − − − − − − − − − − − KNITRO 01 01 01 01 01 01 01 01 01 02 02 01 02 03 02 03 05 04 04 08 07 07 15 08 11 19 14 05 ...... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 time(sec.) obj 69 69 69 209 209 209 494 494 494 1000 1000 1000 1819 1819 1819 3059 3059 3059 4843 4844 4844 7314 7313 7312 10625 10622 10623 monos T 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 average 4 4 4 6 6 6 8 8 8 n 10 10 12 12 14 14 16 16 18 18 20 20 10 12 14 16 18 20