Parallel Algebraic Modeling for Stochastic Optimization
Total Page:16
File Type:pdf, Size:1020Kb
Parallel algebraic modeling for stochastic optimization Joey Huchette and Miles Lubin Cosmin G. Petra Operations Research Center Mathematics and Computer Science Division Massachusetts Institute of Technology Argonne National Laboratory Cambridge, MA 02139 9700 S. Cass Avenue Email: [email protected] Argonne, IL 60439 [email protected] Email: [email protected] Preprint number ANL/MCS-P5181-0814 AMLs (AMPL and GAMS) offer standalone environments, open-source AMLs are typically embedded in high-level Abstract—We present scalable algebraic modeling software, dynamic languages; CVX and YALMIP are both toolboxes StochJuMP, for stochastic optimization as applied to power grid for MATLAB, and Pyomo is a package for Python. Indeed, economic dispatch. It enables the user to express the problem in a dynamic languages provide convenient platforms for AMLs, high-level algebraic format with minimal boilerplate. StochJuMP and domain-specific languages in general, because they make allows efficient parallel model instantiation across nodes and development easier (no need for a customized parser), and efficient data localization. Computational results are presented promote the ease of use by being accessible from a language showing that the model construction is efficient, requiring less that is already familiar to users and has its own tools for than one percent of solve time. StochJuMP is configured with the processing and formatting data. parallel interior-point solver PIPS-IPM but is sufficiently generic to allow straight forward adaptation to other solvers. Historically, speed has been a trade-off for using AMLs Keywords—optimization, parallel programming, high perfor- embedded in high-level dynamic languages, primarily because mance computing, mathematical model, Power system modeling, of their extensive use of operator overloading within an in- scalability terpreted language. These AMLs may be orders of magnitude slower at generating the optimization model before passing it I. INTRODUCTION to a solver; in some reasonable use cases, this model generation time may perversely exceed the time spent in the solver. Algebraic modeling languages (AMLs) for optimization are widely used by both academics and practitioners for specify- The Julia programming language [6] presented an oppor- ing optimization problems in a human-readable, mathematical tunity to address this performance gap. Lubin and Dunning form, which is then automatically translated by the AML to developed JuMP [7], an AML embedded in Julia. By ex- the low-level formats required by efficient implementations ploiting Julia’s metaprogramming and just-in-time compilation of optimization algorithms, known as solvers. For example, functionality, they achieved performancec competitive with a classical family of optimization problems is linear program- that of commercial AMLs (AMPL and Gurobi/C++) and an ming (LP), that is, minimization of a linear function of decision improvement of one to two orders of magnitude over open- variables subject to linear equality and inequality constraints. source AMLs (Pyomo and PuLP [8]) in model generation time As input, LP solvers expect vectors c, l, u, L, and U and a for a benchmark of LP problems. sparse constraint matrix A defining the LP problem, In this paper, we present an extension to JuMP, called minimize cT x x StochJuMP, for modeling a class of stochastic optimization subject to L ≤ Ax ≤ U problems, namely, two-stage stochastic optimization with re- l ≤ x ≤ u; course, a popular paradigm for optimization under uncer- tainty [9]. These problems are computationally challenging and where inequalities are componentwise. at the largest scale require the use of specialized solvers em- ploying distributed-memory parallel computing. These solvers In practice, it is tedious and error-prone to explicitly form require structural information not provided by standard AMLs, a sparse constraint matrix and corresponding input vectors; and because of memory limits, it may not be feasible to build this process often involves concatenating different classes of up the entire instance in serial and then distribute the pieces. decision variables, each indexed over multidimensional sym- Instead, StochJuMP generates the model in parallel accord- bolic sets or numeric ranges, into a single x decision vector. ing to the data distribution across processes required by the Instead, AMLs, which may be considered domain-specific specialized solvers. We present numerical results demonstrat- languages in the vocabulary of computer science, handle these ing that StochJuMP can effectively generate these structured transformations automatically. models in a small fraction (≈ 1.5%) of the solution time. To Notable AMLs include AMPL [1], GAMS [2], our knowledge, these results are among the first reports of YALMIP [3], Pyomo [4], and CVX [5]. While commercial employing Julia on a moderately sized MPI-based cluster. II. BLOCK-ANGULAR STRUCTURE thus, the matrices Qi, i = f0; 1;:::;Ng, are required to be positive semidefinite. In addition, the interior-point algorithm The structured optimization problems considered in this used by the solver requires the matrices A, W , :::, W to paper are known in the optimization community as dual block 1 N have full row rank. angular problems, which means that they have separable objective function and block angular, or half-arrow shaped constraints. In particular, we consider dual block angular prob- III. PIPS-IPM lems with quadratic objective function and affine constraints, Solving practically sized instances of (1) requires the use of which can be mathematically expressed as memory-distributed computing environments and specialized 1 T T PN 1 T T optimization solvers capable of efficiently exploiting the dual min 2 x0 Q0x0 + c0 x0 + i=1 2 xi Qixi + ci xi s.t. Ax0 = b0; block angular structure. One such solver, PIPS-IPM, devel- T1x0+ W1x1 = b1; oped in the past few years by some of the authors of this T2x0+ W2x2 = b2; (1) paper, achieves data parallelism inside the numerical Mehrotra . .. predictor-corrector optimization algorithm [12] by distributing . the data and computations required by the optimization across TN x0+ WN xN = bN ; computational nodes. The data distribution is done by parti- x0 ≥ 0; x1 ≥ 0; x2 ≥ 0; : : : ; xN ≥ 0: tioning the scenarios and assigning a partition to a computa- The decision variables are the vectors xi, i = f0; 1;:::;Ng. tional node. The first-stage data are replicated on all nodes The data of the problem consist of the coefficients vectors ci, to avoid communication overhead. The computations follow right-hand sides bi, symmetric matrices Qi, and rectangular a similar distribution scheme: the intrascenario computations matrices A, Ti, Wi (i = f1;:::;Ng). (factorizations of sparse matrices, solving with the factors, matrix-matrix and matrix-vector products, and vector-vector Problems of the form (1) arise in stochastic optimization operations) are performed in parallel, independently for each with recourse. Stochastic programming problems with recourse scenario, while the computations corresponding to the first- (in (2) we show two-stage quadratic problems [10]) provide stage (factorizations of dense matrices, solve with the factors optimal decisions to be made now that minimize the expected and matrix-vector and vector-vector operations) are replicated cost in the presence of future uncertain conditions: across all computational nodes [13]. 1 T T min x0 Q0x0 + c0 x0 + Eξ[G(x0; ξ)] Several algorithmic and implementation developments x0 2 (2) were targeted at reducing the time to solution and increasing s.t. Ax0 = b0; x0 ≥ 0: the parallel efficiency of PIPS-IPM. In particular, the aug- The recourse function G(x ; ξ) is defined by mented incomplete factorization approach [14] considerably 0 reduced the time to solution by making better use the cores 1 T T inside each node and by fully exploiting the sparsity of G(x0; ξ) = min y Qξy + cξ y y 2 (3) the scenario data. Additional implementation developments, s.t. Tξx0 + Wξy = bξ; y ≥ 0; such as hardware-tuned communication and adoption of GPUs in the dense first-stage calculations [15], enabled real-time and the expected value E[·], which is assumed to be well defined, is taken with respect to the random variable ξ, which solutions of stochastic economic dispatch problems with very good parallel scaling on a variety of HPC platforms: Cray contains the data (Qξ; cξ;Tξ;Wξ; bξ). Some of the elements in ξ can be deterministic. The matrix Q is assumed to be XK7, Cray XC30, IBM BG/P, and IBM BG/Q [15]. However, ξ the algebraic specification and instantiation of the economic symmetric for all possible ξ. W , the recourse matrix, and T , ξ ξ dispatch problems were done serially and required consid- the technology matrix, are random matrices. The symmetric erably longer time than the time spent in the optimization matrix Q , the matrix A , and the linear coefficients c are 0 0 0 process. Removing this capability limitation and being able deterministic. The variable x0 is called the first-stage decision, which is a decision to be made now. The second-stage decision to process and instantiate the optimization models in times y is a recourse or corrective decision that one makes in the that are negligible, that is, a few percent of total solution time, future after some random event occurs. are the main motivations behind the existing work. The recourse problems (2) take the form of the dual- IV. STOCHJUMP block angular problems (1) when the probability distribution of ξ has finite support. When ξ does not have finite support, StochJuMP is an extension built on top of the JuMP problems (1) arise from the use of the sample average approx- algebraic modeling language [7]. Extending AMLs to deal imation (SAA) method, in which a sample (ξ1; ξ2; : : : ; ξN ) with structured problems such as multistage stochastic opti- 1 PN mization presents additional challenges for expressing problem is used to estimate Eξ[G(x0; ξ)] ≈ N i=1 G(x0; ξi) and the minimization and expectation operators are commuted [11]. In structure, both technically and conceptually.