A Parallel Frontal Solver for Large Scale Process Simulation And

A Parallel Frontal Solver For Large Scale Pro cess Simulation and Optimization 1 1 y 1 2 z J. U. Mallya , S. E. Zitney , S. Choudhary , and M. A. Stadtherr 1 Cray Research, Inc., 655-E Lone Oak Drive, Eagan, MN 55121, USA. 2 Department of Chemical Engineering, University of Notre Dame, Notre Dame, IN 46556, USA Revised Decemb er, 1996 Keywords: Simulation, Optimization, Parallel Computing, Sparse Matrices y current address: Asp enTech UK Ltd., Castle Park, Cambridge CB3 0AX, England z Author to whom all corresp ondence should b e addressed Abstract For the simulation and optimization of large-scale chemical pro cesses, the overall computing time is often dominated by the time needed to solve a large sparse system of linear equations. We present here a new parallel frontal solver which can signi cantly reduce the wallclo ck time required to solve these linear equation systems using parallel/vector sup ercomputers. The algorithm exploits b oth multipro cessing and vector pro cessing by using a multilevel approach in which frontal elimination is used for the partial factorization of each front. Results on several large scale pro cess simulation and optimization problems are presented. 1 Intro duction The solution of realistic, industrial-scale simulation and optimization problems is computationally very intense, and may require the use of high p erformance computing technology to be done in a timely manner. For example, Zitney et al. 1995 describ ed a dynamic simulation problem at Bayer AG requiring 18 hours of CPU time on a CRAY C90 sup ercomputer when solved with the standard implementation of SPEEDUP Asp en Technology, Inc.. To b etter use this leading edge technology in pro cess simulation and optimization requires the use of techniques that eciently exploit vector and parallel pro cessing. Since most current techniques were develop ed for use on conventional serial machines, it is often necessary to rethink problem solving strategies in order to take full advantage of sup ercomputing power. For example, by using a di erent linear equation solving algorithm and addressing other implementation issues, Zitney et al. 1995 reduced the time needed to solve the Bayer problem from 18 hours to 21 minutes. In the Bayer problem, as in most other industrial-scale problems, the solution of large, sparse systems of linear equations is the single most computationally intensive step, requiring over 80 of the total simulation time in some cases. Thus, any reduction in the linear system solution time will result in a signi cant reduction in the total simulation time for a given problem, as well as provide the p otential for solving much larger problems within a given time frame. The matrices that arise, however, do not have any of the desirable prop erties, such as numerical or structural symmetry, p ositive de niteness, and bandedness often asso ciated with sparse matrices, and usually exploited in developing ecient parallel/vector algorithms. Recently, an implementation of the frontal metho d Zitney, 1992; Zitney and Stadtherr, 1993; Zitney et al., 1995, develop ed at the University of Illi- nois and later extended at Cray Research, Inc., has b een describ ed that is designed sp eci cally for 1 use in the context of pro cess simulation. This solver FAMP has b een incorp orated in CRAY im- plementations of p opular commercial co des, such as ASPEN PLUS, SPEEDUP Asp en Technology, Inc., and NOVA Dynamic Optimization Technology Pro ducts, Inc.. FAMP is e ectiveonvector machines since most of the computations involved can be p erformed using eciently vectorized dense matrix kernels. However, this solver do es not well exploit the multipro cessing architecture of parallel/vector sup ercomputers. In this pap er we prop ose a new parallel/vector frontal solver PFAMP that exploits b oth the vector and parallel pro cessing architectures of mo dern sup ercomputers. Results demonstrate that the approach describ ed leads to signi cant reductions in the wallclo ck time required to solve the sparse linear systems arising in large scale pro cess simulation and optimization. 2 Background Consider the solution of a linear equation system Ax = b, where A is a large sparse n n matrix and x and b are column vectors of length n. While iterative metho ds can be used to solve such systems, the reliability of such metho ds is questionable in the context of pro cess simulation Cofer and Stadtherr, 1996. Thus we concentrate here on direct metho ds. Generally such metho ds can be interpreted as an LU factorization scheme in which A is factored A = LU , where L is a lower triangular matrix and U is an upp er triangular matrix. Thus, Ax =LU x = LUx=b, and the system can be solved by a simple forward substitution to solve Ly = b for y , followed by a back substitution to nd the solution vector x from Ux = y . The frontal elimination scheme used here is an LU factorization technique that was originally develop ed to solve the banded matrices arising in nite element problems Irons, 1970; Ho o d, 1976. 2 The original motivation was, by limiting computational work to a relatively small frontal matrix, to be able to solve problems on machines with small core memories. Today it is widely used for nite element problems on vector sup ercomputers b ecause, since the frontal matrix can b e treated as dense, most of the computations involved can be p erformed by using very ecient vectorized dense matrix kernels. Stadtherr and Vegeais 1985 extended this idea to the solution of pro cess simulation problems on sup ercomputers, and later Vegeais and Stadtherr, 1990 demonstrated its p otential. As noted ab ove, an implementation of the frontal metho d develop ed sp eci cally for use in the pro cess simulation context has b een describ ed by Zitney 1992, Zitney and Stadtherr 1993, and Zitney et al. 1995, and is now incorp orated in sup ercomputer versions of p opular pro cess simulation and optimization co des. The frontal elimination scheme can b e outlined brie y as follows: 1. Assemble a rowinto the frontal matrix. 2. Determine if any columns are fully summed in the frontal matrix. A column is fully summed if it has all of its nonzero elements in the frontal matrix. 3. If there are fully summed columns, then p erform partial pivoting in those columns, eliminating the pivot rows and columns and doing an outer-pro duct up date on the remaining part of the frontal matrix. This pro cedure b egins with the assembly of row 1 into the initially empty frontal matrix, and pro ceeds sequentially rowbyrowuntil all are eliminated, thus completing the LU factorization. To b e more precise, it is the LU factors of the p ermuted matrix PAQ that have b een found, where P is a row p ermutation matrix determined by the partial pivoting, and Q is a column p ermutation matrix determined by the order in which the columns b ecome fully summed. Thus the solution 3 T T to Ax = b is found as the solution to the equivalent system P AQQ x = LU Q x = Pb, which is solved by forward substitution to solve Ly = Pb for y , back substitution to solve Uw = y for w , and nally the p ermutation x = Qw . To simplify notation, the p ermutation matrices will henceforth not b e shown explicitly. k To see this in mathematical terms, consider the submatrix A remaining to b e factored after the k 1-th pivot: 3 2 k 7 6 0 F 7 6 k : 1 A = 7 6 5 4 k k A A ps ns k k Here F is the frontal matrix. The subscript ps in A indicates that it contains columns that ps k are partially summed some but not all nonzeros in the frontal matrix and the subscript ns in A ns indicates that it contains columns that are not summed no nonzeros in the frontal matrix. If a stage in the elimination pro cess has b een reached at which all remaining columns have nonzeros k and the corresp onding zero submatrix will not app ear in Eq. 1. in the frontal matrix, then A ns Assembly of rows into the frontal matrix then pro ceeds until g 1 columns b ecome fully summed: k 2 3 k k F F 0 6 7 11 12 6 7 6 7 6 7 k k k A = : 2 6 7 F F 0 21 22 6 7 6 7 4 5 k k A A 0 ps ns k k k F is now the frontal matrix and its submatrices F and F comprise the columns that have 11 21 b ecome fully summed, which are now eliminated using rows chosen during partial pivoting and k k k k which are shown as b elonging to F here. This amounts to the factorization F = L U of 11 11 11 11 4 k the order-g blo ck F , resulting in: k 11 2 3 k k k U 0 L U 6 7 12 11 11 6 7 6 7 6 7 k k k +g A = 3 k 6 7 L F 0 21 6 7 6 7 4 5 k +g k +g k k 0 A A ps ns k k k k +g k +g k k where the new frontal matrix F is the Schur complement F = F L U , whichis 22 21 12 k +g k k +g k k computed using an ecient full-matrix outer-pro duct up date kernel, A = A and A = ps ps ns k k k A .

Load more