1

Trilinos Advanced Capabilities, Extensibility and Future Directions Michael A. Heroux Sandia National Laboratories

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under contract DE-AC04-94AL85000. Contributors (past 3 years)

Chris Baker Mike Heroux Dennis Ridzal Developer of Anasazi, RBGen Trilinos Project Leader Lead Developer of Aristos, Intrepid Lead Developer of Epetra, AztecOO, Ross Bartlett Kokkos, IFPACK, Tpetra Marzio Sala Lead Developer of MOOCHO, Stratimikos, RTOp, Developer of Amesos, Belos, EpetraExt Lead Developer of Didasko, Galeri, IFPACK, WebTrilinos Thyra Developer of ML, Amesos Developer of Rythmos Ulrich Hetmaniuk Developer of Anasazi Andrew Salinger Pavel Bochev Lead Developer of LOCA, Capo Lead Developer Intrepid Robert Hoekstra Lead Developer of EpetraExt Paul Sexton Paul Boggs Developer of Epetra Developer of Epetra and Tpetra Developer of Thyra Russell Hooper Bob Shuttleworth Erik Boman Developer of NOX Developer of Meros. Lead Developer Isorropia Developer Zoltan Vicki Howle Chris Siefert Lead Developer of Meros Developer of ML Todd Coffey Lead Developer of Rythmos Jonathan Hu Bill Spotz Developer of ML Lead Developer of PyTrilinos Karen Devine Developer of Epetra, New_Package Lead Developer Zoltan Joe Kotulski Lead Developer of Pliris Ken Stanley Clark Dohrmann Lead Developer of Amesos and New_Package Lead Developer of CLAPS Rich Lehoucq Developer of Anasazi and Belos Heidi Thornquist Carter Edwards Lead Developer of Anasazi, Belos, RBGen and Teuchos Lead Developer phdMesh Kevin Long Developer of Thyra Ray Tuminaro Michael Gee Lead Developer of ML and Meros Developer of ML, Moertel, NOX Roger Pawlowski Lead Developer of NOX Jim Willenbring Bob Heaphy Developer of Epetra and New_Package. Lead developer of Trilinos SQA Michael Phenow Trilinos library manager Developer Zoltan Trilinos Webmaster Developer WebTrilinos Alan Williams Lead Developer Isorropia, FEI Eric Phipps Developer of Epetra, EpetraExt, AztecOO, Tpetra Lead developer Sacado Developer of LOCA, NOX 3 Take Home Messages

ƒ Trilinos is both: Š A development community Š A collection of software ƒ OO techniques lead to: Š Extensibility at many levels. Š Scalable infrastructure. Š Interoperability of independently developed capabilities. Š Ability to adjust to architecture changes. ƒ Project is growing: Š Including more of “vertical software stack”. Š Adapting to broader user base. ƒ We are seeking collaborations with broader DOE community. 4

Background/Motivation 5 Target Problems: PDES and more…

Circuits

PDES

Inhomogeneous Fluids

And More… 6 Target Platforms: Any and All (Now and in the Future) ƒ Desktop: Development and more… ƒ Capability machines: Š Redstorm (XT3), Clusters Š Roadrunner (Cell-based). Š Large-count multicore nodes. ƒ Parallel software environments: Š MPI of course. Š UPC, CAF, threads, vectors,… Š Combinations of the above. ƒ User “skins”: Š ++/C, Python Š . Š Web, CCA. 7 Motivation For Trilinos ƒ Sandia does LOTS of solver work. ƒ When I started at Sandia in May 1998: Š Aztec was a mature package. Used in many codes. Š FETI, PETSc, DSCPack, Spooles, ARPACK, DASPK, and many other codes were (and are) in use. Š New projects were underway or planned in multi-level preconditioners, eigensolvers, non-linear solvers, etc… ƒ The challenges: Š Little or no coordination was in place to: • Efficiently reuse existing solver technology. • Leverage new development across various projects. • Support solver software processes. • Provide consistent solver APIs for applications. Š ASCI (now ASC) was forming software quality assurance/engineering (SQA/SQE) requirements: • Daunting requirements for any single solver effort to address alone. 8 Evolving Trilinos Solution ƒ Trilinos1 is an evolving framework to address these challenges: Š Fundamental atomic unit is a package. Š Includes core set of vector, graph and matrix classes (Epetra/Tpetra packages). Š Provides a common abstract solver API (Thyra package). Š Provides a ready-made package infrastructure (new_package package): • Source code management (cvs, bonsai). • Build tools (autotools). • Automated regression testing (queue directories within repository). • Communication tools (mailman mail lists). Š Specifies requirements and suggested practices for package SQA. ƒ In general allows us to categorize efforts: Š Efforts best done at the Trilinos level (useful to most or all packages). Š Efforts best done at a package level (peculiar or important to a package). Š Allows package developers to focus only on things that are unique to their package.

1. Trilinos loose translation: “A string of pearls” 9 Evolving Trilinos Solution

physicsphysics ƒ Beyond a “solvers” framework ƒ Natural expansion of capabilities to satisfy application and research needs

L(u)=f L(u)=f discretizations methods Math.Math. model model Numerical math Automatic diff. Time domain Automatic diff. Convert to models that Time domain Domain dec. can be solved on digital Space domain Domain dec. Space domain Mortar methods computers Mortar methods L (u )=f Lhh(uhh)=fhh NumericalNumerical model model solvers core Algorithms LinearLinear PetraPetra Find faster and more NonlinearNonlinear UtilitiesUtilities efficient ways to solve Eigenvalues Interfaces -1 Eigenvalues Interfaces u =L -1••f numerical models Optimization Load Balancing uhh=Lhh fhh Optimization Load Balancing AlgorithmsAlgorithms

ƒ Discretization methods, AD, Mortar methods, … computationcomputation 10

Trilinos Package Concepts

Package: The Atomic Unit 11 Trilinos Packages ƒ Trilinos is a collection of Packages. ƒ Each package is: Š Focused on important, state-of-the-art algorithms in its problem regime. Š Developed by a small team of domain experts. Š Self-contained: No explicit dependencies on any other software packages (with some special exceptions). Š Configurable/buildable/documented on its own. ƒ Sample packages: NOX, AztecOO, ML, IFPACK, Meros. ƒ Special package collections: Š Petra (Epetra, Tpetra, Jpetra): Concrete Data Objects Š Thyra: Abstract Conceptual Interfaces Š Teuchos: Common Tools. Š New_package: Jumpstart prototype. Trilinos Package Summary

Objective Package(s) Meshing & Spatial Discretizations phdMesh, Intrepid Discretizations Time Integration Rythmos Automatic Differentiation Sacado Methods Mortar Methods Moertel Linear algebra objects Epetra, Jpetra, Tpetra Abstract interfaces Thyra, Stratimikos, RTOp Core Load Balancing Zoltan, Isorropia “Skins” PyTrilinos, WebTrilinos, Star-P, ForTrilinos C++ utilities, (some) I/O Teuchos, EpetraExt, Kokkos, Triutils Iterative (Krylov) linear solvers AztecOO, Belos, Komplex Direct sparse linear solvers Amesos Direct dense linear solvers Epetra, Teuchos, Pliris Iterative eigenvalue solvers Anasazi Solvers ILU-type preconditioners AztecOO, IFPACK Multilevel preconditioners ML, CLAPS Block preconditioners Meros Nonlinear system solvers NOX, LOCA Optimization (SAND) MOOCHO, Aristos 13 What Trilinos is not ƒ Trilinos is not a single monolithic piece of software. Each package: Š Can be built independent of Trilinos. Š Has its own self-contained CVS structure. Š Has its own Bugzilla product and mail lists. Š Development team is free to make its own decisions about algorithms, coding style, release contents, testing process, etc.

ƒ Trilinos top layer is not a large amount of source code: < 2% total SLOC.

ƒ Trilinos is not “indivisible”: Š You don’t need all of Trilinos to get things done. Š Any collection of packages can be combined and distributed. Š Current public release contains only 30 of the 40+ Trilinos packages. 14 Insight from History A Philosophy for Future Directions ƒ In the early 1800’s U.S. had many new territories. ƒ Question: How to incorporate into U.S.? Š Colonies? No. Š Expand boundaries of existing states? No. Š Create process for self-governing regions. Yes. Š Theme: Local control drawing on national resources. ƒ Trilinos package architecture has some similarities: Š Asynchronous maturation. Š Packages decide degree of interoperations, use of Trilinos facilities. ƒ Strength of each: Scalable growth with local control. 15 Trilinos Strategic Goals ƒ Scalable Computations: As problem size and processor counts increase, the cost of the computation will remain nearly fixed. ƒ Hardened Computations: Never fail unless problem essentially Algorithmic intractable, in which case we diagnose and inform the user why the Goals problem fails and provide a reliable measure of error. ƒ Full Vertical Coverage: Provide leading edge enabling technologies through the entire technical application software stack: from problem construction, solution, analysis and optimization.

ƒ Grand Universal Interoperability: All Trilinos packages will be interoperable, so that any combination of packages that makes sense algorithmically will be possible within Trilinos and with compatible external software. ƒ Universal Accessibility: All Trilinos capabilities will be available to Software users of major computing environments: C++, Fortran, Python and the Goals Web, and from the desktop to the latest scalable systems. ƒ Universal Solver RAS: Trilinos will be: Š Integrated into every major application at Sandia (Availability). Š The leading edge hardened, efficient, scalable solution for each of these applications (Reliability). Š Easy to maintain and upgrade within the application environment (Serviceability). Trilinos Statistics

Registered Users by Type (1985 Total) Trilinos Statistics by Relea

36 48 35 219 Developers 33 30 27 232 University 33 Government Automated 28 Regression Tested 19 Personal 11 1121 Industry packages 9 Other 3.91 365 19.25 Downloads (100s) 10.21 7.36 4.40 Release 8.0 (9/07 10.0 9.54 Release 7.0 (9/06 Source lines (100K) 8.95 Release 6.0 (9/05 7.16 5.48 Release 5.0 (3/05 Release 4.0 (6/04 31 General release 27 18 packages 17 Registered Users by Region (1985 Total) 16 22 33 28 Limited release 30 27 97 packages 26 22 274 Europe 38 703 US (except Sandia) Packages in 32 29 Sandia (includes unregistered) repository 26 Asia 22 223 Americas (except US) 0 5 10 15 20 25 30 35 40 Australia/NZ Africa Counts

638

Stats: Trilinos Download Page 10/16/2007. 17 External Visibility ƒ Awards: R&D 100, HPC SW Challenge (04). ƒ www.cfd-online.com: Trilinos A project led by Sandia to develop an object-oriented software framework for scientific computations. This is an active project which includes several state-of-the-art solvers and lots of other nice things a software engineer writing CFD codes would find useful. Everything is freely available for download once you have registered. Very good!

ƒ Industry Collaborations: Boeing, Goodyear, ExxonMobil. ƒ Linux distros: Debian, Mandriva. ƒ Star-P Interface. ƒ SciDAC TOPS-2 partner. ƒ Over 5000 downloads since March 2005. ƒ Occasional unsolicited external endorsements such as the following two-person exchange on mathforum.org: > The consensus seems to be that OO has little, if anything, to offer > (except bloat) to numerical computing. I would completely disagree. A good example of using OO in numerics is Trilinos: http://software.sandia.gov/trilinos/ 18 Trilinos Presentation Forums

ƒ Next Trilinos User Group Meeting: Š Nov 6-8, 2007. Š At Sandia National Laboratories, Albuquerque, NM, USA. ƒ ACTS “Hands-on” Tutorial: Š Aug 19-21, 2008. Š At Lawrence Berkeley Lab, Berkeley, CA, USA. 19 Website

ƒ http:trilinos.sandia.gov Developer content on software.sandia.gov. ƒ Always looking to improve layout, content. ƒ Site was recently redesigned. 20

Whirlwind Tour of Packages

Discretizations Methods Core Solvers 21 Intrepid Interoperable Tools fo Intrepid of Compatible Discretizations r Rapid Development Intrepid offers an innovative software design for compatible discretizations:

ƒ allows access to FEM, FV and FD methods using a common API ƒ supports hybrid discretizations (FEM, FV and FD) on unstructured grids ƒ supports a variety of cell shapes: ƒ standard shapes (e.g. tets, hexes): high-order finite element methods ƒ arbitrary (polyhedral) shapes: low-order mimetic reconstructions ƒ enables optimization, error estimation, V&V, and UQ using fast invasive techniques (direct support for cell-based derivative computations or via automatic differentiation)

k 0 1 2 3 ΛΛk {C ,C0 ,C1 ,C2 }3 ReductionReduction {C ,C ,C ,C } FormsForms DiscreteDiscrete forms forms d,d*,Ë,∧,(,) d,d*,Ë,∧,(,) Cell Data DD,D,D*,*,WW,M,M Operations Cell Data Operations DiscreteDiscrete ops. ops. Reconstruction Higher order Reconstruction General cells

Pullback:Pullback: FEM FEM Direct:Direct: FV/D FV/D

Developers: Pavel Bochev, Denis Ridzal, David Day 22 Rythmos

ƒ Suite of time integration (discretization) methods ƒ Includes: backward Euler, forward Euler, explicit Runge-Kutta, and implicit BDF at this time. ƒ Native support for operator split methods. ƒ Highly modular. ƒ Forward sensitivity computations will be included in the first release with adjoint sensitivities coming in near future.

Developers: Todd Coffey, Roscoe Bartlett 23

Whirlwind Tour of Packages

Discretizations Methods Core Solvers 24 Moertel: Mortar Methods

ƒ Capabilities for nonconforming mesh tying and contact formulations in 2 and 3 dimensions using Mortar methods.

ƒ Mortar methods are types of Lagrange Multiplier constraints that can be used in contact formulations and in non-conforming or conforming mesh tying as well as in domain decomposition techniques.

ƒ Used in a large class of nonconforming situations such as the surface coupling of different physical models, discretization schemes or non- matching triangulations along interior interfaces of a domain.

Developer: Michael Gee 25 Sacado: Automatic Differentiation

ƒ Efficient OO based AD tools optimized for element-level computations

ƒ Applies AD at “element”-level computation Š “Element” means finite element, finite volume, network device,… ƒ Template application’s element-computation code Š Developers only need to maintain one templated code base ƒ Provides three forms of AD Š Forward Mode: • Propagate derivatives of intermediate variables w.r.t. independent variables forward • Directional derivatives, tangent vectors, square Jacobians, when m ≥ n. Š Reverse Mode:

• Propagate derivatives of dependent variables w.r.t. intermediate variables backwards • Gradients, Jacobian-transpose products (adjoints), when n > m. Š Taylor polynomial mode:

Š Basic modes combined for higher derivatives.

Developers: Eric Phipps, David Gay 26

Whirlwind Tour of Packages

Discretizations Methods Core Solvers 27 Teuchos ƒ Portable utility package of commonly useful tools:

Š ParameterList class: key/value pair database, recursive capabilities. Š LAPACK, BLAS wrappers (templated on ordinal and scalar type). Š Dense matrix and vector classes (compatible with BLAS/LAPACK). Š FLOP counters, timers. Š Ordinal, Scalar Traits support: Definition of ‘zero’, ‘one’, etc. Š Reference counted pointers / arrays, and more… ƒ Takes advantage of advanced features of C++: Š Templates Š Standard Template Library (STL) ƒ ParameterList: Š Allows easy control of solver parameters. Š XML format input/output.

Developers: Roscoe Barlett, Kevin Long, Heidi Thorquist, Mike Heroux, Paul Sexton, Kris Kampshoff, Chris Baker 28 Kokkos ƒ Collection of several sparse/dense kernels that affect the performance of preconditioned Krylov methods ƒ Goal: Š Isolate key non-BLAS kernels for the purposes of optimization. ƒ Kernels: Š Dense vector/multivector updates and collective ops (not in BLAS/Teuchos). Š Sparse MV, MM, SV, SM. ƒ Serial-only for now. ƒ Reference implementation provided (templated). ƒ Mechanism for improving performance: Š Default is aggressive compilation of reference source. Š BeBOP: Jim Demmel, Kathy Yelick, Rich Vuduc, UC Berkeley. Š Vector version: Cray.

Developer: Mike Heroux 29 Zoltan

ƒ Data Services for Dynamic Applications Š Dynamic load balancing Š Graph coloring Š Data migration Š Matrix ordering ƒ Partitioners: Š Geometric (coordinate-based) methods: • Recursive Coordinate Bisection (Berger, Bokhari) • Recursive Inertial Bisection (Taylor, Nour-Omid) • Space Filling Curves (Peano, Hilbert) • Refinement-tree Partitioning (Mitchell) Š Hypergraph and graph (connectivity-based) methods: • Hypergraph Partitioning • Hypergraph Repartitioning PaToH (Catalyurek) • Zoltan Hypergraph Partitioning • ParMETIS (U. Minnesota) • Jostle (U. Greenwich)

Developers: Karen Devine, Eric Boman, Robert Heaphy 30 Trilinos Common Language: Petra

ƒ Petra provides a “common language” for distributed linear algebra objects (operator, matrix, vector)

ƒ Petra1 provides distributed matrix and vector services. ƒ Exists in basic form as an object model: Š Describes basic user and support classes in UML, independent of language/implementation.

Š Describes objects and relationships to build and use matrices, vectors and graphs.

Š Has 3 implementations under development.

1Petra is Greek for “foundation”. 31 Petra Implementations

ƒ Epetra (Essential Petra): Š Current production version. Š Restricted to real, double precision arithmetic. Š Uses stable core subset of C++ (circa 2000). Š Interfaces accessible to C and Fortran users.

ƒ Tpetra (Templated Petra): Š Next generation C++ version. Š Templated scalar and ordinal fields. Š Uses namespaces, and STL: Improved usability/efficiency.

ƒ Jpetra (Java Petra): Š Pure Java. Portable to any JVM. Š Interfaces to Java versions of MPI, LAPACK and BLAS via interfaces.

Developers: Mike Heroux, Rob Hoekstra, Alan Williams, Paul Sexton 32 EpetraExt: Extensions to Epetra

ƒ Library of useful classes not needed by everyone ƒ Most classes are types of “transforms”. ƒ Examples: Š Graph/matrix view extraction. Š Epetra/Zoltan interface. Š Explicit sparse transpose. Š Singleton removal filter, static condensation filter. Š Overlapped graph constructor, graph colorings. Š Permutations. Š -matrix multiply. Š Matlab, MatrixMarket I/O functions. ƒ Most classes are small, useful, but non-trivial to write.

Developer: Robert Hoekstra, Alan Williams, Mike Heroux 33 Thyra

ƒ High-performance, abstract interfaces for linear algebra ƒ Offers flexibility through abstractions to algorithm developers ƒ Linear solvers (Direct, Iterative, Preconditioners) Š Abstraction of basic vector/matrix operations (dot, axpy, mv). Š Can use any concrete linear algebra library (Epetra, PETSc, BLAS). ƒ Nonlinear solvers (Newton, etc.) Š Abstraction of linear solve (solve Ax=b). Š Can use any concrete linear solver library: • AztecOO, ML, PETSc, LAPACK ƒ Transient/DAE solvers (implicit) Š Abstraction of nonlinear solve. Š … and so on.

Developers: Roscoe Bartlett, Kevin Long 34 PyTrilinos

ƒ PyTrilinos provides Python access to Trilinos packages. ƒ Uses SWIG to generate bindings. ƒ Epetra, AztecOO, IFPACK, ML, NOX, LOCA, Amesos and NewPackage are support. ƒ Possible to: Š Define RowMatrix implementation in Python. Š Use from Trilinos C++ code. ƒ Performance for large grain is equivalent to C++. ƒ Several times hit for very fine grain code.

Developer: Bill Spotz 35 WebTrilinos

ƒ WebTrilinos: Web interface to Trilinos

Š Generate test problems or read from file. Š Generate C++ or Python code fragments and click-run. Š Hand modify code fragments and re-run.

Developers: Ray Tuminaro, Jonathan Hu, Marzio Sala 36

Whirlwind Tour of Packages

Discretizations Methods Core Solvers 37 IFPACK: Algebraic Preconditioners

ƒ Overlapping Schwarz preconditioners with incomplete factorizations, block relaxations, block direct solves.

ƒ Accept user matrix via abstract matrix interface (Epetra versions). ƒ Uses Epetra for basic matrix/vector calculations. ƒ Supports simple perturbation stabilizations and condition estimation. ƒ Separates graph construction from factorization, improves performance substantially. ƒ Compatible with AztecOO, ML, Amesos. Can be used by NOX and ML.

Developers: Marzio Sala, Mike Heroux 38 : Multi-level Preconditioners

ƒ Smoothed aggregation, multigrid and domain decomposition preconditioning package

ƒ Critical technology for scalable performance of some key apps. ƒ ML compatible with other Trilinos packages: Š Accepts user data as Epetra_RowMatrix object (abstract interface). Any implementation of Epetra_RowMatrix works.

Š Implements the Epetra_Operator interface. Allows ML preconditioners to be used with AztecOO, Belos, Anasazi. ƒ Can also be used completely independent of other Trilinos packages.

Developers: Ray Tuminaro, Jonathan Hu, Marzio Sala 39 Amesos

ƒ Interface to direct solvers for distributed sparse linear systems (KLU, UMFPACK, SuperLU, MUMPS, ScaLAPACK)

ƒ Challenges: Š No single solver dominates Š Different interfaces and data formats, serial and parallel Š Interface often changes between revisions ƒ Amesos offers: Š A single, clear, consistent interface, to various packages Š Common look-and-feel for all classes Š Separation from specific solver details Š Use serial and distributed solvers; Amesos takes care of data redistribution Š Native solvers: KLU and Paraklete

Developers: Ken Stanley, Marzio Sala, Tim Davis 40 AztecOO

ƒ Krylov subspace solvers: CG, GMRES, Bi-CGSTAB,… ƒ Incomplete factorization preconditioners

ƒ Aztec is the workhorse solver at Sandia: Š Extracted from the MPSalsa reacting flow code. Š Installed in dozens of Sandia apps. Š 1900+ external licenses. ƒ AztecOO improves on Aztec by: Š Using Epetra objects for defining matrix and RHS. Š Providing more preconditioners/scalings. Š Using C++ class design to enable more sophisticated use. ƒ AztecOO interfaces allows: Š Continued use of Aztec for functionality. Š Introduction of new solver capabilities outside of Aztec.

Developers: Mike Heroux, Alan Williams, Ray Tuminaro 41 Belos and Anasazi ƒ Next generation linear solvers (Belos) and eigensolvers (Anasazi) libraries, written in templated C++. ƒ Provide a generic interface to a collection of algorithms for solving large-scale linear problems and eigenproblems. ƒ Algorithm implementation is accomplished through the use of abstract base classes (mini interface) and traits classes. Interfaces are derived from these base classes to: Š operator-vector products Š status tests Š orthogonalization Š any arbitrary linear algebra library. ƒ Includes block linear solvers and eigensolvers.

Developers: Heidi Thornquist, Mike Heroux, Chris Baker, Rich Lehoucq, Ulrich Hetmaniuk 42 NOX: Nonlinear Solvers

ƒ Suite of nonlinear solution methods

ƒ NOX uses abstract vector and “group” interfaces: Š Allows flexible selection and tuning of various strategies: • Directions. • Line searches. Š Epetra/AztecOO/ML, LAPACK, PETSc implementations of abstract vector/group interfaces. ƒ Designed to be easily integrated into existing applications.

Developers: Tammy Kolda, Roger Pawlowski 43 LOCA

ƒ Library of continuation algorithms

ƒ Provides Š Zero order continuation Š First order continuation Š Arc length continuation Š Multi-parameter continuation (via Henderson's MF Library) Š Turning point continuation Š Pitchfork bifurcation continuation Š Hopf bifurcation continuation Š Phase transition continuation Š Eigenvalue approximation (via ARPACK or Anasazi)

Developers: Andy Salinger, Eric Phipps 44 MOOCHO & Aristos

ƒ MOOCHO: Multifunctional Object-Oriented arCHitecture for Optimization Š Large-scale invasive simultaneous analysis and design (SAND) using reduced space SQP methods.

Developer: Roscoe Bartlett

ƒ Aristos: Optimization of large-scale design spaces Š Invasive optimization approach based on full-space SQP methods. Š Efficiently manages inexactness in the inner linear system solves.

Developer: Denis Ridzal 45 Full “Vertical” Solver Coverage Trilinos Packages · Optimization Problems:

· Unconstrained: Find ufu∈ℜn that minimizes ( ) MOOCHO, ∈ℜmn ∈ℜ · Constrained: Find yu and that Aristos minimizes fyu( , ) s.t. cyu( , )= 0 = · Transient Problems: Solve fxtxtt(& ( ), ( ), ) 0 ∈==′ tTxxxx[0, ], (0)00 , &(0) Rythmos · DAEs/ODEs: for xt( )∈ℜn , t ∈ [0, T ] nm+ n · Nonlinear Problems: Given nonlinear op cxu ( , )∈ℜ →ℜ =∈ℜn NOX · Nonlinear equations: Solve cx( ) 0 for x ∂c · Stability analysis: For cxu ( , )=∈∋ 0 find space u U singular LOCA ∂x

· Implicit Linear Problems: × Given linear ops (matrices) AB , ∈ℜnn AztecOO, Belos, · Linear equations: Solve Ax=∈ℜ b for x n Ifpack,ML,etc. · Eigen problems: Solve Av=∈ℜ∈ℜλλ Bv for (all) v n , Anasazi · Explicit Linear Problems: Epetra, Tpetra mn×× mn · Matrix/graph equations: Compute yAxAAGA==; ( ); ∈ℜ∈ , G =+αβα =<> ∈ℜn · Vector problems: Compute yxw; xyxy,; , Algorithms Research: 46 Truly Useful Multi-level Methods ƒ Fly-through of next 4 slides. ƒ Theme: Multi-level preconditioning has come of age across broad spectrum of problems. Nonlinear AMG/ADAGIO

¾1542/1414 (Gee,Heinstein,Key,Pierson,Tuminaro) • Large reduction in iteration count. ¾Novel nonlinear AMG • Slow growth as size increases. ƒ ML, NOX, Epetra, Amesos, AztecOO ¾Constraints projected out in F(x) ¾Improved coloring performance by 10x ¾Initial testing ⇒ excellent convergence ¾

# elmts its 91 13 its 559 14 Jacobi 22 1241 25 2331 23 MG 13 3925 30

Pressurize tire with 5 loadsteps 6119 35 ASC SIERRA Applications

SIERRA/Fuego/Syrinx

¾ Helium plume V&V Project ¾ 260K, 16 processor run ¾ ML/GMRES is ~25% faster than without ML ¾As problem size increase, ML expected to be more beneficial

SIERRA/Aria Multiphysics

¾coupled potential/thermal/displacement DP MEMS problem ¾ML reduced solve time (40%) from ~20 minutes to ~12 minutes ¾ compared to actual ANSYS runs & ARIA re-creation ANSYS scheme Premo premo (Latin) – to squeeze (compress)

Compressible flow to determine aerodynamic characteristics for the Nuclear Weapons Complex ƒ Additional Issues that have been addressed Š FEI/Nox/Trilinos interface development Š Block Algorithm Improvements Š Block Gauss-Seidel, Block Grid Transfers

ƒ B61 problem (6.5 million dofs, 64 procs) ƒ Pseudo-transient + Newton ƒ Euler flow, Mach .8 ƒ Linear solver: 173 solves, tolerance= 10-4

ƒ GMRES/ILU Î 7494 sec ƒ GMRES/MG Î 3704 sec ƒ Numerical Issues Š nonlinear path, time step path, setup time, linear sys. difficulty

49 Premo premo (Latin) – to squeeze (compress)

ƒ Falcon problem (13+ Million dofs, 150 procs) ƒ Pseudo-transient + Newton ƒ Euler flow, Mach .75 ƒ Linear solver: 109 solves, tolerance= 10-4

ƒ GMRES/ILU Î 7620 sec ƒ GMRES/MG Î 3787 sec ƒ 10x improvement on final linear solve. ƒ > 5x gains on some problems over entire sequence

ƒ New Grid Transfers (220+K Falcon, 1 proc) ƒ Pseudo-transient + Newton, Euler flow @ Mach .75 ƒ Last linear solve, tolerance= 10-9 ƒ GMRES/MG/old transfers Î 47 its, 49 sec ƒ GMRES/MG/new transfers Î 24 its, 26 sec 50 51 Multi-level Methods Summary

ƒ Solving hard, real problems fast, scalably. ƒ Still need more… 52 Algorithms Research: Specialized Solvers ƒ Next wave of capabilities: Specialized solvers. ƒ Examples in Trilinos: Š Optimal domain decomposition preconditioners for structures: CLAPS Š Mortar methods for interface coupling: Moertel. Š Segregated Preconditioners for Navier-Stokes: Meros. ƒ Examples in Applications: Š EMU (with Boeing). Š Tramonto. 53 •A11 solve easily applied in parallel. Lipid Bi-Layer Problem: •Apply GMRES to S = A22 –A21*inv(A11)*A12 One example •Need a preconditioner for S. (of many variations)

0 0 0 0 0 19n 0 0 0 0 0 0 2n A11 A12 0 0 19n 0 0 0 0

0 F 0 0 3n A 0 A 0 21 0 0022 3n 0 000 54 Preconditioner for S

D11 F A22 = D21 D21 • D11, D22 = O(1), D21 = O(1e-10) • Ignore D21 for preconditioning. • P(S) requires • 2 diagonal scalings, T D11 F • matvec with F. A22 # A22 = • All distributed operations. 0 D21 55 Tramonto Solver Summary

ƒ 3-level linear operator structure. ƒ Matlab to fully parallel: 1 month. ƒ Complex orchestration: Š Preconditioner: 100+ distributed Epetra matrices used in sequence. Š ML, IFPACK, Amesos used on subproblems. ƒ Utilizes 8 Trilinos packages in total. ƒ 566 Lines of Code (Polymer Solver). ƒ Polymorphic Design. 3D Polymer Results

Approximately Linear Performance Improvement For Fixed Size Problem Same Iteration Counts as Processor Count Increases 57 Properties of New Solver

ƒ Uniform distributed mesh → uniform distributed work. ƒ Preconditioner sub-steps naturally parallel: → Results invariant to processor count up to round-off. ƒ Preconditioner requires almost no extra memory: → 4-10X reduction over previous approach. ƒ GMRES subspace and storage reduced 6X-10X or more. Laurie’s Favorite Properties! ƒ Speedup 20-2X. ƒ Solver has: Š No tuning parameters. Š Near linear scaling.

Michael A. Heroux, Laura J. D. Frink, and Andrew G. Salinger. Schur complement based approaches to solving density functional theories for inhomogeneous fluids on parallel computers. SIAM J. Sci. Comput. 2007. 58

Extending Capabilities: Preconditioners, Operators, Matrices

Illustrated using AztecOO as example 59 Epetra User Class Categories

Š Sparse Matrices: RowMatrix, (CrsMatrix, VbrMatrix, FECrsMatrix, FEVbrMatrix)

Š Linear Operator: Operator: (AztecOO, ML, Ifpack)

Š Dense Matrices: DenseMatrix, DenseVector, BLAS, LAPACK, SerialDenseSolver

Š Vectors: Vector, MultiVector

Š Graphs: CrsGraph

Š Data Layout: Map, BlockMap, LocalMap

Š Redistribution: Import, Export, LbGraph, LbMatrix

Š Aggregates: LinearProblem

Š Parallel Machine: Comm, (SerialComm, MpiComm, MpiSmpComm)

Š Utilities: Time, Flops 60 LinearProblem Class

ƒA linear problem is defined by: Š Matrix A : • An Epetra_RowMatrix or Epetra_Operator object. (often a CrsMatrix or VbrMatrix object.) Š Vectors x, b : Vector objects. ƒTo call AztecOO, first define a LinearProblem: Š Constructed from A, x and b. Š Once defined, can: • Scale the problem (explicit preconditioning). • Precondition it (implicitly). • Change x and b. 61 AztecOO

ƒ Aztec is the previous workhorse solver at Sandia: Š Extracted from the MPSalsa reacting flow code. Š Installed in dozens of Sandia apps. ƒ AztecOO leverages the investment in Aztec: Š Uses Aztec iterative methods and preconditioners. ƒ AztecOO improves on Aztec by: Š Using Epetra objects for defining matrix and RHS. Š Providing more preconditioners/scalings. Š Using C++ class design to enable more sophisticated use. ƒ AztecOO interfaces allows: Š Continued use of Aztec for functionality. Š Introduction of new solver capabilities outside of Aztec. 62 A Simple Epetra/AztecOO Program

// Header files omitted… int main(int argc, char *argv[]) { // ***** Create x and b vectors ***** MPI_Init(&argc,&argv); // Initialize MPI, MpiComm Epetra_Vector x(Map); Epetra_MpiComm Comm( MPI_COMM_WORLD ); Epetra_Vector b(Map); b.Random(); // Fill RHS with random #s // ***** Map puts same number of equations on each pe ***** // ***** Create Linear Problem ***** int NumMyElements = 1000 ; Epetra_LinearProblem problem(&A, &x, &b); Epetra_Map Map(-1, NumMyElements, 0, Comm); int NumGlobalElements = Map.NumGlobalElements(); // ***** Create/define AztecOO instance, solve ***** AztecOO solver(problem); // ***** Create an Epetra_Matrix tridiag(-1,2,-1) ***** solver.SetAztecOption(AZ_precond, AZ_Jacobi); solver.Iterate(1000, 1.0E-8); Epetra_CrsMatrix A(Copy, Map, 3); double negOne = -1.0; double posTwo = 2.0;

for (int i=0; i