Tutorial

Nico Schlömer (TU Berlin) for the PRACE summer school 2013, Ostrava.

Slides contributed by Mike Heroux & many Trilinos devs.

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, 1 for the United States Department of Energy under contract DE-AC04-94AL85000. Goals of the Tutorial

• Present most packages

• Concept behind Trilinos

• Build, use Trilinos

The enlightenment. Background/Motivation:

What can Trilinos do for you?

3 What is Trilinos?

Object-oriented software framework for… . Solving big complex science & engineering problems . A Project of Projects. . More like Lego™ bricks than MATLAB™

4 Sandia National Labs (US Dept of Energy)

Sandia CSRI Albuquerque, NM (505) 845-7695 Trilinos Contributors

Chris Baker Jonathan Hu Alan Williams Ross Bartlett Joe Kotulski Michael Wolf Pavel Bochev Rich Lehoucq Paul Boggs Kevin Long Past Contributors: Erik Boman Karla Morris Jason Cross Lee Buermann Kurtis Nusbaum Michael Gee Cedric Chevalier Roger Pawlowski Esteban Guillen Todd Coffey Brent Perschbacher Bob Heaphy Eric Cyr Eric Phipps Robert Hoekstra David Day Siva Rajamanickam Kris Kampshoff Karen Devine Lee Ann Riesen Ian Karlin Clark Dohrmann Marzio Sala Sarah Knepper Kelly Fermoyle Andrew Salinger Tammy Kolda David Gay Nico Schlömer Joe Outzen Jeremie Gaidamour Chris Siefert Mike Phenow Mike Heroux Bill Spotz Paul Sexton Ulrich Hetmaniuk Heidi Thornquist Bob Shuttleworth Mark Hoemmen Ray Tuminaro Ken Stanley Russell Hooper Jim Willenbring Vicki Howle Contributions by hour of the day

x

libmesh Contributions by hour of the day

x

Trilinos  R&D 100 Winner  9000 Registered Users.  30,000 Downloads.  Open Source. Laptops to Leadership systems Optimal Kernels to Optimal Solutions: ● 60 Packages.  Geometry, Meshing Binary distributions:  Discretizations, Load Balancing. ● Cray LIBSCI  Scalable Linear, Nonlinear, Eigen, ● Debian, Ubuntu Transient, Optimization, UQ solvers. ● Intel (in process)  Scalable I/O, GPU, Manycore 10 Applications

. All kinds of physical simulations: – Structural mechanics (statics and dynamics) – Circuit simulations (physical models) – Electromagnetics, plasmas, and superconductors – Combustion and fluid flow (at macro- and nanoscales)

. Coupled / multiphysics models

. Data and graph analysis (2D distributions). 11 How Trilinos evolved physicsphysics . Started as linear solvers and distributed objects . Capabilities grew to satisfy application and research needs

L(u)=f L(u)=f discretizations methods Math.Math. model model Numerical math Automatic diff. Time domain Automatic diff. Convert to models that Time domain Domain dec. can be solved on digital Space domain Domain dec. Space domain Mortar methods computers Mortar methods Lh(uh)=fhLh(uh)=fh NumericalNumerical model model solvers core Algorithms LinearLinear PetraPetra Find faster and more NonlinearNonlinear UtilitiesUtilities efficient ways to solve EigenvaluesEigenvalues InterfacesInterfaces numerical models uh=Lh-1uh=Lh-1 fh fh OptimizationOptimization LoadLoad Balancing Balancing AlgorithmsAlgorithms

. Discretizations in space and time . Optimization and sensitivities computationcomputation . Uncertainty quantification Target platforms: Any and all, current and future . Laptops and workstations . Clusters and supercomputers – Multicore CPU nodes – Hybrid CPU / GPU nodes . Parallel programming environments – MPI, OpenMP – Intel TBB, Pthreads – Thrust, CUDA – Combinations of the above . User “skins” – ++ (primary language) – C, , Python – Web (Hands-on demo!) Trilinos Strategic Goals

. Scalable Computations: As problem size and processor counts increase, the cost of the computation will remain nearly fixed.

. Hardened Computations: Never fail unless problem essentially intractable, in Algorithmic which case we diagnose and inform the user why the problem fails and Goals provide a reliable measure of error.

. Full Vertical Coverage: Provide leading edge enabling technologies through the entire technical application software stack: from problem construction, solution, analysis and optimization.

. Grand Universal Interoperability: All Trilinos packages, and important external packages, will be interoperable, so that any combination of packages and external software (e.g., PETSc, ) that makes sense algorithmically will be possible within Trilinos.

. Universal Accessibility: All Trilinos capabilities will be available to users of major computing environments: C++, Fortran, Python and the Web, and from Software the desktop to the latest scalable systems. Goals

. TriBITS Lifecycle: Trilinos will be: – Reliable: Leading edge hardened, scalable solutions for all apps – Available: Integrated into every major application at Sandia – Serviceable: “Self-sustaining”. 14 Unique features of Trilinos

● Huge library of algorithms – Linear and nonlinear solvers, preconditioners, … – Optimization, transients, sensitivities, uncertainty, …

● Growing support for multicore & hybrid CPU/GPU

– Built into the new Tpetra linear algebra objects

● Therefore into iterative solvers with zero effort! – Unified intranode programming model – Spreading into the whole stack:

● Multigrid, sparse factorizations, element assembly… ● Growing support for mixed and arbitrary precisions

– Don’t have to rebuild Trilinos to use it!

● Growing support for huge (> 2B unknowns) problems Capability Leaders: Layer of Proactive Leadership

● Areas: – User Experience (W. Spotz) (May 2012). – Scalable I/O: (R. Oldfield) (Nov 2010). – Framework, Tools & Interfaces (J. Willenbring). – Software Engineering Technologies and Integration (R. Bartlett). – Discretizations (P. Bochev). – Geometry, Meshing & Load Balancing (K. Devine). – Scalable Linear Algebra (M. Heroux). – Linear & Eigen Solvers (J. Hu). – Nonlinear, Transient & Optimization Solvers (A. Salinger).

● Each leader provides strategic direction across all Trilinos packages within area. Trilinos’ software organization

16 Trilinos is made of packages . Not a monolithic piece of software – Like LEGO™ bricks, not Matlab™ . Each package: – Has its own development team and management – Makes its own decisions about algorithms, coding style, etc. – May or may not depend on other Trilinos packages . Trilinos is not “indivisible” – You don’t need all of Trilinos to get things done – Any subset of packages can be combined and distributed – Current public release (11.2) contains 54 of the 60+ Trilinos packages . Trilinos top layer framework – Not a large amount of source code: ~1.5% – Manages package dependencies ● Like a GNU/Linux package manager – Runs packages’ tests nightly, and on every check-in . Package model supports multifrontal development 17 Trilinos Package Summary Objective Package(s)

Meshing & Discretizations STK, Intrepid, Pamgen, Sundance, ITAPS, Mesquite Discretizations Time Integration Rythmos Automatic Differentiation Sacado Methods Mortar Methods Moertel Linear algebra objects Epetra, Jpetra, Tpetra, Kokkos Interfaces Thyra, Stratimikos, RTOp, FEI,

Services Load Balancing Zoltan, Isorropia, Zoltan2 “Skins” PyTrilinos, WebTrilinos, ForTrilinos, Ctrilinos, Optika C++ utilities, I/O, thread API Teuchos, EpetraExt, Kokkos, Triutils, ThreadPool, Phalanx, Trios Iterative linear solvers AztecOO, Belos, Komplex Direct sparse linear solvers Amesos, Amesos2 Direct dense linear solvers Epetra, Teuchos, Pliris Iterative eigenvalue solvers Anasazi, Rbgen ILU-type preconditioners AztecOO, IFPACK, Ifpack2 Solvers Multilevel preconditioners ML, CLAPS, Muelu Block preconditioners Meros, Teko Nonlinear system solvers NOX, LOCA, Piro Optimization (SAND) MOOCHO, Aristos, TriKota, Globipack, Optipack, ROL Stochastic PDEs Stokhos Interoperability vs. Dependence (“Can Use”) (“Depends On”)

. Although most Trilinos packages have no explicit dependence, often packages must interact with some other packages: – NOX needs operator, vector and linear solver objects. – AztecOO needs preconditioner, matrix, operator and vector objects. – Interoperability is enabled at configure time. – Trilinos CMake system is vehicle for: ● Establishing interoperability of Trilinos components… ● Without compromising individual package autonomy. ● Trilinos_ENABLE_ALL_OPTIONAL_PACKAGES option . Architecture supports simultaneous development on many fronts.

19 Software Development and Delivery

20 “Are C++ templates safe? No, but they are good.” Compile-time Polymorphism Templates and Sanity upon a shifting foundation

Software delivery: Template Benefits: • Essential Activity ● Compile time polymorphism. ● True generic programming. How can we: ● No runtime performance hit. • Implement mixed precision algorithms? ● Strong typing for mixed precision. • Implement generic fine-grain ● Support for extended precision. parallelism? ● Many more… • Support hybrid CPU/GPU computations? • Support extended precision? Template Drawbacks: • Explore redundant computations? – Huge compile-time performance hit: • Prepare for both exascale “swim ● But good use of multicore :) lanes”? ● Eliminated for common data types. - Complex notation: ● C++ templates only sane way: Esp. for Fortran & C programmers). ● Can insulate to some extent. • Moving to completely templated Trilinos libraries. • Other important benefits. • A usable stack exists now in Trilinos. 21 Solver Software Stack Phase I packages: SPMD, int/double Phase II packages: Templated

Optimization

Unconstrained:

: MOOCHO

Constrained: n o i t

a LOCA Bifurcation Analysis i t n

Transient Problems e r e

DAEs/ODEs: s

f Ryt hmos e f i i D t

Nonlinear Problems i c NOX v i i t t a ) m o Linear Problems i Anasazi d o s t a

Linear Equations: n Ifpack, ML, etc... u c e a Eigen Problems: A Azt ecOO S ( S

Distributed Linear Algebra Epet r a Matrix/Graph Equations: Vector Problems: Teuchos 22 Solver Software Stack Phase I packages: SPMD, int/double Phase II packages: Templated

Optimization

Unconstrained:

: MOOCHO

Constrained: n o i t

a LOCA Bifurcation Analysis i t n

Transient Problems e r e

DAEs/ODEs: s

f Ryt hmos e f i i D t

Nonlinear Problems i c NOX v i i t t a ) m o Linear Problems i Anasazi d o s t a

Linear Equations: n Ifpack, ML, etc... u c e a Eigen Problems: A Azt ecOO S ( S

Distributed Linear Algebra Epet r a Matrix/Graph Equations: Vector Problems: Teuchos 23 Whirlwind Tour of Packages

Core Utilities Discretizations Methods Solvers

24 Discretizations Which packages?

Shards definition of cell topology Intrepid local (cell-based) FE/FV/FD basis definition; numerical integration; cell geometry; etc. Phalanx decomposition of complex PDE systems into a number of elementary user-defined expressions; efficient management of expression dependencies; hooks to embedded tools, etc. FEI, Panzer user-defined assignment and management of global degrees of freedom; assembly of local PDE discretization data into distributed linear systems; etc. Developers: Pavel Bochev, Denis Ridzal, Alan Williams, Roger Pawlowski, Eric, Cyr, others. Discretization Capability Area Which packages?

Shards Shards • a suite of common tools for topological cell topology data that facilitate interoperability between PDE software • cell definitions (e.g., triangle, Intrepid local PDE hexahedron, etc.) discretization • methods to manage and access information about cell topologies: (1) query adjacencies of subcells Phalanx (2) find subcell permutation w. r. to PDE expression trees global cell (3) create user-defined custom cell topologies FEI, Panzer global DOFs; linear systems Discretization Capability Area Which packages?

Intrepid Shards • physics-compatible cell-local PDE cell topology discretizations • streamlined access to finite element, finite volume and finite difference methods Intrepid • support for a wide range of cell topologies local PDE • compatible finite element spaces of arbitrary discretization degree for H(grad), H(curl), H(div) and L2 spaces • Lagrange-interpolating and modal FE bases Phalanx • prototype for polyhedral FE bases PDE expression trees • prototype for control-volume (CV) FEM • numerical integration: spatial and stochastic • cell geometry tools: volumes, normals, FEI, Panzer tangents, reference-to-physical maps global DOFs; linear systems Discretization Capability Area Which packages?

Phalanx Shards • cell-local field (variable, data) evaluation cell topology kernel specifically designed for general PDE solvers • decomposition of complex PDE systems into Intrepid a number of elementary user-defined local PDE expressions discretization • management of expression dependencies • enables rapid development of large PDEs • user-defined data types and evaluation types Phalanx offer unprecedented flexibility for direct PDE expression trees integration with user applications • they also enable embedded technology such as automatic differentiation for sensitivity FEI, Panzer analysis, optimization and uncertainty global DOFs; linear quantification systems Discretization Capability Area Which packages?

FEI, Panzer Shards • user-defined assignment and management cell topology of global degrees of freedom (DOFs) • assembly of cell-local PDE discretization data into global, distributed linear systems Intrepid • insulate PDE application codes from local PDE linear-algebra issues such as discretization storage and mappings of DOFs to distributed linear equations • support multi-physics problems, allowing for Phalanx arbitrarily complicated PDE discretizations PDE expression trees with multiple DOFs per subcell (edge-based, face-based, node-based and mixed) FEI, Panzer global DOFs; linear systems Rythmos

. Suite of time integration (discretization) methods

. Includes: backward Euler, forward Euler, explicit Runge-Kutta, and implicit BDF at this time. . Native support for operator split methods. . Highly modular. . Forward sensitivity computations will be included in the first release with adjoint sensitivities coming in near future.

Developers: Curt Ober, Todd Coffey, Roscoe Bartlett 30 Whirlwind Tour of Packages

Discretizations Methods Core Solvers

31 Sacado: Automatic Differentiation

. Efficient OO based AD tools optimized for element-level computations

. Applies AD at “element”-level computation – “Element” means finite element, finite volume, network device,…

. Template application’s element-computation code – Developers only need to maintain one templated code base . Provides three forms of AD – Forward Mode: ● Propagate derivatives of intermediate variables w.r.t. independent variables forward ● Directional derivatives, tangent vectors, square Jacobians, when m ≥ n.

– Reverse Mode: ● Propagate derivatives of dependent variables w.r.t. intermediate variables backwards ● Gradients, Jacobian-transpose products (adjoints), when n > m.

– Taylor polynomial mode:

– Developers: EricBasic Phipps, modes Carter combined Edwards for (UQ higher data derivatives. types) 32 Whirlwind Tour of Packages

Discretizations Methods Core Solvers

33 Teuchos . Portable utility package of commonly useful tools:

– ParameterList class: key/value pair database, recursive capabilities. – LAPACK, BLAS wrappers (templated on ordinal and scalar type). – Dense matrix and vector classes (compatible with BLAS/LAPACK). – FLOP counters, timers. – Ordinal, Scalar Traits support: Definition of ‘zero’, ‘one’, etc. – Reference counted pointers / arrays, and more… . Takes advantage of advanced features of C++: – Templates – Standard Template Library (STL) . Teuchos::ParameterList: – Allows easy control of solver parameters. – XML format input/output.

Developers: Chris Baker, Roscoe Barlett, Heidi Thornquist, Mike Heroux, Paul Sexton, Kris Kampshoff, Chris Baker, Mark Hoemmen, many others 34 Trilinos Common Language: Petra

. Petra provides a “common language” for distributed linear algebra objects (operator, matrix, vector)

. Petra1 provides distributed matrix and vector services. . Exists in basic form as an object model: – Describes basic user and support classes in UML, independent of language/implementation.

– Describes objects and relationships to build and use matrices, vectors and graphs.

– Has 2 implementations under development.

Petra is Greek for “foundation”. 35 Perform redistribution of distributed objects: Petra Object • Parallel permutations. • “Ghosting” of values for local computations. Model • Collection of partial results from remote processors. Base Class for All Distributed Objects: • Performs all communication. • Requires Check, Pack, Unpack methods from derived class.

Abstract Interface for Sparse All-to-AllAbstract Communication Interface to Parallel Machine Graph class for structure-only computations: • Supports construction of pre-recorded• Shameless “plan” for mimicdata-driven of MPI communications. interface. • Reusable matrix structure. • Examples: • Keeps MPI dependence to a single class (through all of Trilinos!). • Pattern-based preconditioners. ● Supports gathering/scatter of •off-processor Allow trivial x/y serial values implementation. when computing y = Ax. • Pattern-based load balancing tools. Basic sparse matrix class: ● Gathering overlap rows for Overlapping Schwarz. • Opens door to novel parallel• Flexible libraries construction (shmem, UPC, process. etc…) ● Redistribution of matrices, vectors, etc… • Arbitrary entry placement on parallel machine.

Describes layout of distributed objects: • Vectors: Number of vectorDense entries Distributed on each processorVector and and Matrices: global ID • Matrices/graphs: Rows/Columns• Simple managed local data by structure.a processor. • Called “Maps” in Epetra. • BLAS-able, LAPACK-able. • Ghostable, redistributable. • RTOp-able. Petra Implementations

. Epetra (Essential Petra): – Current production version. – Restricted to real, double precision arithmetic. – Uses stable core subset of C++ (circa 2000). – Interfaces accessible to C and Fortran users.

. Tpetra (Templated Petra): – Next generation C++ version. – Templated scalar and ordinal fields. – Uses namespaces, and STL: Improved usability/efficiency. – Builds on top of Kokkos manycore node library.

Developers: Chris Baker, Mike Heroux, Rob Hoekstra, Alan Williams, 37 Carter Edwards, Mark Hoemmen, Christian Trott, Siva Rajamanickam EpetraExt: Extensions to Epetra

. Library of useful classes not needed by everyone . Most classes are types of “transforms”. . Examples: – Graph/matrix view extraction. – Epetra/Zoltan interface. – Explicit sparse transpose. – Singleton removal filter, static condensation filter. – Overlapped graph constructor, graph colorings. – Permutations. – Sparse matrix-matrix multiply. – MATLAB, MatrixMarket I/O functions. . Most classes are small, useful, but non-trivial to write. . Migrating to 64-bit ints

Developer: Robert Hoekstra, Alan Williams, Mike Heroux, 38 Chetan Jhurani Kokkos: Node-level Data Classes

. Manycore/Accelerator data structures & kernels . Epetra is MPI-only, Tpetra is MPI+[X+Y]. . Kokkos Arrays: Details tomorrow. – Simple multi-dimensional arrays. – User specifies dimensions and size. Library handles all else. – Very good general performance. . Pretty-good-kernel (PGK) library: – Node-level threaded (X) and vector (Y) sparse and dense kernels. – Plug replaceable with vendor-optimized libraries. . Implement Petra Object Model at Node level: – Comm, Map/Perm, Vector/Multivector, RowMatrix, Operator.

Developer: Mike Heroux, Carter Edwards, Christian Trott, Siva Rajamanickam, etc. 39 Zoltan

. Data Services for Dynamic Applications – Dynamic load balancing – Graph coloring – Data migration – Matrix ordering . Partitioners: – Geometric (coordinate-based) methods: ● Recursive Coordinate Bisection (Berger, Bokhari) ● Recursive Inertial Bisection (Taylor, Nour-Omid) ● Space Filling Curves (Peano, Hilbert) ● Refinement-tree Partitioning (Mitchell) – Hypergraph and graph (connectivity-based) methods: ● Hypergraph Repartitioning PaToH (Catalyurek) ● Zoltan Hypergraph Partitioning ● ParMETIS (U. Minnesota) ● Jostle (U. Greenwich)

Developers: Karen Devine, Eric Boman, Siva R., LeAnn Riesen 40 Thyra

. High-performance, abstract interfaces for linear algebra . Offers flexibility through abstractions to algorithm developers . Linear solvers (Direct, Iterative, Preconditioners) – Abstraction of basic vector/matrix operations (dot, axpy, mv). – Can use any concrete linear algebra library (Epetra, PETSc, BLAS). . Nonlinear solvers (Newton, etc.) – Abstraction of linear solve (solve Ax=b). – Can use any concrete linear solver library: ● AztecOO, Belos, ML, PETSc, LAPACK . Transient/DAE solvers (implicit) – Abstraction of nonlinear solve. – … and so on.

Developers: Roscoe Bartlett, Kevin Long 41 “Skins”

. PyTrilinos provides Python access to Trilinos packages . Uses SWIG to generate bindings. . Epetra, AztecOO, IFPACK, ML, NOX, LOCA, Amesos and NewPackage are supported.

Developer: Bill Spotz

. CTrilinos: C wrapper (mostly to support ForTrilinos). . ForTrilinos: OO Fortran interfaces.

Developers: Nicole Lemaster, Damian Rouson

. WebTrilinos: Web interface to Trilinos . Generate test problems or read from file. . Generate C++ or Python code fragments and click-run. . Hand modify code fragments and re-run.

42 Developers: Ray Tuminaro, Jonathan Hu, Marzio Sala, Jim Willenbring Whirlwind Tour of Packages

Discretizations Methods Core Solvers

43 Amesos

. Interface to direct solvers for distributed sparse linear systems (KLU, UMFPACK, SuperLU, MUMPS, ScaLAPACK)

. Challenges: – No single solver dominates – Different interfaces and data formats, serial and parallel – Interface often changes between revisions . Amesos offers: – A single, clear, consistent interface, to various packages – Common look-and-feel for all classes – Separation from specific solver details – Use serial and distributed solvers; Amesos takes care of data redistribution – Native solvers: KLU and Paraklete

Developers: Ken Stanley, Marzio Sala, Tim Davis 44 Amesos2/KLU2

. Second-generation sparse direct solvers package

. Unified interface to multiple solvers, just like Amesos . KLU2: Default direct solver. . Amesos2 features: – Supports matrices of arbitrary scalar and index types – Path to multicore CPU and hybrid CPU/GPU solvers – Thread safe: multiple solvers can coexist on the same node ● Supports new intranode hybrid direct / iterative solver ShyLU – Abstraction from specific sparse matrix representation ● Supports Epetra and Tpetra ● Extensible to other matrix types . Still maturing.

45 Developers: Eric Bavier, Erik Boman, and Siva Rajamanickam AztecOO

. Krylov subspace solvers: CG, GMRES, Bi-CGSTAB,… . Incomplete factorization preconditioners

. Aztec is the workhorse solver at Sandia: – Extracted from the MPSalsa reacting flow code. – Installed in dozens of Sandia apps. – 1900+ external licenses. . AztecOO improves on Aztec by: – Using Epetra objects for defining matrix and RHS. – Providing more preconditioners/scalings. – Using C++ class design to enable more sophisticated use. . AztecOO interfaces allows: – Continued use of Aztec for functionality. – Introduction of new solver capabilities outside of Aztec.

46 Developers: Mike Heroux, Alan Williams, Ray Tuminaro Belos

. Next-generation linear solver library, written in templated C++.

. Provide a generic framework for developing iterative algorithms for solving large-scale, linear problems. . Algorithm implementation is accomplished through the use of traits classes and abstract base classes: – Operator-vector products: Belos::MultiVecTraits, Belos::OperatorTraits – Orthogonalization: Belos::OrthoManager, Belos::MatOrthoManager – Status tests: Belos::StatusTest, Belos::StatusTestResNorm – Iteration kernels: Belos::Iteration – Linear solver managers: Belos::SolverManager

. AztecOO provides solvers for Ax=b, what about solvers for: – Simultaneously solved systems w/ multiple-RHS: AX = B – Sequentially solved systems w/ multiple-RHS: AXi = Bi , i=1,…,t – Sequences of multiple-RHS systems: AiXi = Bi , i=1,…,t

. Many advanced methods for these types of linear systems – Block methods: block GMRES [Vital], block CG/BICG [O’Leary] – “Seed” solvers: hybrid GMRES [Nachtigal, et al.] – Recycling solvers: recycled Krylov methods [Parks, et al.] – Restarting techniques, orthogonalization techniques, …

Developers: Heidi Thornquist, Mike Heroux, Mark Hoemmen, Mike Parks, Rich Lehoucq 47 IFPACK: Algebraic Preconditioners

. Overlapping Schwarz preconditioners with incomplete factorizations, block relaxations, block direct solves.

. Accept user matrix via abstract matrix interface (Epetra versions). . Uses Epetra for basic matrix/vector calculations. . Supports simple perturbation stabilizations and condition estimation. . Separates graph construction from factorization, improves performance substantially. . Compatible with AztecOO, ML, Amesos. Can be used by NOX and ML.

Developers: Marzio Sala, Mike Heroux, Siva Rajamanickam, Alan Williams 48 Ifpack2

. Second-generation IFPACK

. Highly optimized ILUT (60x faster than IFPACK’s!) . Computed factors fully exploit multicore CPU / GPU – Via Tpetra . Path to hybrid-parallel factorizations . Arbitrary precision and complex arithmetic support

Developers: Mike Heroux, Siva Rajamanickam, Alan Williams, Michael Wolf 49 : Multi-level Preconditioners

. Smoothed aggregation, multigrid and domain decomposition preconditioning package

. Critical technology for scalable performance of some key apps. . ML compatible with other Trilinos packages: – Accepts user data as Epetra_RowMatrix object (abstract interface). Any implementation of Epetra_RowMatrix works.

– Implements the Epetra_Operator interface. Allows ML preconditioners to be used with AztecOO, Belos, Anasazi. . Can also be used completely independent of other Trilinos packages. . Muelu: Next generation ML (talked about tomorrow).

Developers: Ray Tuminaro, Jeremie Gaidamour, Jonathan Hu, Marzio Sala, Chris Siefert

50 Anasazi

. Next-generation eigensolver library, written in templated C++.

. Provide a generic framework for developing iterative algorithms for solving large-scale eigenproblems. . Algorithm implementation is accomplished through the use of traits classes and abstract base classes: – Operator-vector products: Anasazi::MultiVecTraits, Anasazi::OperatorTraits – Orthogonalization:Anasazi::OrthoManager, Anasazi::MatOrthoManager – Status tests: Anasazi::StatusTest, Anasazi::StatusTestResNorm – Iteration kernels: Anasazi::Eigensolver – Eigensolver managers: Anasazi::SolverManager – Eigenproblem: Anasazi::Eigenproblem – Sort managers: Anasazi::SortManager

. Currently has solver managers for three eigensolvers: – Block Krylov-Schur – Block Davidson – LOBPCG . Can solve: – standard and generalized eigenproblems – Hermitian and non-Hermitian eigenproblems – real or complex-valued eigenproblems

Developers: Heidi Thornquist, Mike Heroux, Chris Baker, Rich Lehoucq, Ulrich Hetmaniuk 51 NOX: Nonlinear Solvers

. Suite of nonlinear solution methods Tensor Method Broyden’s Method Newton’s Method M = f() x + J d + ---T1 dd () () T c c c MB = f xc + Bcd MN = f xc + Jc d 2

Globalizations Line Search Trust Region Interval Halving Dogleg Quadratic Inexact Dogleg Cubic Moré-Thuente

Jacobian Estimation Implementation • Graph Coloring • Parallel • Finite Difference • OO-C++ • Jacobian-Free • Independent of Newton-Krylov http://trilinos.sandia.gov/packages/nox the linear algebra package! Developers: Tammy Kolda, Roger Pawlowski 52 LOCA

. Library of continuation algorithms

. Provides – Zero order continuation – First order continuation – Arc length continuation – Multi-parameter continuation (via Henderson's MF Library) – Turning point continuation – Pitchfork bifurcation continuation – Hopf bifurcation continuation – Phase transition continuation – Eigenvalue approximation (via ARPACK or Anasazi)

Developers: Andy Salinger, Eric Phipps 53 MOOCHO & Aristos

. MOOCHO: Multifunctional Object-Oriented arCHitecture for Optimization

– Large-scale invasive simultaneous analysis and design (SAND) using reduced space SQP methods.

Developer: Roscoe Bartlett

. Aristos: Optimization of large-scale design spaces

– Invasive optimization approach based on full-space SQP methods. ROL: –NewEfficiently package manages inexactnesswith combined in the inner linearcapabilities system solves.

Developer: Denis Ridzal 54 High-level Integration and Coupling of Trilinos Solver Capabilities:

Putting the Pieces Together

55 Classic vs. New Trilinos Solver Stacks

Milestone: Successful integration of new TRILINOS solver stack into SIERRA low-Mach aerodynamic capability

Comparison of current and new Trilinos solver stack

New Trilinos solver stack (Tpetra, Belos, MueLu, Ifpack2, Zoltan2 and Amesos2) has been integrated into Sierra/Conchas Low Mach and Sierra/STK_nalu code base 56 Abstract Numerical Algorithms

An abstract numerical algorithm (ANA) is a numerical algorithm that can be expressed solely in terms of vectors, vector spaces, and linear operators

Example Linear ANA (LANA) : Linear Conjugate Gradients

• ANAs can be very mathematically sophisticated! • ANAs can be extremely reusable!

Linear Conjugate Gradient Algorithm Types of operations Types of objects

linear operator applications

vector-vector operations

scalar operations

scalar product defined by vector space 57 Driving Solvers through Stratimikos

• Stratimikos created Greek words "stratigiki“ (strategy) and "grammikos“ (linear) • Defines class Thyra::DefaultLinearSolverBuilder. • Provides common access to: ● Linear Solvers: Amesos, AztecOO, Belos, … ● Preconditioners: Ifpack, ML, … ● Reads in options through a parameter list (read from XML?) ● Accepts any linear system objects that provide ● Epetra_Operator / Epetra_RowMatrix view of the matrix ● SPMD vector views for the RHS and LHS (e.g. Epetra_[Multi]Vector objects) • Provides uniform access to linear solver options that can be leveraged across multiple applications and algorithms

Key Points • Stratimikos is an important building block for creating more sophisticated linear solver capabilities! Stratimikos Parameter List and Sublists

Top level parameters ... ...

... L

i Sublists passed n

e

a on to package

r

S code!

o

l

v e

r s ... Every parameter and sublist is ... handled by Thyra code and is fully ... validated!

P

r

e

c o

n

d

i ... t i

o

n

e

r ... s Libraries used by Trilinos Optional:

BLAS, LAPACK, MPI, MKL, yaml-cpp, Peano, CUDA, CUSPARSE, Thrust, Cusp, TBB, Pthread, HWLOC, BinUtils, ARPREC, QD, MPI, Boost, Scotch, OVIS, gpcd, METIS, ParMETIS, LibTopoMap, PaToH, CppUnit, ADOLC, ADIC, TVMET, MF, ExodusII, Nemesis, XDMF, Netcdf, y12m, SuperLUDist, SuperLUMT, SuperLU, Zlib, UMFPACK, MA28, AMD, Csparse, HYPRE, PETSC, BLACS, SCALAPACK, MUMPS, PARDISO_MKL, PARDISO, Oski, TAUCS, ForUQTK, Dakota, HIPS, HDF5, MATLAB, CASK, SPARSKIT, Qt, gtest, OpenNURBS, Portals, CrayPortals, Gemini, InfiniBand, BGPDCMF, Pablo, HPCToolkit, Pnetcdf, Clp, GLPK, qpOASES, Matio, PAPI, Eigen Trilinos Integration into an Application

Where to start? http://trilinos.sandia.gov

61 “How do I…?”

. Build Trilinos?

. Build my application with Trilinos?

. Learn about common Trilinos programming idioms?

. Download / find an installation of Trilinos?

. Find documentation and help?

62 Compiling Trilinos

Fetch from trilinos.sandia.gov

$ ls trilinos-11.2.3-Source $ mkdir build $ cd build $ cmake ../trilinos-11.2.3-Source/ […] ($ ctest) $ make install Building your app with Trilinos

If you are using Makefiles: If you are using CMake: Makefile.export system . CMake FIND_PACKAGE

6 4 Using CMake to build with Trilinos

. CMake: Cross-platform build system – Similar function as the GNU Autotools . Trilinos uses CMake to build . You don’t have to use CMake to build with Trilinos . But if you do: – FIND_PACKAGE(Trilinos …) – Example CMake script in hands-on demo . I find this much easier than hand-writing Makefiles

65 Why build systems are important . Trilinos has LOTS of packages . As package dependencies (especially optional ones) are introduced, more maintenance is required by the top-level packages:

NOX ML Amesos SuperLU

Ifpack New Library New Library

Epetra

EpetraExt

Direct Dependencies Indirect Dependencies NOX either must: • Account for the new libraries in it’s configure script (unscalable) • Depend on direct dependent packages to supply them through export Makefiles CMake in Action

CMAKE_MINIMUM_REQUIRED(VERSION 2.8.8) PROJECT(MyProject CXX)

FIND_PACKAGE(Trilinos REQUIRED) INCLUDE_DIRECTORIES(SYSTEM $ {Trilinos_INCLUDE_DIRS})

SET(MY_EXECUTABLE "myexec")

ADD_EXECUTABLE(${MY_EXECUTABLE} "main.cpp") TARGET_LINK_LIBRARIES(${MY_EXECUTABLE} ${Trilinos_LIBRARIES}) TriBITS: Trilinos/Tribal Build, Integrate, Test System

. Based on Kitware open-source toolset CMake, CTest, and Cdash developed during the adoption by Trilinos but later extended for use in CASL-related VERA, SCALE and other projects. . Built-in CMake-based support for partitioning a project into ‘Packages’ with carefully regulated dependencies with numerous features including: – Automatic enabling of upstream and downstream packages (critical for large projects like Trilinos, SCALE, and CASL) – Integrated MPI and CUDA support – Integrated TPL support (coordinate common TPLs across unrelated packages, common behavior for user configuration, etc.) – Removal of a lot of boiler-plateCMake code for creating libraries, executables, copying files, etc. … . Powerful TRIBITS_ADD_[ADVANCED]_TEST(…) CMake wrapper functions to create advanced test . Integrated support for add-on ‘Repositories’ with add-on ‘Packages’ . TribitsCTestDriverCore.cmake testing driver: – Partitioned package-by-package output to CDash and reporting on a package-by-package basis – Failed packages don’t propagate errors to downstream packages – Integrated coverage and memory testing (showing up on CDash) – Nightly and continuous integration (CI) test driver. . Pre-push synchronous CI testing with the Python checkin-test.py script . Automated almost continuous integration support using checkin-test.py script . Plus: TribitsDashboardDriver system, download-cmake.py and numerous other tools TriBITS: Meta Project, Repository, Packages

Package1 Package2

Package3 Package4 Package7 Package8

Package10 Package5 Package6 Package9

RepoA / ProjectA RepoB RepoC

ProjectD ProjectC

Current state of TriBITS Future changes/additions to TriBITS . Flexible aggregation of Packages from • Combining concepts of TPLs and different Repositories into meta Packages to allow flexible configuration Projects and building . TriBITS directory can be snapshoted • TribitsExampleProject out of Trilinos into stand-alone projects • Trilinos-independent TriBITS (independent of Trilinos) documentation . Being used by CASL VERA software, • Provide open access to and several other CASL-related TribitsExampleProject and therefore software packages TriBITS . egdist: Managing multiple repositories Continuous Integration with CDash Trilinos Availability / Information . Trilinos and related packages are available via LGPL or BSD. . Current release (11.2) is “click release”. Unlimited availability.

. Trilinos Awards: – 2004 R&D 100 Award. – SC2004 HPC Software Challenge Award. – Sandia Team Employee Recognition Award. – Lockheed-Martin Nova Award Nominee.

. More information: – http://trilinos.sandia.gov (soon trilinos.org).

. Annual Forums: – Annual Trilinos User Group Meeting in November @ SNL ● talks and video available for download – Spring Developer Meeting, May @ SNL – European meeting. – SC’XY Linear Algebra Tutorial (with Dongarra, Demmel, Kurzcak).

71 More detailed package info 73

Epetra Package Linear Algebra Package http://trilinos.sandia.gov/packages/epetra/ Typical Flow of Epetra Object Construction

• Any number of Comm objects can exist. Construct Comm • Comms can be nested (e.g., serial within MPI).

• Maps describe parallel layout. Construct Map • Maps typically associated with more than one comp object. • Two maps (source and target) define an export/import object.

Construct x Construct b Construct A • Computational objects. • Compatibility assured via common map. A Simple Epetra/AztecOO Program

// Header files omitted... // ***** Create x and b vectors ***** int main(int argc, char *argv[]) { Epetra_Vector x(Map); // Initialize MPI, MpiComm Epetra_Vector b(Map); MPI_Init(&argc,&argv); // Fill RHS with random #s Epetra_MpiComm Comm( MPI_COMM_WORLD ); b.Random();

// *** Map puts same number of equations on each pe *** // ***** Create Linear Problem ***** int NumMyElements = 1000 ; Epetra_LinearProblem problem(&A, &x, &b); Epetra_Map Map(-1, NumMyElements, 0, Comm); int NumGlobalElements = Map.NumGlobalElements(); // *** Create/define AztecOO instance, solve ** AztecOO solver(problem); solver.SetAztecOption(AZ_precond, AZ_Jacobi); // ***** Create an Epetra_Matrix tridiag(-1,2,-1) ***** solver.Iterate(1000, 1.0E-8);

Epetra_CrsMatrix A(Copy, Map, 3); double negOne = -1.0; double posTwo = 2.0; // ***** Report results, finish ****************** for (int i=0; i