<<

Nektar++: enhancing the capability and application of high-fidelity spectral/hp element methods

David Moxey1, Chris D. Cantwell2, Yan Bao3, Andrea Cassinelli2, Giacomo Castiglioni2, Sehun Chun4, Emilia Juda2, Ehsan Kazemi4, Kilian Lackhove6, Julian Marcon2, Gianmarco Mengaldo7, Douglas Serson2, Michael Turner2, Hui Xu5,2, Joaquim Peiró2, Robert M. Kirby8, Spencer J. Sherwin2

Abstract Nektar++ is an open-source framework that provides a flexible, high-performance and scalable platform for the development of solvers for partial differential equations using the high-order spectral/hp element method. In particular, Nektar++ aims to overcome the complex implementation challenges that are often associated with high-order methods, thereby allowing them to be more readily used in a wide range of application areas. In this paper, we present the algorithmic, implementation and application developments associated with our Nektar++ version 5.0 release. We describe some of the key software and performance developments, including our strategies on parallel I/O, on in situ processing, the use of collective operations for exploiting current and emerging hardware, and interfaces to enable multi-solver coupling. Furthermore, we provide details on a newly developed Python interface that enables a more rapid introduction for new users unfamiliar with spectral/hp element methods, C++ and/or Nektar++. This release also incorporates a number of numerical method developments – in particular: the method of moving frames (MMF), which provides an additional approach for the simulation of equations on embedded curvilinear manifolds and domains; a means of handling spatially variable polynomial order; and a novel technique for quasi-3D simulations (which combine a 2D spectral element and 1D Fourier ) to permit spatially-varying perturbations to the geometry in the homogeneous direction. Finally, we demonstrate the new application-level features provided in this release, namely: a facility for generating high-order curvilinear meshes called NekMesh; a novel new AcousticSolver for aeroacoustic problems; our development of a ‘thick’ strip model for the modelling of fluid-structure interaction (FSI) problems in the context of vortex-induced vibrations (VIV). We conclude by commenting on some lessons learned and by discussing some directions for future code development and expansion. PROGRAM SUMMARY Manuscript Title: Nektar++: enhancing the capability and application of high-fidelity spectral/hp element methods Authors: D. Moxey, C. D. Cantwell, Y. Bao, A. Cassinelli, G. Castiglioni, S. Chun, E. Juda, E. Kazemi, K. Lackhove, J. Marcon, G. Mengaldo, D. Serson, M. Turner, H. Xu, J. Peiró, R. M. Kirby, S. J. Sherwin Program Title: Nektar++ Journal Reference: Catalogue identifier: Licensing provisions: MIT Programming language: C++ Computer: Any PC workstation or cluster Operating system: Linux/UNIX, macOS, Microsoft Windows RAM: 512 MB+ Number of processors used: 1-1024+ Supplementary material:

arXiv:1906.03489v2 [cs.MS] 26 Nov 2019 Keywords: spectral/hp element method, computational fluid dynamics Classification: 12 Gases and Fluids External routines/libraries: Boost, METIS, FFTW, MPI, Scotch, PETSc, TinyXML, HDF5, OpenCASCADE, CWIPI Nature of problem: The Nektar++ framework is designed to enable the discretisation and solution of time-independent or time-dependent partial differential equations. Solution method: spectral/hp element method Reasons for the new version:* Summary of revisions:* Restrictions: Unusual features: Additional comments: Running time: The tests provided take a few minutes to run. Runtime in general depends on mesh size and total integration time.

Keywords: Nektar++, spectral/hp element methods, high-order finite element methods

Preprint submitted to Computer Physics Communications 28th November 2019 1. Introduction combined with open-source efforts to increase their prevalence, mean that simulations across extremely complex geometries are High-order finite element methods are becoming increasingly now possible. popular in both academia and industry, as scientists and tech- Nektar++ is a project initiated in 2005 (the first commit to nological innovators strive to increase the fidelity and accuracy an online repository was made on 4th May 2006), with the aim of their simulations whilst retaining computational efficiency. of facilitating the development of high-fidelity computationally- The spectral/hp element method in particular, which combines efficient, and scalable spectral element solvers, thereby helping the geometric flexibility of classical low-order finite element close the gap between application-domain experts (or users), methods with the attractive convergence properties of spectral and numerical-method experts (or developers). Along with Nek- discretisations, can yield a number of advantages in this re- tar++, other packages that implement such high-order methods gard. From a numerical analysis perspective, their diffusion have been developed in the past several years. Nek5000, de- and dispersion characteristics mean that they are ideally suited veloped at Argonne National Laboratory, implements CG discret- to applications such as fluid dynamics, where flow structures izations mainly for solving incompressible and low-Mach num- must be convected across long time- and length-scales without ber flow problems on hexahedral meshes using the classical spec- suffering from artificial dissipation [1, 2, 3, 4, 5] High-order tral element method of collocated Lagrange interpolants [14]. methods are also less computationally costly than traditional Semtex [15, 16] is a fluid dynamics code that also uses the low-order numerical schemes for a given number of degrees of classical spectral element method in two dimensions, with the freedom, owing to their ability to exploit a more locally com- use of a 1D pseudospectral Fourier expansion for three dimen- pact and dense elemental operators when compared to sparse sional problems that incorporate a homogeneous component of low-order discretisations [6, 7, 8]. In addition, high-order meth- geometry. Nektar++ also supports this joint discretisation in ods in various formulations can be seen to encapsulate other Cartesian coordinates; however Semtex also includes support existing methods, such as finite volume, finite difference (e.g. for cylindrical coordinate systems, where the Fourier modes are summation-by-parts finite difference [9]), finite element, and used in the azimuthal direction, which broadens the range of flux reconstruction approaches [10, 11]. All of these features geometries that can be considered in this setting. deal.II [17] is a make the spectral/hp element method an extremely attractive more generic finite element framework, which likewise restricts tool to practitioners. its element choices to quadrilaterals and hexahedra, but permits As the name suggests, the spectral/hp element method relies the solution of a wide array of problems, alongside the use of on the tesselation of a computational domain into a collection of adaptive mesh refinement. Flexi [18] and its spinoff Fluxo [19], elements of arbitrary size that form a mesh, where each element developed at the University of Stuttgart and at the University of is equipped with a polynomial expansion of arbitrary and po- Cologne, implement discontinuous Galerkin methods for flow tentially spatially-variable order [12]. Within this definition, we problems on hexahedral meshes. GNuME, and its NUMO and include continuous Galerkin (CG) and discontinuous Galerkin NUMA components developed at the Naval Postgraduate School, (DG) methods, along with their variants. High-order methods implement both continuous and discontinuous Galerkin methods have been historically seen as complex to implement, and their mainly for weather and climate prediction purposes [20, 21]. adoption has been consequently limited to academic groups PyFR [22], developed at Imperial College London, implements and numerical analysts. This mantra is rapidly being removed the flux reconstruction approach [23] which shares various nu- thanks to the development of open-source numerical libraries merical properties with DG in particular [10, 24]. DUNE im- that facilitate the implementation of new high-fidelity solvers plements a DG formulation, among a wide variety of other for the solution of problems arising across a broad spectrum of numerical methods such as finite difference and finite volume research areas in engineering, biomedicine, numerical weather methods [25]. and climate prediction, and economics. An additional challenge Nektar++ is a continuation of an earlier code Nektar, itself in the use of high-order methods, particularly for problems in- developed at Brown University originally using the C program- volving complex geometries, is the generation of a curvilinear ming language, with some parts extended to use C++. Nek- mesh that conforms to the underlying curves and surfaces. How- tar++ is instead written using the C++ language, and greatly ever, advances in curvilinear mesh generation (such as [13]), exploits its object-oriented capabilities. The aim of Nektar++ is to encapsulate many of the high-order discretisation strategies 1College of Engineering, Mathematics and Physical Sciences, University of mentioned previously, in a readily accessible framework. The Exeter, United Kingdom current release includes both CG and DG methods and, arguably, 2Department of Aeronautics, Imperial College London, United Kingdom its distinguishing feature is its inherent support for complex 3Department of Civil Engineering, Shanghai Jiao Tong University, Shanghai, geometries through various unstructured high-order element China 4Underwood International College, Yonsei University, South Korea types; namely hexahedra, tetrahedra, prisms and pyramids for 5School of Aeronautics and Astronautics, Shanghai Jiao Tong University, three-dimensional problems, and quadrilaterals and triangles for Shanghai, China two-dimensional problems. Both CG and DG can be used on 6Department of Energy and Power Plant Technology, Technische Universität meshes that contain different element shapes (also referred to as Darmstadt, Germany hybrid meshes), and allow for curvilinear element boundaries 7Division of Engineering and Applied Science. California Institute of Tech- nology in proximity of curved geometrical shapes. Along with these 8Scientific Computing and Imaging Institute, University of Utah, USA spatial discretizations, Nektar++ supports so-called quasi-3D 2 simulations in a manner similar to Semtex, where a geomet- Contributors. Nektar++ has been developed across more than rically complex 2D spectral element mesh is combined with a a decade, and we would like to explicitly acknowledge the many classical 1D Fourier expansion in a third, geometrically homo- people who have made contributions to the specific application geneous, direction. This mixed formulation can significantly codes distributed with the libraries. In addition to the coauthors enhance computational efficiency for problems of the appropri- of the previous publication [33] we would like to explicitly thank ate geometry [15] and Nektar++ supports a number of differ- the following for their contributions: ent parallelisation strategies for this approach [26]. The time • Dr. Rheeda Ali (Department of Biomedical Engineering, discretization is achieved using a general linear method for- Johns Hopkins University, USA) and Dr. Caroline Roney mulation for the encapsulation of implicit, explicit and mixed (Biomedical Engineering Department, King’s College Lon- implicit-explicit timestepping schemes [27]. While the main don, UK) for their work on the cardiac electrophysiology purpose of the library is to create an environment under which solver; users can develop novel solvers for the applications of their interest, Nektar++ already includes fully-fledged solvers for • Dr. Michael Barbour and Dr. Kurt Sansom (Department of the solution of several common systems, including fluid flows Mechnical Engineering, University of Washington, USA) governed either by the incompressible or compressible Navier- for developments related to biological flows; Stokes and Euler equations; advection-diffusion-reaction prob- lems, including on a manifold, with specific applications to • Mr. Filipe Buscariolo (Department of Aeronautics, Imperial cardiac electrophysiology [28]. One of the main shortcomings College London, UK) for contributions to the incompress- of the spectral/hp element method is related to a perceived lack ible Navier-Stokes solver; of robustness, arising from low dissipative properties, which • Dr. Jeremy Cohen (Department of Aeronautics, Imperial can be a significant challenge for industrial applications. Nek- College London, UK) for work relating to cloud deploy- tar++ implements several techniques to address this problem, ment and the Nekkloud interface; namely efficient dealiasing techniques [29, 30] and spectral van- ishing viscosity [31], that have proved invaluable for particularly • Mr. Ryan Denny (Department of Aeronautics, Imperial challenging applications [32]. College London, UK) for enhancements in the 1D arterial The scope of this review is to highlight the substantial number pulse wave model; of new developments in Nektar++ since the last publication related to the software, released in 2015 and coinciding with • Mr. Jan Eichstädt (Department of Aeronautics, Imperial the version 4 release of Nektar++ [33]. Since this release, over College London, UK) for initial investigations towards 7,000 commits have been added to the main code for the version using many-core and GPU platforms; 5 documented here, with a key focus on expanding the capab- • Dr. Stanisław Gepner (Faculty of Power and Aeronautical ility of the code to provide efficient high-fidelity simulations Engineering, Warsaw University of Technology, Poland) for for challenging problems in various scientific and engineering enhancements in the Navier-Stokes linear stability solver; fields. To this end, the paper is organized as follows. After a brief review of the formulation in Section 2, in Section 3 we • Mr. Dav de St. Germain (SCI Institute, University of Utah, present our software and performance developments. This in- USA) for enhancements of timestepping schemes; cludes our strategies on parallel I/O; in situ processing; the use • Mr. Ashok Jallepalli (SCI Institute, University of Utah, of collective linear algebra operations for exploiting current and USA) for initial efforts on the integration of SIAC filters emerging hardware; and interfaces for multi-solver coupling to into post-processing routines; enable multi-physics simulations using Nektar++. Furthermore, we provide details on our new Python interfaces that enable • Prof. Dr. Johannes Janicka (Department of Energy and more rapid on-boarding of new users unfamiliar with either Power Plant Technology), Technische Universität Darm- spectral/hp element methods, C++ or Nektar++. In Section 4, stadt, Germany), for support and development of the acous- we then present recent numerical method developments – in tic solver and solver coupling; particular, the method of moving frames (MMF); recently added support for spatially-variable polynomial order for p-adaptive • Mr. Edward Laughton (College of Engineering, Mathem- simulations; and new ways of incorporating global mappings atics and Physical Sciences, University of Exeter, UK) for to enable spatially variable quasi-3D approaches. In Section 5, testing enhancements and initial efforts on non-conformal we then demonstrate some of the new features provided in our grids; new release, namely: our new facility for generating high-order • Dr. Rodrigo Moura (Divisão de Engenharia Aeronáutica, (curvilinear) meshes called NekMesh; a new AcousticSolver Instituto Tecnológico de Aeronáutica, Brasil) for numer- for aeroacoustic problems; and our development of a ‘thick’ ical developments related to spectral vanishing viscosity strip model for enabling the solution of fluid-structure interac- stabilisation; tion (FSI) problems in the context of vortex-induced vibrations (VIV). We conclude in Section 7 by commenting on some les- • Dr. Rupert Nash and Dr. Michael Bareford (EPCC, Uni- sons learned and by discussing some directions for future code versity of Edinburgh, UK) for their work on parallel I/O; development and expansion. and 3 • Mr. Zhenguo Yan and Mr. Yu Pan (Department of Aero-

nautics, Imperial College London, UK) for development of x2 the compressible flow solver;

x1 x ∈ Ω χ (ξ) 2. Methods e e

In this first section, we outline the mathematical framework ω that underpins Nektar++, as originally presented in [33, 34]. η2 T ξ2 Nektar++ supports a variety of spatial discretisation choices, ω−1 primarily based around the continuous and discontinuous Galer- T kin methods (CG and DG). However, in the majority of cases η1 ξ1 CG and DG use the same generic numerical infrastructure. Here collapsed coordinates reference coordinates we therefore present a brief overview and refer the reader to [12] η ∈ [−1, 1]2 ξ ∈ T for further details, which contains a comprehensive description of the method and its corresponding implementation choices. In the text below we also highlight appropriate chapters and Figure 1: Coordinate systems and mappings between collapsed coordinates η, sections from [12] for the material being discussed. reference coordinates ξ and Cartesian coordinates x for a high-order triangular The broad goal of Nektar++ is to provide a framework for element Ωe. the numerical solution of partial differential equations (PDEs) L of the form u = 0 on a domain Ω, which may be geometrically Following the Galerkin approach, we select U = V. We note complex, for some solution u. Practically, in order to carry out that where problems involve Dirichlet boundary conditions on a a spatial discretisation of the PDE problem, Ω needs to be par- boundary ∂ΩD ⊂ Ω of the form u|∂ΩD (x) = gD(x), we typically titioned into a finite number of d-dimensional non-overlapping enforce this by lifting g as described in [12, §2.2.3 and §4.2.4]. ≤ ≤ D elements Ωe, where in Nektar++ we support 1 d 3, such For illustrative purposes, we assume that L is linear and its that the collection of all elements recovers the original region corresponding weak form can be expressed as: find u ∈ U such (Ω = S Ω ) and for e e , Ω ∩ Ω = ∂Ω is an empty set e 1 , 2 e1 e2 e1e2 that or an interface of dimension d¯ < d. The domain may be embed- ded in a space of equal or higher dimension, dˆ ≥ d, as described a(u, v) = `(v) ∀v ∈ U, (1) in [28]. One of the distinguishing features of Nektar++ is that it supports a wide variety of elemental shapes: namely segments where a(·, ·) is a bilinear form and `(·) is a linear form. in one dimension [12, §2]; triangles and quadrilaterals in two To numerically solve the problem given in Equation 1 with dimensions, and; tetrahedra, pyramids, prisms and hexahedra in the spatial partition of Ω, we consider solutions in an appropriate three dimensions [12, §3 and §4]. This makes it broadly suit- finite-dimensional subspace UN ⊂ U. In a high-order setting, able for the solution of problems in complex domains, in which these spaces correspond to the choice of spatial discretisation on hybrid meshes of multiple elements are generally required. the mesh. For example, in the discontinuous setting we select Nektar++ supports the solution of PDE systems that are either 2 steady-state or time-dependent; in the case of time-dependent UN = {u ∈ L (Ω) | u|Ωe ∈ PP(Ωe)}, cases, there is subsequently a choice to use either explicit, impli- cit or implicit-explicit timestepping. From an implementation where PP(Ωe) represents the space of polynomials on Ωe up to and formulation perspective, steady-state and implicit-type prob- total degree P, so that functions are permitted to be discontinu- lems typically require the efficient solution of a system of linear ous across elemental boundaries. In the continuous setting, we 0 equations, whereas explicit-type problems rely on the evaluation select the same space intersected with C (Ω), so that expansions of the spatially discretised mathematical operators. In the fol- are continuous across elemental boundaries. The solution to the δ lowing sections, we briefly outline the support in Nektar++ for weak form in Equation (1) can then be cast as: find u ∈ UN these regimes. such that a(uδ, vδ) = `(vδ) ∀vδ ∈ U (2) 2.1. Implicit-type methods N In this approach we follow the standard finite element deriv- Assuming that the solution can be represented as uδ(x) = P ation as described in [12, §2.2], so that before establishing the n uˆnΦn(x), a weighted sum of N trial functions Φn(x) ∈ UN spatial discretisation, the abstract operator form Lu = 0 is formu- defined on Ω [12, §2.1], the problem then becomes that of find- lated in the weak sense alongside appropriate test and trial spaces ing the coefficients uˆn, which in the Galerkin approach translates V and U. In general, we require at least a first-order derivative into the solution of a system of linear equations. 1 1 To establish the global basis Φ(Ω) = {Φ (x),..., Φ (x)}, we and select, for example, V = H0 (Ω):= {v ∈ H (Ω) | v(∂Ω) = 0}, 1 N where first consider the contributions from each element in the do- main. To simplify the definition of basis functions on each H1(Ω):= {v ∈ L2(Ω) | Dαu ∈ L2(Ω) ∀ |α| ≤ 1}. element, we follow the standard approach described in [12, 4 §2.3.1.2 and §4.1.3] where Ωe is mapped from a standard refer- by d ence space E ⊂ [−1, 1] by a parametric mapping χe : E → Ωe,  1−ξ so that x = χ (ξ). Here, E is one of the supported region shapes p = 0, e  2   1−ξ   1+ξ  1,1 in Table 1 and ξ are d-dimensional coordinates representing pos- φp(ξ) = P (ξ) 0 < p < P,  2 2 p−1 itions in a reference element, distinguishing them from x which  1+ξ p = P, are dˆ-dimensional coordinates in the Cartesian coordinate space. 2 On triangular, tetrahedral, prismatic and pyramid elements, one which supports boundary-interior decomposition and therefore or more of the coordinate directions of a quadrilateral or hexa- improves numerical efficiency when solving the globally as- hedral region are collapsed to form the appropriate reference sembled system. Equivalently, φp(ξ) could also be defined by shape, creating one or more singular vertices within these re- the Lagrange polynomials through the Gauss-Lobatto-Legendre gions [35, 36]. Operations, such as calculating derivatives, map quadrature points which would lead to a traditional spectral the tensor-product coordinate system to these shapes through element method [12, §2.3.4]. In higher dimensions, a tensor Duffy transformations [37] — for example, ωT : T → Q maps product of either basis can be used on quadrilateral and hexa- the triangular region T to the quadrilateral region Q — to allow hedral elements respectively. On the other hand, the use of a these methods to be well-defined. The relationship between collapsed coordinate system also permits the use of a tensor these coordinates is depicted in Figure 1. Note that the singu- product modal basis for the triangle, tetrahedron, prism and pyr- −1 larity in the inverse mapping ωT does not affect convergence amid, which when combined with tensor contraction techniques order and can be mitigated in practice by adopting an alternative can yield performance improvements. This aspect is considered choice of quadrature, such as Gauss-Radau points, in order to further in Section 3.3 and [40, 41]. omit the collapsed vertices [12, §4.1.1]. Elemental contributions to the solution may be assembled

The mapping χe need not necessarily exhibit a constant Jac- to form a global solution through an assembly operator [12, obian, so that the resulting element is deformed as shown in §2.3.1 and §4.2.1]. In a continuous Galerkin setting, the as- Figure 1. Nektar++ represents the curvature of these elements sembly operator sums contributions from neighbouring elements 0 by taking a sub- or iso-parametric mapping for χe, so that the to enforce the C -continuity requirement. In a discontinuous curvature is defined using at least a subset of the basis functions Galerkin formulation, such mappings transfer flux values from used to represent the solution field [12, §4.3.5]. The ability to the element interfaces into the global solution vector. For elliptic use such elements in high-order simulations is critical in the sim- operators, Nektar++ has a wide range of implementation choices ulation of complex geometries, as without curvilinear elements, available to improve computational performance. A common one could not accurately represent the underlying curves and sur- choice is the use of a (possibly multi-level) static condensa- faces of the geometry, as demonstrated in [38]. The generation tion of the assembled system [12, §4.1.6 and §4.2.3], where a of meshes involving curved elements is, however, a challenging global system is formed only on elemental boundary degrees problem. Our efforts to generate such meshes, as well as to adapt of freedom. This is supported both for the classical continuous linear meshes for high-order simulations, are implemented in framework, as well as in the DG method. In the latter, this the NekMesh generator tool described in Section 5.1, as well as gives rise to the hybridizable discontinuous Galerkin (HDG) a number of recent publications (e.g. [13, 39]). approach [42], in which a global system is solved on the trace or skeleton of the overall mesh. With the mapping χe and the transformation ωT the discrete δ approximation u to the solution u on a single element Ωe can then be expressed as 2.2. Explicit-type methods Nektar++ has extensive support for the solution of problems δ X  −1  u (x) = uˆnφn χe (x) in a time-explicit manner, which requires the evaluation of dis- n cretised spatial operators alongside projection into the appropri- ate space. As the construction of the implicit operators requires these same operator evaluations, most of the formulation pre- where φ form a basis of P (E); i.e. a local polynomial n P viously discussed directly translates to this approach. We do basis need only be constructed on each reference element. A note however that there is a particular focus on the discontinous one-dimensional order-P basis is a set of polynomials φ (ξ), p as shown in [12, §6.2] for multi-dimensional 0 ≤ p ≤ P, defined on the reference segment, S. The choice of hyperbolic systems of the form basis is usually made based on its mathematical or numerical properties and may be modal or nodal in nature [12, §2.3]. For du + ∇ · F(u) = G(u). two- and three-dimensional regions, a tensorial basis may be dt employed, where the polynomial space is constructed as the tensor-product of one-dimensional bases on segments, quadrilat- This includes the acoustic perturbation equations that we discuss erals or hexahedral regions. In spectral/hp element methods, a in Section 5.2 and the compressible Navier-Stokes system used common choice is to use a modified hierarchical Legendre basis for aerodynamics simulations in Section 5.4. In this setting, on a (a ‘bubble’-like polynomial basis sometimes referred to as the single element, and further assuming G is zero for simplicity of ‘modal basis’), given in [12, §3.2.3] as a function of one variable presentation as in [12, §6.2.2], we multiply the above equation 5 Name Class Domain definition

Segment StdSeg S = {ξ1 ∈ [−1, 1]} Quadrilateral StdQuad Q = {ξ ∈ [−1, 1]2} 2 Triangle StdTri T = {ξ ∈ [−1, 1] | ξ1 + ξ2 ≤ 0} Hexahedron StdHex H = {ξ ∈ [−1, 1]3} 3 Prism StdPrism R = {ξ ∈ [−1, 1] | ξ1 ≤ 1, ξ2 + ξ3 ≤ 0} 3 Pyramid StdPyr P = {ξ ∈ [−1, 1] | ξ1 + ξ3 ≤ 0, ξ2 + ξ3 ≤ 0} 3 Tetrahedron StdTet A = {ξ ∈ [−1, 1] | ξ1 + ξ2 + ξ3 ≤ −1}

Table 1: List of supported elemental reference regions. by an appropriate test function v ∈ U and integrate by parts to • LocalRegions: physical regions in the domain, compos- obtain ing a reference region E with a map χe, extensions of core Z du Z Z operations onto these regions; v dx + v f˜e(u−, u+) · nds − ∇v · F(u) dx = 0. dt Ωe ∂Ωe Ωe • MultiRegions: list of physical regions comprising Ω, In the above, f˜e(u−, u+) denotes a numerically calculated bound- global assembly maps which may optionally enforce con- ary flux, depending on the element-interior velocity u− and its tinuity, construction and solution of global linear systems, neighbour’s velocity u+. The choice of such a flux is solver- extension of local core operations to domain-level opera- specific and may invovle an upwinding approach or use of an tions; and appropriate . Where second-order diffusive terms • SolverUtils: building blocks for developing complete are required, Nektar++ supports the use of a local discontin- solvers. ous Galerkin (LDG) approach to minimize the stencil required for communication (see [43] and [12, §7.5.2]). From a solver In version 5.0, four additional libraries have been included. Each perspective, the implementation of the above is fairly generic, of these can be seen as a non-core, in the sense that they provide requiring only the evaluation of the flux term f˜, conservation additional functionality to the core libraries above: law F(u) and right-hand side source terms G(u). • Collections: encapsulates the implementation of key 2.3. Recap of Nektar++ implementation kernels (such as inner products and transforms) with an In this section, we briefly outline the implementation of these emphasis on evaluating operators collectively for similar methods inside Nektar++. Further details on the overall design elements; of Nektar++, as well as examples of how to use it, can be found in the previous publication [33]. • GlobalMapping: implements a mapping technique that The core of Nektar++ comprises six libraries which are de- allows quasi-3D simulations (i.e. those using a hybrid signed to emulate the general mathematical formulation ex- 2D spectral element/1D Fourier spectral discretisation) to pressed above. They describe the problem in a hierarchical define spatially-inhomogeneous deformations; manner, by working from elemental basis functions and shapes • NekMeshUtils: contains interfaces for CAD engines and through to a C++ representation of a high-order field and com- key mesh generation routines, to be used by the NekMesh plete systems of partial differential equations on a complex com- mesh generator; and putational domain. Depending on the specific application, this then allows developers to choose an appropriate level for their • FieldUtils: defines various post-processing modules that needs, balancing ease of use at the highest level with fine-level can be used both by the post-processing utility FieldCon- implementation details at the lowest. A summary of each lib- vert, as well as solvers for in-situ processing. rary’s purpose is the following: We describe the purpose of these libraries in greater detail in • LibUtilites: elemental basis functions ψ , quadrature p Sections 3.3, 4.3, 5.1 and 3.2 respectively. point distributions ξi and basic building blocks such as I/O handling; • StdRegions: reference regions E along with the defini- 3. Software and Performance Developments tion of key finite element operations across them, such as integration, differentiation and transformations; This section reviews the software and performance develop- ments added to Nektar++ since our last major release. We note • SpatialDomains: the geometric mappings χe and factors that a significant change from previous releases is the use of ∂χ ∂ξ , as well as Jacobians of the mappings and the construc- C++11-specific language features throughout the framework. A tion of the topology of the mesh from the input geometry; brief summary of our changes in this area include: 6 • transitioning from various data structures offered in boost large memory requirement and processing time to produce parti- to those now natively available in C++11: in particular, tioned meshes. It also imposes potentially severe restrictions on smart pointers such as shared_ptr, unordered STL con- start-up time of simulations and/or the post-processing of data, tainers and function bindings; hindering throughput for very large cases. Although various approaches have been used to partially mit- • avoid the use of typedef aliases for complex data structure igate this restriction, such as pre-partitioning the mesh before types, in deference to the use of auto where appropriate; the start of the simulation and utilising a compressed XML • similarly, where appropriate we make use of range-based format that reduces file sizes and the XML tree size, these do for loops to avoid iterator typedef usage and simplify not themselves cure the problem entirely. In the latest version syntax; of Nektar++ we address this issue with a new Hierarchical Data Format v5 (HDF5) file format. To design the layout of this file, • use of variadic templates in core memory management and we recall that the structure of a basic Nektar++ mesh requires data structures to avoid the use of syntax-dense prepro- the following storage: cessing macros at compile time. • Vertices of the mesh are defined using double-precision Alongside the many developments outlined here, the major floating point numbers for their three coordinates. Each change in the Nektar++ API resulting from this switch has vertex has a unique ID. further motivated the release of a new major version of the code. • All other elements are defined using integer IDs in a hier- The developments described in this section are primarily geared archical manner; for example in three dimensions edges are towards our continuing efforts to support our users on large-scale formed from pairs of vertices, faces from 3 or 4 edges and high-performance computing (HPC) systems. elements from a collection of faces.

3.1. Parallel I/O This hierarchical definition clearly aligns with the HDF5 data layout. To accommodate the ‘mapping’ of a given ID into a Although the core of Nektar++ has offered efficient parallel tuple of IDs or vertex coordinates, we adopt the following basic scaling for some time (as reported in previous work [33]), one structure: aspect that has been improved substantially in the latest release is support for parallel I/O, both during the setup phase of the • The mesh group contains multi-dimensional datasets that simulation and when writing checkpoints of field data for un- define the elements of a given dimension. For example, steady simulations. In both cases, we have added support for given N quadrilaterals, the quad dataset within the mesh new, parallel-friendly mesh input files and data checkpoint files group is a N ×4 dataset of integers, where each row denotes that use the HDF5 file format [44], in combination with Message the 4 integer IDs of edges that define that quadrilateral. Passing Interface (MPI) I/O, to significantly reduce bottlenecks • The maps group contains one-dimensional datasets that relating to the use of parallel filesystems. This approach enables define the IDs of each row of the corresponding two- Nektar++ to either read or write a single file across all processes, dimensional dataset inside mesh. as opposed to a file-per-rank output scheme that can place sig- nificant pressure on parallel filesystems, particularly during the An example of this structure for a simple quadrilateral mesh is partitioning phase of the simulation. Here we discuss the imple- given in Figure 2. We also define additional datasets to define mentation of the mesh input file format; details regarding the element curvature and other ancillary structures such as bound- field output can be found in [45]. ary regions. One of the key challenges identified in the use of Nektar++ When running in parallel, Nektar++ adopts a domain decom- within large-scale HPC environments is the use of an XML- position strategy, whereby the mesh is partitioned into a subset based input format used for defining the mesh topology. Al- of the whole domain for each process. This can be done either at though XML is highly convenient from the perspective of read- the start of the simulation, or prior to running it. Parallelisation ability and portability, particularly for small simulations, the is achieved using the standard MPI protocol, where each process format does pose a significant challenge at larger scales since, is independently executed and there is no explicit use of shared when running in parallel, there is no straightforward way to memory in program code. Under the new HDF5 format, we extract a part of an XML file on each process. This means that in perform a parallel partitioning strategy at startup, which runs as the initial phase of the simulation, where the mesh is partitioned follows: into smaller chunks that run on each process, there is a need • Each process is initially assigned a roughly equal number for at least one process to read the entire XML file. Even at of elements to read. This is calculated by querying the size higher orders, where meshes are typically coarse to reflect the of each elemental dataset to determine the total number additional degrees of freedom in each element, detailed simula- of elements, and then partitioned equally according to the tions of complex geometries typically require large, unstructured rank of the process and total number of processors. meshes of millions of high-order elements. Having only a single process read this file therefore imposes a natural limit to the • The dual graph corresponding to each process’ subdomain strong scaling of the setup phase of simulations – that is, the is then constructed. Links to other process subdomains are maximum number of processes that can be used – due to the established by using ghost nodes to those process’ nodes. 7 maps group

vertex 3 vertex 2 vert dataset: contains: 0, 1, 2, 3 edge 2 seg dataset: contains: 0, 1, 2, 3

quad dataset: contains: 0

edge 3 edge 1 mesh group vert dataset: (v0), (v1), (v2), (v3)

seg dataset: (0,1), (1,2), (2,3), (3,0) edge 0 vertex 0 vertex 1 quad dataset: (0,1,2,3)

(a) A simple two-dimensional quadrilateral mesh consisting of a single (b) Directory-dataset structure that is used for storage of the topological element. data in the left-hand figure.

Figure 2: An example of a single quadrilateral element grid. In (a), we show the topological decomposition of the element into its 4 edges and vertices. Figure (b) shows a schematic of the filesystem-type structure implemented in HDF5 that is used for storage of this topological information.

• The dual graph is passed to the PT-Scotch library [46] to utility is capable of using this format for parallel post-processing perform partitioning in parallel on either the full system or of data. a subset of processes, depending on the size of the graph. 3.2. In-situ processing • Once the resulting graph is partitioned, the datasets are read in parallel using a top-down process: i.e. in three The increasing capabilities of high-performance computing dimensions, we read the volumes, followed by faces, edges facilities allow us to perform simulations with a large number and finally vertices. In the context of Figure 2, this would of degrees of freedom which leads to challenges in terms of consist of reading the quad dataset, followed by the seg post-processing. The first problem arises when we consider the dataset, followed by the vert dataset. storage requirements of the complete solution of these simu- lations. Tasks such as generating animations, where we need • Note that at each stage, each processor only reads the geo- to consider the solution at many time instances, may become metric entities that are required for its own partition, which infeasible if we have to store the complete fields at each time is achieved through the use of HDF5 selection routines instance. Another difficulty occurs due to the memory require- when reading the datasets. ments of loading the solution for post-processing. Although this can be alleviated by techniques such as subdividing the data • The Nektar++ geometry objects are then constructed from and processing one subdivision at a time, this is not possible for these data in a bottom-up manner: i.e. vertices, followed by some operations requiring global information, such as perform- edges, followed by faces and finally volumes, as required ing a C0-projection that involves the inversion of a global mass by each processor. matrix. In such cases, the memory requirements might force the • This concludes the construction of the linear mesh: user to perform post-processing using a number of processing curvature information is stored in separate datasets, and is nodes similar to that used for the simulation. also read at this stage as required for each element. To aid in dealing with this issue, Nektar++ now supports processing the solution in situ during the simulation. The imple- • Finally, ancillary information such as composites and do- mentation of this feature was facilitated by the modular structure main definition are read from the remaining datasets. of our post-processing tool, FieldConvert. This tool uses a pipeline of modules, passing mesh and field data between them, The new HDF5 based format is typically significantly faster to arrive at a final output file. This comprises one or more input than the existing XML format to perform the initial partitioning modules (to read mesh and field data), zero or more processing phase of the simulation. Notably, whereas execution times for modules (to manipulate the data, such as calculating derived the XML format increase with the number of nodes being used quantities or extracting boundary information), and a single (likely owing to the file that must be written for each rank by the output module (to write the data in one of a number of field root processor), the HDF5 input time remains roughly constant. and visualisation formats). To achieve in situ processing, Field- We note that the HDF5 format also provides benefits for the Convert modules were moved to a new library (FieldUtils), post-processing of large simulation data, as the FieldConvert allowing them to be executed during the simulation as well as 8 shared with the FieldConvert utility. The actual execution of the development of spectral element methods is that operator the modules during in situ processing is performed by a new counts can be substantially reduced by using a combination subclass of the Filter class, which is called periodically after a of a tensor product basis, together with a tensor contraction prescribed number of time-steps to perform operations which do technique referred to as sum-factorisation. This technique, ex- not modify the solution field. This filter structure allows the user ploited inside of Nektar++ as well as other higher-order frame- to choose which modules should be used and to set configuration works such as deal.II [17], uses a small amount of temporary parameters. Multiple instances of the filter can be used if more storage to reduce operator counts from O(P2d) to O(Pd+1) at a than one post-processing pipeline is desired. given order P. For example, consider a polynomial interpol- There are many example applications for this new feature. ation on a quadrilateral across a tensor product of quadrature The most obvious is to generate a field or derived quantity, such points ξ = (ξ1i, ξ2 j), where the basis admits a tensor product as vorticity, as the simulation is running. An example of this is decomposition φpq(ξ) = φp(ξ1)φq(ξ2). This expansion takes the given in the supplementary materials Example A.16, in which form the vorticity is calculated every 100 timesteps whilst removing XP XQ the velocity and pressure fields to save output file space, using δ u (ξ1i, ξ2 j) = uˆ pqφp(ξ1i)φq(ξ2 j) the following FILTER configuration in the session file: p=0 q=0

1 P  Q  X X  2 vorticity.vtu   = φp(ξ1i)  uˆ pqφq(ξ2 j) . 3 100   p=0 q=0 4 5 vorticity By precomputing the bracketed term and storing it for each p 6 removefield:fieldname=u,v,p and j, we can reduce the number of floating point operations 7 4 3 8 from O(P ) to O(P ). One of the distinguishing features of Nek- tar++ is that these types of basis functions are defined not only This yields a number of parallel-format block-unstructured VTK for tensor-product quadrilaterals and hexahedra, but also un- files (the VTU format), as described in [47], that can be visu- structured elements (triangles, tetrahedra, prisms and pyramids) alised in appropriate such as Paraview [48] and subsequently through the use of a collapsed coordinate system and appropriate assembled to form an animation. Other example applications basis functions. For more details on this formulation, see [12]. include extracting slices or isocontours of the solution at several The efficient implementation of the above techniques on com- time instants for creating an animation. Since the resulting files putational hardware still poses a significant challenge for practi- are much smaller than the complete solution, there are significant tioners of higher-order methods. For example, Nektar++ was savings in terms of storage when compared to the traditional ap- originally designed using a hierarchical, inheritance-based ap- proach of obtaining checkpoints which are later post-processed. proach, where memory associated with elemental degrees of Another possibility is to perform the post-processing operations freedom is potentially scattered non-contiguously in memory. after the last time-step, but before the solver returns. This way, Although this was more appropriate at the initial time of de- it is possible to avoid the necessity of starting a new process velopment a decade ago, in modern terms this does not align which will have to load the solution again, leading to savings in with the requirements for optimal performance, in which large computing costs. blocks of memory should be transferred and as many opera- tions acted on sequentially across elements, so as to reduce 3.3. Collective linear algebra operations memory access and increase data locality and cache usage. The One of the primary motivations for the use of high-order current efforts of the development team are therefore focused methods is their ability to outperform standard linear methods on redesigns to the library to accommodate this process. In on modern computational architectures in terms of equivalent particular, since version 4.1, Nektar++ has included a library error per degree of freedom. Although the cost in terms of called Collections which is designed to provide this optimisa- floating point operations (FLOPS) of calculating these degrees tion. In the hierarchy of Nektar++ libraries, Collections sits of freedom increases with polynomial order, the dense, locally- between LocalRegions, which represent individual elements, compact structure of higher-order operators lends itself to the and MultiRegions, which represent their connection in either a current hardware environment, in which FLOPS are readily C0 or discontinuous Galerkin setting. The purpose of the library, available but memory bandwidth is highly limited. In this setting, which is described fully in [40], is to facilitate optimal linear the determining factor in computational efficiency, or ability to algebra strategies for large groupings of elements that are of reach peak performance of hardware, is the arithmetic intensity the same shape and utilise the same basis. To facilitate efficient of the method; that is, the number of FLOPS performed for execution across a broad range of polynomial orders, we then each byte of data transferred over the memory bus. Algorithms consider a number of implementation strategies including: need to have high arithmetic intensity in order to fully utilise the • StdMat: where a full-rank matrix of the operator on a computing power of modern computational hardware. standard element is constructed, so that the operator can be However, the increase in FLOPS at higher polynomial orders evaluated with a single matrix-matrix multiplication; must be balanced against the desired accuracy so that execution times are not excessively high. An observation made early in • IterPerExp: where the sum-factorisation technique is eval- 9 uated using an iteration over each element, but geometric where u∗ denotes the received field, u∗∗ the filtered field and factors (e.g. ∂x/∂ξ) are amalgamated between elements; ∆λ the user specified filter width. The filter removes small and scale features a priori and thus reduces the error associated with the transform. Moreover, it does not add unwanted • SumFac: where the sum-factorisation technique is evalu- discontinuities at the element boundaries and imposes a ated across multiple elements concurrently. global smoothing, due to the continuity of the intermediate This is then combined with an autotuning strategy, run at simu- expansion. lation startup, which attempts to identify the fastest evaluation strategy depending on characteristics of the computational mesh • Step 3: A linear interpolation in time can be performed to and basis parameters. Autotuning can be enabled in any sim- overcome larger time scales of the sending application. Due ulation through the definition of an appropriate tag inside the to their identical expansion bases and orders, the resulting NEKTAR block that defines a session file: coefficients can be directly used in the original expansion of the solver. 1 A finer-grained level of control over the Collections setup and As is evident from the above strategy, sending fields to other implementation strategies is documented in the user guide. Per- solvers only requires an application to provide discrete values formance improvements using collections are most readily seen at the requested locations. In Nektar++, this can be achieved in fully-explicit codes such as the CompressibleFlowSolver by evaluating the expansions or by a simpler approximation and AcousticSolver. The vortex pair example defined in Sec- from the immediately available quadrature point values. All tion 5.2 and provided in Example A.15 demonstrates the use of processing is performed by the receiver. The complex handling the collections library. of data transfers is accomplished by the open-source CWIPI library [51], which enables coupling of multiple applications by 3.4. Solver coupling using decentralized communication. It is based purely on MPI The Nektar++ framework was extended with a coupling inter- calls, has bindings for C, Fortran and Python, handles detection face [49] that enables sending and receiving arbitrary variable of out-of-domain points and has been shown to exhibit good fields at run time. Using such a technique, a coupling-enabled performance [52]. With only CWIPI as a dependency and a solver can exchange data with other applications to model multi- receiver-centric strategy that can be adjusted to any numerical physics problems in a co-simulation setup. Currently, two coup- setup, the implementation of compatible coupling interfaces is ling interfaces are available within Nektar++; a file-based sys- relatively straightforward. tem for testing purposes, and an MPI-based implementation An example result of a transferred field is given in Figure 3. for large-scale HPC implementations. The latter was designed For a hybrid noise simulation [49], the acoustic source term to facilitate coupling Nektar++ solvers with different software depicted at the top was computed by a proprietary, finite volume packages which use other discretization methods and operate flow solver on a high-resolution mesh (∆h < 1.4mm) and trans- on vastly different time- and length-scales. To couple two in- ferred to the Nektar++AcousticSolver, which we describe compatible discretisations, an intermediary expansion is used in Section 5.2. After sampling, receiving, filtering, projection which can serve as a projection between both sides of the field. and temporal interpolation, the extrema of the source term are Coupling is achieved by introducing an intermediate expansion, cancelled out and blurred by the spatial filter. Consequently, a which uses the same polynomial order and basis definitions as much coarser mesh (∆h = 20mm) with a fourth order expansion the parent Nektar++ solver; however, a continuous projection is sufficient for the correct representation of the resulting field, and a larger number of quadrature points than the original expan- which significantly reduces the computational cost of the sim- sion of the Nektar++ solver are used. Based on this intermediate ulation. The corresponding loss of information is well defined representation, the coupling strategy is comprised of three major by the filter width ∆λ and limited to the high-frequency range, steps: which is irrelevant for the given application. • Step 1: The field values are requested from the sending ap- plication at the intermediate expansion’s quadrature points. 3.5. Python interface Here, aliasing can be effectively avoided by an appropriate Although Nektar++ is designed to provide a modern C++ selection of quadrature order and distribution. Point values interface to high-order methods, its use of complex hierarchies that lie outside of the senders’ computational domain can of classes and inheritance, as well as the fairly complex syntax be either replaced by a default value or extrapolated from of C++ itself, can lead to a significant barrier to entry for new their available nearest neighbour. users of the code. At the same time, the use of Python in general scientific computing applications, and data science application • Step 2: The physical values at the quadrature points are areas in particular, is continuing to grow, in part due to its rel- then transformed into modal space. This is achieved by a atively simple syntax and ease of use. Additionally, the wider modified forward transform that involves the differential Python community offers a multitude of packages and modules low-pass filter [50]: to end users, making it an ideal language through which dispar- !2 ∆λ ∂u∗∗ ate software can be ‘glued’ to perform very complex tasks with ∗∗ − ∇2 ∗∗ ∗ u u = u , = 0 (3) relative ease. For the purposes of scientific computing codes, 2π ∂xi ∂Ω 10 ω˙ c [Pa/s] ++ 5E7 1.5E8 -5E7 0 -1.5E8 Nektar AcousticSolver ++ Array environment or Py- Nektar cient for large arrays), ++ ffi release. C to ensure that instead of storage ++ to be passed into ++ NumPy mesh) and CAA ( Nektar Nektar mm ndarray 4 5.0 Python bindings to perform a simple Galerkin . 1 ++ < h ∆ Nektar This section highlights our recent developments on numerical Modern scientific computation faces unprecedented demands import NekPy.LibUtilities import as NekPy.StdRegions LibUtil import as numpy StdReg as np nModes = 8 pType = LibUtil.PointsType.GaussLobattoLegendre bType = LibUtil.BasisType.Modified_A = ("Integral print (quad.Integral(fx))) f}".format {:.4 nPts = nModes + 1 pKey = LibUtil.PointsKey(nPts, pType) bKey = LibUtil.BasisKey(bType, nModes, pKey) quad = StdReg.StdQuadExp(bKey, bKey) proj = quad.FwdTrans(fx) fx = np.cos(x) * np.cos(y) methods contained with the 4.1. Method of moving frames on computational simulation in multidimensional complex do- # P=8 Set modes Q=P+1 and quadrature . points # Create -distributed GLL quadrature . points # Create C^0 modified basis on these . points # Create # quadrilateral in expansion each using coordinate this (tensor direction basis ). product #L^2 projection # f(x,y)= of (y) cos (x)* cos quadrilateral onto # . element the x,y and Note are evaluation numpy of () ndarrays cos is performed using . numpy # Integrate function over the . element 4. Developments in Numerical Methods universally used in Pythonfor scientific data computing storage, applications and the classes. This allows an functions directly and vice versa.the Moreover Boost.Python this interface interaction uses to copying data (which could bethis rather interaction ine uses a shareddata structures. memory space Reference counting between is thepersistence then two and used to memory ensure deallocation, data memory depending was on first allocated whether withinthon. the Listing 1: Using the projection and integral on a standard quadrilateral element. x, y = quad.GetCoords() 11 ] ++ ++ lib- 55 C C Array ++ ++ Nektar ] or SWIG [ . These factors Nektar 54 ering both a teach- ff functions and classes ers a set of high-level class, which is almost SciPy ff o ++ and the Gauss-Lobatto-Legendre and C 1 ++ + libraries, o API, any bindings must be hand- P NumPy cient and impractical, as can be ort, handwritten wrappers provide onto a standard quadrilateral ex- C Nektar ++ ff ) ffi y ( . In our particular case, heavy use of shared_ptr API. A perceived drawback of this ap- NumPy.ndarray , using ers a route to handling many of the com- cos 7 ff C Nektar ) ort has been extended to facilitate seamless x = ff ( numpy P cos API also enables the use of higher-performance = , as well as better interoperability with key Python ], which o C ). As can be seen in this example, the aim of the ) 53 y ++ 20mm mesh and fourth order expansion). Slice through a three-dimensional domain. , (1) features such as x 2 = ( API. Additionally, the Python bindings make full use of h f 11 sin ∆ er the ability to automatically generate bindings from the Nektar 4 An example of the Python bindings can be seen in Listing 1, The Version 5.0 release of λ ++ ++ ff ∆ ticular, significant e interaction between the interface, so that they can be used as a learning aide forBoost.Python’s to automatic the datatype full conversion facilities. In par- tionally perform an integral ofis this function (whose exact value bindings is to closely mimic the layout and structure of the tion pansion at order quadrature points to exactly integrate the mass matrix. We addi- class for shared storage meant that many automated solutions in packages such as other than implementation e higher quality and more stability, particularly when combined o source. However, our experience of this process has been that, plexities and subtleties of translating across to the Python proach is the lack ofa automation. wrapper As around Boost.Python the is Python essentially all software. Topackage achieve [ this, we leverage the Boost.Python raries. The purpose of thesethe bindings interfaces is to to key significantly simplify ing aide for new users to the code, as well as a way to connect interpreted perspective. bindings for a number of classes within the core introduce new users to a complexother piece software of packages software, interact and, with atdegree the of same performance time, that retain would a not certain be possible from a purely compiled code, making it suitablepure in instances Python where would interpreted beseen ine using packages such as therefore make Python an ideal language through which to both the Python Figure 3: Instantaneous acoustic source term as represented in CFD (proprietary finite volume flow solver with with C C where we perform the Galerkin projection of the smooth func- would not be well-suited to this particular application. with an automated continuous integration process as is adopted written, whereas other software such as f2py [ with other software packages and expand the scope of the over- vector or the gradient can be expanded on moving frames as follows:

v = v1e1 + v2e2, ∇u = u1e1 + u2e2.

Applying this expansion to a given PDE enables us to re-express (a) Conservational laws (b) Diffusion equation it with moving frames on any curved surface. Then, the weak formulation of the PDE with moving frames, called the MMF scheme, on a curved surface is similar to the scheme in the Euclidean space, in the sense that it contains no metric tensor or its derivatives and it does not require the construction of a continuous curved axis in Ω which often produces geometric singularities. This is a direct result of the fact that moving frames are locally Euclidean. However, the numerical scheme (c) Shallow water equa- (d) Maxwell’s equations with moving frames results in the accurate solutions of PDEs tions on any types of surfaces such as spheres, irregular surfaces, Figure 4: Numerical simulation of the MMF scheme in Nektar++ for several or non-convex surfaces. Some examples of simulations that partial differential equations solved on the sphere. can be achieved under this approach include conservational laws [63], the diffusion equation [64], shallow water equations (SWEs) [65], and Maxwell’s equations [66]. Representative mains composed of various materials. Examples of this include results from Nektar++ for these equations on the surface of a solving shallow water equations on a rotating sphere for weather sphere are shown in Figure 4. prediction, incorporating biological anisotropic properties of Moreover, moving frames have been proven to be efficient cardiac and neural tissue for clinical diagnosis, and simulating for other geometrical realizations, such as the representation the electromagnetic wave propagation on metamaterials for con- of anisotropic properties of media on complex domains [64], trolling electromagnetic nonlinear phenomena. All of these ex- incorporating the rotational effect of any arbitrary shape [65], amples require the ability to solve PDEs on manifolds embedded and adapting isotropic upwind numerical flux in anisotropic in higher-dimensional domains. The method of moving frames media [66]. The accuracy of the MMF scheme with the higher- (MMF) implemented in Nektar++ is a novel numerical scheme order curvilinear meshes produced by NekMesh, described in for solving such computational simulations in geometrically- Section 5.1, is reported to be significantly improved for a high p complex domains. and conservational properties such as mass and energy after a Moving frames, principally developed by Élie Cartan in the long time integration, whereas the accuracy of the MMF-SWE context of Lie groups in the early 20th century [56, 57, 58], scheme on NekMesh is presented to be the best among all the are orthonormal vector bases that are constructed at every grid previous SWE numerical schemes [67]. Ongoing research topics point in the discrete space of a domain Ω. Moving frames are on moving frames are to construct the connections of frames, to considered as an ‘independent’ coordinate system at each grid compute propagational curvature, and finally to build an Atlas point, and can be viewed as a dimensional reduction because (a geometric map with connection and curvature) in order to the number of moving frames corresponds to dimensionality, provide a quantitative measurement and analysis of a flow on independent of the space dimension. In this sense, ‘moving’ does complex geometry. Examples of ongoing research topics in this not mean that the frames are time-dependent, but are different in area are include electrical activation in the heart [68] and fiber a pointwise sense: i.e. if a particle travels from one point to the tracking of white matter in the brain. other, then it may undergo a different frame, which looks like a series of ‘moving’ frames. More recently this approach has been 4.2. Spatially-variable polynomial order adapted more practical and computational purposes, mostly in computer vision [59, 60, 61] and medical sciences [62]. An important difficulty in the simulation of flows of practical Building such moving frames is easily achieved by differen- interest is the wide range of length- and time-scales involved, especially in the presence of turbulence. This problem is ag- tiating the parametric mapping x of a domain element Ωe with respect to each coordinate axis of a standard reference space, gravated by the fact that in many cases it is difficult to predict followed by a Gram-Schmidt orthogonalization process. We ob- where in the domain an increase in the spatial resolution is re- tain orthonormal vector bases, denoted as ei, with the following quired before conducting the simulation, while performing a properties: uniform refinement across the domain is computationally pro- hibitive. Therefore, in dealing with these types of flows, it is i j i i e · e = δ j, ke k = 1, 1 ≤ i, j ≤ 3, advantageous to have an adaptive approach which allows us to dynamically adjust the spatial resolution of the discretisation i where δ j denotes the Kronecker delta. Moreover, the moving both in time and in space. frames are constructed such that they are differentiable within Within the spectral/hp element framework, it is possible to each element and always lie on the tangent plane at each grid refine the solution following two different routes. h-refinement point. These two intrinsic properties of frames implies that any consists of reducing the size of the elements as would be done 12 in low-order methods. This is the common approach for the in the quadrilateral elements (typically used in boundary layer initial distribution of degrees of freedom over the domain, with discretisation) and the triangle elements (typically used to fill the the computational mesh clustering more elements in regions outer volume). As a final point, we note that the use of variable where small scales are expected to occur, such as boundary polynomial order is not limited to quasi-3D simulations; both layers. The other route is p-refinement (sometimes called p- CG and DG discretisations fully support all element shape types enrichment), where the spatial resolution is increased by using a in 2D and 3D, with parallel implementations (including fre- higher polynomial order to represent the solution. As discussed quently used preconditioners) also supporting this discretisation in [69], the polynomial order can be easily varied across ele- option. ments in the spectral/hp element method if the expansion basis is chosen appropriately. In particular if a basis admits a boundary- interior decomposition, such as the modified C0 basis described in section 2 or the classical Lagrange interpolant basis, then 4.3. Global mapping the variation in polynomial order can be built into the assembly operation between interconnected elements. This allows for a Even though the spectral/hp element spatial discretization simple approach to performing local refinement of the solution, allows us to model complex geometries, in some cases it can be requiring only the adjustment of the polynomial order in each advantageous to apply a coordinate transformation for solving element. problems that lie in a coordinate system other than the Cartesian With this in mind, an adaptive polynomial order procedure frame of reference. This is typically the case when the trans- has been implemented in Nektar++, with successful applications formed domain possesses a symmetry; this allows us to solve to simulations of incompressible flows. The basic idea in this the equations more efficiently by compensating for the extra approach is to adjust the polynomial order during the solution cost of applying the coordinate transformation. Examples of this based on an element-local error indicator. The approach we occur when a transform can be used to map a non-homogeneous used is similar to that demonstrated for shock capturing in [70], geometry to a homogeneous geometry in one or more directions. whereby oscilliatory behaviour in the solution field is detected This makes it possible to use the cheaper quasi-3D approach, by an error sensor on each element Ω calculated as e where this direction is discretized using a Fourier expansion, and 2 also for problems with moving boundaries, where we can map kuP − uP−1k2 S e = , the true domain to a fixed computational domain, avoiding the ku k2 P 2 need for recomputing the system matrices after every time-step. where uP is the solution obtained for the u velocity using the The implementation of this method was achieved in two parts. current polynomial order P, uP−1 is the projection of this solution First, a new library called GlobalMapping was created, imple- 2 to a polynomial of order P − 1 and k · k2 denotes the L norm. menting general tensor calculus operations for several types of After each nsteps time-steps, this estimate is evaluated for each transformations. Even though it would be sufficient to consider element. For elements where the estimate of the error is above a just a completely general transformation, specific implement- chosen threshold, P is incremented by one, whereas in elements ations for particular cases of simpler transformations are also with low error P is decremented by one, respecting minimum included in order to improve the computational efficiency, since and maximum values for P. The choice of nsteps is critical for in these simpler cases many of the metric terms are trivial. In the efficiency of this scheme, since it has to be sufficiently large a second stage, the incompressible Navier-Stokes solver was to compensate for the costs of performing the refinement over extended, using the functionality of the GlobalMapping library a large number of time-steps, yet small enough to adjust to to implement the methods presented in [71]. Some examples changes in the flow. More details on this adaptive procedure for of applications are given in [72, 73]. Embedding these global adjusting the polynomial order, as well as its implementation in mappings at the library level allows similar techniques to be both CG and DG regimes, are found in [69]. introduced in other solvers in the future. An example of an application of the adaptive polynomial or- Figure 6 presents an example of the application of this tech- der procedure is presented in Figure 5, showing the spanwise nique, indicating the recirculation regions (i.e. regions where vorticity and polynomial order distributions for a quasi-3D simu- the streamwise velocity is negative) for the flow over a wing lation of the incompressible flow around a NACA 0012 profile at with spanwise waviness. In this case, the coordinate transforma- Reynolds number Re = 50,000 and angle of attack α = 6◦. The tion removes the waviness from the wing, allowing us to treat session files to generate this data can be found in Example A.17. the transformed geometry with the quasi-3D formulation. It It is clear that the regions with transition to turbulence and the is important to note that this technique becomes unstable as boundary layers are resolved using the largest polynomial order the waviness amplitude becomes too large. The fully explicit allowed, while regions far from the airfoil use a low polynomial mapping is more sensitive to instability than the semi-implicit order. This way, the scheme succeeds in refining the discretisa- mapping as discussed in [71]. However, in cases where it can tion in the more critical regions where small scales are present, be successfully applied, it leads to significant gains in terms of without incurring in the large computational costs that would computational cost when compared against a fully 3D imple- be required to uniformly increase the polynomial order. More mentation. The session files used in this example can be found simply stated, it is possible to specify different polynomial order in Example A.18. 13 (a) Spanwise vorticity (b) Polynomial order

Figure 5: Polynomial order and vorticity distributions obtained with simulation using adaptive polynomial order for the flow around a NACA 0012 profile with Re = 50,000 and α = 6◦. Taken from [69].

nodes, as they do not change the topology of the underlying linear mesh, but instead deform it to fit a required geometry. In this bottom-up procedure, these nodes are first added on edges, followed by faces and finally the volume-interior. At each step, nodes are generated on boundaries to ensure a geometrically accurate representation of the model. A key issue in this process, however, is ensuring that elements remain valid after the insertion of high-order nodes, as this process is highly sensitive to boundary curvature. A common example of this is in boundary layer mesh generation [75], where Figure 6: Time-averaged streamwise reversing regions for incompressible flow elements are typically extremely thin in order to resolve the over a wing with spanwise waviness with Re = 1,000 and α = 12◦. high-shear of the flow near the wall. In this setting, naively introducing curvature into the element will commonly push one 5. Applications face of the element through another, leading a self-intersecting element and thus a mesh that is invalid for computation. In this section, we demonstrate some of the new features An important contribution of NekMesh to the high-order mesh provided in our new release, with a focus on application areas. generation community was presented in references [75, 76], where we alleviate this risk through the creation of a coarse, 5.1. NekMesh single element boundary layer mesh with edges orthogonal to the In the previous publication [33], we briefly outlined the Mesh- boundary. The thickness of the layer of elements gives enough Convert utility, which was designed to read various external file room for a valid curving of the ‘macro’-elements. After creation formats and perform basic processing and format conversion. of the high-order mesh, a splitting of these boundary elements In the new release of Nektar++, MeshConvert has been trans- can be performed using the isoparametric mapping between formed into a new tool, called NekMesh, which provides a series the reference space and the physical space. We then apply the of tools for both the generation of meshes from an underlying original isoparametric mapping to construct new elements within CAD geometry, as well as the manipulation of linear meshes the ‘macro’ element, thereby guaranteeing their validity. This to make them suitable for high-order simulations. While Mesh- ensures conservation of the validity and quality of subdivided Convert was dedicated to the conversion of external mesh file elements while achieving very fine meshes. An example is formats, the scope of NekMesh has been significantly broadened shown in Figure 7 where the coarse boundary layer mesh of to become a true stand-alone high-order mesh generator. Figure 8 was split into five layers, using a geometric progression The generation of high-order meshes in NekMesh follows an a of growth rate r = 2 in the thickness of each layer. The session posteriori approach proposed in [74]. We first generate a linear files used to create the meshes for Figure 8 and 9 can be found mesh using traditional low-order mesh generation techniques. in Example A.20. We then add additional nodes along edges, faces and volumes to A complementary approach to avoid invalid or low quality achieve a high-order polynomial discretisation of our mesh. In high-order elements is to optimise the location of high-order the text below, we refer to these additional nodes as ‘high-order’ nodes in the mesh. The approach proposed in [13, 77] of a 14 aerofoil generator. With just three inputs – a NACA code, an angle of attack and dimensions of the bounding box – a geometry is generated and passed to the meshing software. An example is shown in Figure 8 of a mesh generated around a NACA 0012 aerofoil at an angle of attack of α = 15◦.

Figure 7: Example of split boundary layer mesh.

variational framework for high-order mesh optimisation was im- Figure 8: Example of mesh generated around a NACA 0012 aerofoil. plemented in NekMesh. In this approach, we consider the mesh to be a solid body, and define a functional based on the deforma- The other tool is based on the GEO geometry file format of the tion of each high-order element. This functional can correspond Gmsh [80] open source mesh generator. The GEO format is an to physical solid mechanics governing equations such as linear or intepreter for the procedural generation of geometries. NekMesh non-linear elasticity, but also provides the possibility to accom- has been made capable to understand basic GEO commands, modate arbitrary functional forms such as the Winslow equations which gives the possibility to generate simple two-dimensional within the same framework. Minimising this functional is then geometries. An example is shown in Figure 9 of a mesh gener- achieved through classical quasi-Newton optimisation methods ated around a T106C turbine blade: the geometry was created with the use of analytic gradient functions, alongside a Jacobian using a GEO script of lines and splines. regularisation technique to accomodate initially-invalid elements. As demonstrated in [13], the approach is scalable and allows the possibility to implement advanced features, such as the ability to slide nodes along a given constrained CAD curve or surface. Along these lines, much of the development of NekMesh has focused on the access to a robust CAD system for CAD quer- ies required for traditional meshing operations. Assuming that the CAD is watertight, we note that only a handful of CAD operations are required for mesh generation purposes. NekMesh therefore implements a lightweight wrapper around these CAD queries, allowing it to be interfaced to a number of CAD systems. By default, we provide an interface to the open-source Open- Figure 9: Example of mesh generated around a t106c turbine blade. CASCADE library [78]. OpenCASCADE is able to read the STEP CAD file format, natively exported by most CAD design tools, and load it into the system. At the same time, the use of 5.2. Acoustic solver a wrapper means that users and developers of NekMesh are not Time-domain computational aeroacoustics simulations are exposed to the extensive OpenCASCADE API. Although Open- commonly used to model noise emission over wide frequency CASCADE is freely available and very well suited to simple ranges or to augment flow simulations in hybrid setups. Com- geometries, it lacks many of the CAD healing tools required pared with fully compressible flow simulations, they require for more complex geometries of the type typically found in in- less computational effort due to the reduced complexity of the dustrial CFD environments, which can frequently contain many governing equations and larger length scales [81]. However, due imperfections and inconsistencies. However, the use of a light- to the small diffusive terms, as well as the long integration times weight wrapper means that other commerical CAD packages and distances required for these simulations, highly accurate can be interfaced to NekMesh if available. To this end, we have numerical schemes are crucial for stability [82]. This numerical implemented a second CAD interface to the commerical CFI accuracy can be provided by spectral/hp element methods, even CAD engine, which provides a highly robust interface and is on unstructured meshes in complex geometries, and hence Nek- described further in references [79, 39, 77]. tar++ provides a good candidate framework on which to build While users are recommended to create their CAD models such an application code. in a dedicated CAD software, export them in STEP format and The latest release of Nektar++ includes a new load them in NekMesh, they also have the possibility to create AcousticSolver, which implements several variants of their own simple two-dimensional models using one of two tools aeroacoustic models. These are formulated in a hyperbolic made available to them. The first tool is an automatic NACA setting and implemented in a similar fashion to the compressible 15 Euler and Navier-Stokes equations, encapsulated in Nektar++ 5E-4 inside the CompressibleFlowSolver. Following this imple- mentation guideline, the AcousticSolver uses a discontinuous Galerkin spatial discretisation with modal or nodal expansions 2E-4 to model time-domain acoustic wave propagation in one, two or [-] [-]

three dimensions. It implements the operators of the linearized 0 p 0 r / / a 2

Euler Equations (LEE) and the Acoustic Perturbation Equations p x 1 and 4 (APE-1/4) [83], both of which describe the evolution of perturbations around a base flow state. For the APE-1/4 -2E-4 operator, the system is defined by the hyperbolic equations

a a ! -5E-4 x1/r0 [-] ∂p 2 a p + c ∇ · ρu + u = ω˙ c, (4a) ∂t c2 (a) Normalized acoustic pressure distribution at t = 1s with the mesh a a ! shown in light gray and the sampling line in a black dashed line. ∂u a p + ∇ u · u + ∇ = ω˙ m, (4b) ∂t ρ 1E-4 3

where u denotes the flow velocity, ρ its density, p its pressure 2 and c corresponds to the speed of sound. The quantities ua and pa refer to the irrotational acoustic perturbation of the flow velo- 1 city and its pressure, with overline quantities such as u denoting [-] p 0 / the time-averaged mean. The right-hand-side acoustic source a p terms terms ω˙ c and ω˙ m are specified in the session file. This -1 allows for the implementation of any acoustic source term for- -2 mulation so that, for example, the full APE-1 or APE-4 can be Analytical obtained. In addition to using analytical expressions, the source -3 AcousticSolver terms and base flow quantities can be read from disk or trans- ferred from coupled applications, enabling co-simulation with a 0 50 100 driving flow solver. Both, LEE and APE support non-quiescent r/r0 [-] base flows with inhomogeneous speed of sounds. Accordingly, (b) Normalized acoustic pressure along sample line, obtained with the the Lax-Friedrichs and upwind Riemann solvers used in the AcousticSolver and analytical solution [84]. AcousticSolver employ a formulation which is valid even for strong base flow gradients. The numerical stability can be fur- Figure 10: Normalized acoustic pressure for Γ/(cr0) = 1.0 and Mar = 0.0795 at ther improved by optional sponge layers and suitable boundary t = 1s. conditions, such as rigid wall, farfield or white noise. A recurring test case for APE implementations is the “spin- 5.3. Fluid-Structure Interaction (FSI) and Vortex-Induced Vi- ning vortex pair” [84]. It is defined using two opposing vortices, bration (VIV) that are each located at r0 from the center x1 = x2 = 0 of a square domain with edge length −100 r0 ≤ x1,2 ≤ 100 r0. The vortices Fluid-structure interaction (FSI) modelling poses a great chal- have a circulation of Γ and rotate around the center at the an- lenge for the accurate prediction of vortex-induced vibration 2 gular frequency ω = Γ/4πr0 and circumferential Mach number (VIV) of long flexible bodies, as the full resolution of turbulent Mar = Γ/4πr0c. The resulting acoustic pressure distribution is flow along their whole span requires considerable computational shown in Figure 10a and was obtained on an unstructured mesh resources. This is particularly true in the case of large aspect- of 465 quadrilateral elements with a fifth order modal expansion ratio bodies. Although 2D strip-theory-based modelling of such (P = 5). The session files used to generate this example can problems is much more computationally efficient, this approach be found in Example A.19. Along the black dashed line, the is unable to resolve the effects of turbulent fluctuations on dy- acoustic pressure shown in Figure 10b exhibits minor deviations namic coupling of FSI systems [85, 86]. A novel strip model, from the analytical solution defined in [84], but is in excellent which we refer to as ‘thick’ strip modelling, has been developed agreement with the results of the original simulation in [83]. The using the the Nektar++ framework in [87], whose implementa- latter was based on a structured mesh with 19,881 nodes and tion is supported within the incompressible Navier-Stokes solver. employed a sponge layer boundary condition and spatial filtering In this approach, a three-dimensional DNS model with a local to improve the stability. Due to the flexibility and numerical spanwise scale is constructed for each individual strip. Coup- accuracy of the spectral/hp method, a discretization with only ling between strips is modelled implicitly through the structural 16,740 degrees of freedom was sufficient for this simulation, and dynamics of the flexible body. no stabilization measures (e.g. SVV or filtering) were necessary In the ‘thick’ strip model, the flow dynamics are governed to reproduce this result. by a series of incompressible Navier-Stokes equations. The 16 governing equations over a general local domain Ωn associated (a) with the n-th strip are written as 1st

∂u 1 th n + (u · ∇)u = −∇p + ∇2u on Ω (5) 4 ∂t n n n Re n n

∇ · un = 0 on Ωn, (6) where the vector u = (u , v , w ) denotes the fluid velocity in- n n n n y side the n-th strip, with p being the corresponding dynamic x n z pressure and Re the Reynolds number, which we assume to be 16th constant across all strips. The governing equations are supple- (b) (c) (d) mented by boundary conditions of either Dirichlet or Neumann type. In particular, no-slip boundary conditions are applied to the wall of the body, and the velocity of the moving wall is st th th imposed and determined from the transient solution of structural 1 4 16 dynamics equations of motion. A linearized tensioned beam Figure 11: Instantaneous vortex shedding visualized by the vortex-identification model is used to describe the structural dynamic behavior of the of Q-criterion (iso-surfaces of the Q-value= [−5, 5]) in body-fitted coordinates: flexible body, which is expressed by the system (a) full domain view and zoom-in view of (b) first strip; (c) fourth strip and (d) 16th strip. ∂2η ∂η ∂2η ∂4η ρc + c − T + EI = f. (7) ∂t2 ∂t ∂z2 ∂z4 5.4. Aeronautical applications In the above, ρc is the structural mass per unit length, c is the structural damping per unit length, T is the tension and EI is CFD is now an indispensable tool for the design of aircraft the flexural rigidity. f denotes the vector of hydrodynamic force engines, and it has become commonplace in the design guidance per unit length exerted on the body’s wall and η is the structural of new technologies and products [89]. In order for CFD to displacement vector. be effectively adopted in industry, validation and verification is Homogeneity is imposed in the spanwise direction to the required over a broad design space. This is challenging for a local flow within individual strips, under the assumption that the number of reasons, including the range of operating conditions width of the strips is much shorter with respect to the oscillation (i.e. Reynolds numbers, Mach numbers, temperatures and pres- wavelength of excited higher-order modes of the flexible body. sures), the complexity of industrial geometries (including uncer- This therefore enables the use of the computationally-efficient tainty due to manufacturing variations) and their relative motion quasi-3D approach discussed in previous sections within each (i.e. rotor-stator interactions). Even though RANS continues to strip domain, in which two-dimensional spectral elements with be the backbone of CFD-based design, the recent development piecewise polynomial expansions are used in the (x, y) plane of high-order unstructured solvers and high-order unstructured and Fourier expansions are used in the homogeneous z direction. meshing algorithms, combined with the lowering cost of HPC This also requires the assumption of a spanwise oscillation of infrastructures, has the potential to allow for the introduction the flexible body with respect to its full-length scale. As a of high-fidelity transient simulations using large-eddy or direct consequence, the motion variables and fluid forces are expressed numerical simulations (LES/DNS) in the design loop, taking the as a complex Fourier series, and the tensioned beam model is role of a virtual wind tunnel. decoupled into a set of ordinary differential equations, which can As part of our effort to bridge the gap between academia be solved simply by a second-order Newmark-β method [88]. A and industry, we have been developing the expertise to analyse partitioned approach is adopted to solve the coupled FSI system, turbomachinery cases of industrial interest using Nektar++.A in which coordinate mapping technique discussed in Section 4.3 key problem to overcome in these cases is the sensitivity of these is implemented for the treatment of the moving wall [71]. simulations to variations in spatial resolution, which requires the To illustrate the application of this modelling approach, VIV use of stabilisation techniques in order to improve robustness. of a long flexible cylinder with an aspect ratio of 32π which Nektar++ makes use of the spectral vanishing viscosity (SVV) is pinned at both ends is simulated at Re = 3,900, with 16 method, originally introduced for the Fourier spectral method thick strips allocated evenly along the axial line of the cylinder. by Tadmor [90]. SVV is a model-free stabilization technique The instantaneous spanwise wake structure is visualized by the that acts at the subgrid-scale level and allows for exponential vortex-identification of Q-criterion in Figure 11. As the figure convergence properties to be conserved in sufficiently resolved demonstrates, the distribution of vortex shedding illustrates that a simulations. Recent developments in this area have focused second harmonic mode is excited along the spanwise length and on a new SVV kernel by Moura et al. [3], which replicates the the turbulent structure is captured well in the local domain of the desirable dispersion and diffusion properties of DG schemes and strips. This emphasises the convincing advantage of providing does not require the manual tuning of parameters found in the highly-resolved description of hydrodynamics involved in the classical SVV formulation. More specifically, the dissipation FSI process. The session files used to run this simulation can be curves of the CG scheme of order P were compared to those found in Example A.21. of DG order P − 2, and the DG kernel was determined from 17 minimization of the point-wise L2 norm between these curves. SVV stabilization is combined with spectral/hp dealiasing [29] to eliminate errors arising from the integration of non-linear terms. A T106A low pressure turbine vane was investigated at mod- erate regime (Re = 88,450), and the convergence properties of the main flow statistics were extensively explored with the aim of developing a set of best practices for the use of spec- tral/hp element methods as a high-fidelity digital twin [91]. The velocity correction scheme of [92] implemented in the IncNavierStokesSolver is adopted, using the quasi-3D ap- proach discussed in the previous sections and Taylor-Hood type elements in 2D (where spaces of order P polynomials on each element are used for the velocity components, and P − 1 for pressure). Uniform inflow velocity is combined with pitchwise periodicity and high-order outflow boundary conditions [93]. Numerical stability is ensured by employing SVV with the DG kernel in the x-y planes, and the traditional exponential kernel for the spanwise Fourier direction. A representation of the vortical structures is shown in Figure 12: transition to turbulence takes place only in the final portion of the suction surface, where the Figure 12: Instantaneous isosurfaces of Q-criterion (Q = 500) contoured by separated shear layer rolls up due to Kevin-Helmoltz instability. velocity magnitude, showing the vortical structures evolving on the suction The separation bubble remains open and merges into the trailing surface and in the wake of a T106A cascade. The computational domain is edge wake, giving rise to large-scale vortical structures. This replicated in the spanwise and pitchwise directions for visual clarity. work was conducted with clean inflow conditions to isolate the effect of the numerical setup on the various flow statistics. How- where we select a free-stream Mach number Ma = 2.15, shock ever, turbomachinery flows are highly turbulent: subsequent angle β = 30.8◦, a stagnation pressure p = 1.07 × 104 Pa, work focused on the treatment of flow disturbances to reproduce 0 a stagnation temperature of T = 293K, a Reynolds number more accurately a realistic environment [94]. With this aim, a 0 Re = 105 (referred to the inviscid shock impingement location localised synthetic momentum forcing was introduced in the x measured from the plate leading edge), and a Prandtl number leading edge region to cause flow symmetry breakdown on the sh Pr = 0.72. Unlike [96], the leading edge is not included in the suction surface, and promote anticipated transition to turbulence. simulations. The inflow boundary is located at x = 0.3x where This approach yields an improvement in the agreement with sh the analytical compressible boundary layer solution of [97] is experimental data, with no increase in the computational cost. imposed. The session files used in this example can be found in With the intent of being able to tackle cases in which com- Example A.22. At the inlet, the Rankine-Hugoniot relations that pressibility effects are not negligible, there has been an effort describe the incident shock are superimposed over the compress- in validating the CompressibleFlowSolver for shock-wave ible boundary layer solution. At the top boundary we impose boundary layer interaction (SWBLI) configurations. This solver, the constant states corresponding to inviscid post incident shock described in our previous publication [33], formulates the com- wave state. At the outlet in the subsonic part of the boundary pressible Navier-Stokes equations in their conservative form, layer a pressure outlet is imposed based on the inviscid post discretised using a DG scheme and explicit timestepping meth- reflected state conditions. All boundary conditions are imposed ods. In order to regularize the solution in the presence of discon- in a weak sense through a Riemann solver, as described in [98], tinuities, the right hand side of the Navier-Stokes equations is and use a coarse grid of 60 × 40 quadrilateral elements at order augmented with a Laplacian viscosity term of the form ∇·(ε∇q), p = 3. For illustrative purposes, Figure 13 shows a snapshot of where q is the vector of conserved variables, and ε is a spatially- the Mach number field. For a more quantitative comparison, Fig- dependent diffusion term that is defined on each element as ure 14 compares the skin friction coefficient with those from [99] h and [96], which is in fair agreement with the results of [96]. ε = ε λ S. 0 p max

Here, ε0 is a O(1) constant, λmax is the maximum local char- 6. Availability acteristic speed, h is a reference length of the element, p its polynomial order, and S a discontinuity sensor value using the Nektar++ is open-source software, released under the MIT formulation of [70]. To benchmark this approach in the context license, and is freely available from the project website (https: of SWBLI problems, we consider a laminar problem studied //www.nektar.info/). The git repository is freely accessible experimentally and numerically in [95]. Several authors have and can be found at https://gitlab.nektar.info/. Dis- studied this SWBLI with slightly different free stream condi- crete releases are made at milestones in the project and are tions; here we follow the physical parameters used by [96], available to download as compressed tar archives, or as binary 18 Although this version represents a major milestone in the de- velopment of Nektar++, there is still clear scope for future work. A particular area of focus remains the efficient use of many-core CPU and GPU systems, recognising that optimisation and per- formance on an increasingly diverse range of hardware presents a major challenge. Initial research in this area has investigated the use of matrix-free methods as a potential route towards fully utilising computational hardware even on unstructured grids, by combining efficient sum factorisation techniques and the tensor- product basis for unstructured elements presented in [36]. From the perspective of code maintainability, we have also invest- igated various performance-portable programming models in Figure 13: Mach number field of SWBLI test case (60×40 quadrilateral elements, the context of mesh generation [100] and implicit solvers [101]. p = 3); configuration based on [95]. Looking towards the next major release of Nektar++, we en- vision the use of these studies as a guideline to implementing efficient operators for the spectral/hp element method, whilst re- 0.004 taining ease of use for the development of increasingly efficient 0.003 solvers. 0.002 f C 0.001 Acknowledgements 0.000 The development of Nektar++ has been supported by a number 0.001 − of funding agencies including the Engineering and Physical Sci- 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 ences Research Council (grants EP/R029423/1, EP/R029326/1 x/x sh EP/L000407/1, EP/K037536/1, EP/K038788/1, EP/L000261/1, EP/I037946/1, EP/H000208/1, EP/I030239/1, EP/H050507/1, Figure 14: Skin friction coefficient for the SWBLI test case: blue line Nektar++ EP/D044073/1, EP/C539834/1), the British Heart Foundation (60 × 40 quadrilateral elements, p = 3); triangles are from [96]; dotted line is empirical solution by [99]. (grants FS/11/22/28745 and RG/10/11/28457), the Royal So- ciety of Engineering, European Union FP7 and Horizon 2020 programmes (grant nos. 265780, 671571 and 675008), McLaren packages for a range of operating systems. These releases are Racing, the National Science Foundation (IIS-0914564, IIS- considered to contain relatively complete functionality compared 1212806 and DMS-1521748), the Army Research Office to the repository master branch. Docker container images are (W911NF-15-1-0222), the Air Force Office of Scientific Re- also available for these releases and the latest build of master, as search and the Department of Energy. HX acknowledges sup- well as a Jupyter notebook that contains the Python interface of port from the NSF Grants 91852106 and 91841303. SC acknow- Section 3.5. These can be found on Dockerhub under the repos- ledges the support of the National Research Foundation of Korea itories nektarpp/nektar and nektarpp/nektar-workbook (No. 2016R1D1A1A02937255). KL acknowledges the Seventh respectively. Framework Programme FP7 Grant No. 312444 and German Research Foundation (DFG) Grant No. JA 544/37-2.

7. Conclusions Appendix A. Supplementary material In this paper, we have reviewed the latest features and en- hancements of the Nektar++ version 5.0 release. A key theme of our work in this release has been to evolve the fundamental File: user-guide.pdf design of the software detailed in our previous publication [33], towards providing an enabling tool for efficient high-fidelity sim- Figure A.15: User guide for Nektar++ detailing compilation, installation, input ulations in various scientific areas. To this end, this latest version format and usage, including examples. of Nektar++ provides a complete pipeline of tools: from pre- processing with NekMesh and a new parallel I/O interface for mesh and field representations; new solvers and improvements to File: insitu.zip existing ones through numerical developments such as spatially Figure A.16: Nektar++ input files for simulating flow past a cylinder at Re = 80 variable polynomial order and the global mapping technique; to using IncNavierStokesSolver. This example uses the HDF5 input format parallel post-processing and in-situ processing with the Field- described in Section 3.1 and the in-situ processing facilities of Section 3.2 to Convert utility developments. This gives scientific end-users a generate an animation of the vorticity field and show the von Kármán vortex tool to enable efficient high-fidelity simulations in a number of shedding in this regime. fields, such as the applications we discuss in Section 5. 19 File: adaptiveOrder.zip Argonne, IL, see https://nek5000. mcs. anl. gov/index. php/MainPage (2008). Figure A.17: Nektar++ input files for simulating flow over a NACA0012 wing at [15] H. M. Blackburn, S. Sherwin, Journal of Computational Physics 197 (2) Re = 50,000 using IncNavierStokesSolver. This example uses an adaptive- (2004) 759–778. in-time polynomial order described in Section 4.2 to increase efficiency of the [16] H. M. Blackburn, D. Lee, T. Albrecht, J. Singh, Computer Physics Com- simulation when compared to a spatially-constant polynomial order. munications 245 (2019) 106804. [17] W. Bangerth, R. Hartmann, G. Kanschat, ACM Transactions on Mathem- File: wavyWing.zip atical Software (TOMS) 33 (4) (2007) 24. [18] F. Hindenlang, G. J. Gassner, C. Altmann, A. Beck, M. Staudenmaier, Figure A.18: Nektar++ input files for simulating flow over a wavy NACA0012 C.-D. Munz, Computers & Fluids 61 (2012) 86–93. wing at Re = 1,000 using IncNavierStokesSolver. This example uses the [19] G. J. Gassner, A. R. Winters, D. A. Kopriva, Journal of Computational mapping technique described in Section 4.3 to perform simulations in a quasi-3D Physics 327 (2016) 39–66. setting. [20] F. X. Giraldo, M. Restelli, Journal of Computational Physics 227 (8) (2008) 3849–3877. File: acoustic.zip [21] D. S. Abdi, F. X. Giraldo, Journal of Computational Physics 320 (2016) 46–68. Figure A.19: Nektar++ input files for simulating the spinning vortex pair using [22] F. Witherden, A. Farrington, P. Vincent, Computer Physics Communica- the AcousticSolver of Section 5.2 at a polynomial order of P = 5, accelerated tions 185 (2014) 3028–3040. doi:10.1016/j.cpc.2014.07.011. using the Collections library described in Section 3.3. [23] H. T. Huynh, A flux reconstruction approach to high-order schemes including discontinuous Galerkin methods, in: 18th AIAA Computational Fluid Dynamics Conference, 2007, p. 4079. File: meshGen.zip [24] Y. Allaneau, A. Jameson, Computer Methods in Applied Mechanics and Engineering 200 (49-52) (2011) 3628–3636. Figure A.20: NekMesh input files for a NACA0012 aerofoil section and T106C [25] A. Dedner, R. Klöfkorn, M. Nolte, M. Ohlberger, Computing 90 (3-4) turbine blade geometry outlined in Section 5.1. (2010) 165–196. [26] A. Bolis, C. D. Cantwell, D. Moxey, D. Serson, S. Sherwin, Computer File: vivCylFlow.zip Physics Communications 206 (2016) 17–25. [27] P. E. Vos, C. Eskilsson, A. Bolis, S. Chun, R. M. Kirby, S. J. Sherwin, Figure A.21: Nektar++ input files for simulating flow over a flexible cylinder International Journal of Computational Fluid Dynamics 25 (3) (2011) Re = 3,900 using the IncNavierStokesSolver. This example uses the the 107–125. thick strip model outlined in Section 5.3 to reduce computational cost against a [28] C. D. Cantwell, S. Yakovlev, R. M. Kirby, N. S. Peters, S. J. Sherwin, full 3D simulation. Journal of Computational Physics 257 (2014) 813–829. [29] G. Mengaldo, D. De Grazia, D. Moxey, P. E. Vincent, S. Sherwin, Journal File: shockBL.zip of Computational Physics 299 (2015) 56–81. [30] A. R. Winters, R. C. Moura, G. Mengaldo, G. J. Gassner, S. Walch, Figure A.22: Nektar++ input files for simulating a shock boundary-layer inter- J. Peiro, S. J. Sherwin, Journal of Computational Physics 372 (2018) action test case, at a Reynolds number Re = 105, Mach number Ma = 2.15 and 1–21. shock angle β = 30.8◦ as outlined in Section 5.4. [31] R. M. Kirby, S. J. Sherwin, Computer Methods in Applied Mechanics and Engineering 195 (23-24) (2006) 3128–3144. [32] J.-E. W. Lombard, D. Moxey, S. J. Sherwin, J. F. Hoessler, S. Dhandapani, M. J. Taylor, AIAA Journal 54 (2) (2015) 506–518. References [33] C. D. Cantwell, D. Moxey, A. Comerford, A. Bolis, G. Rocco, G. Mengaldo, D. de Grazia, S. Yakovlev, J.-E. Lombard, D. Ekelschot, [1] R. C. Moura, G. Mengaldo, J. Peiró, S. Sherwin, Journal of Computa- B. Jordi, H. Xu, Y. Mohamied, C. Eskilsson, B. Nelson, P. Vos, C. Biotto, tional Physics 330 (2017) 615–623. R. M. Kirby, S. J. Sherwin, Computer Physics Communications 192 [2] R. C. Moura, G. Mengaldo, J. Peiró, S. J. Sherwin, An LES setting (2015) 205–219. doi:10.1016/j.cpc.2015.02.008. for DG-based implicit LES with insights on dissipation and robustness, [34] H. Xu, C. D. Cantwell, C. Monteserin, C. Eskilsson, A. P. Engsig-Karup, in: Spectral and High Order Methods for Partial Differential Equations S. J. Sherwin, Journal of Hydrodynamics 30 (1) (2018) 1–22. ICOSAHOM 2016, Springer, 2017, pp. 161–173. [35] M. Dubiner, Journal of Scientific Computing 6 (4) (1991) 345–390. [3] G. Mengaldo, R. Moura, B. Giralda, J. Peiró, S. Sherwin, Computers & [36] S. J. Sherwin, G. E. Karniadakis, Computer Methods in Applied Mechan- Fluids 169 (2018) 349–364. ics and Engineering 123 (1-4) (1995) 189–229. [4] G. Mengaldo, D. De Grazia, R. C. Moura, S. J. Sherwin, Journal of [37] M. G. Duffy, SIAM Journal on Numerical Analysis 19 (6) (1982) 1260– Computational Physics 358 (2018) 1–20. 1262. [5] P. Fernandez, R. C. Moura, G. Mengaldo, J. Peraire, Computer Methods [38] F. Bassi, S. Rebay, Journal of Computational Physics 138 (2) (1997) in Applied Mechanics and Engineering 346 (2019) 43–62. 251–285. [6] P. E. Vos, S. J. Sherwin, R. M. Kirby, Journal of Computational Physics [39] J. Marcon, J. Peiró, D. Moxey, N. Bergemann, H. Bucklow, M. Gam- 229 (13) (2010) 5161–5181. mon, A semi-structured approach to curvilinear mesh generation around [7] C. D. Cantwell, S. J. Sherwin, R. M. Kirby, P. H. J. Kelly, Computers & streamlined bodies, in: AIAA Scitech 2019 Forum, American Institute Fluids 43 (2011) 23–28. of Aeronautics and Astronautics, Reston, Virginia, 2019, pp. 1725–. [8] C. D. Cantwell, S. J. Sherwin, R. M. Kirby, P. H. J. Kelly, Math. Mod. doi:10.2514/6.2019-1725. Nat. Phenom. 6 (2011) 84–96. [40] D. Moxey, C. D. Cantwell, R. M. Kirby, S. J. Sherwin, Comput. Meth. [9] G. J. Gassner, SIAM Journal on Scientific Computing 35 (3) (2013) Appl. Mech. Eng. 310 (2016) 628–645. doi:10.1016/j.cma.2016. A1233–A1253. 07.001. [10] G. Mengaldo, D. De Grazia, P. E. Vincent, S. J. Sherwin, Journal of [41] D. Moxey, R. Amici, R. M. Kirby, Efficient matrix-free high-order finite Scientific Computing 67 (3) (2016) 1272–1292. element evaluation for simplicial elements, under review in SIAM J. Sci. [11] G. Mengaldo, Ph.D. dissertation (2015). Comp. (2019). [12] G. E. Karniadakis, S. J. Sherwin, Spectral/hp element methods for CFD, [42] S. Yakovlev, D. Moxey, S. J. Sherwin, R. M. Kirby, Journal of Scientific Oxford University Press, 2005. Computing 67 (1) (2016) 192–220. [13] M. Turner, J. Peiró, D. Moxey, Computer-Aided Design 103 (2018) [43] B. Cockburn, C.-W. Shu, SIAM Journal on Numerical Analysis 35 (6) 73–91. (1998) 2440–2463. [14] P. Fischer, J. Kruse, J. Mullen, H. Tufo, J. Lottes, S. Kerkemeier, Ar- [44] M. Folk, G. Heber, Q. Koziol, E. Pourmal, D. Robinson, An overview gonne National Laboratory, Mathematics and Computer Science Division, 20 of the HDF5 technology suite and its applications, in: Proceedings of in: New Challenges in Grid Generation and Adaptivity for Scientific the EDBT/ICDT 2011 Workshop on Array Databases, ACM, 2011, pp. Computing, Springer, 2015, pp. 203–215. 36–47. [77] M. Turner, D. Moxey, J. Peiró, M. Gammon, C. Pollard, H. Bucklow, [45] M. Bareford, N. Johnson, M. Weiland, Improving Nektar++ IO Per- Procedia Engineering 203 (2017) 206–218. doi:10.1016/j.proeng. formance for Cray XC Architecture, in: Cray User Group Proceedings, 2017.09.808. Stockholm, Sweden, 2018. [78] Open Cascade SAS, Open Cascade (2019). [46] C. Chevalier, F. Pellegrini, Parallel Computing 34 (6-8) (2008) 318–331. [79] J. Marcon, M. Turner, J. Peiró, D. Moxey, C. Pollard, H. Bucklow, [47] W. J. Schroeder, B. Lorensen, K. Martin, The visualization toolkit: an M. Gammon, High-order curvilinear hybrid mesh generation for CFD object-oriented approach to 3D graphics, Kitware, 2004. simulations, in: 2018 AIAA Aerospace Sciences Meeting, American [48] J. Ahrens, B. Geveci, C. Law, The visualization handbook 717 (2005). Institute of Aeronautics and Astronautics, Reston, Virginia, 2018, pp. [49] K. Lackhove, Hybrid Noise Simulation for Enclosed Configurations, 1403–. doi:10.2514/6.2018-1403. Doctoral thesis, Technische Universität Darmstadt (2018). [80] C. Geuzaine, J.-F. Remacle, International Journal for Numerical Methods [50] M. Germano, Physics of Fluids 29 (6) (1986) 1755. doi:10.1063/1. in Engineering 79 (11) (2009) 1309–1331. doi:10.1002/nme.2579. 865649. [81] T. Colonius, S. K. Lele, Progress in Aerospace Sciences 40 (6) (2004) [51] A. Refloch, B. Courbet, A. Murrone, C. Laurent, J. Troyes, G. Chaineray, 345–416. doi:10.1016/j.paerosci.2004.09.001. J. B. Dargaud, F. Vuillot, AerospaceLab (2011). [82] C. K. W. Tam, Fluid Dynamics Research 38 (9) (2006) 591–615. doi: [52] F. Duchaine, S. Jauré, D. Poitou, E. Quémerais, G. Staffelbach, T. Morel, 10.1016/j.fluiddyn.2006.03.006. L. Gicquel, Computational Science & Discovery 8 (1) (7 2015). doi: [83] R. Ewert, W. Schröder, Journal of Computational Physics 188 (2) (2003) 10.1088/1749-4699/8/1/015003. 365–398. doi:10.1016/S0021-9991(03)00168-2. [53] D. Abrahams, R. W. Grosse-Kunstleve, O. Overloading, CC Plus Plus [84] E.-A. Müller, F. Obermeier, The spinning vortices as a source of sound, Users Journal 21 (7) (2003) 29–36. in: AGARD CP-22, 1967, pp. 21–22. [54] P. Peterson, International Journal of Computational Science and Engin- [85] J. Chaplin, P. Bearman, Y. Cheng, E. Fontaine, J. Graham, K. Herfjord, eering 4 (4) (2009) 296–305. F. H. Huarte, M. Isherwood, K. Lambrakos, C. Larsen, et al., Journal of [55] D. M. Beazley, et al., SWIG: An easy to use tool for integrating scripting Fluids and Structures 21 (1) (2005) 25–40. languages with C and C++, in: Tcl/Tk Workshop, 1996, p. 43. [86] R. Willden, J. Graham, Journal of Fluids and Structures 15 (3) (2001) [56] Élie Cartan, Riemannian geometry in an orthogonal frame, World Sci- 659–669. entific Pub. Co. Inc., 2002. [87] Y. Bao, R. Palacios, M. Graham, S. Sherwin, Journal of Computational [57] Élie Cartan, Geometry of Riemannian spaces, Math. Sci. Press, 2001. Physics 321 (2016) 1079–1097. [58] Élie Cartan, La théorie des groupes finis et continus et la géométrie [88] D. Newman, G. Karniadakis, Journal of Fluid Mechanics 344 (1997) différentiellle traitees par la méthode du repère mobile, Gauthier-Villars, 95–136. 1937. [89] G. M. Laskowski, J. Kopriva, V. Michelassi, S. Shankaran, U. Paliath, [59] M. Fels, P. J. Olver, Acta Appl. Math. 51 (2) (1998) 161–213. R. Bhaskaran, Q. Wang, C. Talnikar, Z. J. Wang, F. Jia, Future Directions [60] P. J. Olver, Moving frames – in geometry, algebra, computer vision, of High Fidelity CFD for Aerothermal Turbomachinery Analysis and and numerical analysis. foundations of computational mathematics, in: Design, in: 46th AIAA Fluid Dynamics Conference, Washington, D.C., London Math. Soc. Lecture Note Ser., Cambridge Univ. Press, 2001, pp. USA, 2016, pp. 1–30. 267–297. [90] E. Tadmor, SIAM Journal on Numerical Analysis 26 (1) (1989) 30–44. [61] O. Faugeras, Cartan’s moving frame method and its application to the [91] A. Cassinelli, F. Montomoli, P. Adami, S. J. Sherwin, High Fidelity geometry and evolution of curves in the euclidean, affine and projective Spectral/hp Element Methods for Turbomachinery, in: ASME Paper No. planes, in: J. L. Mundy, A. Zisserman, D. Forsyth (Eds.), Lecture Notes GT2018-75733, 2018, pp. 1–12. in Computer Science, Vol. 825, Springer, 1994. [92] G. E. Karniadakis, M. Israeli, S. A. Orszag, Journal of Computational [62] E. Piuze, J. Sporring, K. Siddiqi, Moving frames for hear fiber recon- Physics 97 (2) (1991) 414–443. struction, in: S. Ourselin, D. Alexander, D. Westin (Eds.), Lecture Notes [93] S. Dong, G. E. Karniadakis, C. Chryssostomidis, Journal of Computa- in Computer Science book series, Vol. 9123, Springer, 2015. tional Physics 261 (2014) 83–105. [63] S. Chun, Journal of Scientific Computing 53 (2) (2012) 268–294. [94] A. Cassinelli, H. Xu, F. Montomoli, P. Adami, R. Vazquez Diaz, S. J. [64] S. Chun, Journal of Scientific Computing 59 (3) (2013) 626–666. Sherwin, On the effect of inflow disturbances on the flow past a linear [65] S. Chun, C. Eskilsson, Journal of Computational Physics 333 (2017) LPT vane using spectral/hp element methods, in: ASME Paper No. 1–23. GT2019-91622, 2019, pp. 1–12. [66] S. Chun, Journal of Computational Physics 340 (2017) 85–104. [95] G. Degrez, C. Boccadoro, J. Wendt, Journal of Fluid Mechanics 177 [67] S. Chun, J. Marcon, J. Peiró, S. J. Sherwin, submitted. (1987) 247–263. [68] S. Chun, C. Cantwell, in preparation. [96] J.-P. Boin, J. Robinet, C. Corre, H. Deniau, Theoretical and Computa- [69] D. Moxey, C. D. Cantwell, G. Mengaldo, D. Serson, D. Ekelschot, tional Fluid Dynamics 20 (3) (2006) 163–180. J. Peiró, S. J. Sherwin, R. M. Kirby, Towards p-adaptive spectral/hp [97] F. M. White, Viscous fluid flow, McGraw-Hill New York, 2006. element methods for modelling industrial flows, in: M. L. Bittencourt, [98] G. Mengaldo, D. De Grazia, F. Witherden, A. Farrington, P. Vincent, N. A. Dumont, J. S. Hesthaven (Eds.), Spectral and High Order Methods S. Sherwin, J. Peiro, A guide to the implementation of boundary condi- for Partial Differential Equations ICOSAHOM 2016, Springer Interna- tions in compact high-order methods for compressible aerodynamics, in: tional Publishing, 2017, pp. 63–79. 7th AIAA Theoretical Fluid Mechanics Conference, 2014, p. 2923. [70] P.-O. Persson, J. Peraire, Sub-cell shock capturing for discontinuous galer- [99] E. Eckert, Journal of the Aeronautical Sciences 22 (8) (1955) 585–587. kin methods, in: 44th AIAA Aerospace Sciences Meeting and Exhibit, [100] J. Eichstädt, M. Green, M. Turner, J. Peiró, D. Moxey, Comp. Phys. 2006, p. 112. Comm. 229 (2018) 36–53. [71] D. Serson, J. R. Meneghini, S. J. Sherwin, Journal of Computational [101] J. Eichstädt, M. Vymazal, D. Moxey, J. Peiró, A comparison of the Physics 316 (2016) 243–254. shared-memory parallel programming models OpenMP, OpenACC and [72] D. Serson, J. R. Meneghini, S. J. Sherwin, Computers & Fluids 146 Kokkos in the context of implicit solvers for high-order FEM, under (2017) 117–124. review in Comp. Phys. Comm. (2019). [73] D. Serson, J. R. Meneghini, S. J. Sherwin, Journal of Fluid Mechanics 826 (2017) 714–731. [74] S. J. Sherwin, J. Peiró, Int. J. Numer. Meth. Engng 53 (2002) 207–223. [75] D. Moxey, M. D. Green, S. J. Sherwin, J. Peiró, Computer Methods in Applied Mechanics and Engineering 283 (2015) 636–650. doi:10. 1016/j.cma.2014.09.019. [76] D. Moxey, M. D. Green, S. J. Sherwin, J. Peiró, On the generation of curvilinear meshes through subdivision of isoparametric elements,

21