Turbulence simulations: multiscale modeling and

data-intensive computing methodologies

by

Jason Graham

A dissertation submitted to The Johns Hopkins University in conformity with the

requirements for the degree of Doctor of Philosophy.

Baltimore, Maryland

January, 2014

c Jason Graham 2014

All rights reserved Abstract

In this two part work, methodologies for the multiscale modeling of complex tur- bulent flows and data-intensive computing strategies for large-scale turbulent simu- lations are developed and presented. The first part of this thesis is devoted to the simulation of turbulent flows over objects characterized by hierarchies of length-scale.

Flows of this type present special challenges associated with the cost of resolving small-scale geometric elements. During large eddy simulation (LES), their effects on the resolved scales must be captured realistically through subgrid-scale models.

Prior work performed by Chester et al. [21] proposed a technique called renormal- ized numerical simulation (RNS), which is applicable to objects that display scale- invariant geometric () properties. The idea of RNS is similar to that of the dynamic model used in LES to determine model parameters for the subgrid-stress tensor model in the bulk of the flow. In RNS, drag forces from the resolved elements that are obtained during the simulation are re-scaled appropriately by determining drag coefficients that are then applied to specify the drag forces associated with the subgrid-scale elements. In the current work we introduce a generalized framework for

ii ABSTRACT describing and implementing the RNS methodology thereby extending the method- ology first presented by Chester et al. [21]. Furthermore, we present various other possible practical implementations of RNS that differ on important, technical aspects related to 1) time averaging, 2) spatial localization, and 3) numerical representation of the drag forces. The new RNS framework is then applied to fractal tree canopies consisting of fractal-like trees with both planar cross-section and three dimensional orientations. The results indicate that the propsed time averaged, local, and explicit formulation of RNS is superior to the predecessor formulation as it enables the mod- eling of spatially non-homogenous geometries without using a low-level branch based description and preserves the assumed dynamic similary through temporal filtering.

In addition, the overall predicted drag force of the non-planar fractal trees is shown to agree well with experimental data. In addition to RNS, a methodology for generating accurate inflow conditions in multiscale turbulence simulations is present. This tech- nique called concurrent precursor simulation (CPS) allows the synchronous generation of inflow data from an upstream precursor simulation. This approach conceptually is the same as the standard precursor simulations (Lund et al. [72] and Ferrante and

Elghobashi [35]) used in the past, however, it eliminates the I/O bottleneck of disk reads and writes by transferring sampled data directly between domains using MPI.

Furthermore, issues with recycling time scales of the sample inflow library are removed since the upstream, precursor simulation is performed concurrently with the target simulation. This methodology is applied to a single fractal tree (modeled using RNS)

iii ABSTRACT

in turbulent duct flow and to a finite length, developing wind farm. In the second

part of this work, data-intensive computing strategies addressing the large-scale data

problem in direct numerical simulation (DNS) of turbulent flows are presented. DNS

provides the highest fidelity of predicited turbulence data. As a result, these data have

served a vital in role in turbulence research and access to such data is key to continued

development of the field. Classical approaches to the management and dissemination

of these large-scale datasets, however, has proven to be cumbersome and prohibitively

expensive in some instances thus minimizing the usefulness of these data to a broad

community. Therefore, the Johns Hopkins Turbulent Databases (JHTDB) (Perlman

et al. [89] and Li et al. [68]) have been created which expose large-scale turbulence

datasets to the reasearch community worldwide using Web services. The JHTDB

project provides Web service libraries for C, Fortran, and Matlab which allow inter-

action with the DNS data. The design and implementation of the Matlab interface

along with several examples are presented. Also, the first Web service based, publicly

available channel flow DNS database is produced in this work. The implementation

of the channel flow DNS and construction of the subsequent database are presented.

These data are then used to study the structure and organization of channel flow

turbulence. In this study, the Q criterion [50] is employed to measure the vortex sizes and organization. Results appear to indicate good, qualitative agreement with the- oretical predictions with respect to the prescence of large-scale near wall structures and the preponderance of buffer layer vortices.

iv ABSTRACT

Primary Reader: Professor Charles Meneveau

Secondary Readers: Professors Gregory L. Eyink and Randal Burns

v Acknowledgments

I first and foremost thank my Lord and Savior Jesus Christ, from whom all bless- ings proceed, for the opportunity to pursue the endevours in this work. Without

His guidance and care it would not have been possible to complete this journey. An immense thanks goes to my advisor Professor Charles Meneveau for his patience, kindness, and encouragement throughout my doctoral studies. His direction and thoughtful conversations are also greatly acknowledged. I thank Professors G. L.

Eyink and R. Burns for their many kind and insightful suggestions made throughout this work, for their stimulating lectures, and fruitful collaborations. To Professors A.

Prosperetti, O. Knio, J. Katz, and R. Mittal, I am very greatful for the exceptional courses which they taught and for their endless pursuit of academic excellence.

A great thanks goes to Dr. Edward Givelberg, Kalin Kanov, and the entire

JHTDB team for exciting and rewarding collaborations. For providing access and support to the PoongBack code along with fruitful collaborations, a cordial thanks belongs to Professor Robert Moser, Myoungkyu Lee and Nicholas Malaya of the

University of Texas. To my collegues Claire VerHulst, Adrien Thormann, and Dr.

vi ACKNOWLEDGMENTS

Kunlun Bai, I am very thankful for fun and thought provoking conversations which have certainly added to the richness of this work.

For financial support, I am indebted to the JHU IGERT program on “Modeling

Complex Systems” (NSF grant #0801471) and the NSF grant #CMMI-094153 for supporting this effort.

And finally, to my wife Cindy, for her unwaivering support and endless patience– without which this work would not have been possible–I am deeply grateful.

vii Dedication

To my mom, though departed too soon, your inspiration and love live on in this work. And to Noah, who reminds me each day what it means to live.

viii Contents

Abstract ii

Acknowledgments vi

List of Tables xiv

List of Figures xv

1 Introduction 1

1.1 Overview ...... 1

1.2 Background and Motivation ...... 5

1.2.1 Multiscale Modeling ...... 5

1.2.2 Data-intensive Computing ...... 12

1.3 Thesis Outline ...... 21

I Multiscale Modeling 23

2 Renormalized Numerical Simulation 24

ix CONTENTS

2.1 Introduction ...... 24

2.2 RNS Framework ...... 25

2.3 LES Implementation ...... 30

2.4 Planar tree canopy test case ...... 32

2.5 RNS formulations ...... 39

2.5.1 Model M1 ...... 40

2.5.2 Model M2 ...... 43

2.5.3 Model M3 ...... 45

2.5.4 Model M4 ...... 47

2.6 Test results ...... 49

2.6.1 RNS quantities ...... 50

2.6.2 Selected flow statistics ...... 55

2.6.3 Temporal averaging time-scale ...... 58

2.6.4 Discussion ...... 59

2.6.5 Grid and RNS Modeling Sensitivity ...... 60

2.7 Applications to canopy consisting of fractal trees with three non co-

planar branches ...... 64

2.7.1 Flow Field and RNS Results ...... 66

2.7.2 Comparison with Experimentally Determined Drag Coefficient 75

2.8 Conclusions ...... 77

3 Concurrent Precursor Simulation 80

x CONTENTS

3.1 Introduction ...... 80

3.2 Implementation ...... 81

3.3 Application Cases ...... 85

3.3.1 Single fractal tree in turbulent duct flow ...... 86

3.3.2 Finite length wind farm ...... 89

3.4 Conclusion ...... 92

II Data-Intensive Computing 94

4 Johns Hopkins Turbulence Databases 95

4.1 Introduction ...... 95

4.2 Design and Construction ...... 96

4.2.1 Database Cluster ...... 99

4.2.2 Web Services ...... 100

4.3 Channel Flow Database Interpolation and Differentiation Methods . . 102

4.3.1 Spatial Interpolation ...... 103

4.3.2 Spatial Differentiation ...... 108

4.4 Matlab Client Interface ...... 110

4.4.1 Design and Implementation ...... 111

4.4.2 Code Examples ...... 113

4.5 Conclusions ...... 116

xi CONTENTS

5 Channel Flow DNS 118

5.1 Introduction ...... 118

5.2 Governing Equations ...... 119

5.3 Production Simulation ...... 121

5.4 Vortex Analysis ...... 125

5.4.1 Implementation ...... 130

5.4.2 Results ...... 134

5.5 Conclusions ...... 140

6 Concluding Remarks 144

Appendix A LESGO Validation: Flow over wall mounted cubes 148

Appendix B MPI-DB 152

B.1 Introduction ...... 152

B.2 Related Work ...... 154

B.3 Channel Flow Simulation ...... 155

B.4 The MPI-DB software library ...... 157

B.5 MPI-DB Architecture ...... 158

B.6 Fortran Interface Design ...... 160

B.7 Fortran Interface Example ...... 161

B.8 Results ...... 166

B.9 Conclusion ...... 171

xii CONTENTS

Appendix C B-Spline Collocation Method 172

Appendix D Channel Flow DNS: Pressure Solver 175

D.1 Introduction ...... 175

D.2 Pressure Solution ...... 176

D.2.1 Non-Zero Wavemode Solution ...... 176

D.2.2 Zero Wavemode Solution ...... 177

D.3 Validation ...... 178

D.4 Conclusion ...... 183

Vita 200

xiii List of Tables

2.1 Definitions of the tested RNS models...... 39 2.2 Time averaged drag coefficient, RNS error, and forces for each of the RNS models when applied to simulation of the “V-tree” canopy. Results shown pertain to the sample tree in the middle of the domain. Along with time, all of the quanties are also averaged across both b elements of the tree. eb is defined as eb / Fb ; FR the total resolved force on the targetk tree;k F the total| | subgrid| | | force| on the | S| target tree; FT the force on the target tree...... 59 2.3 Definitions of| the| case configurations used in the grid and RNS mod- eling analysis. Listed are the case names, grid resolution, number of resolved generations (Ng), number of grid points across the diameter of the branches in the last resolved generation (Np), and the canopy averaged total drag forces...... 62

4.1 Table of Web service functions for each JHTDB database...... 101

B.1 Summary of grid resolution and Re used for the channel flow simulations.156 B.2 Simulation test results for various grid sizes ...... 167 B.3 Simulation test results for the 512 256 256 grid for variable number of processes ...... × . . .× ...... 167

xiv List of Figures

1.1 Example of a fractal tree (“3D-V fractal tree”), in which smaller branches occur at increasing elevations as an idealization of multiple-scale veg- etation element interacting with a turbulent boundary layer...... 7

2.1 Decomposition of fractal geometry into resolved and subgrid compo- nents, as well as between r, β and b elements. Numerically, in the present work, the resolved portions are treated using the immersed boundary method (IBM), while sub grid portions are accounted for using RNS. The distribution of forces within the sub grid portion uses a filtered indicator function χβ (see text)...... 26 2.2 Fractal tree canopy composed of planar V-trees. Shown are the two “resolved” generations g0 ande g1. The solid black box indicates the physical domain for the simulations. The entirety of the tree as repre- sented in the simulation is shown in Figure 2.1...... 33 2.3 Definitions of RNS references regions used in the “V-tree” canopy sim- ulations...... 36 2.4 Contour plot of instantaneous velocity magnitude along a constant y and constant x plane...... 37 2.5 Instantaneous velocity magnitude contours on horizontal planes at the mid-plane heights of branch generations g2, g3 and g4...... 38 2.6 Contour plot of instantaneous local force magnitude along a constant-x plane across the middle tree in the domain...... 38 2.7 Time series of hydrodynamic forces in the x-direction acting of the target tree for RNS model M2...... 51 2.8 Time series of hydrodynamic forces acting on one of the branches (el- ement b1) of the target tree for model M2. The left axis shows the x-direction forces, while the right axis (lower group of lines) shows the y-spanwise direction forces...... 52 2.9 Time series of reference velocities measured from element b1 and its descendant β elements in the target tree, for model M2...... 52

xv LIST OF FIGURES

2.10 Time history of drag coefficient obtained from RNS using models from Table 2.1. For models M1 – M3 the single, global drag coefficients are presented. In the bottom plot, the time-series for M4 are shown corresponding to each branch of the “V-tree”, denoted as elements b1 and b2...... 55 2.11 Mean values (vertical bars) and standard deviations (error bars) of the computed RNS drag coefficients...... 56 2.12 Horizontally averaged (a) mean velocity and (b) turbulent shear stress profiles evaluated from RNS of flow over the “V-tree” canopy, for dif- ferent RNS formulations M1-M4...... 57 2.13 Time history of the RNS drag coefficient obtained for several temporal averaging time-constants...... 58 2.14 Mean values of the measured drag forces decomposed into the resolved, subgrid, and total contributions. The error bars indicate the estimated standard error of the mean total drag force due to statistical convergence. 63 2.15 Fractal tree canopy composed of “3D V-trees”. The solid black box indicates the computational domain used in the simulations...... 65 2.16 Reference regions b and β used when applying RNS to the “3D V-tree” canopy simulations...... 67 2.17 Contours of instantaneous velocity magnitude from RNS of flow over a fractal “3D V-tree”. Shown are contours along three vertical planes, one at constant y and two at constant x...... 68 2.18 Contours of instantaneous velocity magnitude on the branch mid-planes of generations g2, g3 and g4 in the region of unresolved branches of the fractal “3D V-tree”...... 68 2.19 Instantaneous force magnitude contours along the mid-planes of gen- erations g2, g3 and g4 (same as in Figure 2.18)...... 69 2.20 (a) Mean streamwise velocity profile, averaged in time and horizontal directions, for simulation of boundary layer flow over a canopy of “3D V-trees” using RNS. A side view of such a tree is also shown in the pro- file as a reference. (b) Shear stress profiles, including mean turbulence shear stresses as well asd ispersive and total stress...... 70 2.21 Time series of canopy averaged drag forces. Solid line: force on the resolved branches, dashed line: subgrid-scale force, and small-dashed line: total force...... 71 2.22 Vertical profiles of (a) the mean velocity, (b) frontal area density (per unit height) and (c) resulting mean drag force...... 72 2.23 Time series of computed forces on the three b elements of the sample tree, during representative time period...... 73 2.24 Mean values (vertical bars) and plus minus one standard deviation (error bars) of the computed RNS drag coefficients for each of the three branches...... 74

xvi LIST OF FIGURES

2.25 Total drag coefficient computed for an entire tree. The solid line (com- puted from RNS) is based on the total force exerted on the trees aver- aged across all trees (canopy averaged). The horizontal (dashed) line is the mean value obtained in a laboratory experiment [43]...... 75

3.1 An example CPS of turbulent boundary layer flow over a wall mounted cube...... 82 3.2 Schematic showing the domains containing precursor simulation (“red”) and the target simulations (“blue”). The intra-domain decomposition and respective MPI ranks are shown next to the domains...... 83 3.3 Schematic showing the splitting of the default MPI COMM WORLD com- municator into two communicators associated with the red and blue domains for the CPS implementation. The bridge communicator used during the sampling operation is also shown...... 84 3.4 Domain setup for the CPS of single fractal tree in turbulent duct flow. Note that the third branch for each branch cluster in the second gen- eration are aligned along the line-of-sight direction of the viewing angle. 88 3.5 Instantenous streamwise velocity along a y-plane for the precursor (red) domain in the CPS of a single fractal tree in turbulent duct flow. . . . 89 3.6 Instantenous streamwise velocity along three z-planes for the target (blue) domain in the CPS of a single fractal tree in turbulent duct flow. 90 3.7 Instantaneous streamwise velocity in the upstream and downstream domains during the CPS of the developing, finite length wind farm. Figure prepared by Dr. Richard Stevens and reused from Stevens et al. [100]...... 91 3.8 Power output comparsion of field data from Horns Rev wind farm and CPS of two grid resolutions. Figure prepared by Dr. Richard Stevens and reused from Stevens et al. [100] ...... 92

4.1 Schematic of the JHTDB indicating the logical layout of the remote clients, Web server, and the database cluster. Source JHTDB [55]. . . 98 4.2 Visualizations of the forced isotropic turbulence database using the Matlab client interface...... 114

5.1 Friction velocity Reynolds number during the channel flow simulation during the database time interval ...... 125 5.2 Mean velocity profile in viscous units. Standard values of κ = 0.41 and B = 5.2 are used in the log-law (dashed line) for reference...... 126 5.3 Profiles of statitical quantities from the channel flow DNS...... 127 5.4 Streamwise power spectral densities at various y+ locations as function of kx ...... 128 5.5 Spanwise power spectral densities at various y+ locations as function of kz ...... 129

xvii LIST OF FIGURES

5.6 Q isosurfaces in a sub-region of the channel flow domain for three threshold values: Q=4, Q=16, Q=64...... 135 5.7 The joint PDFs of the normalized vortex volume with respect to the log-layer eddy scale and center of mass location for various Q thresh- olds: a) 0.5, b) 1.0, c) 2.0, d) 4.0, e) 8.0, f) 16.0, g) 32.0, h) 64.0. . . 137 5.8 Marginal PDFs for the normalized vortex volume with respect to the log-layer eddy scale and center of mass...... 139 5.9 Marginal PDF for the normalized vortex volume with respect to the log-layer Kolmogorov scale...... 140 5.10 Joint PDFs for the surrogate vortex ellipsoid volume to the vortex volume ratio and center of mass location for various Q thresholds: a) 0.5, b) 1.0, c) 2.0, d) 4.0, e) 8.0, f) 16.0, g) 32.0, h) 64.0...... 141 5.11 Marginal PDF of the surrogate vortex ellipsoid volume to the vortex volume ratio...... 142

A.1 Domain setup, contours of instantaneous x component velocity, and time averaged streamlines for the wall mounted cubes test-case (stream- lines originating near x/h=4 have been seeded at that location). . . . 149 A.2 Mean velocity profiles for the wall mounted cubes case for: (a) x com- ponent of velocity at y = 0, (b) x component of velocity at z = 0.5h, and (c) y component of velocity at z = 0.5h. In each figure, the hori- zontal arrow denotes the x component of the measured reference velocity.151

B.1 Instantenous streamwise velocity (vertical contour planes) and vortic- ity fields (iso-surfaces) for case C3 from Table B.1. The iso-surfaces are colored according to the vertical height...... 157 B.2 Throughput of the data ingestion as a function of the grid size. . . . . 168 B.3 Throughput of the data ingestion for grid size 512 256 256 as a × × function of the number of processes used in the simulation...... 168

D.1 Comparison between the numerical (lines) and analytical solutions (symbols) of (D.8) with Dirichlet boundary conditions and two val- ues of k: a) k = 1.1180 and b) k = 18.901...... 181 D.2 Comparison between the numerical (lines) and analytical solutions (symbols) of (D.8) with Neumann boundary conditions and two values of k: a) k = 1.1180 and b) k = 18.901...... 181 D.3 PDF of the PPE Dirichlet boundary condition residual with the two normalization types ...... 183

xviii Chapter 1

Introduction

1.1 Overview

Accurate simultions of turbulent flows pose many challenges due to turbulence’s inherent complexity and large number of degrees of freedom. Turbulence paradox- ically possesses both chaotic and well ordered characteristics and remains a major unsolved problem in physics even though the governing equations–the Navier-Stokes

(N-S) equations–have been known since the 1800’s. The N-S equations for an incom- pressible fluid are expressed as

∂u + (u u) = p + ν 2u + f, u = 0 (1.1) ∂t ∇ · ⊗ −∇ ∇ ∇ ·

1 CHAPTER 1. INTRODUCTION

where u is the velocity vector field, p the kinematic pressure, ν the molecular viscosity, and f a body force. Except for several idealized flows, currently no known analytical solutions exist thereby leaving only numerical solutions of the N-S equations. This is especially true for turbulence.

The numerical solutions to the N-S equations may be divided into two camps:

1) direct numerical simulations (DNS) where the N-S equations are solved directly and all turbulence scales are resolved or 2) a modeled approach where turbulence models are applied to capture unresolved turbulence scales. These two approaches are discussed further below.

DNS is a powerful tool for studying turbulent phenomena. In a DNS, all of the turbulent scales of motion are accurately resolved and no turbulent models are employed. As a result, data generated from a DNS are high-quality and provide researchers access to complete information with regard to the turbulent field [81]. The ability to compute all of the relavent degrees of freedom of a turbulent field, however, comes at a cost. DNS is both computationally expensive and typically produces very large data sets that must be properly managed in order to make practical use of the resulting data. This inherent expense is due to the requirement of resolving the dissipative scales (and near wall viscous scales in wall bounded flows) which decrease as the Re is increased, thus requiring finer computational meshes. Furthermore, the

finer meshes restrict the allowable size of the computational time step in the numerical integration of the governing equations due to numerical stability and accuracy issues.

2 CHAPTER 1. INTRODUCTION

In isotropic turbulence, for example, the resulting overall complexity of a DNS scales as Re11/4 (where Re = UL/ν is the Reynolds number, U a characteristic turbulence velocity scale and L a large-scale length of turbulence) leaving researchers the ability to only perform DNSs for modest Re. Wall bounded flows are even more expensive due to viscous wall interactions.

For high Re or complex flows where DNS can not be afforded turbulence modeling is required. There are numerous approaches to turbulence modeling (see Pope [91]), however, the two most commonly used methodologies are Reynolds Averaged Navier-

Stokes (RANS) and large eddy simulation (LES). In RANS, the N-S equations are either time or ensemble averaged. This results in governing equations for the mean

fields where all of the effects of the turbulence on the mean field are modeled. In

LES, the N-S equations are spatially filtered, thereby, separating the turbulence into resolved and unresolved (subgrid) scales. As a result, governing equations for the resolved field are produced and the aggregate effects of the subgrid scale turbulence are captured through models.

The LES filtering operation can be expressed as

f(x) = G (x x′)f(x′)dx′ (1.2) ∆ − Z e where f is a fully resolved field of interest (such as velocity), f is the filetered version of this field, and G∆ the filter kernel at scale ∆ (commonly thee grid scale). Applying

3 CHAPTER 1. INTRODUCTION this filter to the N-S equations produces the filtered N-S equations which may be expressed as

∂u + (u u) = p + ν 2u τ + f, u = 0 . (1.3) ∂t ∇ · ⊗ −∇ ∇ − ∇ · ∇ · e e e e e e e In the filtered N-S equations an additional term τ–the subgrid-scale (SGS) stress tensor–has been introduced. This quantity arises from filtering the non-linear terms of the N-S equations and is expressed as τ = u^u u u. The generation of ⊗ − ⊗ this additional term is a result of the closure problem [78].e A detailede review of SGS closure models can be found in Meneveau and Katz [78]. In addition to modeling subgrid turbulence scales, flows containing unresolved geometric features also reguire modeling. These subgrid-scale geometric features may be represented in the resolved

field equations as a momentum sink imposed by the body force f [19, 21, 20, 43, 42,

77]. e

The first of two parts of this thesis is devoted to the modeling of turbulent flow interacting with multiscale objects which possess both resolved and unresolved ge- ometric features. The methodology is developed and applied to study turbulence generated by fractal trees using LES. In addition, the first part also describes a new methodology for imposing accurate inflow conditions in periodic domains for turbu- lent flows which are vital for correct numerical solutions. The second part of this thesis is devoted to the development of data-intensive computing methodologies for

4 CHAPTER 1. INTRODUCTION the large-scale data problem in DNS. These methodologies along with an application to channel flow turbulence are presented and discussed. In the following sections the background and motivation for both of these two parts are presented.

1.2 Background and Motivation

1.2.1 Multiscale Modeling

Fluid flows involving multiple-scale boundaries can be found in nature, such as wind flow through tree canopies, over rough terrain and natural landscapes, and flow through porous media. Important transport processes of mass, momentum and energy can occur at the interface between fluid and such bounding surfaces. Simulation of

fluid flow in such conditions involves challenges due to the typically large ranges of spatial and temporal scales that must be resolved. As a result, in most applications it is required to employ subgrid-scale models to represent the small scale features while resolving the large scale problem on a computational mesh. In the bulk of turbulent

flow, this is the Large Eddy Simulation (LES) approach. Along the boundaries, additional modeling is required if the boundaries include large ranges of length-scales with topological features that occur at subgrid-scales, when viewed at the resolution of the LES. The present work is devoted to modeling turbulent flow over multiple-scale, tree-like objects.

Fractals provide a useful idealization of multiple-scale objects since they may be

5 CHAPTER 1. INTRODUCTION described using simple geometric rescaling rules [74, 6]. Basic fractal objects can serve as surrogates for more random and complex multi-scale objects often found in nature [85], while remaining tractable for systematic study. have been used to model trees, see e.g. de Langre [26], and the of trees has been found to be mostly between 1.45 and 1.74 [14].

A series of papers studying turbulence in the wake of fractal objects (Staicu et al. [99], Hurst and Vassilicos [51], and Laizet and Vassilicos [64]) show that there exists a strong coupling between the geometric (multi-scale) features of the fractal object and the turbulence properties observed in the wake downstream. Therefore, in order to accurately simulate momentum transport and turbulent flow dynamics in the presence of fractal objects, it appears desirable to retain relevant informa- tion about the multi-scale features of the fractal geometry, while remaining within computationally feasible and affordable approaches.

Figure 1.2.1 shows an example of a fractal tree to be used in this study. It shares with real vegetation elements the preponderance of small scales (smaller branches) at increasing elevations, and thus idealizes a fractal vegetation element interacting with a turbulent boundary layer. For these reasons this particular tree geometry has also been studied in a laboratory experiment in which the total drag force on such a fractal tree placed in a canopy has been measured [43] (a related study has measured eddy-length scales in the wake of a single individual such tree [4]). We point out that the simulated fractal object only represents certain features of real trees and, as stated

6 CHAPTER 1. INTRODUCTION

Figure 1.1: Example of a fractal tree (“3D-V fractal tree”), in which smaller branches occur at increasing elevations as an idealization of multiple-scale vegetation element interacting with a turbulent boundary layer. already, must be regarded as a particular idealization. Unlike real trees, the simulated objects are rigid, i.e. they do not ‘sway’ in the wind, they exhibit deterministic scale- invariance, i.e. each generation is an exact geometric replica, there are no leaves that would break scale-invariance at the smallest scale, etc. With these limitations in mind, we proceed to focus on studying interactions between such objects and turbulent flow.

One of the most important aspects of the interactions between flow and such objects is the associated momentum exchange, the drag force. Classical methods for characterizing the momentum transport due to canopies have been been based on drag models using a leaf area index (LAI) description [93, 97, 16, 37], log-law based models that use a roughness length scale to parameterize the entire vegetation canopy [93,

1], and models that consider the canopy as a porous medium characterized by a

7 CHAPTER 1. INTRODUCTION prescribed porosity [69]. In the application of these classical methods to the simulation of fractal canopy flows, two notable deficiencies arise: 1) the characterization of the multiple-scale geometrical features by a single length scale, or elevation-dependent length scale (in the case of LAI), and 2) the requirement that model parameters must be known a priori for specification in parameterizations of the drag forces.

Renormalized numerical simulation (RNS) was first introduced in Chester et al.

[21] as a technique for simulating flows over partially resolved multiple-scale (fractal) objects in high-Reynolds number flows. The RNS methodology is a downscaling strategy in which drag forces due to the large, resolved scales are obtained directly from the explicit representation of the resolved objects in the computational domain

(e.g. an immersed boundary method - IBM) and the resolved-scale information is renormalized appropriately to predict the drag forces due to the small, unresolved scales. During the renormalization, geometric and dynamic similarity is invoked and the resulting small-scale information is “repeatedly fed back into the simulation of the large-scale problem” [19]. This recursive, iterative procedure is carried out for the duration of the simulation in order to capture the effects of both the resolved and unresolved geometry. For applications to high Reynolds number flow, the force is modeled using a form drag representation for both the resolved and unresolved forces. These modeled forces depend on a drag coefficient which is assumed to be scale invariant due to geometric similarity and high Reynolds number flow. The drag coefficient needs not to be specified a priori, since it is dynamically evaluated in a

8 CHAPTER 1. INTRODUCTION manner analogous to the dynamic subgrid scale model for LES [39].

In the first applications of RNS (i.e. Chester et al. [21] and Chester and Mene- veau [20]), a classic drag model was used that assumed that the instantaneous drag coefficient for each of the small branches is the same as that at the large branches.

Note, however, that in a time-varying turbulent flow, it is more natural to expect only the time-averaged drag coefficient to be the same at various scales, assuming geomet- ric similarity of the (fractal) boundary and (complete) dynamical similarity at high

Reynolds numbers. There is little basis for the stronger assumption of instantaneous similarity. Therefore, it makes sense to develop and test a RNS method that involves time averaging so it can be based on the weaker assumption of similarity between scales in an averaged sense. Thus, as the main purpose of this work, we develop and apply a temporal averaging technique in order to help justify the similarity assump- tion when using a classical drag model. In addition, we also consider effects of spatial inhomogeneity where the methodology of Chester et al. [21] can only be applied to homogenous canopies. In the current work, we introduce a local model that may be used to treat heterogeneous tree canopies and complex flow structures interacting with irregular trees. In Chester and Meneveau [20], the authors extended the RNS methodology of Chester et al. [21] for the application of flow over non-planar fracatal trees. In their approach, they used the cross-flow principle [65] to preserve kinematic similary with branches having irregular orientation with respect to the flow direction.

The approach decomposes the drag forces into normal and axial forces with respect to

9 CHAPTER 1. INTRODUCTION the tree branches. The primary drawback of this approach arises from the complexity of the implementation which requires a “low-level”, branch based description of the imposed drag forces. In this work, we present an generalized RNS framework that describes the imposed drag forces in terms of branch groups. These groups, which we call RNS elements, may be constructed in such a way that kinematic similary is implicitly preserved for non-planar geometry. Moreover, this generalized framework has the additional advantage that it may readily be applied to non-tree fractal geom- etry such as fractal grids. As part of evaluating the RNS framework presented in this work, we distinguish between explicit and implicit time formulations of RNS. Though not explicitly presented, an implicit time formulation was mentioned in Chester et al.

[21] in which the authors observed stability issues with the implicit formulation which were not found with the explicit formulation. In this work we test whether similar stability limitations with the implicit formulation are observed in the current RNS framework.

In addition to correct modeling strategies, inflow conditions are one of the key in- gredients for producing accurate results from numerical simulations. Since analytical realizations of turbulent fields are not known a priori, enforcing correct and accurate inflow conditions for turbulence simulations poses a challenging problem. Techniques such as generating synthetic fields from Fourier modes [24], reduced order methods such as proper orthogonal decompostion [32, 56], or the superposition of synthetic eddies [90] have been proposed. Another approach is the generation of inflow data

10 CHAPTER 1. INTRODUCTION using precursor simulations [72, 35]. In this approach a precursor simulation is per- formed in which an inflow data library is stored to disk. These data are then loaded and imposed as inflow conditions during the target simulation. Depending on the problem, one approach may prove more attractive than another. Techniques which modulate inflow conditions or impose modeled turbulent structues will require time for the input signature of the imposed inflow field to decay in order to minimize the influences of the inflow technique on the simulation results. In the case of simula- tions with periodic boundary conditions (e.g. atmosphere boundary layer or channel

flow), the precursor simulation method provides an idealized approach since the pre- cursor simulation may be performed directly without any inflow modulation and the turbulent field can evolve until it becomes fully developed.

In this work, a concurrent precursor simulation has been developed which allows a precursor simulation to be peformed concurrently with the target turbulent simu- lation. Conceptually, the concurrent precursor simulation (CPS) is equivalent to a standard precursor simulation (SPS) mentioned previously. In both cases, a full tur- bulent simulation is performed generating inflow data for a target simulation. These data are then sampled from the precursor simulation (typically a subregion or plane) and transfered to the target simulation as an inflow condition. The major differences between the two arise in practice. The SPS must be peformed before the target sim- ulation is conducted which extends the overall time of the simulation. Conversely, the CPS performs the precursor and target simulations concurrently. Also the SPS

11 CHAPTER 1. INTRODUCTION

stores the sampled precursor data to disk. These data are then read from disk during

the target simulation. Since disk I/O bandwidth is significantly smaller than that of

modern network interconnects and random access memory (RAM), simulation time

for both the precursor and target simulations are hindered for the SPS when com-

pared to the CPS approach. For the CPS, sampled inflow data is transfered directly

from RAM of the precursor simulation to the target simulation using memory copies

thereby minimizing the overhead of the sampling operation. Moreover, since in the

CPS the simulations are performed concurrently, there is no need to “recycle” the

precursor data as may be required in the SPS once a target simulation extends beyond

the temporal extend of the stored inflow data library. This eliminates the introduction

of artificial “recycling” time scales into the target simulation.

1.2.2 Data-intensive Computing

The origins of DNS can be traced back to 1972 to the work of Orszag and Patterson

[87] in which a DNS of incompressible flow was performed on a 323 mesh simulating

isotropic, homogeneous turbulence. Though the simulation was performed only on a

small 323 mesh, it laid the foundations for the usage of spectral methods [81] crucial for highly accurate simulations of turbulence. During the 1970’s and early 1980’s, most

DNS were limited to isotropic, homogenous turbulence or at most flows containing one inhomogenous direction. Simulations of wall bounded flows were not possible at the time due to the additional computational costs associated with resolving the near

12 CHAPTER 1. INTRODUCTION

wall viscous scales. It wasn’t until 1987, that the first DNS of plane channel flow

was conducted by Kim et al. [62] for a Reτ = 180 where Reτ is the Reynolds number based on the friction velocity and half-channel height. In the decades following, increasing computing power afforded researchers the possibility of simulating ever increasing Re and domain sizes. One consequence of the increasing simulation sizes are the growing data volumes produced by the DNS. For instance, in 2002, the worlds largest DNS of isotropic turbulence was conducted by Yokokawa et al. [111]. The simulation was performed on a computational mesh of 40963 producing an estimated

2 TB per snapshot of velocity and pressure fields. Recently work has begun by Yeung and Sreenivasan [109] in which an isotropic DNS on a mesh of 81923 is performed.

Once complete, their simulation will break the long standing record of Yokokawa et al. [111] thus producing an even large dataset of an estimated 16 TB per output snapshot. With regard to wall bounded flows, the largest channel flow DNS was recently completed at Reτ = 5200 producing almost 4 TB per snapshot of velocity and pressure fields. This simulation is approximately 10 times larger than the previous largest channel flow DNS of Hoyas and Jim´enez [49] which generated 25 TB of raw data.

As computing power continues to allow larger simulations to be performed, the data generated from large-scale DNS will only continue to grow. This generation of ever increasing turbulence datasets will continue to put a higher demand on the data storage scheme used for DNS. A data storage approach that is not only sufficient in

13 CHAPTER 1. INTRODUCTION storage capacity but also amenable to post-simulation analytics is important. Due to the complexity of the turbulent fields, raw velocity and pressure data provide basic information that must be processed when studying the structure, organization, and topology of the underlying turbulence. Therefore, secondary analyses are required where post-simulation analytics can be performed to study the resulting data. As a result, efficient data layout and high I/O throughput are important qualities of the data storage scheme. Furthermore, the ability to easily perform data manipulations within the data server is important in order to remove low-level details of the data access from the analysis application. Classical non-persistent, flat-file data storage schemes or array-oriented storage schemes such as NetCDF and HDF5 do not eas- ily address these needs. One promising approach, however, to this large-scale data problem is to utilize database technology as the storage medium. An example of this approach are the Johns Hopkins Turbulence Databases in which the spatio-temporal data from three large-scale DNS of forced isotropic, magnetohydrodynamic, and chan- nel flow turbulence are stored in a public database [89, 68]. For a more detailed survey of data management systems see Appendix B.2.

The Johns Hopkins Turbulent Databases (JHTDB) provide public access to lage- scale, turbulent DNS data [89, 68]. These data reside within distributed SQL databases and are made available to any one in the world, using Web services over the internet.

Remote clients may easily access the data using client software which sends and re- ceives data using the Simple Object Access Protocol (SOAP). In addition to primary

14 CHAPTER 1. INTRODUCTION

field variables (e.g. velocity, pressure, magnetic field, etc.), secondary calculations

may be perfomed in-situ within the database cluster to obtain spatial derivatives of the

primary fields, perform particle tracking, and other operations. The JHTDB project

provides client libraries for C, Fortran, and Matlab including example codes that may

be easily extended and adapted for personal research. In addition to the client library,

any programming language that provides SOAP functionality (e.g. Python) may be

used directly with the Web service interface. The JHTDB currently contains three

DNS turbulent datasets: forced isotropic, forced magnetohydrodynamic, and channel

flow turbulence. The databases contain 27TB, 56TB, and 48TB of data, respectively.

The forced isotropic turbulence database is the first database generated for the

JHTDB [89, 68]. This database has been used extensively throughout the reseach community including those who are not members of the JHTDB project. Several examples of these studies include works by L¨uthi et al. [73], Gungor and Menon

[47], Holzner et al. [48], Wu and Chang [108], and Cardesa et al. [15]. The second database added to the JHTDB is the forced MHD turbulence database and has been used, for example, in the study of breakdown of flux freezing in MHD [33]. The latest database constructed is the channel flow database which was made publicly available recently [45]. The generation of these data along with their ingestion in the JHTDB are discused in Chapter 5 of this work.

Direct numerical simulations of turbulent channel flow have played an important role in the study of wall bounded turbulence. A brief history of such DNSs along

15 CHAPTER 1. INTRODUCTION

with sample cases where the DNS data are used are given below. As mentioned

previously, the first channel flow DNS was performed by Kim et al. [62]. In their work

they compare a large number of turbulence statistics to experimental data giving the

first glance of the applicability of DNS for wall bounded flows. Moreover, the wall

normal, velocity-vorticity formulation presented in their work has been a foundational

methodology where it has been used extensively in subsequent DNSs [84, 29, 30, 31,

66]. In a follow up paper, Kim [60] utilized the data from Kim et al. [62] and studied

pressure fluctuations in a channel. Following this [61] performed a DNS at Reτ = 395.

In this work the authors showed that velocity and vorticity spectra near the channel centerline exhibits local isotropy as predicted by Kolmogorov (1941). These data were then used by Blackburn et al. [10] who studied the topology in turbulent channel flow using the invariants of the velocity gradient tensor. In a similar effort, Jeong et al. [54] utilized the Reτ = 180 data from Kim et al. [62] in which near wall coherent structures were studied using the eduction scheme from Jeong and Hussain [53]. The work of

Moser et al. [84] brought the first generally available datasets for Reτ = 180, 395, and

595. Their work indicated the absence of low-Re effects in the velocity fluctuations for Reτ > 395. In del Alamo´ and Jim´enez [28], extended domains compared to the

Moser et al. [84] cases were utilized for Reτ = 180 and 550 in order to study large scale anisotropic turbulent structures. These large scales were not possible with the smaller domains used in Moser et al. [84]. Following this, del Alamo´ and Jim´enez [29] studied the spectra of these large scales. Furthermore, statistical datasets were made

16 CHAPTER 1. INTRODUCTION

available for these simulations. Continuing the analysis of large scale structures, del

Alamo´ et al. [30] performed DNS for Reτ up to 1900. In that work large domains were used to capture large scale energetic structures while shorter domains with more refined grids were used to study overlap layer structures. The work of Hoyas and

Jim´enez [49] produced a channel flow DNS for Reτ = 2003. In their analysis, velocity scalings were compared to the data from del Alamo´ and Jim´enez [29] and del Alamo´ et al. [30]. Statistical datasets for these simulations were also made publicly available.

Data from Hoyas and Jim´enez [49] at Reτ = 934 were compared against experimental data by Monty and Chong [82] and found excellent agreement with the velocity statistics and energy spectra. Panton [88] then used data produced by del Alamo´ and Jim´enez [29], del Alamo´ et al. [30], and Hoyas and Jim´enez [49] to study Re

effects on vorticity fluctuations. This work demonstrated two inner and one outer

scaling for the mean square of vorticity fluctuations. Also in 2009, Klewicki et al.

[63] utilized data sets from Moser et al. [84], Kawamura et al. [59] (at Reτ = 636),

and Hoyas and Jim´enez [49] to study the logarithmic behavior of velocity statistics.

These data were used to develop a theory which predicts the von K´arm´anconstant.

A final example of DNS data usage is Gao et al. [38] in which data from del Alamo´

et al. [30] and Moser et al. [84] is used to study statistical characteristics of vortex

cores in wall bounded flows.

Datasets mentioned in the previous paragraphs for Reτ = 180, 550, 934, 2003

are publicly available from the UPM Fluid Dynamics Group [105] and ICES [52].

17 CHAPTER 1. INTRODUCTION

In addition to these data, a dataset for Reτ = 5200 is also being generated [66].

Although, these data are publicly available, they are limited to statistical profiles and a small number of coarsely spaced (in time) velocity fields for Reτ up to 934 [105].

The statistical profiles for instance are very useful for validation or comparing data from differing sources and are easily downloaded. The fields, however, are challenging to obtain. The recipient must ship hardware to the location of the data where the

fields may be copied and stored to disk. The hardware is then shipped back to the receipient [52]. Once the data are received, the data must be transformed from spectral space to physical space thus requiring machines with a sufficient amount of memory, especially for the larger Reτ datasets, to perform the global transformations

[52]. The complex procedures involved for processing the data are also prone to user and equipment error.

In order to allow easy access to turbulent channel flow data, a channel flow DNS of

Reτ = 1000 is performed in this work (Chapter 5) and the resulting velocity and pres- sure fields are transferred and ingested into the JHTDB (data ingestion is performed by Kalin Kanov). Using the publicly available Web services and client interfaces, researchers may easily retrieve and interact with the data. The construction and im- plementation of the channel flow database for the JHTDB is discussed in Chapter 4.

Also presented in Chapter 4 is the Matlab client interface developed by the author for the JHTDB Web services. The Matlab client interface provides the ability to interact with the database from within a Matlab session. This ability allows clients to uti-

18 CHAPTER 1. INTRODUCTION lize Matlab intrisic functions such as plotting tools, fast Fourier transform functions, eigenvalue/vector procedures, etc. The details of the implementation and several examples are also discused in Chapter 4.

In Chapter 5, the channel flow DNS for the JHTDB is presented. Also included in the chapter is a study of coherent vortical structures in wall bounded flows. In the study, a vortex identification scheme (Q criterion) is used to identify vortical structures within the channel flow data. A brief discussion on vortex identification schemes is given in below.

The goal of vortex identification schemes is to correctly extract vortex regions that contain sufficiently strong, well-organized rotation about a filament axis. In general, using the vorticiy field directly is inadequate since in addition to containing well defined vortical filaments with spiraling patterns, it also may contain dominant background vorticity as in shear flows or other high vorticity regions such as near walls in wall bounded flows which do not necessarily guarantee vortical motion. To properly identify and track vortical structures, an accurate definition of a vortex core and discrimination method are needed. Though there is currently no singly accepted mathematical definition for a vortex core, several attributes that a vortex core should contain are, however, generally sought; these include:

1. Dominant rate of rotation with respect to rate of strain

2. Local pressure minimum

19 CHAPTER 1. INTRODUCTION

3. Compact swirling motion about an axis

Numerous definitions have been presented in the past which attempt to satisfy one or more of the attributes listed above. Several of these definitions based on the local kinematics of the velocity gradient tensor are: 1) the Q criterion [50] which defines regions with relative higher rotation rates than strain rates; 2) λ2 criterion which

Jeong and Hussain [53] states, “corresponds to the pressure minimum in a plane, when contributions of unsteady irrotational straining and viscous terms in the Navier-

Stokes equations are discarded”; 3) ∆ criterion [22] which is based on finding regions having intense rotation based on complex eigenvalues of the velocity gradient tensor;

4) λci (swirl strength) criterion [112] which like the ∆ criterion determines regions of intense rotation, but limits the definition to regions with complex conjugate eigenpair of the velocity gradient tensor; and 5) enhanced λci criterion [17] which extends the the λci criterion to also limit regions which possess radially compact swirling patterns.

In the work of Chakraborty et al. [17], all of these methods are compared. It is shown in their work that, in general, these methods identify similar vortical structures in most turbulent flows. Although in some instances discrepancies between the methods are obsevered. For the ∆ criterion, Jeong and Hussain [53] found vortex structures to be more noisy than the Q or λ2 methods; Chakraborty et al. [17] also observed noisy vortex boundaries for the ∆ criterion compared to the other methods. In forced isotropic flow, Chakraborty et al. [17] reports a significantly lower value for vortex region overlap between the ∆ and λci criteria (73.55% overlap) than for Q and λ2

20 CHAPTER 1. INTRODUCTION

criteria compared with the λci criterion which both indicate a +99% overlap. Also,

Jeong and Hussain [53] observed differences in identified vortex structures between

the Q and λ2 criteria for “a conically symmetric vortex with axial velocity”; this

difference is also noted by the swirling jet analysis of Chakraborty et al. [17] where it

is shown to be differences in the spiraling compactness thresholds implicily enforced

by the two criteria.

From a pragmatic point of view, the eigenvalue/vector calculations of the λ2, λci, and the enhanced λci criteria methods make them more expensive to compute than the Q and ∆ criteria methods. These differences in computational costs become most appearent when performing vortex analyses for large domain and/or for many ensembles. Between the Q and ∆ criteria, it is previously argued that the Q criterion is superior. As a result, in this work the Q criterion is used as the vortex identification method for the analysis in 5.4. In this analysis, vortex regions are identified and § assembled using the pointwise vortex definition provided by the Q criterion. The corresponding size, shape, and location of the vortical structures are studied and discussed.

1.3 Thesis Outline

In this work, modeling and computing strategies for turbulence simulations are presented. The first part of this work (Chapters 2-3) is devoted to multiscale mod-

21 CHAPTER 1. INTRODUCTION eling methodologies for LES. In Chapter 2, RNS, which provides a dynamic, down- scaling strategy for modeling subgrid scale geometry in LES, is discussed. Following in Chapter 3 is the presentation of a concurrent precursor methodology (CPS). This methodology provides a novel method for generating accurate inflow conditions for multiscale turbulence simulations. The second part of this work (Chapters 4-5) is focused on data intensive computing strategies for DNS. Discussed in Chapter 4 are an overview of the JHTDB along with the design and construction of the new channel

flow database. Also included in the chapter is presentation of the Matlab Web ser- vice client interface. The presentation of the channel flow DNS used to generate the

JHTDB channel flow database is presented in Chapter 5. The implemenation of the channel flow DNS along with a study of coherent vortical structures are discussed. The concluding remarks of this work are presented in Chapter 6. Additional discussions can be found in the appendices. For example in Appendix A, a validation case for the

LES code LESGO used in the RNS work is presented. The next three appendices dis- cuss topics from the second part of this work. Appendix B presents a data-intensive computing approach for performing DNS in which a turbulence database (as used in the JHTDB) may be generated during runtime of the DNS. Following this presenta- tion is Appendix C in which the B-spline collocation method empolyed in the channel

flow DNS code (PoongBack [66]) used in this work is discussed. In the final appendix

(Appendix D) is a discussion on the pressure solver implemented by the author in

PoongBack; several validation and verification studies are therein presented.

22 Part I

Multiscale Modeling

23 Chapter 2

Renormalized Numerical

Simulation1

2.1 Introduction

In this chapter, the RNS methodology first introduced in Chapter 1 is discussed.

The proposed update to the RNS framework is presented in 2.2. An overview of § the LES implemtation used for the RNS is given in 2.3. Before introducing and § comparing the various formulations, a particularly simple tree geometry is presented on which the various formulations will be tested. In this tree geometry all the branches reside on a plane perpendicular to the flow direction. The geometry and simulation set-up for this test case are presented in 2.4. The various RNS formulations are § 1Portions reused with permission from J. Graham and C. Meneveau, Phys. Fluids 24, 125105. Copyright 2012, AIP Publishing LLC.

24 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

described in 2.5. In 2.6 we present predicted forces, drag coefficients, and other § § aspects characterizing the performance of the various formulations. Based on the comparisons among them, we provide arguments about which can be considered the best option. It is then applied to simulations of a fully three-dimensional fractal tree in 2.7, and the total predicted drag force is compared to an available experimental § measurement. A summary and conclusions are presented in 2.8. §

2.2 RNS Framework

It is convenient to represent unresolved or subgrid drag elements by a momentum sink that extracts a prescribed amount of linear momentum from the flow. In high-Re

flows over blunt objects, the momentum sink or hydrodynamic drag force acting on the fluid is expressed in terms of a quadratic law based on a representative velocity

U according to, 1 e F = ρc A U U. (2.1) D −2 d | | e e Here, ρ is the fluid density, cd is the drag coefficient, and A a representative surface

area. While the representative area may be defined based on resolved geometric

features and the reference velocity U measured directly, the parameter cd associated with complex multiple-scale structurese is not typically known a-priori. For self-similar or fractal-like objects, RNS may be used to determine cd in a dynamic fashion during

simulation.

25 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

Figure 2.1: Decomposition of fractal geometry into resolved and subgrid components, as well as between r, β and b elements. Numerically, in the present work, the resolved portions are treated using the immersed boundary method (IBM), while sub grid portions are accounted for using RNS. The distribution of forces within the sub grid portion uses a filtered indicator function χβ (see text).

The RNS framework is based on decomposinge the self-similar object into resolved and subgrid-scale (unresolved) geometric features. An illustration of this decompo- sition to a particular fractal tree is shown in Figure 2.1. The large scales at near the base of the tree are resolved with the immersed boundary method (IBM), while the remaining scales are modeled using a drag force parameterized with the RNS- determined drag coefficient. The resolved and subgrid-scale regions are decomposed into RNS elements, i.e. geometric components such as branches or branch clusters.

The smallest of the explicitly resolved components are grouped into what will be called “r elements”, while the unresolved components are grouped into “β elements” and contain the largest unresolved components as well as all of its descendants. Fur- thermore, “b element” are defined as being composed of the union of any given r element and all of the β elements that are direct descendants of the given r element.

The core of the RNS methodology is the statement that the total force associated

26 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

with a b element must be equal to the forces from its constituent r and β elements.

The total fluid force acting on the b elements (Figure 2.1) is given by

F (t) = ( pδ + τ v )n dS (2.2) b,i − ij ij j ZSb

v where p is the pressure, τij is the viscous stress tensor, Sb is the wetted surface of

the entire b element and ni is the unit normal to this surface. We can decompose the

Sb surface into the surface corresponding to the r and β elements, and use Fr,i(t) =

v v ( pδij + τ )njdS and Fβ,i(t) = ( pδij + τ )njdS to write Sr − ij Sβ − ij R R

Fb(t) = Fr(t) + Fβ(t). (2.3) r b β b X∈ X∈

This (trivial) identity provides a self-consistency constraint that is analogous to the

Germano identity [39, 78] relating momentum fluxes filtered at various scales. The

identity becomes useful once certain modeling assumptions are introduced for Fb and

Fβ, while the resolved forces, Fr, can be obtained directly from the forces acting on

the numerically resolved portions of the object. In our case, when using the immersed

boundary method, the forces are provided directly during application of IBM. The

force associated with the unresolved elements, Fβ, is expressed using a generic form

drag model according to

F (t) = c (t)Γ (t) (2.4) β − d,β β

27 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION where c is the β elements’ drag coefficient, and Γ = ρ V V A /2. Here ρ is the d,β β | β| β β

fluid density, and Vβ and Aβ are the reference velocity and representative frontal area, respectively, of the β element. The reference velocity is computed by averaging the

fluid velocity over a predefined volume enclosing the β element. Ideally the volume should contain only fluid that interacts with the element, in order to ensure that the reference velocity correctly characterizes the hydrodynamic loading.

Next, we ‘zoom out’, and consider the b elements. The total force acting on these elements can also be expressed in terms of a form drag model according to

F (t) = c (t)Γ (t) (2.5) b − d,b b

where cd,b is the drag coefficient appropriate for the b elements, and the vector Γb is defined according to Γ = ρ V V A /2. Here V and A are the reference velocity b | b| b b b b and representative frontal area, respectively, of the b element. The reference velocity is computed by averaging the fluid velocity over a predefined volume enclosing the b element, that is geometrically similar (but larger) than that used for the β elements.

If there exists complete geometric similarity between b and β elements (which ex- ists in the case of a deterministic fractal shape shown), and if one assumes sufficiently large Reynolds numbers so that the drag force does not depend on Reynolds number

(or scale), one may assume that both drag coefficients are equal, at least on average, i.e. c = c for all β b. In general, however, this equality may not be exactly true d,β d,b ∈

28 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

and may require additional considerations. We relegate further discussion to 2.5, § which is devoted to several RNS formulations including numerical, spatial, and tem- poral treatments for the evaluation of cd,β and subsequently cd,b. For the remainder of this section we assume that cd,β and cd,b are already known (their determination is described in 2.5). § In a simulation, it is also necessary to prescribe the spatial distribution of this force. As described in Chester et al. [21], the unresolved hydrodynamic drag force is represented on a point x of the computational grid pertaining to element β as a momentum sink (local force per unit mass) given by

f (x, t) = κ (t) u(x, t) u(x, t)χ(x) (2.6) β − β | | e e e where κβ is determined so as to enforce that the total force due to element β is equal to the prescribed force Fβ. u is the local resolved velocity vector at that point, and

χ is the filtered indicator function.e The filtered indicator function (illustrated in

Figure 2.1) is defined as χ = G χ where χ is the true indicator function (1 inside e ∗ G the object and 0 outside),e and is a Gaussian filter kernel of width 2∆. The filtered indicator function is the representation of the subgrid-scale geometric information within the computational mesh at resolved scales. The undetermined factor κβ is

d chosen such that the integrated distributed force vector Fβ(t) = fβ(x, t)d x is as R close as possible to the total force computed from its parameterized form given by

29 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

(2.4) such that

κβ(t)Iβ(t) = cd,β(t)Γβ(t) (2.7)

where I (t) = u(x, t) u(x, t)χ(x)ddx. Performing a least-squares minimization of β | | R this overdeterminede systeme ofe equations (since it is a vector equation for a scalar

quantity) results in the contraction each side of (2.7) by Iβ. Solving for κβ from the

resulting scalar equation results in

Γ I (t) κ (t) = c (t) β · β . (2.8) β d,β I (t) 2 | β |

Once κβ is determined for each β element, (2.6) is applied as a forcing term in the momentum equation.

2.3 LES Implementation

Simulations are performed using a variant of the JHU-LES code [92, 12, 21] called

LESGO. LESGO solves the filtered, incompressible Navier-Stokes equations for a neutrally buoyant and high-Re flow such that,

∂u + u u uT ∂t · ∇ − ∇

e 1  1 1 = e ρ− ep + e τ + ρ− f + ρ− Πi (2.9a) − ∇ ∇ · e e

30 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

u = 0 (2.9b) ∇ · e where u the filtered velocity, ρ the fluid density, p the filtered (modified) pressure, τ

the deviatorice component of the subgrid scale stresse tensor, f the forcing term which

contains the IBM and RNS forces, and Π = d P /dx thee applied mean pressure − gradient forcing where i = (1, 0, 0). For all simulations in this study, the scale-

dependent Lagrangian dynamic subgrid stress model [12] is employed.

The governing equations are discretized using a pseudo-spectral method where

spectral discretization is used in the x and y (horizontal) directions and 2nd order

finite differencing in the z (or vertical) direction. Time integration is performed using a 2nd order Adams-Bashforth scheme. Periodic boundary conditions are used along the sides of the domain. A stress-free boundary condition is imposed at the top of the domain, while a rough-wall (low-law) boundary condition is imposed at the bottom surface.

Solid objects (resolved scale features) in the domain are represented using the

IBM implementation already described in Chester et al. [21]. This implementation employs a level set description for the representation of the objects in the domain.

The IBM enforces a zero velocity condition inside of solid objects along with a log-law wall shear stress that is applied as a tangential boundary condition at the wall. In

Appendix A, a validation case using the IBM is performed for flow over an array of wall mounted cubes.

31 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

2.4 Planar tree canopy test case

For the purpose of testing various RNS formulations to be presented in 2.5, a § configuration for flow over a canopy of “planar fractal trees” is used. The branches of each tree are arranged in a “V” configuration. Shown in Figure 2.2 is a canopy of regularly spaced trees consisting of only two resolved generations (g0 and g1).

Figure 2.1), shows the tree containing the filtered representation of the additional subgrid-scale branches. The planar elements provide the multiple-scale features in the y and z directions while remaining relatively simple.

A constant scale factor r = 1/2 coupled to a doubling of each element at each gen- eration provides a completely self-similar tree with a fractal similarity dimension[74]

1 of D = log(NB)/log(r− ) = 1 where NB = 2 is the number of branches of the gener- ator at g0. The tree geometry is described by an [34] (IFS), with the similarity contraction w : X X i = 0,...,N 1 where X is { i → | B − } 3 a closed subset of R . It follows that, a branch cluster B at generation gn is ob- tained by performing n contraction mappings of the fractal generator G such that

B = w w (G), where in ◦ · · · ◦ i1

w (x) = rR x + s , (2.10) i i · i

Ri a rotation matrix, and si a translation vector. To describe the current geometry, we have R = 1, s = (0, 1.179h, h) and s = (0, 1.179h, h) where h is the height i 0 − 1

32 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

Figure 2.2: Fractal tree canopy composed of planar V-trees. Shown are the two “resolved” generations g0 and g1. The solid black box indicates the physical domain for the simulations. The entirety of the tree as represented in the simulation is shown in Figure 2.1.

of the generator. The generator branch diameter is d = 0.571h and has a length of l = h/cos(θ) where θ = 45◦ is the skew angle relative to the vertical axis. Each branch is also offset a distance of 0.179h from the center of the branch cluster.

The simulated domain is the region indicated by the black box in Figure 2.2 and is defined as (x, y, z) : 0 x 12h, 0 y 6h, 0 z 8.2h . Three trees { ≤ ≤ ≤ ≤ ≤ ≤ } were placed in the periodic domain in order to minimize coupling between turbulence generated by the leading tree from that of the last tree (and vice versa). A uniform spatial discretization of N N N = 300 150 206 is used such that there x × y × z × × is a minimum of eight grid points across the diameter of the branches in the last resolved generation. This helps ensure sufficient resolution for the IBM at least as

33 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

far as predictions of drag forces is concerned [21, 104]. The flow is forced in the +x

direction using a mean pressure gradient forcing of Π. The normalization parameters

used in the simulations are based on ρ, h and Π such that the time, velocity, and

force scales are defined as,

1 τ = ρhΠ− (2.11)

ph up = Π (2.12) sρ ρu2 f = p = Π (2.13) p h respectively.

Boundary conditions along the x y perimeter are periodic. A stress-free, no- − permeability condition is applied at the top boundary. Along the bottom surface, the IBM is used to implement a rough wall, log-law stress[21, 20] with a surface

4 roughness of zo = 10− h. The IBM treatment of the bottom surface was required because in the IBM implementation, grid points normal to the solid surfaces at a distance δ 1.1∆, where ∆ is the grid spacing, are required for evaluation of the ≤ tangential stresses applied to the fluid. As a result, there exist points xs = (xs, ys, zs) on the tree surface near z = 0 such that z = zs +δnz < 0, where nz is the z component

of the local surface normal, that consequently lie outside the computational domain.

As a result it was required to place several grid points below the bottom surface of

the simulated domain and use the IBM to represent the bottom surface consistently.

34 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

As in prior applications[21, 20], the same log-law stress boundary condition is applied

on the resolved tree branch surfaces, also imposing a constant surface roughness of

4 zo = 10− h.

Shown in Figure 2.3 are the RNS reference regions (volumes) for both b elements

and one set of β elements for the target tree in the domain. The base of the ref-

erence region is centered about the base of the bottom most branch cluster within

the respective elements and the top of the region extends to the top of the tree at

z = 2h. Proportionality factors are applied to determine the width (y direction)

and depth (x direction) of the reference region such that the width and depth are

2.0 and 1.143, respectively, times the height. The reference area used for the RNS

calculations is taken to be the frontal area (whose normal is in the x direction) of − the reference regions. The reference velocity for the b and β elements is calculated from the volumetric mean of the velocity inside the reference region for each element.

During the calculation of the reference velocity, the LES velocity is sampled at evenly distributed points within the volume with a spacing at or below the grid spacing. A trilinear interpolation is used to place the LES velocity on these sampling points.

During RNS, the flow field is initialized with a log-law boundary layer profile with superimposed background perturbations. The simulations are allowed to “spinup” for fully developed, statistically steady-state canopy turbulence to be established.

In Figure 2.4, the magnitude of the instantaneous velocity from one of the simu- lations is shown. These results are from the RNS using model M2 (to be described

35 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

Figure 2.3: Definitions of RNS references regions used in the “V-tree” canopy simulations.

in detail in 2.5). From Figure 2.4 the structure of the subgrid geometric scales is § readily observed within the contour slices along the y z (constant x) plane through − the low velocity areas highlighted by the black contour regions. Qualitatively, this

clearly hightlights the impact of the subgrid-scale force field on the surrounding fluid

where we observe a similar “footprint” in the upper canopy as would be expected

from a direct (resolved) representation of all the branches. Shown in Figure 2.5 are

contours of the magnitudes of the instantaneous velocity on x y (constant z) planes. −

The planes are along the branch mid-planes of generations g2, g3, and g4, which are all within the subgrid-scale region. Vortex shedding due to the parameterized subgrid- scale branch structures is observed thus showing the ability of the modeling technique to capture physical flow phenomenon. Also noted are the scale reduction and increase

36 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

Figure 2.4: Contour plot of instantaneous velocity magnitude along a constant y and constant x plane. in number of the wakes with increasing generation and height. This is a direct result of retaining the multiscale, geometric information in the applied subgrid-scale forcing.

The magnitude of the applied force field as given by (2.6) is shown in Figure 2.6 for the same time step as the previous velocity plots. From this figure the retention of the multiscale, geometric information by the model is further illustrated. Also from this figure, the smoothed subgrid-scale structure of the tree is clearly visible. We note the non-uniformity in the force distribution within the subgrid region, due to the non-uniformity of the instantaneous velocity distribution highlighting the local nature of the applied subgrid-scale forcing.

37 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

Figure 2.5: Instantaneous velocity magnitude contours on horizontal planes at the mid-plane heights of branch generations g2, g3 and g4.

Figure 2.6: Contour plot of instantaneous local force magnitude along a constant-x plane across the middle tree in the domain.

38 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

2.5 RNS formulations

After having presented the numerical simulation technique and some representa- tive views of the flow, in this section we present various alternative formulations to evaluate drag coefficients for the unresolved forces. Four versions will be considered, to be denoted as models M1 through M4, as listed in Table 2.1. There are three different types of treatments we apply to the evaluation of the RNS drag coefficient.

First, the temporal treatment indicates whether or not time averaging is used in the evaluation of the drag coefficient. Second, spatial treatment refers to whether a single drag coefficient (global) is computed for all the objects in the domain (e.g. all trees in a canopy or all branches on a tree), or if a drag coefficient is computed for each of the b elements in the domain (local). Last, the numerical treatment indicates if an explicit or implicit time treatment is used for the drag coefficient calculations.

We present four permutations of the three treatment categories. For each model a simulation is performed using the fractal tree canopy as presented in 2.4. § Table 2.1: Definitions of the tested RNS models. Model Temporal Spatial Numerical M1 Instantaneous Global Explicit M2 Averaged Global Explicit M3 Averaged Global Implicit M4 Averaged Local Explicit

39 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

2.5.1 Model M1

This is the original RNS model presented by Chester et al. [21]. It was based

on an instantaneous approach, computing a single (global) drag coefficient for all

branches, using an explicit dynamic formulation, and a branch-based description. In

this section we recast the original formulation using a more generalized framework,

based on “b” and “β” elements.

The basic relationship between the drag coefficients of each b element and its descendant β elements is written as

cd,β(t) = cd,b(t) (2.14)

for all β b. This equality assumes complete dynamic similarity. This assumption ∈ breaks down if the subgrid-scale drag becomes affected by viscous drag, since then there is an additional parameter, the Reynolds number, which would differ at the two scales. Here we assume that the Reynolds number is large enough so that even the full subgrid-scale range is in the inertia-dominated regime. For rough boundary layers, this would correspond to a subgrid range that is in the “fully-rough” regime.

Similar arguments were developed and tested for LES using the dynamic rough-wall model [3]. Moreover, Eq. 2.14 also assumes dynamic similarity at any instant t.A limitation of this approach is that for a turbulent flow, complete similarity between branches is only satisfied in a time-averaged sense, whereas here we are assuming

40 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

that it holds at each individual time. In using a global definition for the RNS drag

coefficient, the drag coefficient for all b elements reduces to a common global value such that we use cd,b = cd for all b. The time explicit treatment implies that model

M1 uses the drag coefficient c (t ∆t) determined at the previous time-step (∆t is d − the time step) in the formulation of the force equality. With these assumptions in

mind, the force balance on each b element – as given by (2.3) – may be expressed as,

F (t) = F (t) c (t ∆t) Γ (t). (2.15) b r − d − β r b β b X∈ X∈

By virtue of the time-explicit treatment, the right hand side of the above equation is fully specified at every instance in time. Self-consistency also requires that the vector expression given by (2.5) be satisfied, with cd(t) chosen to enforce the equality

in some sense. Following Chester et al. [21] and usual practice for overdetermined

systems as in the dynamic model [71], a least-squares error minimization approach is

applied. The error to be minimized is expressed as,

E(t) = e e (t) (2.16) b · b Xb

where eb – the RNS error – is defined as,

eb(t) = Fb(t) + cd(t)Γb(t) (2.17)

41 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

and Fb is given by (2.15). The contraction of eb implies summation over all spatial components (i.e., e e = e2) and the summation over b is performed over all b b · b b elements in the domain. To find cd which minimizes the error given by (2.16), we compute ∂E/∂cd = 0 and after solving for cd we obtain,

F Γ (t) c (t) = b b · b (2.18) d − Γ Γ (t) Pb b · b P as the RNS drag coefficient. Note that the definition of the error in Eq. 2.17 refers to

the error in the force corresponding to the entire b element, including both resolved

(r) and unresolved (β) elements. This is in contrast to the approach followed in model

M3, discussed later on, where the error refers to forces acting only on the resolved

(r) elements.

An important other assumption that underlies Eq. 2.14 (complete similarity) is

that effects of distance to the ground are neglected. That is to say, the scale ratio

hrg+1/z between the branch length at the first un-resolved generation g + 1 (hrg+1)

and their height above the ground z is very small, i.e. hrg+1/z << 1. Since z

is of the order of h, and r < 1, this condition is equivalent to assuming that the

length-scales of the last resolved generation is already quite small (or g >> 1). In the applications presented in this study, where due to resolution limitations we use g = 2, this condition is satisfied only marginally. This is similar to LES in which the

filter scale should in principle be chosen to be much smaller than the integral scale,

42 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

but still cannot be chosen much smaller due to practical constraints.

2.5.2 Model M2

In this model we apply temporal averaging to the evaluation of the drag force.

The application of temporal averaging is motivated by the basic properties of drag law

(2.1), which for unsteady turbulent flows refers typically to the relationship between

the time-averaged drag force and time-averaged incoming velocity. While one may

also define an instantaneous drag coefficient based on the instantaneous force, at any

given instant there is no requirement that the instantaneous drag forces at various

scales are matched instant to instant. For model M1 this will be seen to lead to strong fluctuations in the drag coefficient and to large peaks in error in enforcing the basic drag force equivalence from (2.5). Hence, for model M2, we introduce time averaging.

The total error is obtained by applying a time averaging operation to (2.16). For practical purposes during a simulation it is advantageous to use a running temporal

filter. Applying this temporal filter to (2.16), the weighted error is defined as,

t E = e e (t′)W (t t′)dt′ (2.19) b · b − Z−∞ Xb

where W (t t′) is a one-sided (to comply with ‘causality’) weighting function such − t that W (t t′)dt′ = 1. As with model M1, a least-squares minimization to (2.19) −∞ − R 43 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

with respect to cd enables us to find,

(t) c (t) = LFΓ (2.20) d − (t) LΓ Γ where

t (t) = F Γ (t′)W (t t′)dt′ (2.21) LFΓ b · b − Z−∞ Xb t (t) = Γ Γ (t′)W (t t′)dt′ . (2.22) LΓΓ b · b − Z−∞ Xb

The weighting parameter of (2.21) and (2.22) may be chosen in such a way that the accumulation of data at previous times and integration over t′ is avoided. This

simplification is achieved by adopting an exponential function for W (as was done in

Meneveau et al. [79] for averaging along Lagrangian trajectories):

′ 1 t−t W (t t′) = e− T (2.23) − T where T is a prescribed relaxation time scale that controls the “memory” of the various terms used in the calculation of the drag coefficient. Inserting (2.23) into

(2.21) and (2.22), then differentiating each with respect to t, we obtain the following

44 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION expressions:

d FΓ 1 L = Fb Γb FΓ (2.24) dt T · − L ! Xb d ΓΓ 1 L = Γb Γb ΓΓ . (2.25) dt T · − L ! Xb

These equations may be integrated in time using a standard numerical integration technique; in this study we used Euler integration for which and are ex- LFΓ LΓΓ pressed as,

(t) = (1 ǫ) (t ∆t) + ǫ F Γ (t) (2.26) LFΓ − LFΓ − b · b Xb (t) = (1 ǫ) (t ∆t) + ǫ Γ Γ (t) (2.27) LΓΓ − LΓΓ − b · b Xb where ǫ = ∆t/T . The choice of the relaxation time-scale T will be discussed sepa- rately.

2.5.3 Model M3

The previous two global models employ an explicit scheme when evaluating the force in Eq. (2.4). We now present an implicit formulation for the global description, where (also applying scale-invariance) cd,β(t) = cd,b(t) = cd(t). The resulting total

45 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

force balance for each b element in the domain then becomes,

F (t) = F c (t) Γ (t) (2.28) b r − d β r b β b X∈ X∈

with the form drag representation Fb given by (2.5). Consequently, the error is

determined by inserting (2.5) into (2.28) and rearranging terms to obtain

eb(t) = Rb(t) + cd(t)Mb(t) (2.29)

where Rb = r b Fr and Mb = Γb β b Γβ. Using the definition for the temporally ∈ − ∈ weighted errorP given by (2.19) and insertingP the error expressed by (2.29), we may compute the drag coefficient using, again, a least-squares minimization procedure with respect to cd to obtain (t) c (t) = LRM (2.30) d − (t) LMM

where

t (t) = R (t′) M (t′)W (t t′)dt′ (2.31) LRM b · b − Z−∞ Xb t (t) = M (t′) M (t′)W (t t′)dt′ . (2.32) LMM b · b − Z−∞ Xb

46 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

As in model M2 this is equivalent to first order differential equations for and LRM , and applying Euler integration these can be solved according to, LMM

(t) = (1 ǫ) (t ∆t) + ǫ R M (t) (2.33) LRM − LRM − b · b Xb (t) = (1 ǫ) (t ∆t) + ǫ M M (t) (2.34) LMM − LMM − b · b Xb

where, again, ǫ = ∆t/T .

Note that unlike the explicit formulations in which the error is based on the entire

b element force including both resolved (r) and unresolved (β) elements, here the

error refers only to the resolved (r) portion. By definition, the resolved portion of

the force represents is smaller, and could be more prone to zero crossings which cause

difficulties in practical implementations, compared to formulations based on the total

(b element) force. Also, as a result of being based on different definitions of the error,

the implicit and explicit formulations differ even in the limit of small computational

time-step ∆t.

2.5.4 Model M4

In general, fractal objects contain heterogenous sub-structures, i.e. not all the b

and β elements need to be the same. Moreover, even if geometrically similar, their

alignment with the incoming flow can be different. Hence, there is motivation for a

more spatially localized approach, to determine a local drag coefficient that differs

47 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

for different parts of the fractal. In the current framework, we propose to determine

a separate drag coefficient that differs for each b element in the domain.

The formulation of this model begins with (2.4) where, along with scale-invariance,

we use again a time explicit scheme, i.e. c (t) = c (t ∆t). Inserting (2.4) with d,β d,b − the current drag coefficient definitions into (2.3), results in the total force balance for each b element in the domain being expressed as,

F (t) = F (t) c (t ∆t) Γ (t) (2.35) b r − d,b − β r b β b X∈ X∈

Similarly to model M2, it is required that Fb also satisfy,

F (t) = c (t)Γ (t) (2.36) b − d,b b resulting in the error

eb(t) = Fb(t) + cd,b(t)Γb. . (2.37)

Model M4 will, again, involve time averaging. Consequently, the temporally weighted error is expressed as,

t E = e (t′) e (t′)W (t t′)dt′ (2.38) b b · b − Z−∞ where the summation over b elements previously used in model M2 is omitted. We may then obtain the drag coefficient local to each b element by taking ∂Eb/∂cd,b = 0.

48 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

Performing this operation and rearranging terms leads to a drag coefficient defined

as, (t) c (t) = LF Γ,b (2.39) d,b − (t) LΓ Γ,b where

t = F Γ (t′)W (t t′)dt′, (2.40) LF Γ,b b · b − Z−∞ t = Γ Γ (t′)W (t t′)dt′. . (2.41) LΓ Γ,b b · b − Z−∞

Using again Euler integration to solve the associated first-order differential equation yields

(t) = (1 ǫ) (t ∆t) + ǫF Γ (t) (2.42) LF Γ,b − LF Γ,b − b · b (t) = (1 ǫ) (t ∆t) + ǫΓ Γ (t) (2.43) LΓ Γ,b − LΓ Γ,b − b · b where, as before, ǫ = ∆t/T .

2.6 Test results

In this section, results obtained from using the four variants M1-M4 are presented and discussed. Each of the models are tested for flow over the “V-tree” canopy. During

RNS the forces, both resolved and unresolved, are obtained along with the reference

49 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

velocities. These quantities are recorded in addition to the RNS drag coefficient

and the RNS error for each b element. For models using temporal averaging the

relaxation time-scale T is chosen large enough compared to a tree characteristic time-

scale. Specifically, the time-scale τc = h/Uc, where Uc is the mean velocity within the

canopy and h is a characteristic length scale of the trees can be used as reference.

During the current analysis we opted to use 3τ T 5τ to ensure averaging over c ≤ ≤ c several canopy or tree-scale events. It was checked a posteriori for the results presented

that U 10u and that T/τ fell within the desired range, typically T/τ = 3.5, or c ≈ p c c T = 0.35τ, where τ is the reference time-scale of the simulation given by Eq. (2.11).

The impact of T/τ on the computed RNS drag coefficient is investigated in 2.6.3. c §

2.6.1 RNS quantities

The approach is tested in simulations and the resulting drag forces (resolved,

subgrid-scale, and total) on the target tree for model M2 are shown in Figure 2.7.

These results indicate a strongly varying hydrodynamic loading on the tree. It is noted

that the forces due to the subgrid-scale component contributes at least half of the total

load. In Figure 2.8 the forces on one of the two individual b elements, namely element

b1 of the target tree is shown for simulations using model M2. The contribution to

both the x and y components of the force are displayed. The y component force

fluctuates about zero due to the symmetric loading. From the x component force, it is observed that during this period of time, the subgrid contribution is slightly more

50 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

Figure 2.7: Time series of hydrodynamic forces in the x-direction acting of the target tree for RNS model M2.

than half of the total force for element b1. The resulting drag forces for the other

RNS models exhibit similar characteristics and are not shown for the sake of brevity.

Shown in Figure 2.9 are time series of the measured reference velocities used in the RNS calculations for model M2. The measured reference velocities show similar characteristics for both the b1 element and its β element descendants. The similarity

between the two β elements can be attributed to the symmetry of their arrangement

with respect to the bulk flow direction. In addition, the similarity of the velocity

signals between the b element and its descendant β elements is somewhat expected

since the β elements occupy subspaces of the parent b element.

The time history of the drag coefficients computed using RNS for all models are

shown in Figure 2.10. In the top of Figure 2.10 the drag coefficient for model M1

51 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

Figure 2.8: Time series of hydrodynamic forces acting on one of the branches (element b1) of the target tree for model M2. The left axis shows the x-direction forces, while the right axis (lower group of lines) shows the y-spanwise direction forces.

Figure 2.9: Time series of reference velocities measured from element b1 and its descendant β elements in the target tree, for model M2.

52 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

is shown. These results indicate the presence of very large and fast fluctuations

compared to M2 shown in the second plot from the top. The fluctuations of the

instantaneous model, also observed in Chester et al. [21], can be attributed to the

rapidly varying local flow structures which differ instantaneously among scales various

scales (for instance a vortex shedding event on the b element may differ in phase from

that on the β elements. Due to the non-trivial implicit feedback mechanism between

the applied forces and the measured flow field, it has been observed that a strongly

time varying drag coefficient has the propensity to cause negative side-effects on

stability. This is similar to what has been observed in the dynamic subgrid-scale stress

model, in which some kind of averaging is often required. The impact of temporal

averaging is clearly evident as the variability about the mean value is much reduced,

and the overall signal is smoother when compared to that of M1. As expected, and as will be shown more clearly in 2.6.3 the temporal averaging dampens the fluctuations § at time scales on the order of, or smaller than, T = 0.35τ.

Comparing models M2 and M3 in Figure 2.10, cd from model M3 displays more small time-scale fluctuations compared to the explicit model M2. Further tests (not shown) have uncovered additional problems with the “implicit” formulation M3: It turns out to be more susceptible to instabilities associated with lack of alignments between modeled and real forces. In the “explicit formulations” one assumes that the total force on a b element is aligned with the corresponding velocity. In the implicit formulation, one must assume that the difference in forces between b element

53 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

and β elements, i.e. the r element force, is aligned with the difference of velocities

at both scales. In practice, this condition is more often violated which can lead

to force and velocity differences aligned in the same direction rather than opposing

each other, thus leading to negative drag coefficients. If this occurs, the simulations

tend to become unstable. Hence, the implicit formulation can suffer from problems

of stability, especially when applied to trees with more complex geometry. These

stability issues are in agreement with the findings of Chester et al. [21].

The drag coefficients for the local model, M4, are shown in the bottom plot in

Figure 2.10. Since the local model evaluates different drag coefficients for both b

elements, two time signals are shown, corresponding to elements b1 and b2 (both

branches in the target “V” tree). From the time series it is noted that the drag

coefficients from the local model display somewhat increased variability with respect

to model M2. This is expected since there is no “ensemble” averaging as is done in the global formulation. The increase in variability could, if desired, be offset by increasing the value of T used in the temporal averaging (see 2.6.3). § In addition to the time-signals from Figure 2.10, the mean and standard deviations of the computed RNS drag coefficients are shown in Figure 2.11. Temporal averaging clearly leads to decreased standard deviation as compared to the instantaneous model.

This is in agreement with our previous qualitative findings from the time-signals shown in Figure 2.10. We also observe that the mean values of the drag coefficients are in fairly good agreement with one another with the largest variation from the

54 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

Figure 2.10: Time history of drag coefficient obtained from RNS using models from Table 2.1. For models M1 – M3 the single, global drag coefficients are presented. In the bottom plot, the time-series for M4 are shown corresponding to each branch of the “V-tree”, denoted as elements b1 and b2. group being the implicit model M3 which predicts the lowest mean drag coefficient of the group.

2.6.2 Selected flow statistics

In order to further document the characteristics of models M1 through M4, we compute several basic flow statistics. In Figure 2.12 we show the mean streamwise velocity profiles for each of the RNS formulations. These are averaged both in time as well as across the horizontal planes. For the averaging over horizontal (constant z) planes, no distinction is made between values existing inside or outside of the trees

(inside the resolved part of the tree, the velocities and stresses are zero). Thus, the

55 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

Figure 2.11: Mean values (vertical bars) and standard deviations (error bars) of the computed RNS drag coefficients.

averaging is performed over the entire plane. This approach was adopted to allow for

consistency between the averaging operation of the resolved and subgrid-scale regions.

It can be readily seen that a typical boundary layer profile forms in the region

above the canopy. In the upper part of the resolved branch region, 0.5h < z <

1.5h, the flow shows almost constant mean velocity. In the lower region z < h/2 the mean velocity is further reduced due to the presence of the ground surface. In the unresolved-branch region at z > 1.5h, a smooth transition occurs towards the

boundary layer behavior above the canopy.

The mean velocity profiles show very little differences among the various RNS

models used. Especially within the canopy, results are almost indistinguishable.

Model M3 leads to slightly larger mean velocity, perhaps not surprising since it lead

56 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

Figure 2.12: Horizontally averaged (a) mean velocity and (b) turbulent shear stress profiles evaluated from RNS of flow over the “V-tree” canopy, for different RNS formulations M1-M4. to slightly smaller drag coefficient compared to the other models.

Turbulent shear stress distributions are shown in Figure 2.12 (b) for each of the

RNS formulations. Shown are the horizontally averaged Reynolds shear stress profiles, as well as the dispersive stresses computed using spatial averaging of the mean velocity distributions [107]. Some differences can be observed in the Reynolds and dispersive stress profiles between the models. The total stress remains essentially the same for all cases since the same pressure gradient forcing is applied. It follows a nearly linear profile within the canopy, but with different slopes associated with the resolved and subgrid regions.

57 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

Figure 2.13: Time history of the RNS drag coefficient obtained for several temporal averaging time-constants.

2.6.3 Temporal averaging time-scale

In this section, the impact of the temporal averaging time-constant T is explored.

We compare RNS model M2 with three values of T/τc=0.5, 3.5, and 10 (or T/τ =

0.05, 0.35 and T/τ = 1). In Figure 2.13, the time varying RNS drag coefficients are shown for the three time scales. From this figure it can be readily observed that increasing T corresponds to increased smoothing, where fluctuations on the order of or below T are filtered from the signal. It is also seen that the mean values of the drag coefficients are essentially independent of T in the range tested.

58 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

Table 2.2: Time averaged drag coefficient, RNS error, and forces for each of the RNS models when applied to simulation of the “V-tree” canopy. Results shown pertain to the sample tree in the middle of the domain. Along with time, all of the quanties are also averaged across both b elements of the tree. e is defined as e / F ; F k b k | b| | b| | R| the total resolved force on the target tree; FS the total subgrid force on the target tree; F the force on the target tree. | | | T | Model c fe / h3 e [%] F /f h3 F /f h3 F /f h3 d b p k b k | R| p | S| p | T | p M1 0.796 4.53 7.55 114.5 77.2 191.3 M2 0.809 4.47 7.43 113.6 77.6 191.1 M3 0.776 4.60 7.55 113.0 79.5 192.4 M4 0.803 4.54 7.41 116.7 79.0 195.7

2.6.4 Discussion

In this section the four formulations of RNS listed in Table 2.1 are discussed.

Each model has been evaluated by performing simulations using the setup presented

in 2.4. Listed in Table 2.2 are time averaged statistics for the drag coefficient and § RNS error, and the average magnitudes of forces acting on the middle (target) tree

in the domain. For the drag coefficient, the time explicit models M1, M2, and M4 predict a value of approximately c 0.8, whereas the implicit model (M3) predicts d ≈ a slightly lower value of approximately c 0.78. The values for the RNS errors d ≈ (both the raw and normalized values) indicate that models M2 and M4 provide the smallest errors, albeit the differences between models are relatively small. From the total tree force ( F ) we observe that the global models predict a smaller loading | T | than what is found with the local model. This trend is also seen in the total resolved force ( F ) whereas the total subgrid forces ( F ) show (relatively) little differences. | R| | S| One significant difference occurs between the temporally averaged models and the

59 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

instantaneous formulation. The results (see Figure 2.13) show that the temporal av-

eraging eliminates fluctuations associated with times scales on the order of or smaller

than T . In various other applications, it was found that the implicit formulation is hindered by stability issues when applied to more complex geometries (such as three- dimensional branching objects) in which negative coefficients can result from locally highly misaligned drag forces and velocities. Consequently, the implicit formulation is deemed less desirable since its drawbacks are not offset by any notable benefits.

Finally, the local formulation provides the ability to model heterogenous structures, which is highly desirable for simulations of flow over more general structures.

From this comparative analysis it is concluded that model M4 is the recommended

RNS approach, since it is not limited to homogeneous structures and canopies, pro-

vides for a more stable numerical implementation, and allows to better reproduce

dynamic similarity of drag forces by using temporal averaging.

2.6.5 Grid and RNS Modeling Sensitivity

In this section we discuss the impact of the grid resolution and the number of

resolved and RNS-modeled generations. For grid resolution studies, we perform two

new simulations at two additional resolutions compared to that used in the test case

of 2.4. Moreover, to examine effects of number of resolved and subgrid branch § generations, we consider cases with only a single resolved generation, as well as a case

with three resolved generations. A caveat related to the single resolved generation

60 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

case is that the first generation is nearest to the ground, which breaks strict geometric

scale-invariance in the sense that further generations above the tree are not subjected

to the presence of the surface. (Note that this is similar to the dynamic Smagorinsky

model case in standard LES, if one places the grid and test-filters close to the integral

scale of turbulence, where the basic assumption of scale-invariance no longer holds

exactly).

A list of the simulation cases considered is shown in Table 2.3. Each case uses

RNS model M4 with a averaging time-scale of T = 0.35τ. Note that case G2R2 is the configuration used for testing model M4 in 2.4. Statistics for the drag forces § were collected over a duration Ts and averaged over the three trees in the canopy.

The higher-resolution cases G3 are computationally expensive, and so Ts could not be extended to very large times for those cases. As shown in prior sections, the computed forces display considerable temporal unsteadiness, and so obtaining fully converged forces for the G3 simulations was not feasible. In order to document the uncertainty associated with statistical convergence, we report the mean values obtained from the available simulation time Ts as well as the uncertainty. The latter is quantified as follows: the force signal over the entire available averaging time Ts is divided into Nc segments of length corresponding to the correlation time scale

Tc of the measured total drag force (Figure 2.7 shows representative signals of such forces). The correlation time-scale Tc is computed from the first zero crossing of the autocorrelation function (itself only an approximated value) of the drag forces. We

61 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

Table 2.3: Definitions of the case configurations used in the grid and RNS modeling analysis. Listed are the case names, grid resolution, number of resolved generations (Ng), number of grid points across the diameter of the branches in the last resolved generation (Np), and the canopy averaged total drag forces. Case N N N N N F /f h3 T /τ e x × y × z g p x p s s G1R1 150 76 106 1 8 185 96 4.0 G2R1 300 ×150× 211 1 16 187 102 5.0 G2R2 300 × 150 × 211 2 8 181 70 4.5 G3R2 600 × 300 × 421 2 16 187 12 10.0 G3R3 600 × 300 × 421 3 8 181 12 10.3 × ×

find that among the cases Tc/τ has an average value of Tc/τ = 3, a value that is employed for the analysis. The force is averaged within each of the Tc segments to obtain Nc different partial averages. Also, the standard deviation σc among the Nc partial averages is computed. The standard deviation one would expect for the overall average force can then be estimated according to es = σc/√ Nc, since the Nc partial

mean values are (approximately) uncorrelated. The values of Ts and es for each case are reported in Table 2.3.

The resulting time and canopy averaged drag forces for the resolved, subgrid, and total contributions are shown in Figure 2.14. The standard error of mean total drag force is indicated by the error bars. The total drag force is reasonably insensitive to changes in the grid resolution and number of RNS modeled generations, as would be expected. A slight increase in the resolved drag force is observed when doubling

3 the grid resolution. In both cases of grid refinement an increase of roughly 5fph is

realized. The increase in resolved drag force with an increase in grid resolution is

attributed to the increased accuracy of the IBM and more resolution of turbulence.

62 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

Figure 2.14: Mean values of the measured drag forces decomposed into the resolved, subgrid, and total contributions. The error bars indicate the estimated standard error of the mean total drag force due to statistical convergence.

The current IBM implements a rough wall log-law stress model for representing the wall shear due to the solid boundary [21]. As the grid resolution is increased, the near wall effects, namely, the separation point is better captured which ultimately controls the hydrodynamic drag [40, 23]. While the resolved forces are positively correlated with the change in grid resolution, the subgrid scale forces are found to decrease with increasing grid resolution. The combined increase in resolved forces and decrease in subgrid forces nearly balances leading to a nearly consistent total drag force with changes in grid resolution. Changes in the level for RNS modeling, in all cases, behave as expected. In each the resolved force increases as more generations are resolved. The converse is true for the subgrid forces.

63 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

2.7 Applications to canopy consisting of

fractal trees with three non co-planar

branches

In this section, we apply RNS to the simulation of flow over a canopy composed

of fractal trees of the type shown in Figure 1.2.1. As seen in Figure 1.2.1, the trees

contain branches arranged in an (inverse tripod) configuration with alternating direc-

tion at each generation. For shorthand the trees will be denoted as “3D V-tree”. The

branch diameter and height of the generator are equivalent to that of the trees used

in 2.4. Each branch cluster is composed of three branches, giving a fractal similarity § dimension of 1.585 for the entire tree which is consistent with the range of fractal

dimensions observed for natural tree canopies [83].

In Figure 2.15, the canopy of trees with the two resolved generations g0 and g1 is

shown. The simulation domain is the region indicated by the black box and is defined

as (x, y, z) : 0 x 12.715h, 0 y 3.668h, 0 z 10.01h . A uniform spatial { ≤ ≤ ≤ ≤ ≤ ≤ } discretization of N N N = 312 90 248 is used. x × y × z × ×

The tree geometry is generated using an IFS expressed by (2.10) with Ri given as

64 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

Figure 2.15: Fractal tree canopy composed of “3D V-trees”. The solid black box indicates the computational domain used in the simulations.

1 0 0   R = i  0 1 0   −       0 0 1      with s = (0, 1.179h, h), s = (1.021h, 0.589h, h), and s = ( 1.021h, 0.589h, h). 0 − 1 2 − When displaying the “complete” fractal tree as in Figure 1.2.1, it is approximated by using nine generations.

For RNS, model M4 (see Table 2.1) is employed. Initially a temporal averaging time constant of T/τc = 3.5 (the same as used from in the preceding “V-trees” case) was chosen. It was found, however, that the RNS drag coefficients contained large

fluctuations, perhaps due to the stronger interactions among trees in the closer packed

“3D V-tree” canopy. Therefore, the larger value of T/τc = 10 (or T/τ = 1), as tested

65 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

in 2.6.3, is used. § In Figure 2.16 we illustrate the various domains (b and β-regions) that are used in

RNS, around the target tree. There are three b element regions shown. Region b1 is the branch that is lone-standing to a side, while b2 and b3 are one in front of the other, to the other side. Each b element contains three β elements, but in the figure (for clarity) only the three β elements corresponding to the b1 element are shown. The

β elements posses the same relative arrangement as b elements, but they are rotated by 180 degrees about the center of their parent b element so that the β1 element is located to the other side. The sizes of these regions are the same as those used in

2.4 with the addition that here the depth of the regions is taken to be 2h. §

2.7.1 Flow Field and RNS Results

In Figure 2.17 contours of instantaneous velocity magnitude on several planes are shown. As can be seen, the structure of the unresolved branches can be recognized by the darker regions (low velocity) in the upper regions of the canopy, on the y z − (constant x) plane contours. Although, given the 3D nature of the branch arrange- ments, it is more difficult to see them compared to the planar “V-trees” considered in the preceding section. Instantaneous velocity magnitude contours on various hor- izontal (constant z) planes are shown in Figure 2.18 (for the same time step as in

Figure 2.17). Instantaneous distributions of the magnitude of the distributed force are shown in Figure 2.19 on the same horizontal planes as those shown in Figure 2.18.

66 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

Figure 2.16: Reference regions b and β used when applying RNS to the “3D V-tree” canopy simulations.

The fractal structure of the canopy is clearly evident, even though the filtering of the indicator function removes much of the sharpness and smallest scales of the real frac- tal tree. It is also interesting to note the non-uniformity of the force distributions as a result of spatial variations of velocity, as also observed in the results from 2.4. § Time and horizontally averaged profiles of streamwise velocity and turbulent shear

stresses are shown in Figure 2.20. The mean velocity profile possesses the signature

S-shaped profile through out the canopy [2, 107, 98, 5, 110]. In the lower region

of the canopy, the velocity profile tends to remain relatively flat. For canopies of

this type (those with a dense upper canopy and sparser lower canopy) a secondary

maximum [2, 5, 93, 110] or “knee” is typically encountered in the lower canopy due to

horizontal advection [98, 107]. The absence of this secondary maximum in this case

67 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

Figure 2.17: Contours of instantaneous velocity magnitude from RNS of flow over a fractal “3D V-tree”. Shown are contours along three vertical planes, one at constant y and two at constant x.

Figure 2.18: Contours of instantaneous velocity magnitude on the branch mid-planes of generations g2, g3 and g4 in the region of unresolved branches of the fractal “3D V-tree”.

68 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

Figure 2.19: Instantaneous force magnitude contours along the mid-planes of gener- ations g2, g3 and g4 (same as in Figure 2.18). is a direct result of the horizontal averaging technique used. The averaging techique we used treats the entire domain as the working fluid including the volume occupied by the resoved branches which results in slightly lower mean values in the resolved region. This was done in order to apply consistent averaging in both the resolved and subgrid regions. In the upper canopy, the velocity profile transitions from a flat profile to an exponential profile which is consistent with other studies that discuss the

S-shaped profile. The Reynolds stress possess the exponential profile found in canopy

flows with the largest stress in the upper canopy [2, 98, 107, 106, 58, 110]. We also see that the turbulent shear stresses are dominated by the Reynolds stress while the dispersive stresses are found to be small, almost negligible. This is consistent with the space-filling and uniform nature of the cross-sectional footprint of the 3D-V tree

69 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

Figure 2.20: (a) Mean streamwise velocity profile, averaged in time and horizontal directions, for simulation of boundary layer flow over a canopy of “3D V-trees” using RNS. A side view of such a tree is also shown in the profile as a reference. (b) Shear stress profiles, including mean turbulence shear stresses as well asd ispersive and total stress. canopy, especially in the upper canopy, resulting in small deviations from the planar average mean velocity [58]. In addition, in the outer flow the dispersive stresses are negligible, consistent with the idea that this canopy appears as a relatively uniform roughness to the outer flow.

Figure 2.21 shows time series of the drag forces exerted on the trees. In this plot, the forces shown are averaged over the four distinct trees in the simulation domain

– for individual trees and branches the fluctuations are even more pronounced. It is possible that the periodic peaks observed in the time series of forces are caused by sweeping events transporting high-momentum from the outer flow to within the canopy [36, 97, 94]. Further more detailed analysis is required to address these events and is left for future study.

70 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

Figure 2.21: Time series of canopy averaged drag forces. Solid line: force on the resolved branches, dashed line: subgrid-scale force, and small-dashed line: total force.

The results indicate that the subgrid-scale forces provide a larger contribution to the total loading than the resolved forces. Based on the mean values, the subgrid- scale force provides nearly 75% of the total drag, while possessing only roughly 56% of the exposed surface area, giving a higher force density than what is found by the resolved components. We attribute this finding to the fact that the subgrid region is in the upper canopy which mainly interacts with the external, faster flow resulting in large stress [107, 5, 2]. In the lower canopy the resolved branches are exposed mostly to the slower flow resulting in lower drag forces. We explain this result qualitatively by observing the mean velocity, frontal area (per unit height), and mean drag force.

71 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

Figure 2.22: Vertical profiles of (a) the mean velocity, (b) frontal area density (per unit height) and (c) resulting mean drag force.

The frontal area is computed as

g(z)+1 NB ld/h if 0 z/h 2 a(z) =  ≤ ≤ (2.44)   0 otherwise   where here the generation g is expressed as a function of the vertical height. In

Figure 2.22, the profiles of these quantities are presented. It is clear that there is a direct correlation with the higher velocity and frontal area in the upper canopy.

From this figure we observe the highest mean velocity along with the largest exposed surface area density in the top of the canopy. Likewise the resulting drag force profile peaks sharply in the upper part of the canopy near the outer flow interface.

Next, time series of the forces on the target tree predicted during the simulation

72 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

Figure 2.23: Time series of computed forces on the three b elements of the sample tree, during representative time period.

are presented. For each of the three separate b elements of the target tree, the forces consist of the drag force on r elements (resolved), β elements (subgrid), and the total

b elements. Representative time series are shown in Figure 2.23. These results clearly

indicate a highly unsteady and non-uniform loading on the upper parts of the target

tree. This is caused by the particular tree orientation and spacing of trees in the

canopy. The b1 element is less obstructed by neighboring objects than elements b2

and b3, and is thus more exposed to the free-stream velocity; this results in larger

forces on b1 than what for b2 and b3.

Shown in Figure 2.24 are the computed RNS drag coefficients of the target tree.

In the figure are the mean values and standard deviations for each of the b elements.

The averaging was performed over a time period 40 < t/τ < 120 and was started

73 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

Figure 2.24: Mean values (vertical bars) and plus minus one standard deviation (error bars) of the computed RNS drag coefficients for each of the three branches. after a “spinup” period in order to avoid including effects from start-up transients.

From these results we observe that the drag coefficient for element b1 is the largest of the group, with element b2 having the smallest mean value. We attribute the differences among the mean values to the spatially varying flow structure interacting with these elements. As was illustrated in Figure 2.16, each element has unique upstream obstructions that influences the local flow it is exposed to. Aside from the differences in the mean drag coefficient, the variability represented by the error bars is similar among all elements.

74 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

Figure 2.25: Total drag coefficient computed for an entire tree. The solid line (com- puted from RNS) is based on the total force exerted on the trees averaged across all trees (canopy averaged). The horizontal (dashed) line is the mean value obtained in a laboratory experiment [43].

2.7.2 Comparison with Experimentally Determined

Drag Coefficient

Next, we document the drag coefficient for an entire tree, computed according to,

FT Cd = | | 2 (2.45) 0A.5ρ ref V ref

where FT is the total drag force exerted on the entire tree (i.e. now including the forces involving the resolved generations g0, g1 as well as the unresolved ones), Aref is

2 the line of sight frontal area of the (entire) tree (computed to be 2.726h ), and V ref is some reference velocity.

75 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

To enable comparisons with a laboratory experiment (Graham et al. [43]), the ref- erence velocity is taken to be the mean velocity at the location (x, y, z) = (1.984h, 0, 2.992h) relative to the center of the base of the target tree, the same location where the mean velocity was measured in a laboratory experiment (see below). The total drag force used in (2.45) is the value from Figure 2.21. A time history of the total drag coefficient is shown in Figure 2.25. Its time averaged value, evaluated between 40 < t/τ < 120 when the flow has reached statistical stationary behavior, is Cd = 0.32. It is useful to recall that the various drag coefficient values obtained (0.8 for the planar V trees, between 2 and 2.5 for the 3D-V tree branches, and 0.32 for the entire 3D-V tree) differ because they are each defined based on different reference areas and velocities.

Laboratory experiments of boundary layer flow over a canopy of fractal trees have been performed in a water tunnel [43] at a Reynolds number of Re = V h/ν h ref ≈ 5.5 104. The height of the tree generator is h = 50.4mm. The reference mean × velocity measurement was performed using a Pitot probe placed at the same location as that mentioned for the simulation described before. The reference mean velocity was found to be V ref = 1.1m/s. Drag force measurements were performed using a load cell at the base of a tree (target tree) within the model canopy. Due to constraints in the tree model construction which required branches of size of more than

1 mm, the fractal tree consists of five generations. The force measurement was scaled with the tree frontal area and reference velocity as in (2.45), and the experimentally determined drag coefficient (time averaged) was found to be Cd = 0.35, i.e. the

76 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION computationally determined value is approximately 8% less than the measured value, showing reasonably good agreement between the two.

2.8 Conclusions

In this study, renormalized numerical simulation (RNS) is further developed and tested for flow over a regular canopy of fractal trees. The method is based on the fundamental requirement (a generalization of the Germano identity) that the modeled drag force for a given branch must be equal to the sum of (1) the resolved force on the resolved sub-branches and (2) all of the unresolved subgrid-scale branches.

The original formulation of Chester et al. [21] is cast in more general terms and several variants are introduced. These include temporal averaging, spatial localiza- tion, and explicit versus implicit formulations. The temporal averaging provides a more consistent framework since the base drag model is applicable for the mean drag force, rather than instantaneous values. To allow applications of RNS to simulating

flow over heterogeneous fractal structures, a spatially localized version of RNS is in- troduced. Even within homogeneous structures, different branches may have different drag coefficients due to different orientations with the flow. This issue is addressed withing the local RNS version. Finally, implicit and explicit formulations of RNS are presented, depending on whether the drag force of the unresolved scales is represented by the most recent drag coefficient (explicit), or whether the forces are subtracted

77 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION and an independent drag coefficient is obtained at every time-step of the simulation

(implicit).

A case study comparing the four variants is then performed using a simplified fractal tree canopy that is composed of planar “V-trees”. The results from the various applications show only small differences in the measured drag forces among the four variants. However, the impact of temporal averaging on the computation of the drag coefficient is clearly evident. It is found that the explicit formulations demonstrate good stability characteristics. Conversely, additional tests show that the implicit formulation can suffer from instability when applied to more complex, 3-dimensional geometry structures. Due to the flexibility afforded by the local formulation, along with the stability found in the explicit description, it is therefore concluded that the time-averaged, local, and explicit RNS formulation – model M4 – is the best for practical applications.

The second application described in this chapter is the modeling of the turbu- lent flow over a canopy consisting of more complex fractal trees, with branches not restricted to a plane. Based on the earlier findings, the time averaged, local, and explicit RNS model (M4) was implemented. Results from this simulation indicate that the flow-canopy interaction is well represented by the modeled subgrid-scale forces. Also the total force drag coefficient computed from RNS compares well with an experimentally determined value.

It bears remembering that the approach as presented here is limited to applications

78 CHAPTER 2. RENORMALIZED NUMERICAL SIMULATION

in high Reynolds number flows, since it is assumed that the drag coefficient, cd, for a

scale-invariant branch of bluff-body cross-section, is independent of Reynolds number.

As pointed out in Chester et al. [21], in applications involving finite Reynolds numbers,

this assumption can be relaxed by allowing for further determination of parameters

associated with possible Re dependencies. Similar generalizations may be developed

to describe possible dependencies upon the parameter hg/z, describing the length- scale to height ratio which was assumed to be asymptotically small in the present approach. We also remark that the dynamic roughness length approach described in

Anderson and Meneveau [3] to model turbulent flows over rough surfaces with power- law height spectra is similar to RNS, since it too is based on the condition that the force (in that case total surface drag) should be invariant of the scale used to resolve surface height fluctuations.

79 Chapter 3

Concurrent Precursor Simulation

3.1 Introduction

A concurrent precursor simulation methodology has been developed for producing

accurate inflow conditions for multiscale turbulent simulations. The discussion on the

CPS technique is continued in this chapter from its introduction in Chapter 1. In

3.2, the implementation of this techique is discussed. Also in 3.2, the fringe method § § used for enforcing inflow boundary conditions in periodic domains is presented. After

the CPS methodology is presented in further detail, it is applied to two cases in 3.3. § The first application case is for turbulent duct flow obstructed by a single fractal tree.

The fractal tree used in this simulation is identical to the “3D V-tree” presented in

Chapter 2. Furthermore, the modeling of the upper portion (unresolved branches) is conducted using RNS. For the second application, a CPS is performed for a finite

80 CHAPTER 3. CONCURRENT PRECURSOR SIMULATION

length wind farm. Conclusions and summary are then given in 3.4. §

3.2 Implementation

Shown in Figure 3.2 is an example case of turbulent boundary layer flow over a single wall mounted cube. In the figure, the velocity magnitude in viscous units is shown. Moreover, the figure demonstrates how the CPS is conducted using two domains: the upstream precursor simulation and the other downstream target simu- lation. At the aft end of the downstream domain is the fringe region which is used to slowly transform the wake region flow structure into the sample region marked by the aft end of the upstream domain. At each time step during the simulation, data from the sample region are transfered to the target simulation and applied in the fringe region. The figure also notes how the global MPI rank is distributed between the two domains. When the MPI job is launched, the total number of processes are divided by 2 and redistributed between the two domains.

The two main features of CPS, synchronization and direct memory transfers for the sampling operation, are achieved using the message passing interface (MPI). The precursor and target simulations are lauched in the multiple program-multiple data

(MPMD) paradigm where a program for the precursor simulation and one for the target simulation are launched together as a single MPI job. The two simulations execute independently from one another save for the synchronous sampling operation

81 CHAPTER 3. CONCURRENT PRECURSOR SIMULATION

Figure 3.1: An example CPS of turbulent boundary layer flow over a wall mounted cube. and dynamic time stepping evaluation (if used). Shown in Figure 3.2 is a schematic of the two domain setup as implemented in LESGO. The upstream, precursor domain is denoted as the “red” domain while the domain of the target simulation is called the

“blue” domain. In the schematic, the slab domain decomposition is shown with the corresponding local processor rank listed in the respective colors next to the domains.

The global processor rank is also listed next to the processor slabs in parenthesis.

For the slab domain decomposition there is a one-to-one mapping between the local processor ranks for the data transfers during the sampling operation.

The CPS methodology is implemented in LESGO by first splitting the default

MPI COMM WORLD communicator into two communicators associated with the red and

82 CHAPTER 3. CONCURRENT PRECURSOR SIMULATION

Figure 3.2: Schematic showing the domains containing precursor simulation (“red”) and the target simulations (“blue”). The intra-domain decomposition and respective MPI ranks are shown next to the domains. blue domains. In Figure 3.2 this communicator splitting is show for a case with 8 processes total with 4 processes used for the red domain and 4 for the blue. Each respective local communicator is created from MPI COMM WORLD with MPI Comm split and is then used for all intra-domain communication during the LES. To establish inter-domain communication patterns for the sampling operation, a bridge commu- nicator is created with the MPI function MPI Intercomm create which effectively creates the one-to-one mapping based on the local processor rank. This inter-domain communicator is then used when sampling data from the red domain and inserting it into the blue domain.

A convenient feature of this implementation is that only two primary additions

83 CHAPTER 3. CONCURRENT PRECURSOR SIMULATION

Figure 3.3: Schematic showing the splitting of the default MPI COMM WORLD commu- nicator into two communicators associated with the red and blue domains for the CPS implementation. The bridge communicator used during the sampling operation is also shown. to an existing MPI based code are required to implent the CPS methodology. The

first is the intra-domain communicator assignment. Once defined, the intra-domain communicator may be used in place of the origin MPI communicator to maintain the original communication patterns within the respective domains (i.e., red or blue). The second is the addition of the sampling operation. This addition just requires copying data from the red domain to a contiguous buffer and transfering to the corresponding process in the blue domain using MPI send and receive calls. These transfers are per- formed using the established inter-domain communicator as illustrated in Figure 3.2.

For Fourier spectral collocation methods, imposing inflow conditions requires addi- tional consideration as the periodicity must be maintained while the inflow boundary condition is imposed. Various methods have been implemented which modulate the

84 CHAPTER 3. CONCURRENT PRECURSOR SIMULATION

simulation procedure to enforce these conditions. The method used in this work is

the fringe technique [9, 96]. The implemenation of the fringe technique used here is

expressed as

u(x, y, z) = w(x)u (x, y, z) + (1 w(x))u(L , y, z) (3.1) p − s

where x is the streamwise position, up the velocity field sampled from the precursor

domain, u the LES velocity field, Ls the x location of the start of the fringe region,

and w the weighting function. The weighting function is used to slowly transition

the velocity field in the fringe region from the velocity at begining of the sampling

region to the field obtained from the sampling region of the upstream domain. The

definition of w is expressed as

1 x Ls 1 cos π − , if Ls x < Lpl 2 Lpl Ls w(x) =  − − ≤ (3.2)  h  i  1, if L x < L pl ≤ e   where x indicates the streamwise position and L = L 1 L with L the end of the pl e − 4 f e fringe region (typically the end of the domain).

3.3 Application Cases

The CPS methodology is applied to two test cases. The first test case discussed is a single fractal tree within turbulent duct flow. The fractal tree is the same tree from

85 CHAPTER 3. CONCURRENT PRECURSOR SIMULATION

2.7 and is represented using RNS. The second test case is a LES of a finite length § wind farm. The wind farm is composed of a regularly spaced array of wind turbines.

The turbines are represented using an actuator disk model. Both of these cases are presented and results from the simulations are discussed in the following two sections.

3.3.1 Single fractal tree in turbulent duct flow

The single fractal tree is placed in a duct containing turbulent flow. The upstream domain of the CPS contains a duct which generates fully developed wall bounded turbulence. The duct walls are represented using the IBM. The fractal tree is identical to one of those used in 2.7 which is represented using RNS. The dimensions of the § duct and tree orientation are identical to that used in the experimental work of Bai et al. [4]. Furthermore, the setup of this case is designed to match the setup of the experimental setup. However, the major difference between the setup of the current analysis and the experimental setup is the structure of the inflow. The experimental work applied flow conditioners to generate an augmented boundary layer flow which results in a mean velocity profile that differs from that found in pressure driven duct

flow (see [4] for further details). Though, care must be taken with direct comparison of the velocity and other turbulent quantities between the experimental results and the numerical results of the current analysis, a qualitative comparison may be safely conducted. Detailed comparison of these data are planned for future studies and are not part of the current work. However, results from this CPS are used to verify the

86 CHAPTER 3. CONCURRENT PRECURSOR SIMULATION current implementation.

The configuration of the two domains for the CPS is shown in Figure 3.3.1. The duct and the fractal tree are shown along with the boundaries of the computational domain. The computational domain along the y and z directions is set slightly larger than the duct perimeter for the IBM implementation. The size of the duct for both domains defined as (x, y, z) : 0 x 30.0h, 0 y 11.111h, 0 z 4.5635h { ≤ ≤ ≤ ≤ ≤ ≤ } where h is the height of the base generation. The origin of the base of the fractal tree is located at (x, y, z) = (6.5h, 5.5556h, 0.0h) relative to the origin of the duct in the downstream domain. For the RNS in the downstream domain, model M4 is used with a temporal averaing time-scale of T/τ = 1.17 where τ is given by (2.11). The

RNS elements for the subgrid scale branches are defined the same as in 2.7. § In Figure 3.3.1, the instantaneous streamwise velocity along a y-plane in the pro- ducer (red) domain is shown. From the figure, the structure of the wall bounded turbulence is clearly evident. The sample region in this domain begins at x/h = 26 and extends to the end of the domain. Shown in Figure 3.3.1 are the x component of velocity at several z-planes. The z-planes are located at the mid-plane of generations g1, g2, and g4. The g1 plane intersects resolved branches which are imposed with the IBM, whereas the g2 and g4 planes intersect the subgrid scale branches modeled during the RNS. Wall generated turbulence is also evident near the top and bottom part of each plane. The complexity of the multiscale wake structure is also obvious especially in the higher generations. For the generations with subgrid scale model-

87 CHAPTER 3. CONCURRENT PRECURSOR SIMULATION

Figure 3.4: Domain setup for the CPS of single fractal tree in turbulent duct flow. Note that the third branch for each branch cluster in the second generation are aligned along the line-of-sight direction of the viewing angle. ing, individual branches become less appearant and the wake structure is generated primarily from aggregate branch clusters. With regard to the viability of the CPS implementation, the inflow turbulence is clearly imposed. Though the upstream do- main is computed independently from the downstream domain, information from the sampling operation is transferred and imposed correctly with the fringe treatment in the downstream domain. Moreover, the ability to generate “realistic” inflow turbu- lence is also achieved which is important for the study of flow structure interaction with the multiscale fractal tree.

88 CHAPTER 3. CONCURRENT PRECURSOR SIMULATION

Figure 3.5: Instantenous streamwise velocity along a y-plane for the precursor (red) domain in the CPS of a single fractal tree in turbulent duct flow.

3.3.2 Finite length wind farm1

In this application, a developing, finite length wind farm is simulated using a CPS

(performed by Dr. Richard Stevens). In this simulation, atmospheric boundary layer

flow is simulated in the upstream domain where a wind farm composed of an array of wind turbines are located in the downstream domain. The wind turbines are modeled using an actuator disk approach (see Ref. [100]).

The wind farm consists of 13 rows in the streamwise direction and 6 turbines in the periodic spanwise direction. The spacing between the turbines is 7.85D in the streamwise direction and 5.24D in the spanwise direction where D is the turbine rotor

1Portions have been reprinted with author retained re-usage rights from R. J. Stevens, J. Graham, and C. Meneveau, Renew. Energy, accepted (2014). Copyright 2014, Elsevier.

89 CHAPTER 3. CONCURRENT PRECURSOR SIMULATION

Figure 3.6: Instantenous streamwise velocity along three z-planes for the target (blue) domain in the CPS of a single fractal tree in turbulent duct flow.

90 CHAPTER 3. CONCURRENT PRECURSOR SIMULATION

diameter. Both domains are 3.14 12.57 2 km, in the spanwise, streamwise, and × × vertical direction, respectively and different resolutions are considered. The length of the fringe region is 1.13 km in the streamwise direction and no turbines are placed in the region close to the start of the fringe region.

In Figure 3.7, the instantenous streamwise velocity is shown on a plane located at the rotor hub height. Also, the domain dimensions and fringe region are denoted in the figure. The elongate wall streaks are evident in both domains. In the downstream domain, the turbines are demarcated by black lines. The turbine wake structure are also evident within the wind farm. Shown in Figure 3.8 are the normalized power output for each row of the wind farm. The plot contains data collected from the

Horns Rev wind farm and data from two CPS a differing grid resolutions. The results indicate good agreement between the numerical and field data. 3.14 km

Ls Le 12.57 km 1.13 km

Figure 3.7: Instantaneous streamwise velocity in the upstream and downstream do- mains during the CPS of the developing, finite length wind farm. Figure prepared by Dr. Richard Stevens and reused from Stevens et al. [100]

91 CHAPTER 3. CONCURRENT PRECURSOR SIMULATION

1 Horns Rev 1024 × 128 × 256 1536 × 192 × 384 0.9

0.8

0.7

0.6 P(row) / P(row 1) 0.5

0.4

2 4 6 8 10 12 Row

Figure 3.8: Power output comparsion of field data from Horns Rev wind farm and CPS of two grid resolutions. Figure prepared by Dr. Richard Stevens and reused from Stevens et al. [100] 3.4 Conclusion

A concurrent precursor simulation methodology has been developed for generating inflow data for turbulence simulations containing periodic boundary conditions. A

CPS is similar to a SPS, in that an independent turbulent simulation is performed to generate accurate inflow data. However, there are several advantages to the CPS when compared to the SPS. First, the CPS generates the inflow data from the up- stream domain synchronously with the target simulations where the SPS must be performed before the target simulation is conducted. Second, the CPS transfers the sampled inflow data by direct memory copies using MPI thus bypassing the slower disk I/O operations required in the SPS for reading and writing to the inflow library.

92 CHAPTER 3. CONCURRENT PRECURSOR SIMULATION

As discussed in 3.3, two applications using CPS have been performed. The first ap- § plication is turbulent duct flow containing a fractal tree. The results of the analysis qualitatively indicated correct enforcement of inflow condition with the fringe treat- ment and sampling operation. Furthermore, the capability to generate wall bounded turbulent inflow conditions is demonstrated. In the second application, a finite length wind farm simulation has been performed in CPS. The results demonstrate the ability to generate proper atmospheric boundary layer inflow data containing expected near wall streaks important for accurate calculations of the momentum transport within the wind farm. A comparison of numerical data at two grid resolutions and field data from the Horns Rev wind farm show good agreement.

93 Part II

Data-Intensive Computing

94 Chapter 4

Johns Hopkins Turbulence

Databases

4.1 Introduction

In this chapter, the JHTDB, first presented in Chapter 1 is dicussed. We begin

the discussion in 4.2 with an overview of the JHTDB including the DNS databases, § Web services, and client interfaces. For the DNS databases, there are currently three

available: forced isotropic turbulence, isotropic magnetohydrodynamic turbulence,

and the newly added channel flow turbulence database. The channel flow database is

generated from a large-scale DNS of turbulent channel flow at Reτ = 1000 (presented in detail in Chapter 5). For the DNS, we use the PoongBack code [66] to simulate

1/2 of a flow-through time. During the DNS, the data (velocity and pressure fields)

95 CHAPTER 4. JOHNS HOPKINS TURBULENCE DATABASES are stored to disk every 5 time steps. These data are then transferred to and ingested into the JHTDB. An overview of the Web services which expose these data to the research community world wide are presented in 4.2.2. We then present in 4.3 the § § implementation of the spatial interpolation and differentiation schemes used for the

Web service functions for the channel flow database. These schemes are based on the generalized barycentric Lagrange interpolation method of Berrut and Trefethen [8].

The design and implementation of the Matlab client interface for the JHTDB is given in 4.4. The Matlab interface is based on the Matlab intrinsice SOAP Web service § functions and is designed to simplify the sending and receiving of SOAP messages from the JHTDB. Futhermore, the Matlab interface allows direct interaction with the

JHTDB from a standard Matlab session making it not only easy to use, but cross platform compliant. In addition to the implentation, several examples are presented.

Finally in 4.5 a summary of this chapter is given. §

4.2 Design and Construction

A primary goal of the JHTDB is to expose large-scale turbulent data to the research community while at the same time providing easy to use client interfaces for retreiving and interacting with the data [89, 68]. The client interfaces remove the need for the end user to know details of the data storage such as the SQL query language.

To accomplish this, the JHTDB is composed of two primary components. The first

96 CHAPTER 4. JOHNS HOPKINS TURBULENCE DATABASES component is the Web server (front-end) which provides the layer which remote clients interact with. The second component is the database cluster (backend) that contains the turbulence datasets and provides a scalable infrastructure that supports data intensive analyses. This Web service model using a database cluster backend has also been deployed successfully by the Sloan Digital Sky Survey in which multi-terabyte astronomy data archives are available to researchers [101].

In Figure 4.2, a schematic of the JHTDB is shown. Within the figure, the Web server and the database cluster are illustrated with respect to their relative, logical location. At the top of the figure the remote clients are indicated. It can be readily seen that the remote client are separated from the databases by the Web server which serves as a “middleman” for processing incomming requests and returning the requested data.

Data for a JHTDB database is generated by a large-scale DNS performed high- performance computing facility (i.e. supercomputer or cluster). For the channel

flow DNS discussed in Chapter 5, the simulation is performed on the supercomputer

Stampede at the Texas Advanced Computing Center [102]. Turbulent data from a

DNS is periodically written to disk (e.g. every 5 or 10 time steps) as standard C or Fortran arrays. These data files are then transferred to an intermediate storage server local to the JHTDB. The data are then manually ingested into the database cluster (data ingestion for the channel flow database is performed by Kalin Kanov; see 4.2.1 for further details). §

97 CHAPTER 4. JOHNS HOPKINS TURBULENCE DATABASES

Figure 4.1: Schematic of the JHTDB indicating the logical layout of the remote clients, Web server, and the database cluster. Source JHTDB [55].

98 CHAPTER 4. JOHNS HOPKINS TURBULENCE DATABASES

As mentioned in the previous paragraph, the current method for building one of the JHTDB databases requires multiple “hands-on” steps. After the data are gener- ated the turbulent data must be transferred to an intermediate storage location where it is then manually ingested into the database. At JHU a data-intensive computing framework called MPI-DB [41, 44] is currently under development which removes these labor intensive steps. MPI-DB effectively inserts the database storage capabil- ity in the memory hierarchy of the HPC system and allows direct interaction with a database. Therefore, creation of a database can be created during simulation runtime.

In Appendix B, MPI-DB is presented along with a application to the generation of a channel flow DNS during simulation runtime. Note the channel flow DNS discussed in Appendix B is separate from that in Chapter 5.

4.2.1 Database Cluster

The database cluster is composed of a networked database system running Win- dows 2008 Server and SQL Server 2008 (64-bit). Within the database cluster, data is partitioned across 4-8 nodes. Each node runs 4 virtual database servers in order to leverage the multithreaded capability of the database nodes. As discussed previously, high I/O throughput is important for data processing. To support this requirement the database cluster is composed of 30 750GB disks per node providing an aggregate throughput of 1.0 GB/s for 8 nodes. Data from a turbulence simulation are stored in the database in 83 binary large objects (BLOBs). Each of these blobs are ordered

99 CHAPTER 4. JOHNS HOPKINS TURBULENCE DATABASES according to a (Morton) Z-curve [89]. The Z-curve provides a single index for the spatial location of the data and ensures spatial locality of the storage layout while providing that no location is omitted (i.e. space filling). Spatial locality is important in order to support contiguous data access for typical usage patterns of the databases.

4.2.2 Web Services

The Web services provide a convenient mechanism for remote clients to access and retrieve spatiotemporal data from the turbulent databases. The Web services use

SOAP which provide a standardized protocol for sending and receiving messages over the internet. Client functions are provided by the Web services which allow for spatial and temporal interpolation, differentiation, particle tracking and other secondary calculations to be performed within the database. These secondary calculations are in addition to client functions to request primary fields. Packaged libraries which allow easy use of the Web service functions are provided for C, Fortran, and Matlab.

Shown in Table 4.1 are a list of functions available for each database.

For spatial interpolation, Lagrange polynomial interpolation methods are used.

The client has the option between no interpolation, 4th order, 6th order, or 8th or- der interpolation. In the case of the isotropic and MHD databases, interpolation with uniform grid spacing is employed. For the channel flow database a general- ized barycentric Lagrange interpolation method [8] is used which accounts for the non-uniform grid spacing in the wall normal direction of the channel. In the case of

100 CHAPTER 4. JOHNS HOPKINS TURBULENCE DATABASES

Table 4.1: Table of Web service functions for each JHTDB database. Function Isotropic MHD Channel GetVelocity X XX GetMagneticField X GetVectorPotential X GetPressure XXX GetVelocityAndPressure XXX GetForce XX GetVelocityGradient XXX GetMagneticFieldGradient X GetVectorPotentialGradient X GetPressureGradient XXX GetPressureHessian XXX GetVelocityLaplacian XXX GetMagneticFieldLaplacian X GetVectorPotentialLaplacian X GetVelocityHessian XXX GetMagneticFieldHessian X GetVectorPotentialHessian X GetPosition XX GetBoxFilter XX GetBoxFilterSGS XX GetBoxFilterGradient XX

101 CHAPTER 4. JOHNS HOPKINS TURBULENCE DATABASES spatial differentiation, finite difference methods are used. As with spatial interpo- lation, standard finite differencing weights for constant grid spacing are applied to the isotropic and MHD databases. The channel flow database, use finite differencing weights determined from the barycentric Lagrange interpolation method. For both spatial interpolation and differentiation of the channel flow database, the weights are precomputed and called from a database table when needed. Temporal interpolation is performed using either piecewise cubic Hermite interpolating polynomial (PCHIP) interpolation or no interpolation depending on the client program. The spatial inter- polation and differention formulations for the channel flow database are discussed in

4.3. For a complete description of the interpolation methods used see JHTDB. §

4.3 Channel Flow Database Interpolation

and Differentiation Methods

The JHTDB Web services provide methods for performing spatial interpolation and differentiation within the database. Discussed in the following sections are the formulation of these methods used for the channel flow database.

102 CHAPTER 4. JOHNS HOPKINS TURBULENCE DATABASES

4.3.1 Spatial Interpolation

Spatial interpolation for domains with non-uninform grid spacing (e.g. the channel

flow domain) is applied using multivariate polynomial interpolation of the barycentric

Lagrange form from Berrut and Trefethen [8]. Using this approach, we are interested in interpolating the field f at point x′. The point x′ is known to exist within the grid cell at location (xm, yn, zp) where (m, n, p) are the cell indices. The cell indices are obtained for the x and z directions, which are uniformly distributed, according to

m = floor(x′/dx) (4.1)

p = floor(z′/dz).

In the y direction the grid is formed by Marsden-Schoenberg collocation points which are not uniformly distributed. Along this direction we perform a search to obtain n such that

y y′ < y if y′ 0 (4.2) n ≤ n+1 ≤

yn 1 < y′ yn if y′ > 0 − ≤

103 CHAPTER 4. JOHNS HOPKINS TURBULENCE DATABASES

The cell indices are also assured to obey the following:

0 m N 2 ≤ ≤ x −

0 n N /2 1 if y′ 0 (4.3) ≤ ≤ y − ≤

N /2 n N 1 if y′ > 0 y ≤ ≤ y − 0 p N 2 ≤ ≤ z −

where Nx, Ny, and Nz are the number of grid points along the x, y, and z directions,

respectively. In the case that x′ = xNx 1 the cell index set to be m = Nx 2; likewise − − for the z direction.

The interpolation stencil also contains q points in each direction for an order q interpolant (with degree q 1). The resulting interpolated value is expressed as: −

ie je ke i j k f(x′) = lx(x′)ly(y′)lz (z′)f(xi, yj, zk) (4.4) i=i j=j Xs Xs kX=ks

104 CHAPTER 4. JOHNS HOPKINS TURBULENCE DATABASES where the starting and ending indices are given as

i = m ceil(q/2) + 1 s − i = i + q 1 e s −

n ceil(q/2) + 1 + jo if n Ny/2 1 j =  − ≤ − (4.5) s   n floor(q/2) + j otherwise − o   j = j + q 1 e s − k = p ceil(q/2) + 1 s − k = k + q 1 e s −

and jo is the index offset for the y direction depending on the distance from the top and bottom walls. The ceil() function ensures that stencil remains symmetric about the interpolation point when q is odd. In the case for js, the separate treatments for the top and bottom halves of the channel is done to ensure that the one-sided stencils remain symmetric with respect to the channel center. The value for jo may be evaluated based upon the y cell index and the interpolation order as

max(ceil(q/2) n 1, 0) if n Ny/2 1 j =  − − ≤ − . (4.6) o   min(N n ceil(q/2), 0) otherwise y − −  

105 CHAPTER 4. JOHNS HOPKINS TURBULENCE DATABASES

The interpolation weights, lx, ly, and lz, are given as

wξ ′ ξ θ θξ l (θ′) = − (4.7) θ ξe wη ′ η=ξs θ θη − P where θ may either be x, y, or z. The barycentric weights, wξ, in (4.7) are given as

1 w = (4.8) ξ ξe η=ξs,η=ξ θξ θη 6 − Q The weights are computed by applying a recursive update procedure as in Berrut and

Trefethen [8]. A slightly modified version of the algorithm in Berrut and Trefethen [8]

is given below (Greg Eyink, personal communication, 2013):

for ξ = ξs to ξe do

wξ = 1

end for

for ξ = ξs + 1 to ξe do

for η = ξ to ξ 1 do s − w = (θ θ )w η η − ξ η w = (θ θ )w ξ ξ − η ξ end for

end for

for ξ = ξs to ξe do

wξ = 1/wξ

106 CHAPTER 4. JOHNS HOPKINS TURBULENCE DATABASES

end for

To account for the periodic domain along the x and z directions we adjust the i and k indices when referencing f in (4.4) such that

ie je ke i j k f(x′) = lx(x′)ly(y′)lz (z′)f(xi%Nx, yj, zk%Nz) (4.9) i=i j=j Xs Xs kX=ks and % is the modulus operator. The indices for the interpolation coefficients remain the same, however, we use the fact that the grid points are uniformly spaced such that (4.7) becomes

wξ ξ θ ′ ξ∆θ l (θ′) = − (4.10) θ ξe wη ′ η=ξs θ η∆θ − P and similarly for the barycentric weights, (4.8) becomes

1 w = (4.11) ξ ξe η=ξs,η=ξ(ξ η)∆θ 6 − Q for the x and z directions. The computation of the barycentric weights for the x and z directions are carried out once (for a given interpolation order) for all grid points using (4.11); for the y direction the barycentric weights are computed for each point

using (4.8).

107 CHAPTER 4. JOHNS HOPKINS TURBULENCE DATABASES

4.3.2 Spatial Differentiation

Spatial differentiation for grids with non-uniform spacing is performed using the

barycentric method of the interpolating polynomial. In one dimension (assuming the

x direction; the same applies for the y and z directions), the interpolant for the field f is given as

ie j f(x) = lx(x)f(xj). (4.12) j=i Xs It follows that the rth derivative may be computed as

drf ie drlj (x) = x (x)f(x ) . (4.13) dxr dxr j j=i Xs

Within the database we compute the derivatives at the grid sites for the FD4NoInt,

FD6NoInt, and FD8NoInt differencing methods where no interpolation is per- formed. If a sample point is given that does not coincide with a grid point, the derivative at the nearest grid point is computed and returned. For the FD4Lag4 method we compute the derivatives with the FD4NoInt method (at the grid sites) and then these data are interpolated to the interpolation point using the Lag4 inter- polation method presented in 4.3.1. § For evaluating derivatives at the grid sites we follow the method presented in

108 CHAPTER 4. JOHNS HOPKINS TURBULENCE DATABASES

Berrut and Trefethen [8] such that

drf ie (x ) = D(r) f(x ). (4.14) dxr i x,ij j j=i Xs

r j (r) d lx where Dx,ij = dxr (xi) is the differentiation matrix [8]. The differentiation matrices

for r = 1 and r = 2 are given, respectively, as

w /w D(1) = j i (4.15) x,ij x x i − j (2) wj/wi wk/wi 1 Dx,ij = 2 + (4.16) − xi xj xi xk xi xj " k=i # − X6 − − for i = j and 6 D(r) = D(r) (4.17) x,jj − x,ji i=j X6 when i = j for all r > 0. We note that in (4.16) and (4.17) fixes have been applied to the respective equations presented in Berrut and Trefethen [8], i.e., (9.4) and

(9.5). As with the interpolation schemes, the grid point locations for the uniformly distributed directions are expressed as θξ = ξ∆θ, where θ may either be x or z.

For second order mixed derivatives (such as for the pressure Hessian) we compute

the derivatives at the grid sites within the respective plane. When computing the

mixed partials along x and y we have

j d2f ie e (x , y ) = D(1) D(1) f(x , y ) . (4.18) dxy m n x,mi y,nj i j i=i j=j Xs Xs

109 CHAPTER 4. JOHNS HOPKINS TURBULENCE DATABASES

Similar formulae exist for mixed partials along x and z, and y and z.

The differencing stencil size depends on the required order of the differencing method and the derivative order, r. In general, the resulting stencil size is determined as

q′ + r for non–symmetric grid distribution about evaluation point q =    q′ + r (r + 1)%2 for symmetric grid distribution about evalutation point −  (4.19)  th where q′ is the order of the differencing method. For example, to obtain a 6 order differencing method for the first derivative of f along the x, y, and z directions, a value of q = 7 is required. For the second derivative, the x or z directions require a value of q = 7 where the y direction requires q = 8 to acheive a 6th order differencing method.

4.4 Matlab Client Interface1

Client interfaces for the JHTDB Web services are provided to allow easy use of the available Web service functions (see Table 4.1). The currently provided client inter- faces are for C, Fortran, and Matlab. In addition to these interfaces the Web service functions are callable by any programming language that supports SOAP. Matlab has intrinsic SOAP functions that allow direct interaction with the JHTDB. How- 1Portions reprinted with author retained re-usage rights from Yu et al., Journal of Turbulence, 13, 12 (2012). Copyright 2012, Taylor & Francis.

110 CHAPTER 4. JOHNS HOPKINS TURBULENCE DATABASES ever, they require advanced knowledge by the programmer to utilize them, especially with regard to the data layout of the sent and received SOAP messages. To alleviate this issue, Matlab functions are provided by the client interface that store data us- ing basic Matlab datatypes. The details of this interface design and implementation along with a couple of examples are discussed in the following sections.

4.4.1 Design and Implementation

The Matlab interface allows clients to interact with the turbulence database di- rectly from a Matlab session. This interface is based on Matlab web service functions which communicate with the database directly using SOAP. All communication with the JHU Turbulence Database Cluster is controlled through the TurbulenceService

Matlab class. This class creates SOAP messages, queries the database, and parses the database response. For each database function a wrapper function has been created to perform the data translation and retrieval. One major advantage of the Matlab interface to that of its C and Fortran counterparts is the readily available functions and toolboxes that Matlab provides. With the Matlab interface, clients can retrieve sections of spatiotemporal data from the database, view the data with Matlab’s plot- ting tools or perform secondary calculations on the data, all from the same Matlab session.

A standard distribution of Matlab contains a set of functions for creating

(createSoapMessage), sending (callSoapService), and parsing (parseSoapResponse)

111 CHAPTER 4. JOHNS HOPKINS TURBULENCE DATABASES

SOAP messages. These routines use a W3C compliant Document Object Model

(DOM) approach for constructing and parsing the Extensible Markup Language

(XML) formatted SOAP message. The DOM provides a generic mechanism to create

XML documents. However, while being robust and dynamic, the DOM approach

holds the disadvantage of being computationally inefficient for large XML documents

– this inefficiency becomes a limiting factor for large database queries. To avoid this

critical problem, we have developed faster replacement functions to create and send

the SOAP message, and to parse the SOAP response2. Therefore, by performing low- level string operations instead of the employing the DOM, we can rapidly build and parse extensive XML documents leading to a 100 speedup over the original DOM × approach. Due to this increase in efficiency, the Matlab interface possess similar performance characteristics as those of the C and Fortran database interfaces.

The basis for the Matlab database functions are created by using the createClassFromWsdl utility. This utility generates the TurbulenceService Matlab class from the Web Service Definition Language (WSDL) functions of the database web service. These generated files are modified to incorporate the newly developed faster Matlab SOAP routines. The purpose of the TurbulenceService class is to ac- commodate a request to the database by taking data from Matlab, generating an appropriate SOAP message, sending the message to the database, and finally re- trieving and parsing the database response. While providing a direct mechanism for

2Custom SOAP parser developed by visiting scholar Edo Frederix

112 CHAPTER 4. JOHNS HOPKINS TURBULENCE DATABASES

interacting with the database, the TurbulenceService class returns data packaged in a

Matlab structure array which may not be necessarily intuitive to most Matlab users.

We have, therefore, created wrapper functions which translate the response structure

into directly accessible Matlab vectors.

4.4.2 Code Examples

The following code snippets illustrate the complete mechanism, starting from

user-generated request data and ending with a parsed database response, stored in

response. From a Matlab script, request data will be provided to the getVelocity

TurbulenceService wrapper function as demonstrated in Listing 4.1. This wrapper function calls the TurbulenceService TS getVelocity function (see Listing 4.2), and translates its structure into a vector of velocity components. The TS getVelocity

function illustrated in Listing 4.3 assembles the data in a Matlab structure, creates the SOAP message, sends the SOAP message and parses the SOAP response.

For illustration, in Figure 4.2(a) is a velocity contour plot of sample data from the turbulence database. The data was retrieved using the getVelocity function

from the Matlab interface and the contour plot was generated using Matlab standard

countour plotting tools. In Figure 4.2(b) a visualization of ‘worms’ are shown in a

small subcube of the data at t = 0, using iso-Q surfaces generated using the Matlab

implementation of getVelocityGradients to evaluate Aij, computing the invariant

Q = 1 A A at every point in Matlab, and using Matlab 3D plotting tools. − 2 ij ji

113 CHAPTER 4. JOHNS HOPKINS TURBULENCE DATABASES

(a) Velocity contour plot generated using the Matlab interface as available for download

(b) Visualization of ‘worms’ using iso-Q surfaces generated using the Matlab implementation of getVelocityGradients 1 to evaluate Aij , computing the invariant Q = 2 Aij Aji at every point and using Matlab 3D plotting tools.−

Figure 4.2: Visualizations of the forced isotropic turbulence database using the Matlab client interface.

114 CHAPTER 4. JOHNS HOPKINS TURBULENCE DATABASES

✞ Listing 4.1: Example call to getVelocity from Matlab interface ☎ % Set client authentication key authkey = ’...’ ; % Set target database d a t a s e t = ’ isotropic1024coarse’ ; % Set temporal interpolation scheme temporal = ’PCHIP’ ; % Set spatial interpolation scheme s p a t i a l = ’Lag6 ’ ;

% Create a set of (x,y,z) c oordinates to query at a randomly % chosen time step − points(1:3,:) = ...; time = 0.002 randi(1024, 1); ∗ % Call TurbulenceService wrapper to perform getVelocity req u e s t at % specified points response = getVelocity(authkey, dataset , time, ... , points ); ✝ ✆✌

✞ Listing 4.2: Sample getVelocity wrapper function ☎ function r esponse = getVelocity(authkey, dataset , time, ... , points)

% Create the TurbulenceService object and call TS getVelocity obj = TurbulenceService; responseStruct = TS getVelocity(obj, authkey, dataset , ... , points);

% Return a vector of velocity components response = getVector(resultStruct.GetVelocityResult.Vector3 ) ;

end ✝ ✆✌

✞ Listing 4.3: Sample TS getVelocity TurbulenceService class function ☎ function r esponseStruct = TS getVelocity(obj, authkey, dataset , ... , points)

% Construct a Matlab structure containing the data data = struct( ’points ’ , s t r u c t ( ’x’ , points(1,:), ...), ...);

% Create the XML document, call the service and parse the response soapMessage = createSoapMessage( ’ GetVelocity’ , data, ...); response = callSoapService(URL, soapMessage, ...); responseStruct = parseSoapResponse(response );

end ✝ ✆✌

115 CHAPTER 4. JOHNS HOPKINS TURBULENCE DATABASES

4.5 Conclusions

An overview of the JHTDB is presented in this chapter. As discussed, the JHTDB is composed of two major components the clustered database backend where the

DNS are stored and the Web server frontend which exposes the DNS to the research community through Web services. A large-scale DNS was performed for 1/2 a flow through and the output velocity and pressure fields were used to create the channel

flow database. The resulting size of the velocity and pressure fields stored in the database is currently at 48 TB, though additional data will be added to the database in the near future which resulting in a final size of 96 TB. Web service functions for the JHTDB provide clients the ability to retrieve the spatio-temporal data any point in space or time with the database. To achieve this interpolation methods are used.

For temporal interpolation, PCHIP is employed [55]. The implementation of the spa- tial interpolation methods for the channel flow database which use the generalized barycentric Lagrange interpolation method [8] are discussed. For functions which require differentiation, the differencing weights of the Web service functions are gen- erated from the barycentric Lagrange interpolation method also. The Matlab client interface for the JHTDB Web services is also presented. It is demonstrated that the

Matlab interface provides a portable library that is based on intrinsic Matlab Web service functions using SOAP. Convient wrappers have been provided in order to aid in generating and parsing the SOAP messages sent to and received from the JHTDB.

A custom SOAP parser for Matlab (developed by visit scholar Edo Frederix) is also

116 CHAPTER 4. JOHNS HOPKINS TURBULENCE DATABASES

included in the client library. This parser allows an approximate 100 speedup over × the intrinsic Matlab SOAP parser. The result is that comparable performance is observed between the Matlab and C/Fortran counterparts. One advantage the Mat- lab interface provides over the C/Fortran libraries is the plethora of readily available functionality provided by Matlab, which make complex, secondary analysis with the

JHTDB data easier.

117 Chapter 5

Channel Flow DNS

5.1 Introduction

A large-scale DNS for the JHTDB is performed in this work. The channel flow

DNS is conducted at Reτ = 1000 for 1/2 of a flow-through time and the output velocity and pressure fields are stored in the JHTDB. In this chapter, we discuss this

DNS in further detail from Chapter 1 and present the implementation used for the simulation. In addition the production simulation for the JHTDB, a post-simulation analysis using the channel flow data is presented. In this analysis, the coherent vortical structures of the channel flow turbulence are studied, specifically their scales and distributions.

The organization of this chapter is given as follows. We first present the governing equations and solution method for the channel flow DNS in 5.2. Then we discuss §

118 CHAPTER 5. CHANNEL FLOW DNS the production DNS in 5.3. In this section, the setup of the simulation along with § statistical results of turbulence quantities are presented. Following this the vortex analysis introduced in Chapter 1 is addressed. The conclusions to this chapter are then given in 5.5. §

5.2 Governing Equations

The implementation used in the channel flow code uses the formulation present in Kim et al. [62]. This formulation (which will be denoted as KMM) uses a vertical velocity and vorticity representation by which the pressure caclulation is eliminated.

The benefit of this approach is that the issue with specifying the correction pressure boundary condition is avoided, especially when explicit time integration is used.

The momentum equations are expressed as

∂u = p + H + ν 2u (5.1) ∂t −∇ ∇ where H = (u u) + F with F being the mean pressure gradient, ν the −∇ · ⊗ molecular viscosity, and continuity is given by the divergence free condition

u = 0 . (5.2) ∇ ·

During the transformations applied in the KMM formulation, the pressure Poisson

119 CHAPTER 5. CHANNEL FLOW DNS

equation given by

2p = H (5.3) ∇ ∇ ·

is utilized. After applying the KMM transformations, the following set of equations

are obtained ∂ 2v = h + ν 4v , (5.4) ∂t∇ v ∇

∂g = h + ν 2g , (5.5) ∂t g ∇

∂v f + = 0 , (5.6) ∂y where ∂u ∂w f = + , (5.7) ∂x ∂z

∂u ∂w g = (5.8) ∂z − ∂x

∂ ∂H ∂H ∂2 ∂2 h = x + z + + H , (5.9) v −∂y ∂x ∂z ∂x2 ∂z2 y     ∂H ∂H h = x z . (5.10) g ∂z − ∂x

The boundary conditions applied and enforced are

v( 1) = 0 , (5.11) ±

g( 1) = 0 , (5.12) ±

120 CHAPTER 5. CHANNEL FLOW DNS

f( 1) = 0 . (5.13) ±

The last boundary condition listed is important for solving for the pressure as it implies that the divergence free condition is not only enforced within the domain but also at the boundaries.

The solution procedure begins by solving for v and g from (5.4) and (5.4), respec-

tively. Then using (5.7) and (5.8) the other velocity components are obtained. For

obtaining the pressure field, the PPE given by (5.3) is used to solve for p. Details for the pressure solver can be found in Appendix D. For further details of the numerical implementation of flow solver see Lee et al. [66].

5.3 Production Simulation

The turbulent channel flow database is produced from a direct numerical simula- tion (DNS) of wall bounded flow with periodic boundary conditions in the longitudi- nal and transverse directions, and no-slip conditions at the top and bottom walls. In the simulation, the Navier-Stokes equations are solved using a wall–normal, velocity– vorticity formulation [62] (see 5.2). Solutions to the governing equations are provided § using a Fourier-Galerkin pseudo-spectral method for the longitudinal and transverse directions and seventh-order Basis-splines (B-splines) collocation method in the wall normal direction (see Appendix C for more information). Dealiasing is performed using the 3/2-rule [86]. Temporal integration is performed using a low-storage, third-

121 CHAPTER 5. CHANNEL FLOW DNS

order Runge-Kutta method. Initially, the flow is driven using a constant volume flux

control (imposing a bulk channel mean velocity of U = 1) until stationary condi- tions are reached. Then the control is changed to a constant applied mean pressure gradient forcing term equivalent to the shear stress resulting from the prior steps.

Additional iterations are then performed to further achieve statistical stationarity before outputting fields.

The simulation is performed using the petascale DNS channel flow code (Poong-

Back) developed at the University of Texas at Austin by Prof. Robert Moser’s re- search group [66]. In the wall-normal, velocity-vorticity formulation, the pressure is eliminated from the governing equations. In order to obtain the pressure field for the database, the pressure solver was subsequently implemented by the author in

PoongBack which solves the pressure Poisson equation given as

2p = [ (u u)] (5.14) ∇ −∇ · ∇ · ⊗

where p is the pressure divided by density, and u the velocity. The Neumann boundary

condition, expressed as ∂p ∂2v = ν (5.15) ∂y ∂y2 where ν is the molecular kinematic viscosity and v the wall-normal velocity compo- nent, is used at the top and bottom walls. This calculation is performed independently from the velocity field solution only when outputting fields. The implemenation and

122 CHAPTER 5. CHANNEL FLOW DNS

validation of the pressure solver is discussed in detail in Appendix D.

The simulation is performed for approximately 1/2 of a flow through time (another

1/2 flow through time will be added in the near future). The 3 component velocity

vector and pressure fields are stored every 5 time steps, resulting in 2000 frames of

data. These fields are then inserted into the JHTDB and comprise the channel flow

database presented in 4. Information regarding the simulation setup and resulting § statistical quantities are listed below. Note that the averaging operation for mean

and other statistical quantities is applied in time and over x–z planes.

Simulation parameters

- Domain Length: L L L = 8πh 2h 3πh where h is the half–channel x × y × z × × height (h = 1 in dimensionless units)

- Grid: N N N = 2048 512 1536 (wavemodes); 3072 512 2304 x × y × z × × × × (collocation points); data is stored at the wavemode resolution, i.e. N N x × y × N = 2048 512 1536 at grid point nodes in physical space. z × ×

5 - Viscosity: ν = 5 10− (non-dimensional) ×

- Mean pressure gradient: dP/dx = 0.0025 (non-dimensional)

- DNS Time step: ∆t = 0.0013 (non-dimensional)

- Database time step: δt = 0.0065 (non-dimensional)

- Time stored: t = [0, 12.993]

123 CHAPTER 5. CHANNEL FLOW DNS

Flow statistics averaged over t = [0, 12.993]

- Bulk velocity: Ub = 0.99992

- Centerline velocity: Uc = 1.13195

2 - Friction velocity: u = 4.99857 10− τ ×

3 - Viscous length scale: δ = ν/u = 1.00029 10− ν τ ×

Ub2h - Reynolds number based on bulk velocity and full channel height: Reb = ν =

3.99970 104 ×

- Centerline Reynolds number: Re = U h/ν = 2.26391 104 c c ×

- Friction velocity Reynolds number: Re = u h/ν = 9.99713 102 τ τ ×

Grid spacing in viscous units

- x direction: ∆x+ = 12.2683

+ 2 - y direction at first point: ∆y = 1.65259 10− 1 ×

+ - y direction at center: ∆yc = 6.15728

- z direction: ∆z+ = 6.13416

In the following figures several quantities from the simulation are presented.

Shown in Figure 5.1 is the computed friction Reynolds for the time interval in the database. In Figure 5.2 the mean velocity is presented along with the standard U + profiles in the viscous sublayer and log-layer. The viscous and turblent shear stresses,

124 CHAPTER 5. CHANNEL FLOW DNS

Reynolds normal stresses, mean pressure, pressure variance, and velocity–pressure

covariances are shown in Figures 5.3(a)–5.3(d). In the remaining plots, the power

spectral densities of velocity and pressure are presented for various y+ locations.

Streamwise spectra are shown in Figure 5.4, whereas span wise spectra are shown in

Figure 5.5.

Figure 5.1: Friction velocity Reynolds number during the channel flow simulation during the database time interval

5.4 Vortex Analysis

In this analysis, we are interested in identifying regions containing vortical struc-

tures and then determining their size and organization. For identifying the vortex

structures, the Q criterion is used because it provides similar capabililty to the en-

hanced swirling strength method while being less expensive to calculate due to the

125 CHAPTER 5. CHANNEL FLOW DNS

Figure 5.2: Mean velocity profile in viscous units. Standard values of κ = 0.41 and B = 5.2 are used in the log-law (dashed line) for reference. eigenvalue calculations of the enhanced swirling strength method. The computa- tional costs are important in this analysis, since vortical structures are analyzed over many time steps (1000) from the channel flow dataset for a spatial domain size of

128 256 128 along the x, y, and z directions, respectively. The grid point locations × × are the same as those used in the DNS (see 5.3) but are confined to a sub-region de- § fined as (x, y, z) : 0 x 0.5πh, 0 y h, 0 z 0.25πh . The implementation { ≤ ≤ ≤ ≤ ≤ ≤ } of this analysis is given below.

126 CHAPTER 5. CHANNEL FLOW DNS

(a) Mean viscous, turbulent, and total shear (b) Velocity covariances in viscous units stress normalized by the wall stress

(c) Mean pressure profile in viscous units (d) Pressure variance and pressure-velocity co- variance in viscous units

Figure 5.3: Profiles of statitical quantities from the channel flow DNS.

127 CHAPTER 5. CHANNEL FLOW DNS

(a) y+ = 10.11 (b) y+ = 29.89

(c) y+ = 99.75 (d) y+ = 371.6

(e) y+ = 999.7

Figure 5.4: Streamwise power spectral densities at various y+ locations as function of kx 128 CHAPTER 5. CHANNEL FLOW DNS

(a) y+ = 10.11 (b) y+ = 29.89

(c) y+ = 99.75 (d) y+ = 371.6

(e) y+ = 999.7

Figure 5.5: Spanwise power spectral densities at various y+ locations as function of kz 129 CHAPTER 5. CHANNEL FLOW DNS

5.4.1 Implementation

The Q criterion is based on the second invariant of the velocity gradient tensor

which is expressed as 1 Q = (Ω Ω S S ) (5.16) 2 ij ij − ij ij where Ω = 1 (A A ) and S = 1 (A + A ) with A the velocity gradient ij 2 ij − ji ij 2 ij ji ij tensor. Vortex filaments are said to exist at locations for Q > 0 which have a relatively larger rotation rate than axial strain. Though not always guaranteed, it is assumed in this work that Q > 0 provides a minimum pressure region which is also indicative of a vortex core [53]. In the original formulation proposed by Hunt et al. [50], an additional pressure constraint is placed on the vortex definition. The additional pressure constraint, in this case, is omitted following Chakraborty et al. [17].

When computing the Q field, 6th order finite differences are employed. The eval- uation of Q contains the square product of the evaluated derivatives of the velocity

field. As a result, the calculation of the Q field tends to amplify noise from the ve-

locity derivative evaluations. Therefore, the velocity field is first filtered with a box

filter at 4∆, where ∆ is the local grid spacing, before computing the Q field in order

to reduce these issues.

Shown in Figure 5.6, are iso-surfaces of the computed Q for one time step (further

discussions are provided in 5.4.2). From the figure it may be observed that many of § the vortical structures tend to resemble ellipsoids. As a result, we determine surrogate

130 CHAPTER 5. CHANNEL FLOW DNS

ellipsoids which have an equivalent inertia tensor as each vortical structures in the

sample region. The surrogate ellipsoids provide an idealized, tractable object to

characterize the vortices and are discussed further in 5.4.2. § A summary of the steps taken for determining the vortex regions and subsequent calculations are given as follows:

1. Read the velocity field from the database for a given time step

2. Filter the velocity field using a box filter

3. Compute the velocity gradient tensor and subsequently the Q field.

4. Determine the points in the field that satisfy Q ǫ, where ǫ is the Q threshold– ≥ these are labeled as vortex points

5. Find regions that posses contiguous vortex points–these are the vortex regions

6. Compute the vortex region volume; this is computed by summing the cell vol-

ume of the vortex points within the region

7. Compute the center of mass of the vortex region

8. Determine the inertia tensor, its eigenvalues, and the surrogate ellipsoid volume

9. Bin the vortex information for histogram evaluation

10. Update the time step and repeat the process

131 CHAPTER 5. CHANNEL FLOW DNS

In step 7, the center of mass for a vortex region is computed as

x ∆V x = n n n (5.17) c ∆V P n n P where the summation is performed over all vortex points within the vortex region,

xn the cell center of the vortex point, and ∆Vn the cell volume of the vortex point.

In the following step, the inertia tensor for a vortex region is computed as

J = (r r I r r )∆V (5.18) n · n − n ⊗ n n n X where r = x x and I the identity matrix. The ellipsoid of the inertia tensor is n n − c then obtained from the eigenvalues (J1,J2,J3) of the inertia tensor using

xT Λ x = φ2 (5.19) · · where

J1 0 0   Λ = (5.20)  0 J2 0         0 0 J3     and the corresponding eigenvectors determine the body coordinate system for the

132 CHAPTER 5. CHANNEL FLOW DNS

vortex region. The ellipsoid equation given in (5.19) is expanded and reorganized as

x2 y2 z2 2 + 2 + 2 = 1 . (5.21) φ φ φ √J1 √J2 √J3       The value for φ is determined in order to produce an surrogate ellipsoid aligned to the body coordinate system of the vortex region which has the same inertia tensor as

Λ. The eigenvalues for this ellipsoid are given as

4 J = πabc(b2 + c2) (5.22) 1 15 4 J = πabc(a2 + c2) (5.23) 2 15 4 J = πabc(a2 + b2) (5.24) 3 15 where from (5.21)

φ a = (5.25) √J1 φ b = (5.26) √J2 φ c = . (5.27) √J3

Inserting a, b, and c from (5.25) into the equations for J1, J2, and J3 and then summing the left hand sides together and the right hand sides together to get a single

133 CHAPTER 5. CHANNEL FLOW DNS

equation for φ results in

1 + 1 + 1 8 J1 J2 J3 5 J1 + J2 + J3 = π φ . (5.28) 15  √J1J2J3 

Then, rearranging terms and solving for φ gives

1/5 (J + J + J ) √J J J φ = 1 2 3 1 2 3 (5.29)  8 1 1 1  15 π J1 + J2 + J3     or more succinctly in terms of the invariants of Λ

1/5 tr(Λ) det(Λ) φ = . (5.30) 8 πtr(Λ 1) " 15 p − #

The volume of the surrogate ellipsoid is then computed as

4 V = π det(Λ 1)φ3 . (5.31) e 3 − p

5.4.2 Results

In this section the results of the vortex analysis are discussed. As mentioned previously, vortical structures are determined using the Q criterion for a sub-region of the channel flow domain for 1000 time steps. Shown in Figure 5.6 are the Q isosurfaces for three threshold values. From the figure the high-speed near wall streaks are observed. It can been seen that these streaks are aligned with the flow direction.

134 CHAPTER 5. CHANNEL FLOW DNS

Further from the wall other vortical structures are observed. These structures appear to take on various orientations unlike the dominant streamwise orientation of the near wall streaks. The impact of the Q threshold is also evident from the figure. As the

Q threshold is increased the size of the vortical structures decreases and appears to shrink radially inward. This suggest that each Q surface bounds regions of increasing strength of the vortex region.

Figure 5.6: Q isosurfaces in a sub-region of the channel flow domain for three threshold values: Q=4, Q=16, Q=64.

The volume and center of mass for each vortical structure in the domian are estimated using the procedure listed in the previous section. From these estimates,

135 CHAPTER 5. CHANNEL FLOW DNS

probability distribution functions (PDF) of the vortex volume, center of mass, and the

ratio of the surrogate ellipsoid volume to the vortex volume are computed for the 1000

time steps. The PDFs of the vortex volumes are computed using two normalizations.

The first is based on the log-layer eddy scale which is considered to be the largest

scales in the flow. The second is based on the log-layer Kolmogorov scale which

determines the smallest turbulent scales.

In Figure 5.7, the joint PDFs of the normalized vortex volume with respect to the

log-layer eddy scale and center of mass are shown for varying Q thresholds. From the figure it is observed that the normalized vortex volume, is particularly small and ubiquitous throughout the channel. For larger Q thresholds, the location of the vortices become more isolated to the near wall region along with an increasing expectancy of larger vortex sizes near the wall. The increased near wall vortex sizes appear to be due to the fact that 1) peak vorticity maginitude is produced at the wall, and 2) the larger Q values will tend to isolate the strongest vortices.

The marginal distrubutions of Figure 5.7 are shown in Figure 5.8. In Figure 5.8(a) the PDFs of the normalized vortex volume with respect to the log-layer eddy scale are shown. From this figure clear power law behavior is observed for the normalized vortex volume. The power law exponent (qualitatively) seems to be nearly constant especially for the lower values of the Q threshold and compares well to the reference distribution (dashed line) with 4 power law exponent. In Figure 5.8(b), the PDFs − for y location of the center of mass are shown. From this figure, it is observed that

136 CHAPTER 5. CHANNEL FLOW DNS

Figure 5.7: The joint PDFs of the normalized vortex volume with respect to the log-layer eddy scale and center of mass location for various Q thresholds: a) 0.5, b) 1.0, c) 2.0, d) 4.0, e) 8.0, f) 16.0, g) 32.0, h) 64.0.

137 CHAPTER 5. CHANNEL FLOW DNS

the vortex locations are predominant near the wall. The peak of the PDFs appear to

occur in the buffer layer region. For increasing y they decay until about the end of the log layer or y/h 0.3 where after which they remain more or less constant. ≈ The marginal PDFs for the normalized vortex volume with respect to the log-layer

Kolmogorove scale (η) are shown in Figure 5.9. The log-layer Kolmogorov scale is expressed as κy 1/4 η = δ (5.32) ν δ  ν 

where δν is the viscous length scale determined by δν = h/Reτ . The results in

Figure 5.9 also indicate power law behavior for the normalized vortex volume based

on the local Kolmogorov scale. Furthermore, good overlap between all values of Q

is observed near 50 . V/η3 . 150. In this region the PDFs compare well with the

reference distribution (dashed line) possessing a power law exponent of 1, especially − for lower values of the Q threshold.

Shown in Figure 5.10 are the joint PDFs of the surrogate vortex ellipsoid to the

vortex volume ratio and center of mass location. It is observed from this figure

that for small Q threshold values, the volume ratio appears nearly constant, though

slightly greater than 1, throughout the channel. As the Q threshold is increased the

occurrence of the vorticies appears to be restricted closer to the wall. Also closer

near the wall, the volume ratio tends to increase; this increase is more pronounced

in larger Q thresholds. The marginal distribution for the volume ratio is shown in

Figure 5.11. Note the marginal distribution for the center of mass location is the

138 CHAPTER 5. CHANNEL FLOW DNS

(a) PDF of normalized vortex volume with respect to the log- layer eddy scale

(b) PDF of vortex center of mass

Figure 5.8: Marginal PDFs for the normalized vortex volume with respect to the log-layer eddy scale and center of mass.

139 CHAPTER 5. CHANNEL FLOW DNS

Figure 5.9: Marginal PDF for the normalized vortex volume with respect to the log-layer Kolmogorov scale. same as Figure 5.8(b). From the figure it is observed that there is an initial peak at

V /V 0.65. Then at V /V 1, the PDF increases rapidly to a peak at V /V 1.2 e ≈ e ≈ e ≈ for Q threshold up to 8 and at V /V 1.35 for Q thresholds 16 or larger. After the e ≈ maximum peak in the PDF, all curves decay fairly rapidly to zero.

5.5 Conclusions

The channel flow DNS for the JHTDB is presented in this chapter. The channel

flow DNS is performed using the recently developed PoongBack code [66]. The DNS is carried out for 1/2 of a flow through time and the resulting velocity and pressure

fields are stored in the JHTDB. The statistical profiles computed during the stored

140 CHAPTER 5. CHANNEL FLOW DNS

Figure 5.10: Joint PDFs for the surrogate vortex ellipsoid volume to the vortex volume ratio and center of mass location for various Q thresholds: a) 0.5, b) 1.0, c) 2.0, d) 4.0, e) 8.0, f) 16.0, g) 32.0, h) 64.0.

141 CHAPTER 5. CHANNEL FLOW DNS

Figure 5.11: Marginal PDF of the surrogate vortex ellipsoid volume to the vortex volume ratio. time interval of the DNS show expected behavior. The presence of the viscous sub- layer and log-layer are clearly evident in the mean velocity profiles and compare very well with theoretical profiles. The mean viscous and turbulent stresses also indicate the expected behavior along with the total mean stress which is nearly linear. The off linear deviations are due to limited temporal extent of the DNS which is currently at 1/2 of a flow through time. Pre-production simulation tests indicated that several

flow through times are required to achieve convergence with the linear total stress profile.

An analysis using the velocity fields from the DNS is performed in this work. For the analysis the size and distance from the wall of discrete vortical structures were studied. It is found that the joint PDF of the normalized vortex volumes with respect

142 CHAPTER 5. CHANNEL FLOW DNS to the log-layer eddy scale and wall distance location were strongly influenced by the threshold used in the Q vortex identification method. For smaller Q thresholds, the vortex volume relative to the log-layer eddy scale is observed to be fairly small throughout most of the channel except close to the wall where the occurence of larger structures is found. In all cases the detection of larger structures is confined predominatly for y/h 0.3. The occurance of large near wall structures is most ≤ pronounced for larger Q thresholds where this is believed to be due to the larger

Q values isolating regions of strongest vorticity which in this case are located at the wall. The marginal PDFs for the normalized vortex volumes based on the log- layer eddy scale are found to exhibit power law behavior with an observed weak dependence of the power law exponent on Q. The vortex locations are found to be most likely in the buffer layer as illustrated in the marginal PDFs for the vortex barycenters. For the normalized vortex volumes based on the log-layer Kolmogorov scale, power law behavior is also observed. These results indicate an overlap region in the distributions for 50 . V/η3 . 150 for all Q thresholds thus suggesting common power law prefactors for this region. Conversely, the distributions for the normalized vortex volumes based on the log-layer eddy scale displayed no such overlap region.

Surrogate vortices based on an equivalent intertia tensor are constructed for each of the identified vortex regions. It is noted that the surrogate vortices are predominantly larger than their corresponding vortices by 1.2 to 1.3 times.

143 Chapter 6

Concluding Remarks

Methodologies for the modeling of turbulent flows interacting with multiscale, complex geometry objects and data-intensive computing strategies for large-scale

DNSs are pursued in this work. The multiscale modeling of flows in the presence of hierarchically organized, fractal objects are presented in the first part of this the- sis. In this modeling approach, a technique called RNS is presented and applied to fractal tree canopies. In this work, a generalized framework for the RNS strategy

(first introduced by Chester et al. [21]) has been developed and applied to turbulent

flow in fractal tree canopies. The proposed RNS implementation enables the model- ing of spatially non-homogenous geometries without using a low-level branch based description and preserves the assumed dynamic similary through temporal filtering that was not guranteed in the original RNS method. The results from this analysis demonstrate the ability to produce canopy flow characteristics that are observed in

144 CHAPTER 6. CONCLUDING REMARKS

field measurements [107, 98]. Furthermore, comparisons with experimental data from a fractal tree canopy in a water tunnel showed very good agreement with the total drag force imposed on the trees even with nearly %75 of the total drag carried by the modeled subgrid scale branches. Also presented is a CPS technique which allows the concurrent generation of inflow data for complex turbulent flows. This method im- proves the standard precursor simulation approach by eliminating the I/O bottleneck associated with reading and writing to and from a precursor library stored to disk.

The CPS transfers all sampled inflow data via MPI calls. In addition, it removes the need to recycle the precursor library data for simulations that are conducted beyond the temporal extent of the precursor simulation. This methodology was demonstrated for two applications (turbulent duct flow over a single fractal tree and a developing,

finite wind farm in the atmospheric boundary layer) and is shown to perform very well.

In the second part of this thesis, approaches addressing the large-scale data prob- lem in DNS are presented. For this work, a large-scale DNS of turbulent channel flow is performed and the resulting velocity and pressure data are subsequently inserted in to the JHTDB. The newly generated publicly available database is accessible to any- one in the world using standard Web services based on SOAP. Discussed in this work is the development and implementation of the Matlab client interface for the JHTDB.

The Matlab interface has the distict advantange over its C and Fortran counterparts due to the readily available Matlab intrinsic functions usable for complex data anal-

145 CHAPTER 6. CONCLUDING REMARKS ysis. Then, the production simulation of the channel flow database along with the

DNS implentation are presented. Utilizing the channel flow data, an analysis based on the Q criterion vortex identification method is peformed in which the structure and organization of the wall bounded vortical structures are studied. The results from this study seem to indicate good, qualititative agreement with theoretical prediction with respect to the presence of large scale near wall structures and the preponderance of buffer layer vortices.

With regard to future work, several approaches can be taken. For the RNS methodology, techniques to address scale-dependent, multiscale geometry are needed.

Currently the implemenation asssumes complete scale-invariance. Realistic structures found in nature or in engineering problems will posses a finite cutoff length scale. A dynamic formulation which addresses this scale dependency would be a valuable con- tribution to the field. Further RNS studies with comparisons to the experimental data of Bai et al. [4] would also provide valuable insights. An RNS for this case has been conducted and presented in Chapter 3. However, as noted before, differences in the inflow structure have led to noticable descrepencies in the resulting fractal tree wake. Additional work to generate comparable inflow conditions as those in Bai et al. [4] should prove to be a fruitful endeavor. With regard to the large-scale DNS work in this thesis, continued efforts on the vortical structure analysis and identifi- cation methods would be valuable. The current method analyzes static structures.

The author believes a study which tracks these structures backwards in time using

146 CHAPTER 6. CONCLUDING REMARKS the JHTDB and captures their formation, splitting, and merging would be of great interest to the community.

147 Appendix A

LESGO Validation: Flow over wall

mounted cubes1

As a code validation, an LES is performed for flow over an array of wall mounted

cubes using a resolution of immersed objects that is similar to that used in RNS in the

main text. The cubes are represented using the IBM such that there is a minimum of

eight grid points across the width of each cube. The LES results of the flow field are

compared with the experimentally determined data of Meinders and Hanjali´c[76].

For the simulations, the cubes are arranged in a 2 2 matrix as shown in Fig- × ure A.1, where periodic boundary conditions on horizontal planes is assumed. The physical domain is specified as (x, y, z) : 0 x 8h, 0 y 8h, 0 z 3.5h { ≤ ≤ ≤ ≤ ≤ ≤ } where h is the height and diameter of each cube. A uniform spatial discretization of

1Reused with permission from Graham, J. and Meneveau, C., Phys. Fluids 24, 125105. Copy- right 2012, AIP Publishing LLC.

148 APPENDIX A. LESGO VALIDATION: FLOW OVER WALL MOUNTED CUBES

Figure A.1: Domain setup, contours of instantaneous x component velocity, and time averaged streamlines for the wall mounted cubes test-case (streamlines originating near x/h=4 have been seeded at that location).

N N N = 64 64 29 is used. x × y × z × × Along with the arrangement of the four cubes in the simulation domain, contours of the instantaneous velocity (x component) and streamlines from the time averaged velocity field are shown in Figure A.1. From the contour planes the local flow struc- ture and their interaction with the cubes is evident. On the plane along constant z, we observe that each cube sits in the wake of its upstream counterpart. From the streamlines, the diversion of the flow around the cubes is seen along with the recirculation region generated in the wake of the cubes (slight asymmetries in the y-direction are due to lack of complete statistical convergence).

Results for mean velocity profiles comparing LES with experimental data are

shown in Figure A.2. For presentation of the profiles, data are shifted such that the

149 APPENDIX A. LESGO VALIDATION: FLOW OVER WALL MOUNTED CUBES

local origin is located at the center of the leading edge of the sample cube. The

velocity is normalized using a reference velocity located at a position of (x, y, z) =

(1.3h, 0, 2.25h) in this local coordinate system. The data from the LES compare well with the experimental data for the given profiles. Both the x and y component of ve-

locity are well represented using the current grid spacing, representative of the coarse

discretization of the smallest resolved branches in the RNS applications presented in

the main text.

150 APPENDIX A. LESGO VALIDATION: FLOW OVER WALL MOUNTED CUBES

Figure A.2: Mean velocity profiles for the wall mounted cubes case for: (a) x com- ponent of velocity at y = 0, (b) x component of velocity at z = 0.5h, and (c) y component of velocity at z = 0.5h. In each figure, the horizontal arrow denotes the x component of the measured reference velocity.

151 Appendix B

MPI-DB1

B.1 Introduction

Scientific simulations generate increasingly large data sets, which need to be stored and exposed to researchers for subsequent analysis. Parallel database systems provide persistent storage of large-scale scientific data sets which allow researchers to perform post-simulation analytics, promote research collaborations through data sharing, as well as extend the lifetime of the data. We believe that MPI will play an important role in enabling complex data analytics computations with data sets stored in the databases. In this chapter we demonstrate a method based on MPI client-server im- plementation for storing the output of scientific simulations directly into the database.

This direct creation of the simulation output database automates the previously used

1Portions reused with permission from J. Graham, E. Givelberg, and K. Kanov, EuroMPI ’13 Proceedings of the 20th European MPI Users’ Group Meeting. Copyright 2013, ACM.

152 APPENDIX B. MPI-DB semi-manual, time-consuming ingestion process where flat files were initially gener- ated and subsequently ingested into the database using special-purpose scripts that were developed for each simulation. We show that it is possible to ingest the output of large-scale simulations directly into the database without delaying the simulation process. Furthermore, our experiments reveal that significant improvement of the ingest process is possible to enable complex MPI-driven computations involving data sets stored in the database.

The need to perform increasing complex analytics with large data sets that are stored in the database led Givelberg et al. [41] to develop the MPI-DB software library and a model for data-intensive computations [41] for HPC applications interacting with databases. The MPI-DB library provides the user with an abstraction layer, thus hiding the database and removing the need to use a database query language.

Its transport layer uses MPI-2 functionality for communications. In this chapter we describe how MPI-DB can be used to stream and ingest data into a database system from a running turbulent channel flow simulation using a data-intensive computing model.

The rest of the chapter is organized as follows. After providing some background in section B.2 we briefly describe the channel flow simulation (section B.3) and the

MPI-DB software library (section B.4). Since many simulation codes, including our channel flow code, are written in Fortran, we created a Fortran interface for MPI-DB, whose object-oriented design we desrcibe in detail in sections B.6 and B.7. This is

153 APPENDIX B. MPI-DB followed with the description of computational experiments B.8 we carried out, the tuning of the ingestion process and the conclusions in section B.9.

B.2 Related Work

At present, MPI-DB uses a relational database management system as its backend.

Multidimensional arrays are partitioned into small chunks and each chunk is stored as a binary large object (BLOB) in a relational database record. Necessary array manipulation is then performed by MPI-DB. The layered architecture of MPI-DB and the modularity of each layer allow it to be easily extendible to work with different storage systems as the backend. Alternative approaches are described below.

Array-oriented database systems such as SciDB [13], ArrayDB [75] and RasDaMan

[7] aim to make multidimensional arrays, image or raster data first-class DBMS cit- izens. Language extension to SQL for manipulating and querying arrays have also been proposed. These include the Array Manipulation Language [75] and Array Query

Language [70]. The focus of these works is to provide database support for array data and allow the manipulation of multidimensional arrays within the database system.

The task of ingesting the data into the system and formatting them according to each system’s specification is left to the user. In contrast MPI-DB and its simple interface hide the details of how the data is stored and organized in the database and handles the data-ingestion process transparently to the user. Moreover, MPI-DB can

154 APPENDIX B. MPI-DB be extended to work with such systems as its backend.

A widely used alternative to database systems for storing array data are file-based storage packages such as NetCDF [95] and HDF5 [103]. They hide the details of the physical organization of the array data, but provide limited manipulation capabilities.

Thus, all manipulation of the array data has to be performed by the application.

NoSQL database systems such as BigTable [18] and Dynamo [27] achieve high scal- ability and performance and can be effectively used when working with large amounts of structured or unstructured data. These systems provide no APIs for manipulating array data and can only be used as distributed data-stores. As the data ingestion and subsequent retrieval and manipulation process is performed by the application, a system such as MPI-DB, when implemented on top of a NoSQL database system, can be used to effectively exploit the high scalability and performance that these systems provide.

B.3 Channel Flow Simulation

Channel flow is a classical problem in the field of turbulence, possessing the tractable characteristic of homogeneous turbulent statistics while introducing bound- ary effects due to viscous interactions at the top and bottom walls. In this chapter, we will be performing direct numerical simulations of turbulent channel flow using the data-intensive computing architecture.

155 APPENDIX B. MPI-DB

In the channel flow simulations in this chapter, we solve the incompressible Navier-

Stokes equations using the wall-normal formulation of Kim et al. [62] as presented

in 5.2. For parallelization, the channel flow implementation employs a MPI base § domain decomposition. The spatial derivatives of the Navier-Stokes equations are computed using a parallel fast Fourier Transform algorithm and compact finite dif- ference schemes [67].

The simulation is performed on a domain of size L L L = 2πδ πδ 2δ x × y × z × × where δ is the half-channel height. As part of the scalability test that will be dis-

cussed in later sections, the grid spacing will be adjusted as specified in Table B.1.

Also contained in the table are the Re values that will be used at the associated

grid resolution. The grid spacing and domain size were chosen in order to maintain

∆x+ = ∆y+ = ∆z+ 4.5 in the center of the channel where ∆x+ = Re∆x/δ and ≈ ∆x is the grid spacing in the x direction. The terms ∆y+ and ∆z+ have similar defi-

nitions. Shown in Figure B.1 are the instantaneous streamwise velocity and vorticity

fields from case C3 in Table B.1. From this figure, the complex, near wall turbulent

structures are evident along with the velocity variation across the channel.

Table B.1: Summary of grid resolution and Re used for the channel flow simulations. Case Grid Resolution Re C1 128 64 64 92 C2 256 ×128 × 128 184 C3 512 × 256 × 256 368 C4 1024× 512× 512 735 × ×

156 APPENDIX B. MPI-DB

Figure B.1: Instantenous streamwise velocity (vertical contour planes) and vorticity fields (iso-surfaces) for case C3 from Table B.1. The iso-surfaces are colored according to the vertical height. B.4 The MPI-DB software library

We provide a brief description of the MPI-DB software library, its architecture and capabilities as well as some implementation details. MPI-DB enables the dynamic creation of large scientific databases from the output of simulations during their ex- ecution. The data generated during runtime of a simulation are directly streamed to the database, and are easily and efficiently retrievable thereafter from remote clients. MPI-DB has a simple API, which provides database access and capabilities to high-performance computing processes. The manipulation of these data inside the database servers is abstracted away from the user and happens automatically. The

157 APPENDIX B. MPI-DB simple interface allows users without database knowledge to make use of database technology without making significant changes to their code [41].

B.5 MPI-DB Architecture

MPI-DB has a layered client-server architecture. The clients and the servers make use of the MPI software package – for communication and to exchange data on the client side and additionally to spawn processes on the server side. Each layer of the library handles a specific task. The layers are as follows: data transport layer, data object layer, database access layer, and system management layer.

The data transport layer handles the movement of data between the clients and the servers. At the moment it makes use of two different protocols for this data trans- fer: UDT and MPI. The MPI-2 standard [80] introduces functionality for client-server interaction, which the MPI-DB library makes use of. However, in our experience most implementations of this standard do not provide cross-platform communication (e.g.

Linux to Windows). This is why the data transport layer includes a UDT implemen- tation, based on UDT sockets [46]. This layer, as well as any of the other layers, is easily extendible and can be implemented with a different underlying protocol.

The data object layer provides definitions, access methods and storage represen- tation for the objects exposed to the user. As an example consider our use case of a numerical simulation storing its output in a database. The computation is carried

158 APPENDIX B. MPI-DB out over a multidimensional array. In MPI-DB the user defines a Domain object, which includes the dimensionality of the array and the spatial extents of each di- mension. The user then proceeds to define an Array object over this domain, which additionally has a data type (e.g. int, float, double). After computing the data for the Array object the user can execute a write operation on it, which transparently reshapes and repartitions the array and inserts it into the database according to the database schema.

As the name suggests the database access layer encapsulates the database access and communication. It provides abstractions for bulk insert, data retrieval and in the future will handle the execution of data analysis procedures or the compilation of user-defined functions. It provides a simple and extensible interface and currently includes MySQL and SQL Server implementations.

Finally, the system management layer handles the establishment of connections between the clients and the master MPI-DB server. The master MPI-DB server is the integral part of this layer. It is responsible for monitoring the available resources and spawning the requested number of data object servers on an appropriate number of database servers. Subsequently, each of the clients connects to its allocated data object server, which handles all the client’s requests.

159 APPENDIX B. MPI-DB

B.6 Fortran Interface Design

We created the Fortran interface as a set of wrapper functions for performing MPI-

DB operations. These wrapper functions perform the correct data type conversion and translation in such a way that eliminates the need for the Fortran programmer to understand the details of the C++ interoperability. Thus, the Fortran interface provides convenient methods through standard subroutine calls for utilizing MPI-DB.

Interoperability with the C++ code of MPI-DB is accomplished using the ISO C BINDINGS module of the Fortran 2003 standard. The ISO C BINDINGS gives a convenient method for datatype conversions from Fortran to C types. The interface works in two layers

(these layers are specific to the Fortran interface). The first layer is the translation layer and contains the subroutines that the application code calls directly. Provided to these subroutines are standard Fortran data types and the MPI-DB types which will be discussed below. These subroutines then convert and manipulate the input data appropriately before calling the actual C++ code in the second layer or the bindings layer.

One key attribute of MPI-DB is its object-oriented design. The ability to manage and interact with the MPI-DB data objects from Fortran is achieved by using C- style pointers to the data objects and providing subroutines for operating on these objects. We have created an MPI-DB data type for each of these data objects. Using specific types for each of the data objects increases extensibility by allowing additional attributes to be easily added to the data types through new subtypes. Moreover, the

160 APPENDIX B. MPI-DB specific types provide the ability to have polymorphic interfaces for the translation layer of the interface, thus minimizing the number of specific subroutines needed by the client application code.

Support for Fortran is provided through the MPI-DB Fortran module and an application library. The MPI-DB module is assembled from several other supporting

Fortran modules which includes a Fortran module for the MPI-DB type definitions and then a separate module for each of the MPI-DB types which themselves contain the translation and binding layers for the respective types. Also included in the

MPI-DB module are the global subroutines such as the MPI-DB initialization and

finalization functions along with overloaded interfaces that apply to multiple data objects.

B.7 Fortran Interface Example

In this section we describe how the Fortran interface for MPI-DB is used with an existing simulation code. Our goal has been to make as few changes as possible to this simulation code and directly stream its output to the database. The following is an example of writing a single array of data to a MySQL database (we also provide support for writing to SQL Server databases).

We begin by first including the MPI-DB module in the program and defining the

MPI-DB data types that will be used. These data types and their usage are discussed

161 APPENDIX B. MPI-DB

below. ✞ ☎ program ChannelFlow

use mpidb

implicit none

MPIDBtype ( Domain t ) : : domain

MPIDBtype ( DomainPartition t ) : : s l a b p a r t i t i o n

MPIDBtype ( ArrayType t ) : : a r r a y t y p e

MPIDBtype ( Array t ) : : array

MPIDBtype ( Connection t ) : : conn

... end program ChannelFlow ✝ ✆✌

The next step is to initialize MPIDB. The subroutine MPIDB Start is used to

to initialize MPI-DB and is used in place of mpi init for MPI initialization. We

also provide subroutines for obtaining the number of processes in the default com-

municator and the current MPI rank. By default MPI-DB uses the MPI COMM WORLD

communicator for all MPI communication. ✞ ☎ ...

! Initialize MPIDB

PIDBc a l l M Start(n, arg)

PIDBc a l l M Size ( p )

c a l l MPIDB Rank( i d )

... ✝ ✆✌

After MPI-DB is initialized, we create the global Domain object for our data.

The spatial dimensions of the domain are passed as arguments. The last (optional) argument of the MPIDB Domain constructor call is the boolean flag for specifying if

162 APPENDIX B. MPI-DB

the domain is a global domain (.true.) or local domain (.false.). A global do- main implies that the domain spans across all processes (and will contain distributed arrays), whereas a local domain will be defined for the local process only. Within our domain we create an ArrayType object. The ArrayType object will be used to construct arrays within our domain. ✞ ☎ ...

! Create domain and array partition in the domain

c a l l MPIDB Domain(domain, n11, n12, n21, n22, n31, n32, .true. )

PIDBc a l l M ArrayType( array type , domain)

... ✝ ✆✌

We now establish the connection to the MPI-DB server and create the basic MPI-

DB system tables in the database. First, we create the Connection object which will be used for communication to the server. Second, we establish the connection to the database. Since this is performed by all compute processes, we ensure synchronization with the MPIDB Barrier subroutine. After the connection to the MPI-DB server is

established, the database is logged into and the basic tables are created. (Note this

last operation is carried out by only one of the computing processes.) ✞ ☎ ...

! Create connection object

PIDBc a l l M Connection( conn )

! Establish connection

PIDBc a l l M ConnectServer( conn )

PIDBc a l l M Barrier ( )

163 APPENDIX B. MPI-DB

! Login to database

PIDBc a l l M ConnectDB( conn, host , db, user , passwd, MPIDB MYSQL SERVER )

i f ( id==0) c a l l MPIDB CreateBasicTables( conn )

PIDBc a l l M Barrier ( )

... ✝ ✆✌

Once the database connection is established and the database is initialized, we

are in a position to start writing data to the database. We begin by generating the

ArrayPartition object for the distributed array. This object provides the decom-

position scheme for the distributed array that we will create. The first argument

of the ArrayPartition constructor is the local slab of each process. The MPIDB

system communicates the domain information among the computing processes pre-

senting the user application with a single coherent ArrayPartition object. Next,

the MPIDB Array object is created. The Array object holds the actual outgoing

data which is passed to it during its construction. An array mirroring the one we just

created on the compute system must now be created in the database. This is done

using MPIDB Create which will create the location that will receive the array when

written from the compute nodes. Finally the data is written to the database with

MPIDB Write. During this call the outgoing data will be streamed to the database

and ingested using a Morton-order, space-filling spatial index [89]. During each time

step of the simulation several Array objects are created, written and destroyed. ✞ ☎ ...

! Create array partition object for distributed array

PIDBc a l l M ArrayPartition( slab partition , domain);

164 APPENDIX B. MPI-DB

! Generate array object containing the data in ’dat’

PIDBc a l l M Array(array , array type , s l a b partition , dat )

! Create the array in the database

PIDBc a l l M Create(array , conn)

! Write data from the compute nodes to the database

PIDBc a l l M Write(array , conn)

PIDBc a l l M Barrier ( )

... ✝ ✆✌

After all data transfers are completed we can clean-up allocated memory and

simply terminate all MPI-DB communication from the compute clients to the MPI-

DB server. The subroutine MPIDB Delete is an overloaded interface that deletes the

provided object using the delete function in C++. Finally, we terminate all MPI

communication between the compute nodes with MPIDB Finish. ✞ ☎ ...

! Delete array object including allocated data

PIDBc a l l M Delete( array )

! Disconnect and clean up

PIDBc a l l M DisconnectFromServer( conn )

! Finalize all MPIDB communcation

PIDBc a l l M Finish ( )

end program ChannelFlow ✝ ✆✌

165 APPENDIX B. MPI-DB

B.8 Results

We carried out a series of experiments to evaluate and optimize direct data inges- tion. We added data ingestion code using the MPI-DB Fortran bindings (see Section

B.7) to the simulation of the turbulent channel flow and measured the performance of the ingestion for various grid sizes (see Table B.2). In all of our experiments presented here the output is generated every 5 time steps.

The test system used for the analysis is comprised of four compute nodes and one database node within a clustered system. Each of the nodes has the following operating system and hardware configuration:

Scientific Linux 6.2 (kernel 2.6.32) • Dual Intel Xeon X5650 CPUs with 24 logical cores (simultaneous multithreading • enabled)

48 GB of system memory • 10GigE network interconnects • SATA III SSD local storage (single drive) •

For the analysis we used the GCC 4.4 compiler suite with Intel MPI 4.0.3. The database node is running MySQL version 5.1 with the database stored on a local

SSD drive.

We timed the output as seen by the application and summarized the results in

Table B.2. All of the tests reported in Table B.2 were performed with 32 processes

166 APPENDIX B. MPI-DB

(8 per node). For each grid size we measured the total wall-clock time it takes to compute 5 simulation time steps and the total time to perform the following data ingest into the database. The measurements were performed for several runs and the average results are recorded in the table. The last column of the table is the fraction of the MPI-DB database ingest time divided by the time step duration.

Table B.2: Simulation test results for various grid sizes grid size time step I/O I/O fraction 1284 6 64 6.435 1.078 0.16 256 ×128 × 128 104.4 1.711 0.016 512 × 256 × 256 210.5 5.05 0.023 1024× 512× 512 615.5 10.1 0.016 × ×

In Table B.3 we report the results of the same measurements for the test simulation with the grid size of 512 256 256 points with a variable number of processors used. × × In all cases the proportion of time devoted to the database I/O was only a few percent of the total simulation time. When the problem size and the number of processes were chosen such that each process computed with a sufficiently large portion of the memory, database ingestion was on the order of 2% of the total computation time.

Table B.3: Simulation test results for the 512 256 256 grid for variable number of processes × × number of processors time step I/O I/O fraction 8 155.0 1.844 0.012 16 140.0 3.471 0.024 32 210.5 5.055 0.023 64 109.0 7.821 0.071

These results are confirmed by measurements of the data ingestion throughput.

167 APPENDIX B. MPI-DB

Figure B.2: Throughput of the data ingestion as a function of the grid size.

Figure B.3: Throughput of the data ingestion for grid size 512 256 256 as a function of the number of processes used in the simulation. × ×

168 APPENDIX B. MPI-DB

We measured the throughput, as seen by the application: the computing processes were synchronized at the beginning of the data ingestion and the wall-clock time was measured at the moment when all processes completed I/O. The results are summarized in Figures B.2 and B.3. As before, all tests reported in Figure B.2 were carried out with 32 processes, 8 per node. For the large size of 1024 512 512 the × × simulation program effectively achieved a 780 MB/sec throughput. This throughput is larger than the perceived limit set by the approximately 600 MB/sec throughput of the SATA III drive containing the database. The data writes, however, do not appear to be limited by the database throughput as we are performing asynchronous transfers and the reported throughput is based on the transfer time observed by the client application.

In Figure B.3 the throughput for the simulation with the grid size of 512 256 256 × × points is recorded as a function of the number of processes used in the computation.

In the case of 8 processes the size of the array slab available to each process is large and consequently a higher throughput was achieved. It is also observed that as the number of processes is increased a lower throughput was realized. We attribute this behavior to an increase in relative overhead for the data transmission due to the smaller data chunks handled by each process. In addition, the larger process number for a fixed hardware setup results in increased resource contention. Based on these results, the data transmissions will be scalable as long as each processes has a large enough amount of work to perform (i.e. local data chunks are not fine grained).

169 APPENDIX B. MPI-DB

Although the MPI-DB data ingestion is fast enough to be used in a simulation

application, it can still be significantly improved. MPI-DB partitions multimdimen-

sional arrays into atoms, which are stored as binary large objects (BLOBs) in the

database. A small atom size of 4 4 4 numbers was used in all of our experiments. × × The partitioning of the data and the SQL query processing have therefore resulted in a large overhead. The MySQL ingestion process executed a separate query for each array atom, and we measured a mere 6.1 MB/sec data ingestion rate per process for the MPI-DB server.

By comparison, we have been able to improve the rate of data ingestion into Mi- corsoft’s SQL Server database by implementing bulk ingestion using the data reader, and have achieved ingestion throughput up to 40 MB/sec for the same sized array atoms. However, so far we have been unable to run MPI client-server between a

Linux and a Windows systems because we are not aware of any implementation of the MPI-2 standard that currently supports this.

Presently the server-side data ingestion in MPI-DB is clearly the bottleneck, with the rate of ingestion significantly lower than the network transfer rate. This can be corrected by carrying out the ingestion in parallel within the MPI-DB server, and furthermore by using a larger atom size. In separate experiments loading large

BLOBs using 30 parallel threads resulted in up to 2 GB/sec ingestion throughput on our system.

170 APPENDIX B. MPI-DB

B.9 Conclusion

We demonstrate a method relying on MPI-2 client-server implementation for direct data ingestion of an output of a large-scale numerical simulation into the database.

We have implemented a Fortran interface to the MPI-DB library and used it in the turbulent channel flow simulation to evaluate and optimize the process of data ingestion. Our test experiments demonstrate that data ingestion can be performed without delaying the simulation process. This direct data ingestion automates the previously used labor-intensive ingestion process which required developing special- purpose tools for each simulation. The on-the-fly ingestion is possible because the ingestion process is asynchronous and parallel. While our measurements reveal that a good utilization of the high network throughput is achieved, there is a large room for the improvement in the utilization of the disk I/O bandwidth. Although the current implementation is adequate for direct data ingestion, we expect the improvement in the disk I/O bandwidth utilization to enable complex data analytics computations using the database. In view of the growing need for performing analysis on data stored in databases points we believe that the interaction of scientific computing processes with databases is a promising direction in MPI research.

171 Appendix C

B-Spline Collocation Method

Basis splines (B-splines) are piecwise polynomials that possess special continuity

properties within a given interval (see de Boor [25] for an excellent treatise on B-

splines.) The construction of B-splines begins with a given, non-decreasing breakpoint

distribution on the interval Ξ = [a, b] expressed as ξ = ξ : i = 0,...,N . From { i } the breakpoint distribution a set of B-splines which guarantee continuous derivatives every where in Ξ up to order k 1, where k is the order of the B-spline, may be − constructed by properly selecting a set of knot locations within Ξ [11, 25]. These knot locations are denoted by t = t : i = 0,...,N + k 1 . The N + 1 Marsden- { i − } Schoenberg collocation points defined on Ξ are then determined as the mean location of k 1 consecutive knots in t and are expressed by x = x : i = 0,...,N [57]. − { i } k 1 In order to ensure C − on Ξ, the number of B-splines required is equal to the knot number [11], i.e. n = k where n are the number of B-splines.

172 APPENDIX C. B-SPLINE COLLOCATION METHOD

For the given knot vector t, the n B-splines of order k are then defined according to the recurrence relation as

k x ti k 1 ti+k x k 1 Bi (x) = − Bi − (x) + − Bi+1− (x) (C.1) ti+k 1 ti+k ti+1 − − where

1 if ti x < ti+1 B1(x) =  ≤ (C.2) i   0 otherwise   and i = 0, . . . , n 1. The set of B-splines form a complete basis and may be used to − express a function u(x) on Ξ as

n 1 − k u(x) = ciBi (x) (C.3) i=0 X

where ci are the B-spline coefficients (or de Boor points [57]). For collocation methods, u will be given at the discrete collocation points. At these locations the field u may be written (in discrete form) as

n 1 − uj = ciBij (C.4) i=0 X

k and Bij = Bi (xj) is the B-spline matrix. The coefficients ci are then readily expressed

173 APPENDIX C. B-SPLINE COLLOCATION METHOD

as N 1 − 1 ci = Bij− uj . (C.5) j=0 X 1 where Bij− is the inverse of the B-spline matrix from (C.4).

The derivatives of the field u are carried by the B-splines such that the dth deriva- tive of u may be expressed as

N 1 (d) − (d) uj = ciBij (C.6) i=0 X

(d) (d),k where Bij = Bi (xj). Therefore, derivatives are computed by first determining the coefficients ci given by (C.5). Then the derivatives are obtained by taking the inner product of the appropriate B-spline matrix with the coefficients.

It should be noted that B-spline matrix is compactly supported over n splines

(represented by the inner index i). As a result, the B-spline matrix may be repre- sented as a sparse, banded matrix. This feature is especially important for the linear algebra operations needed for computing the coefficients ci in (C.5) and applying the associated B-spline matrix when computing u or its derivatives.

174 Appendix D

Channel Flow DNS: Pressure

Solver

D.1 Introduction

In this chapter, the solution procedure used in the pressure solver for the channel

flow DNS is presented. As mentioned in 5.2 the pressure field is not determined § during the evalution of the velocity field. However, it is required for computing certain turbulence quantities of interest (e.g. the pressure strain correlation of the turbulent kinetic energy equation) and thus is desired for storage in the JHTDB. The vertical velocity and vorticity formulation of Kim et al. [62] invokes the PPE given by

(5.3). From this equation the pressure can be solved for as a boundary-value problem with Neumann boundary conditions.

175 APPENDIX D. CHANNEL FLOW DNS: PRESSURE SOLVER

D.2 Pressure Solution

In the pressure solver, the PPE given by (5.3) is expressed in wavespace along the

x and z directions. An ordinary differential equation (ODE) is then solved for each

kx and kz wavemode. The solution procedure for the non-zero and zero wavemodes

are discussed in the following sections.

D.2.1 Non-Zero Wavemode Solution

We first apply Fourier transforms along the x and z directions to obtain the

transformed PPE as

d2 dH k2 + p = ik H + ik H + y (D.1) − m dy2 x x z z dy   b b b b 2 2 2 where km = kx + kz with kx and kz the wave numbers along the x and z directions, respectively. The Fourier transformed Neumann boundary condition is then expressed as ∂p ∂2v = ν (D.2) ∂y ∂y2 b b For all but the 0 0 wavemode, these equations provides a second order ODE along y − that may be solved using standard techniques. The solver implemented in PoongBack uses a banded LU-decompostion approach for the B-spline linear algebra operations

(see Appendix C).

176 APPENDIX D. CHANNEL FLOW DNS: PRESSURE SOLVER

Now applying the B-spline representation presented in C for p, the transformed § PPE in (D.1) becomes

c k2 B + B(2) = R (D.3) p · − m  b dHy where cp are the B-spline coeffients for the pressure and Rj = (ikxHx + ikzHz + dy )j

where j denotes the index of the collocation points. The above equationb b may then

be expressed as

c L = R (D.4) p ·

where L = k2 B + B(2) . Applying the boundary conditions given in (D.2), the ij − m ij ij   2 (1) ∂ vb left and right hand side matrices become Lij = Bij and Rj = ν ∂y2 become at the j   top and bottom walls, i.e. y = h. The pressure is then computed as ±

p = c B (D.5) p ·

1 where c = L− R. p ·

D.2.2 Zero Wavemode Solution

The 0 0 wavemode can be obtained by integrating (D.1) with respect to y for −

kx = kz = 0 to obtain dp = H (D.6) dy y b b

177 APPENDIX D. CHANNEL FLOW DNS: PRESSURE SOLVER

The nonlinear term H can be written as H (y; 0, 0) = dvvc such that we can integrate y y − dy (D.6) directly to obtainb b

p = vv (D.7) − b c where p( 1; 0, 0) = 0 is applied. − b D.3 Validation

In order to test the numerical implementation of the pressure solver, two validation approaches are taken. In the first approach, two test cases compare numerical results are against the analytical solutions of an inhomogeneous Helmholtz equation. In these tests, the same equations are solved, but Dirichlet boundary conditions are applied in one test and Neumann boundary conditions are applied in the other. The numerical solutions for these tests are obtained using the pressure solution algorithm in D.2.1. § Another set of tests are performed which serve to check how well the pressure solver implicitly satisfies the Dirichlet boundary condition while only enforcing the Neumann boundary condition in the PPE. The details of this test are discuss below.

We first solve the inhomogenous Helmholtz equation given as

d2 k2 + p = g(y) (D.8) − dy2   where g(y) and k = 0 are known. The boundary conditions are specified as p( 1) = c 6 −

178 APPENDIX D. CHANNEL FLOW DNS: PRESSURE SOLVER

and p(1) = d. The solution for p may be obtained by writing p as p = ph + pp where

the homogeneous solution carries the boundary conditions and a particular solution

has homogenous boundary conditions. The resulting homogeneous solution is then

given as

ph = acosh(ky) + bsinh(ky) (D.9)

where a = (c+d)/(2cosh(k)) and b = (d c)/(2sinh(k)). The particular solution may − be obtained by using an appropriate basis projection. Since the particular solution

has homogeneous boundary conditions a sine series is chosen such that

∞ pp = ansin(πny). (D.10) n=1 X

The coefficients an are then obtained by inserting pp into (D.8) and taking the ap-

propriate scalar product to obtain

1 1 an = 2 2 g(y)sin(πny)dy . (D.11) −(nπ) + k 1 Z−

The final solution is then given as

c + d d c ∞ p(y) = cosh(ky) + − sinh(ky) + a sin(πny) (D.12) 2cosh(k) 2sinh(k) n n=1 X

where an is given by (D.11).

For the second test, we solve the same inhomogenous Helmholtz equation but in

179 APPENDIX D. CHANNEL FLOW DNS: PRESSURE SOLVER

this case Neumann boundary conditions are applied. The boundary conditions for

this case are dp ( 1) = c and dp (1) = d. Following the same solution procedure as dy − dy for the Dirichlet boundary condition case, the same particular solution as given by

(D.10) and (D.11). The homogenous solution is also given by (D.9), however, the coefficients a and b are modified. The resulting final solution is given as

d c d + c 2γ ∞ p(y) = − cosh(ky) + − sinh(ky) + a sin(πny) (D.13) 2 ksinh(k) 2 kcosh(k) n n=1 X

where an is given by (D.11) with

∞ πng γ = ( 1)n n (D.14) − (nπ) 2 + k2 n=1 X

1 and gn = 1 g(y)sin(πny)dy. − − R The two tests for the different boundary conditions were performed with g(y) = ky and c = 1 and d = 2. The results for the Dirichlet and Neumann boundary − conditions are shown in Figures D.1 and D.2, respectively, for two values of k. From

these plots it is readily observed that the numerical solution computes the correct

solution.

In the pressure solver, the Neumann boundary condition, as given by (D.2), is

explicitly enforced. There is, however, a Dirichlet boundary condition (obtained by

projecting the momentum equation along the channel wall) that may be applied that

is just as valid as (D.2). They both cannot be enforced simultaneously however. For

180 APPENDIX D. CHANNEL FLOW DNS: PRESSURE SOLVER

Figure D.1: Comparison between the numerical (lines) and analytical solutions (sym- bols) of (D.8) with Dirichlet boundary conditions and two values of k: a) k = 1.1180 and b) k = 18.901.

Figure D.2: Comparison between the numerical (lines) and analytical solutions (sym- bols) of (D.8) with Neumann boundary conditions and two values of k: a) k = 1.1180 and b) k = 18.901.

181 APPENDIX D. CHANNEL FLOW DNS: PRESSURE SOLVER

this validation test the Dirichlet boundary condition is computed after the pressure

field is obtained. We then perform a posteriori checks to see how well the momentum

equation along the channel wall is satified.

For the tests, we use the Dirichlet condition given as:

∂p = ντ 2u + τ Π (D.15) ∂τ · ∇ · where Π is the mean pressure gradient force. We simply choose τ along the x z − diagonal resulting in ∂p ∂p + Π = ν 2u + 2w (D.16) ∂x ∂z − x ∇ ∇  In wavespace, we write the residual for the Dirichlet condition as

∂p ∂p d2 R = + Π ν k2 + (u + w) (D.17) D ∂x ∂z − x − − m dy2   b dLb d RD R c b RD b b | {z } | {z } This quantity is computed once the pressure field is obtained from the PPE. We then

L R transform RD and RD to physical space and compute to pointwise residuals. The

first pointwisec residualc uses a local normalization given as:

L R RD(x, t) RD(x, t) R(x, t) = | L − R | (D.18) 2 max (RD(x, t),RD(x, t))

182 APPENDIX D. CHANNEL FLOW DNS: PRESSURE SOLVER

(a) Local normalization as defined by (D.18) (b) Mean normalization as defined by (D.19)

Figure D.3: PDF of the PPE Dirichlet boundary condition residual with the two normalization types

L while the second uses the planar average of RD such that:

RL (x, t) RR (x, t) R(x, t) = | D − D | (D.19) t RL (x, ) h| D |ixz

The PDFs for the residuals given by (D.18) and (D.19) are shown. From these results it may be concluded that the Dirichlet boundary condition is reasonably satis-

fied. This is a result of correct evalution of the momentum equation by the PoongBack code.

D.4 Conclusion

The implementation of the pressure solver employed in the channel flow DNS dis- cused in Chapter 5 is presented. The pressure is computed by solving the standandard

183 APPENDIX D. CHANNEL FLOW DNS: PRESSURE SOLVER pressure Poisson equation with the Neumann boundary conditions. The implemena- tion of this solver is tested using two approaches. In the first approach numerical results from the solver are compared to analytical solutions of the inhomogenous

Helmholtz equation. These results are found to compare very well between the two solution methods. In the second validation approach, the enforcement of the pressure

Dirchlet boundary condition is tested (a posteriori) while only explicitly applying the

Neumann condition in the pressure field solution. This test indicated low residuals in the error of the pressure Dirchlet boundary condition.

184 Bibliography

[1] J. D. Albertson and M. B. Parlange. Surface length scales and shear stress: Im-

plications for land-atmospheric interaction over complex terrain. Water Resour.

Res., 35(7):2121–2132, July 1999.

[2] B. Amiro. Comparison of turbulence statistics within three boreal forest

canopies. Boundary-Layer Meteorol., 51:99–121, 1990.

[3] W. Anderson and C. Meneveau. Dynamic roughness model for large-eddy sim-

ulation of turbulent flow over multiscale, fractal–like rough surfaces. J. Fluid

Mech., 679:288–314, 2011.

[4] K. Bai, C. Meneveau, and J. Katz. Near-Wake Turbulent Flow Structure and

Mixing Length Downstream of a Fractal Tree. Boundary-Layer Meteorol., 143

(2):285–308, Feb. 2012.

[5] D. D. Baldocchi and T. P. Meyers. Turbulence structure in a deciduous forest.

Boundary-Layer Meteorol., 43(4):345–364, June 1988.

185 BIBLIOGRAPHY

[6] M. Barnsley. Fractals Everywhere. Academic Press Inc., Boston, 1988.

[7] P. Baumann, A. Dehmel, P. Furtado, R. Ritsch, and N. Widmann. The multi-

dimensional database system RasDaMan. In SIGMOD, 1998.

[8] J.-P. Berrut and L. N. Trefethen. Barycentric Lagrange Interpolation. SIAM

Rev., 46(3):501–517, Jan. 2004.

[9] F. P. Bertolotti, T. Herbert, and P. R. Spalart. Linear and nonlinear stability

of the Blasius boundary layer. J. Fluid Mech., 242:441–474, Apr. 1992.

[10] H. M. Blackburn, N. N. Mansour, and B. J. Cantwell. Topology of fine-scale

motions in turbulent channel flow. J. Fluid Mech., 310:269–292, Apr. 1996.

[11] O. Botella. A velocity-pressure Navier-Stokes solver using a B-spline collocation

method. CTR Annu. Res. Briefs, 1999.

[12] E. Bou-Zeid, C. Meneveau, and M. B. Parlange. A scale-dependent Lagrangian

dynamic model for large eddy simulation of complex turbulent flows. Phys.

Fluids, 17(025105), 2005.

[13] P. G. Brown. Overview of sciDB: large scale array storage, processing and

analysis. In SIGMOD, 2010.

[14] A. R. Burk, editor. New Research on Forest Ecosystems. Nova Science Pub-

lishers, 2005.

186 BIBLIOGRAPHY

[15] J. Cardesa, D. Mistry, L. Gan, and J. Dawson. Invariants of the reduced velocity

gradient tensor in turbulent flows. J. Fluid Mech., 716:597–615, Jan. 2013.

[16] M. Cassiani, G. G. Katul, and J. D. Albertson. The effects of canopy leaf area

index on airow across forest edges: large-eddy simulation and analytical results.

Boundary-Layer Meteorol., 126(3):443–460, 2008.

[17] P. Chakraborty, S. Balachandar, and R. J. Adrian. On the relationships between

local vortex identification schemes. J. Fluid Mech., 535:189–214, July 2005.

[18] F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows,

T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A Distributed Storage

System for Structured Data. ACM Trans. Comput. Syst., 26(2):4:1–4:26, June

2008.

[19] S. Chester. On Dynamic Modeling for Multiscale Turbulence Problems. PhD

thesis, Johns Hopkins University, 2006.

[20] S. Chester and C. Meneveau. Renormalized numerical simulation of flow over

planar and non-planar fractal trees. Environ. Fluid Mech., 7(4):289–301, July

2007.

[21] S. Chester, C. Meneveau, and M. B. Parlange. Modeling turbulent flow over

fractal trees with renormalized numerical simulation. J. Comput. Phys., 225

(1):427–448, July 2007.

187 BIBLIOGRAPHY

[22] M. S. Chong, A. E. Perry, and B. J. Cantwell. A general classification of three-

dimensional flow fields. Phys. Fluids A Fluid Dyn., 2(5):765, May 1990.

[23] J. Cui, V. C. Patel, and C.-L. Lin. Prediction of Turbulent Flow Over Rough

Surfaces Using a Force Field in Large Eddy Simulation. J. Fluids Eng., 125(1):

2, 2003.

[24] L. Davidson. Hybrid LES-RANS: Inlet boundary conditions for flows with recir-

culation. In S.-H. Peng and W. Haase, editors, Advances in Hybrid RANS-LES

Modeling, volume 97 of Notes on Numerical Fluid Mechanics and Multidisci-

plinary Design, pages 55–66. Springer Berling Heidelberg, 2008.

[25] C. de Boor. A Practical Guide to Splines. Springer, 2001.

[26] E. de Langre. Effects of Wind on Plants. Annu. Rev. Fluid Mech., 40(1):

141–168, Jan. 2008.

[27] G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman,

A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Ama-

zon’s highly available key-value store. In SOSP, 2007.

[28] J. C. del Alamo´ and J. Jim´enez. Direct numerical simulation of the very large

anisotropic scales in a turbulent channel. Cent. Turbul. Res. Annu. Res. Briefs.

Stanford Univ., pages 329–341, 2001.

188 BIBLIOGRAPHY

[29] J. C. del Alamo´ and J. Jim´enez. Spectra of the very large anisotropic scales in

turbulent channels. Phys. Fluids, 15(6):L41, Apr. 2003.

[30] J. C. del Alamo,´ J. Jim´enez, P. Zandondade, and R. D. Moser. Scaling of the

energy spectra of turbulent channels. J. Fluid Mech., 500:135–144, Jan. 2004.

[31] J. C. del Alamo,´ J. Jim´enez, P. Zandonade, and R. D. Moser. Self-similar vortex

clusters in the turbulent logarithmic region. J. Fluid Mech., 561:329, Aug. 2006.

[32] P. Druault, J.-P. Bonnet, F. Coiffet, J. Delville, E. Lamballais, and S. Lardeau.

Generation of three-dimensional turbulent inlet conditions for large-eddy sim-

ulation. AIAA, 42(3):447–456, 2994.

[33] G. Eyink, E. Vishniac, C. Lalescu, H. Aluie, K. Kanov, K. B¨urger, R. Burns,

C. Meneveau, and A. Szalay. Flux-freezing breakdown in high-conductivity

magnetohydrodynamic turbulence. Nature, 497:466–469, May 2013.

[34] K. Falconer. Fractal Geometry: Mathematical Foundations and Applications.

John Wiley & Sons, 1997.

[35] A. Ferrante and S. E. Elghobashi. A robust method for generating inflow con-

ditions for direct simulations of spatially-developing turbulent boundary layers.

J. Comput. Phys., 198(1):372–387, July 2004.

[36] J. J. Finnigan. Turbulence in waving wheat. Boundary-Layer Meteorol., 16(2):

181–211, June 1979.

189 BIBLIOGRAPHY

[37] J. J. Finnigan, R. H. Shaw, and E. G. Patton. Turbulence structure above a

vegetation canopy. J. Fluid Mech., 637:387–424, Oct. 2009.

[38] Q. Gao, C. Ortiz-Due˜nas, and E. K. Longmire. Analysis of vortex populations

in turbulent wall-bounded flows. J. Fluid Mech., 678:87–123, Apr. 2011.

[39] M. Germano. Turbulence: the filtering approach. J. Fluid Mech., 238:325–336,

Apr. 1992.

[40] J. H. Gerrard. An experimental investigation of the oscillating lift and drag of

a circular cylinder shedding turbulent vortices. J. Fluid Mech., 11(02):244–256,

Mar. 1961.

[41] E. Givelberg, A. Szalay, K. Kanov, and R. Burns. An architecture for a data-

intensive computer. In Proc. first Int. Work. Network-aware data Manag., NDM

’11, pages 57–64, New York, NY, USA, 2011. ACM.

[42] J. Graham and C. Meneveau. Modeling turbulent flow over fractal trees us-

ing renormalized numerical simulation: Alternate formulations and numerical

experiments. Phys. Fluids, 24(125105), 2012.

[43] J. Graham, K. Bai, C. Meneveau, and J. Katz. LES modeling and experimen-

tal measurement of boundary layer flow over multi-scale, fractal canopies. In

H. Kuerten, B. Geurts, V. Armenio, and J. Fr¨ohlich, editors, Direct Large–Eddy

Simul. VIII, volume 15 of ERCOFTAF Series, pages 233–238, 2011.

190 BIBLIOGRAPHY

[44] J. Graham, E. Givelberg, and K. Kanov. Run-time creation of the turbulent

channel flow database by an hpc simulation using mpi-db. In Proceedings of the

20th European MPI Users’ Group Meeting, EuroMPI ’13, 2013.

[45] J. Graham, K. Kanov, E. Givelberg, R. Burns, G. Eyink, A. Szalay, C. Men-

eveau, M. K. Lee, N. Malaya, and R. D. Moser. A Web-Services accessible

database for channel flow turbulence at Reτ = 1000. In APS DFD, 2013.

[46] Y. Gu and R. L. Grossman. UDT: UDP-based Data Transfer for High-Speed

Wide Area Networks. In Comput. Networks, volume 51. Elsevier, May 2007.

[47] A. G. Gungor and S. Menon. A new two-scale model for large eddy simulation

of wall-bounded flows. Prog. Aerosp. Sci., 46:28–45, 2010.

[48] M. Holzner, M. Guala, B. L¨uthi, A. Liberzon, N. Nikitin, W. Kinzelbach, and

A. Tsinober. Viscous tilting and production of vorticity in homogeneous tur-

bulence. Phys. Fluids, 22(061701), 2010.

[49] S. Hoyas and J. Jim´enez. Scaling of the velocity fluctuations in turbulent chan-

nels up to Reτ = 2003. Phys. Fluids, 18(011702), 2006.

[50] J. C. R. Hunt, A. A. Wray, and P. Moin. Eddies, streams, and convergence

zones in turbulent flows. Stud. Turbul. Using Numer. Simul. Databases, 2,

pages 193–208, Dec. 1988.

191 BIBLIOGRAPHY

[51] D. Hurst and J. C. Vassilicos. Scalings and decay of fractal-generated turbu-

lence. Phys. Fluids, 19(3):035103, 2007.

[52] ICES University of Texas. Turbulent Channel Flow

DNS Fields Available for Analysis, 2013. URL

http://turbulence.ices.utexas.edu/content/Channeldata.html.

[53] J. Jeong and F. Hussain. On the identification of a vortex. J. Fluid Mech., 285:

69–94, Apr. 1995.

[54] J. Jeong, F. Hussain, W. Schoppa, and J. Kim. Coherent structures near the

wall in a turbulent channel flow. J. Fluid Mech., 332:185–214, Feb. 1997.

[55] JHTDB. Johns Hopkins Turbulence Databases (JHTDB), 2013. URL

http://turbulence.pha.jhu.edu.

[56] P. S. Johansson and H. I. Andersson. Generation of inflow data for inhomoge-

neous turbulence. Theor. Comput. Fluid Dyn., 18(5):371–389, Oct. 2004.

[57] R. W. Johnson. Higher order B-spline collocation at the Greville abscissae.

Appl. Numer. Math., 52(1):63–75, Jan. 2005.

[58] G. Katul, L. Mahrt, D. Poggi, and C. Sanz. One-and two-equation models for

canopy turbulence. Boundary-Layer Meteorol., 113:81–109, 2004.

[59] H. Kawamura, H. Abe, and K. Shingai. DNS of turbulence and heat transport

in a channel flow with different Reynolds and Prandtl numbers and bound-

192 BIBLIOGRAPHY

ary conditions. In Turbul. Heat Mass Transf. 3 (Proceedings Third Intl Symp.

Turbul. Heat Mass Transf., pages 15–32, 2000.

[60] J. Kim. On the structure of pressure fluctuations in simulated turbulent channel

flow. J. Fluid Mech., 205:421–451, Apr. 1989.

[61] J. Kim and R. A. Antonia. Isotropy of the small scales of turbulence at low

Reynolds number. J. Fluid Mech., 251:219–238, Apr. 1993.

[62] J. Kim, P. Moin, and R. Moser. Turbulence statistics in fully developed channel

flow at low Reynolds number. J. Fluid Mech., 177:133–166, Apr. 1987.

[63] J. Klewicki, P. Fife, and T. Wei. On the logarithmic mean profile. J. Fluid

Mech., 638:73, Sept. 2009.

[64] S. Laizet and J. C. Vassilicos. Multiscale generation of turbulence. J. Multiscale

Model., 1(1):177–196, 2009.

[65] P. J. Lamont and B. L. Hunt. Pressure and force distributions on a sharp-nosed

circular cylinder at large angles of inclination to a uniform subsonic stream. J.

Fluid Mech., 76(03):519–559, Apr. 1976.

[66] M. Lee, N. Malaya, and R. D. Moser. Petascale direct numerical simulation of

turbulent channel flow on up to 786K cores. In Proc. Int. Conf. High Perform.

Comput. Networking, Storage Anal. - SC ’13, pages 1–11, New York, New York,

USA, 2013. ACM Press.

193 BIBLIOGRAPHY

[67] S. K. Lele. Compact finite difference schemes with spectral-like resolution. J.

Comput. Phys., 103(1):16–42, Nov. 1992.

[68] Y. Li, E. Perlman, M. Wan, Y. Yang, C. Meneveau, R. Burns, S. Chen, A. Sza-

lay, and G. Eyink. A public turbulence database cluster and applications to

study Lagrangian evolution of velocity increments in turbulence. J. Turbul., 9

(31), 2008.

[69] Z. Li, J. D. Lin, and D. R. Miller. Air over and through a forest edge: a steady-

state numerical simulation. Boundary-Layer Meteorol., 51:179–197, 1990.

[70] L. Libkin, R. Machlin, and L. Wong. A query language for multidimensional

arrays: design, implementation, and optimization techniques. In SIGMOD,

1996.

[71] D. K. Lilly. A proposed modification of the Germano sugrid-scale closure

method. Phys. Fluids A, 4(3), 1992.

[72] T. S. Lund, X. Wu, and K. D. Squires. Generation of Turbulent Inflow Data

for Spatially-Developing Boundary Layer Simulations. J. Comput. Phys., 140:

233–258, 1998.

[73] B. L¨uthi, M. Holzner, and A. Tsinober. Expanding the Q R space to three

dimensions. J. Fluid Mech., 641:497–507, Dec. 2009.

194 BIBLIOGRAPHY

[74] B. B. Mandelbrot. The fractal geometry of nature. W. H. Freeman and Com-

pany, San Francisco, 1982.

[75] A. P. Marathe and K. Salem. Query processing techniques for arrays. VLDB

J., 11(1):68–91, 2002.

[76] E. Meinders and K. Hanjali´c. Vortex structure and heat transfer in turbulent

flow over a wall-mounted matrix of cubes. Int. J. Heat Fluid Flow, 20(3):255–

267, June 1999.

[77] C. Meneveau. Germano identity-based subgrid-scale modeling: A brief survey

of variations on a fertile theme. Phys. Fluids, 24(121301), 2012.

[78] C. Meneveau and J. Katz. Scale-invariance and turbulence models for large-

eddy simulation. Annu. Rev. Fluid Mech., 32:1–32, 2000.

[79] C. Meneveau, T. S. Lund, and W. H. Cabot. A Lagrangian dynamic subgrid-

scale model of turbulence. J. Fluid Mech., 319:353–385, Apr. 1996.

[80] Message Passing Interface Forum. MPI: A Message-Passing Interface Standard

(Version 2.2). Technical report, University of Tennessee, Knoxville, Tennessee,

2009.

[81] P. Moin and K. Mahesh. Direct Numerical Simulation: A Tool in Turbulence

Research. Annu. Rev. Fluid Mech., 30(1):539–578, Jan. 1998.

195 BIBLIOGRAPHY

[82] J. P. Monty and M. S. Chong. Turbulent channel flow: comparison of streamwise

velocity data from experiments and direct numerical simulation. J. Fluid Mech.,

633:461, Aug. 2009.

[83] D. R. Morse, J. H. Lawton, and M. M. Dodson. Fractal dimension of vegetation

and the distribution of arthropod body lengths. Nature, 314(25), Apr. 1985.

[84] R. D. Moser, J. Kim, and N. N. Mansour. Direct Numerical Simulation of

turbulent channel flow up to Reτ = 590. Phys. Fluids, 11(4):943, 1999.

[85] T. Nakayama and K. Yakubo. Fractal Concepts in Condensed Matter Physics.

Springer, 2010.

[86] S. A. Orszag. On the Elimination of Aliasing in Finite-Difference Schemes

by Filtering High-Wavenumber Components. J. Atmos. Sci., 28(6):1074–1074,

Sept. 1971.

[87] S. A. Orszag and J. Patterson, G. S. Numerical simulation of turbulence. In

M. Rosenblatt and C. Atta, editors, Statistical Models and Turbulence, pages

127–147, 1972.

[88] R. L. Panton. Scaling and correlation of vorticity fluctuations in turbulent

channels. Phys. Fluids, 21(11):115104, 2009.

[89] E. Perlman, R. Burns, Y. Li, and C. Meneveau. Data exploration of turbu-

196 BIBLIOGRAPHY

lence simulations using a database cluster. In Proc. 2007 ACM/IEEE Conf.

Supercomput., SC ’07, pages 23:1—-23:11, New York, NY, USA, 2007. ACM.

[90] R. Poletto, T. Craft, and A. Revell. A New Divergence Free Synthetic Eddy

Method for the Reproduction of Inlet Flow Conditions for LES. Flow, Turbul.

Combust., 91(3):519–539, July 2013.

[91] S. Pope. Turbulent Flows. Cambridge University Press, 1st edition, 2000.

[92] F. Port´e-Agel, C. Meneveau, and M. B. Parlange. A scale-dependent dynamic

model for large-eddy simulation: application to a neutral atmospheric boundary

layer. J. Fluid Mech, 415:261–284, July 2000.

[93] M. R. Raupach and A. S. Thom. Turbulence in and above Plant Canopies.

Annu. Rev. Fluid Mech., 13(1):97–129, Jan. 1981.

[94] M. R. Raupach, J. J. Finnigan, and Y. Brunei. Coherent eddies and turbulence

in vegetation canopies: The mixing-layer analogy. Boundary-Layer Meteorol.,

78(3-4):351–382, Mar. 1996.

[95] R. Rew and G. Davis. NetCDF: an interface for scientific data access. Comput.

Graph. Appl. IEEE, 10(4):76–82, 1990.

[96] P. Schlatter, N. A. Adams, and L. Kleiser. A windowing method for periodic

inflow/outflow boundary treatment of non-periodic flows. J. Comput. Phys.,

206(2):505–535, 2005.

197 BIBLIOGRAPHY

[97] R. H. Shaw and U. Schumann. Large-eddy simulation of turbulent flow above

and within a forest. Boundary-Layer Meteorol., 61(1-2):47–64, Oct. 1992.

[98] R. H. R. Shaw. Secondary wind speed maxima inside plant canopies. J. Appl.

Meteorol., 16(5):514–521, May 1977.

[99] A. Staicu, B. Mazzi, J. Vassilicos, and W. van de Water. Turbulent wakes of

fractal objects. Phys. Rev. E, 67(6):066306, June 2003.

[100] R. J. Stevens, J. Graham, and C. Meneveau. A concurrent precursor inflow

method for Large Eddy Simulations and applications to finite length wind farms.

Renew. Energy, 2014. (accepted).

[101] A. S. Szalay, P. Z. Kunszt, A. Thakar, J. Gray, D. Slutz, and R. J. Brunner.

Designing and mining multi-terabyte astronomy archives: the Sloan Digital Sky

Survey. ACM SIGMOD Rec., 29(2):451–462, June 2000.

[102] Texas Advance Computing Center (TACC). Stampede, 2013. URL

https://www.tacc.utexas.edu/stampede/.

[103] The HDF Group. Hierarchical data format version 5, 2010.

[104] Y.-H. Tseng, C. Meneveau, and M. B. Parlange. Modeling Flow around Bluff

Bodies and Predicting Urban Dispersion Using Large Eddy Simulation. Envi-

ron. Sci. Technol., 40(8):2653–2662, Apr. 2006.

198 BIBLIOGRAPHY

[105] UPM Fluid Dynamics Group. DNS Turbulent Channel Data, 2013. URL

http://torroja.dmt.upm.es/turbdata/channels/.

[106] J. D. Wilson. A second-order closure model for flow through vegetation.

Boundary-Layer Meteorol., 42(4):371–392, Mar. 1988.

[107] N. R. Wilson and R. H. Shaw. A higher order closure model for canopy flow.

J. Appl. Meteorol., 16(11):1197–1205, Nov. 1977.

[108] C. C. Wu and T. Chang. Rank-Ordered Multifractal Analysis (ROMA) of

probability distributions in uid turbulence. Nonlin. Process. Geophys., 18(2):

261–268, Apr. 2011.

[109] P. K. Yeung and K. R. Sreenivasan. Progress and opportunities in direct nu-

merical simulations at the next higher resolution. In APS DFD, 2013.

[110] C. Yi. Momentum Transfer within Canopies. J. Appl. Meteorol. Climatol., 47

(1):262–275, Jan. 2008.

[111] M. Yokokawa, A. Uno, T. Ishihara, and Y. Kaneda. 16.4-Tflops direct numerical

simulation of turbulence by Fourier spectral method on the Earth Simulator.

In In Proc. SC2002, CD-ROM, 2002.

[112] J. Zhou, R. J. Adrian, S. Balachandar, and T. M. Kendall. Mechanisms for

generating coherent packets of hairpin vortices in channel flow. J. Fluid Mech.,

387:353–396, May 1999.

199 Vita

Jason Graham was born in Hammond, LA in 1983. For his undergraduate studies, he attended the Louisiana State University and graduated with a Bachelor of Science degree in Mechanical Engineering in 2006. He began attending Johns Hopkins Uni- versity in 2009 and received a Master of Science in Engineering degree in 2011. In

2012, he began working at the Johns Hopkins University Applied Physics Laboratory in Laurel, MD.

200

Rightslink Printable License https://s100.copyright.com/App/PrintableLicenseFrame.j...

ASSOCIATION FOR COMPUTING MACHINERY, INC. LICENSE TERMS AND CONDITIONS Jan 11, 2014

This is a License Agreement between Jason Graham ("You") and Association for Computing Machinery, Inc. ("Association for Computing Machinery, Inc.") provided by Copyright Clearance Center ("CCC"). The license consists of your order details, the terms and conditions provided by Association for Computing Machinery, Inc., and the payment terms and conditions.

All payments must be made in full to CCC. For payment instructions, please see information listed at the bottom of this form.

License Number 3306240231522 License date Jan 11, 2014 Licensed content publisher Association for Computing Machinery, Inc. Licensed content publication Proceedings of the 20th European MPI Users' Group Meeting Licensed content title Run-time creation of the turbulent channel flow database by an HPC simulation using MPI-DB Licensed content author Jason Graham, et al Licensed content date Sep 15, 2013 Type of Use Thesis/Dissertation Requestor type Author of this ACM article Is reuse in the author's own No new work? Format Electronic Portion Full article Will you be translating? No Order reference number Title of your Turbulence simulations: multiscale modeling and data-intensive- thesis/dissertation computing methodologies Expected completion date Jan 2014 Estimated size (pages) 200 Billing Type Invoice Billing address 3609 Ednor Rd BALTIMORE, MD 21218 United States Total 8.00 USD Terms and Conditions

Rightslink Terms and Conditions for ACM Material

1. The publisher of this copyrighted material is Association for

1 of 4 01/11/2014 10:09 PM Rightslink Printable License https://s100.copyright.com/App/PrintableLicenseFrame.j...

Computing Machinery, Inc. (ACM). By clicking "accept" in connection with completing this licensing transaction, you agree that the following terms and conditions apply to this transaction (along with the Billing and Payment terms and conditions established by Copyright Clearance Center, Inc. ("CCC"), at the time that you opened your Rightslink account and that are available at any time at ).

2. ACM reserves all rights not specifically granted in the combination of (i) the license details provided by you and accepted in the course of this licensing transaction, (ii) these terms and conditions and (iii) CCC's Billing and Payment terms and conditions.

3. ACM hereby grants to licensee a non-exclusive license to use or republish this ACM-copyrighted material* in secondary works (especially for commercial distribution) with the stipulation that consent of the lead author has been obtained independently. Unless otherwise stipulated in a license, grants are for one-time use in a single edition of the work, only with a maximum distribution equal to the number that you identified in the licensing process. Any additional form of republication must be specified according to the terms included at the time of licensing.

*Please note that ACM cannot grant republication or distribution licenses for embedded third-party material. You must confirm the ownership of figures, drawings and artwork prior to use.

4. Any form of republication or redistribution must be used within 180 days from the date stated on the license and any electronic posting is limited to a period of six months unless an extended term is selected during the licensing process. Separate subsidiary and subsequent republication licenses must be purchased to redistribute copyrighted material on an extranet. These licenses may be exercised anywhere in the world.

5. Licensee may not alter or modify the material in any manner (except that you may use, within the scope of the license granted, one or more excerpts from the copyrighted material, provided that the process of excerpting does not alter the meaning of the material or in any way reflect negatively on the publisher or any writer of the material).

6. Licensee must include the following copyright and permission notice in connection with any reproduction of the licensed material: "[Citation] © YEAR Association for Computing Machinery, Inc. Reprinted by permission." Include the article DOI as a link to the definitive version in the ACM Digital Library. Example: Charles, L. "How to Improve Digital Rights Management," Communications of the ACM, Vol. 51:12, © 2008 ACM, Inc. http://doi.acm.org/10.1145/nnnnnn.nnnnnn (where nnnnnn.nnnnnn is replaced by the actual number).

7. Translation of the material in any language requires an explicit license identified during the licensing process. Due to the error-prone nature of language translations, Licensee must include the following

2 of 4 01/11/2014 10:09 PM Rightslink Printable License https://s100.copyright.com/App/PrintableLicenseFrame.j...

copyright and permission notice and disclaimer in connection with any reproduction of the licensed material in translation: "This translation is a derivative of ACM-copyrighted material. ACM did not prepare this translation and does not guarantee that it is an accurate copy of the originally published work. The original intellectual property contained in this work remains the property of ACM."

8. You may exercise the rights licensed immediately upon issuance of the license at the end of the licensing transaction, provided that you have disclosed complete and accurate details of your proposed use. No license is finally effective unless and until full payment is received from you (either by CCC or ACM) as provided in CCC's Billing and Payment terms and conditions.

9. If full payment is not received within 90 days from the grant of license transaction, then any license preliminarily granted shall be deemed automatically revoked and shall be void as if never granted. Further, in the event that you breach any of these terms and conditions or any of CCC's Billing and Payment terms and conditions, the license is automatically revoked and shall be void as if never granted.

10. Use of materials as described in a revoked license, as well as any use of the materials beyond the scope of an unrevoked license, may constitute copyright infringement and publisher reserves the right to take any and all action to protect its copyright in the materials.

11. ACM makes no representations or warranties with respect to the licensed material and adopts on its own behalf the limitations and disclaimers established by CCC on its behalf in its Billing and Payment terms and conditions for this licensing transaction.

12. You hereby indemnify and agree to hold harmless ACM and CCC, and their respective officers, directors, employees and agents, from and against any and all claims arising out of your use of the licensed material other than as specifically authorized pursuant to this license.

13. This license is personal to the requestor and may not be sublicensed, assigned, or transferred by you to any other person without publisher's written permission.

14. This license may not be amended except in a writing signed by both parties (or, in the case of ACM, by CCC on its behalf).

15. ACM hereby objects to any terms contained in any purchase order, acknowledgment, check endorsement or other writing prepared by you, which terms are inconsistent with these terms and conditions or CCC's Billing and Payment terms and conditions. These terms and conditions, together with CCC's Billing and Payment terms and conditions (which are incorporated herein), comprise the entire agreement between you and ACM (and CCC) concerning this licensing transaction. In the event of any conflict between your obligations established by these terms and conditions and those established by CCC's Billing and Payment terms

3 of 4 01/11/2014 10:09 PM Rightslink Printable License https://s100.copyright.com/App/PrintableLicenseFrame.j...

and conditions, these terms and conditions shall control.

16. This license transaction shall be governed by and construed in accordance with the laws of New York State. You hereby agree to submit to the jurisdiction of the federal and state courts located in New York for purposes of resolving any disputes that may arise in connection with this licensing transaction.

17. There are additional terms and conditions, established by Copyright Clearance Center, Inc. ("CCC") as the administrator of this licensing service that relate to billing and payment for licenses provided through this service. Those terms and conditions apply to each transaction as if they were restated here. As a user of this service, you agreed to those terms and conditions at the time that you established your account, and you may see them again at any time at http://myaccount.copyright.com

18. Thesis/Dissertation: This type of use requires only the minimum administrative fee. It is not a fee for permission. Further reuse of ACM content, by ProQuest/UMI or other document delivery providers, or in republication requires a separate permission license and fee. Commercial resellers of your dissertation containing this article must acquire a separate license.

Special Terms:

If you would like to pay for this license now, please remit this license along with your payment made payable to "COPYRIGHT CLEARANCE CENTER" otherwise you will be invoiced within 48 hours of the license date. Payment should be in the form of a check or money order referencing your account number and this invoice number RLNK501199323. Once you receive your invoice for this order, you may pay your invoice by credit card. Please follow instructions provided at that time.

Make Payment To: Copyright Clearance Center Dept 001 P.O. Box 843006 Boston, MA 02284-3006

For suggestions or comments regarding this order, contact RightsLink Customer Support: [email protected] or +1-877-622-5543 (toll free in the US) or +1-978-646-2777.

Gratis licenses (referencing $0 in the Total field) are free. Please retain this printable license for your reference. No payment is required.

4 of 4 01/11/2014 10:09 PM