ENHANCING FLUID MODELING WITH TURBULENCE AND ACCELERATION

A dissertation submitted to Kent State University in partial fulfillment of the requirements for the degree of Doctor of Philosophy

by

Fan Chen

May 2015 Dissertation written by

Fan Chen

B.S., Huazhong University of Science and Technology, 2005

M.S., Huazhong University of Science and Technology, 2007

M.S., Kent State University, 2009

Ph.D., Kent State University, 2015

Approved by

Dr. Ye Zhao , Chair, Doctoral Dissertation Committee

Dr. Feodor Dragan , Members, Doctoral Dissertation Committee

Dr. Arden Ruttan

Dr. Jing Li

Dr.Mina Katramatou

Accepted by

Dr. Javed Khan , Chair, Department of Computer Science

Dr.James L. Blank , Dean, College of Arts and Sciences ii TABLE OF CONTENTS

LISTOFFIGURES...... vii

LISTOFTABLES ...... x

1 Introduction ...... 1

1.1 Significance,ChallengeandObjectives...... 1

1.2 MethodologyandContribution ...... 3

1.3 Publications...... 5

2 Background ...... 7

2.1 FluidSimulation ...... 7

2.2 DistanceField...... 8

2.3 FluidTurbulence ...... 10

2.4 GPUaccelerationinFluidModeling ...... 13

3 DistanceField...... 16

3.1 Introduction...... 16

3.2 DistanceFieldTransform...... 19

3.2.1 Definition...... 19

3.2.2 Vector-BasedDistanceTransform ...... 20

3.2.3 OurComputationalScheme ...... 21

3.3 ActiveBandScheme ...... 22

iii 3.3.1 PropagationProcedure ...... 22

3.3.2 LifespanofaPoint ...... 23

3.3.3 GridStructuresandLifespanCoefficient ...... 25

3.4 ComputationalProcedure ...... 27

3.5 Multiple-SegmentDistanceTransform ...... 28

3.6 ResultsandDiscussion ...... 30

4 Adaptive and Controllable Turbulence Enhancement ...... 35

4.1 Introduction...... 35

4.2 RandomForcing ...... 40

4.3 TurbulenceSynthesis ...... 41

4.3.1 FrequencyDomainGeneration...... 42

4.3.2 EnergySpectrumControl ...... 44

4.3.3 Computation ...... 45

4.4 TurbulenceIntegration ...... 46

4.5 ConditionalandIntermittentTurbulence ...... 49

4.6 Experiments...... 53

5 LangevinParticlesinFlowSimulation ...... 57

5.1 Introduction...... 57

5.2 LangevinModel...... 60

5.2.1 ParticleMotion:ARandomProcess ...... 60

5.2.2 GeneralizedLangevinModel...... 61

5.2.3 FlowTurbulence ...... 62 iv 5.2.4 ComputationalScheme...... 63

5.3 LangevinParticlesinFlowSimulation ...... 64

5.3.1 LangevinForce...... 65

5.3.2 ParticleEvolution...... 66

5.3.3 TurbulenceControl...... 68

5.3.4 SimulationProcedure...... 68

5.4 Results...... 68

5.5 Discussion...... 75

6 UsingGPUinFluidModeling ...... 78

6.1 GPUcomputationwithCUDA ...... 78

6.2 LBMSimulation ...... 81

6.2.1 Introduction...... 81

6.2.2 ImplementationandResults ...... 83

6.3 FTLE ...... 85

6.3.1 Introduction...... 85

6.3.2 ImplementationandResults ...... 87

6.4 FluidCompression/Decompression...... 89

6.4.1 Compression ...... 89

6.4.2 Decompression ...... 90

6.4.3 Result...... 96

7 Conclusion...... 98

v BIBLIOGRAPHY...... 101

vi LIST OF FIGURES

1 Floating-PointOperationsperSecond.[1] ...... 14

2 Vectorbaseddistancepropagation...... 20

3 Distancetransformonarectangulargrid...... 23

4 Distancetransformonatrianglegrid...... 26

5 The12neighborsofanFCClatticesiteformsacuboctahedron (Image courtesy

ofDr.FengQiu)...... 26

6 DistancetransformonanFCCgrid...... 27

7 Computational time for distance transform on different distanceranges. . . . . 32

8 Performanceofusingdifferentsegmentsize...... 33

9 Distance field of the two points is rendered as isosurfaces with different dis-

tancevalues...... 33

10 Distance field of the Armadillo is rendered as isosurfaces with different dis-

tancevalues...... 34

11 Distance field of the bunny is rendered as isosurfaces with different distance

values...... 34

12 Random vector fields generated for a preferred scale with different deviations. . 42

13 Divergence-free vector fields with two scales. Top: Spectrum; Bottom: Vector

field. 1 = √2, 2 =8 and σ1 = σ2 =0.7...... 43

14 Data flow of FNS() computation...... 47

vii 15 Snapshots of turbulence enhancement simulations: (a) Original coarse simu-

lation; (b) Wavelet subgrid turbulence; (c) Our subgrid turbulence; (d) Add

vorticity confinement to (a); (e) Wavelet turbulence to (d); (f) Our turbulence

to (d) with q =0.8; (g) Our turbulence to (d) with q =0.2; (h) Our turbulence

to (d) with q =0.1...... 50

16 Snapshots of integrating turbulence to a laminar smoke...... 52

17 Snapshots of turbulence enhancement conditioned by the distance to obstacles:

(a) a laminar smoke simulation; (b) direct turbulence integration to (a) at the

same resolution; (c) Finer turbulent behavior achieved by executing simulation

on a coarser grid than (a), while coupling turbulence to the interpolated flow. . . 55

18 Snapshots of turbulence enhancement with SPH ((b) and (d)), in comparison

withoriginalsimulation((a)and(c))...... 56

19 Compute Langevin force F(t). u(t) is the particle velocity, u(t) is the mean flow velocity, and u(t +1) is the particle’s target velocity at the following time

step t +1 computed by Eqn. 33...... 65

20 Snapshots of integrating turbulence to a rising smoke flow with two different

turbulence levels controlled by the characteristic length scale lm. (a) Original

flow; (b) Vorticity confinement; (c) Random forcing; (d) lm =0.001; (e) lm =

0.003...... 69

21 Snapshots of integrating turbulence to a smoke simulation with diminishing

wind. Left: before wind stops; Right: after wind stops...... 71

22 Snapshots of turbulence enhancement of a smoke past obstacles with two dif-

ferentturbulencelevels...... 73 viii 23 Snapshots of turbulence enhancement of a flow over a table top with two dif-

ferentturbulencelevels...... 74

24 The GPU Devotes More Transistors to Data Processing [1]...... 79

25 GPUprogrammingmodel[1]...... 80

26 2Dand3DLBMlattice[2]...... 81

27 Flow pattern with FTLE and LCS. (a) Red: upward velocity. Green: downward

velocity; (b)(c) Red: high FTLE value; Blue: low FTLE value...... 87

28 Smokeanimationframeworkoverview...... 89

29 Bidirectional advection for P-Frames estimation from two consecutive K-Frames.

Red and purple arrow lines represent forward advection and backward advec-

tion,respectively...... 93

30 streamcompaction[3]...... 94

31 ScanAlgorithm[3]...... 95

32 ScanandScatter[3]...... 95

ix LIST OF TABLES

1 Performance of our distance transform method on multiple data sets. For

each model in different grid size, we compare the computational time between

seg multiple-segment method with a segment size lmax = 20 (see Section 3.5), and

the method with no segment. Computational time is measured inseconds. . . . 31

2 Performancereport...... 75

3 SRTLBMtimecomparingbetweenCPUandGPU ...... 84

4 MRTLBMtimecomparingbetweenCPUandGPU ...... 84

5 FTLEcomputationPerformance ...... 88

6 Fluiddecompression ...... 96

x CHAPTER 1

Introduction

1.1 Significance, Challenge and Objectives

The applications of modeling fluid phenomena such as water, smoke, gas and fire, are widely used in computer graphics, physics, etc. A well-developed research subject called Computa- tional Fluid Dynamics(CFD) proposes many advanced numerical methods to simulate the fluid.

The fundamental of CFD is Navier-Stokes equations [4] and how to efficiently and correctly solve these finite differential equations has been broadly researched and studied. In computer graphics, the fluid simulation has high expectation on reality, and the result is visually correct and not necessary to be physically correct. As the same time, the speed is also very significant in graphics applications.

Therefore, how to efficiently simulate turbulent and realistic fluid has become an important objective for graphics researchers. Many researchers have endeavored to solve the Navier-

Stokes equations in most two numerical methods:stable fluid solver [5] and Lattice Boltzmann

Method(LBM) [6]. Stable fluid solver employed the semi-Lagrangian advection to guarantee the result unconditionally stable. This is an implicit finite-difference method to solves the NS equations. Although this method can achieve real time speed on low resolution, the accelera- tion for high resolution simulation is still needed. Because of the resource and time limitation, it is not practical to simulate the fluid at very high resolution, especially in the real-time ap- plications. On the other hand, low resolution simulation satisfies the requirement of speed but

1 cannot provide abundant details and turbulence due to the numerical dissipation. Many re- searchers have endeavored to introduce turbulence for enhancing fluid animations based on the stable solver. Synthetic and energy injection have been used to enhance the details and turbulence.

From another aspect, the simulation can be accelerated by exploiting the programmability of graphics processing unit (GPU). Many researchers study how to utilize the GPU to parallelize the fluids modeling, then we can still efficiently stimulatethe fluids with relative high resolution but achieve abundant details. Lattice Boltzmann Method (LBM) [6] solved the Navier-Stokes equations explicitly on the lattice which can be intuitively accelerated by exploiting the pro- grammability of GPU.

Due to arrival of GPU, many researchers study how to utilize the GPU to accelerate the fluid modeling. The finite-time Lyapunov exponent (FTLE) is exploited in the fluid animation for controlling the fluid behavior, which contains very intense computation but has the parallel nature for GPU acceleration. Another application is the fluid compression/decompression.

Since there is a high expect on the decompression speed, resorting to GPU for acceleration is necessary to achieve better performance.

Another important work for fluid simulation is boundary handling which can highly influence the reality of the fluid. For obstacles inside the fluid, people usually compute the distance

fields of them which can implicitly represent the geometric shapes. The previous methods of computing the distance field cannot achieve fast computation especially for moving obstacles.

2 1.2 Methodology and Contribution

Although flow simulation in computer graphics has achieved astounding appearance of various natural phenomena, graphical fluid solvers are continuously improved confronting the chal- lenge from realistic simulation, energy dissipation and limited computational resources. To enhance and accelerate the modeling of fluid, we propose new methods to achieve better visual results and performance as follows:

The obstacle boundaries inside the fluids are usually represented by distance field. Com- • plicated fluid most of the time is affected by the obstacles inside the fluid. The previous

methods of computing the distance field cannot achieve fast computation especially for

moving obstacles. We propose a novel distance field transform method based on an it-

erative method adaptively performed on an evolving active band. Our method utilizes

a narrow band to store active grid points being computed. Unlike the conventional fast

marching method, we do not maintain a priority queue, and instead, perform iterative

computing inside the band. This new algorithm alleviates the programming complex-

ity and the data-structure (e.g. a heap) maintenance overhead, and leads to a parallel

amenable computational process. During the active band propagating from a starting

boundary layer, each grid point stays in the band for a lifespan time, which is determined

by analyzing the particular geometric property of the grid structure. In this way, we

find the Face-Centered Cubic (FCC) grid is a good 3D structure for distance transform.

We further develop a multiple-segment method for the band propagation, achieving the

computational complexity of O(m N) with a segment-related constant m.

We propose a new scheme for enhancing fluid animation with controllable turbulence. • 3 An existing fluid simulation from ordinary fluid solvers is fluctuated by turbulent vari-

ation modeled as a random process of forcing. The variation is pre-computed as a se-

quence of solenoidal noise vector fields directly in the spectral domain, which is fast and

easy to implement. The spectral generation enables flexible vortex scale and spectrum

control following a user prescribed energy spectrum, e.g. Kolmogorov’s cascade theory,

so that the fields provide fluctuations in subgrid scales and/or in preferred large octaves.

The vector fields are employed as turbulence forces to agitate the existing flow, where

they act as a stimulus of turbulence inside the framework of the Navier-Stokes equations,

leading to natural integration and temporal consistency. The scheme also facilitates adap-

tive turbulent enhancement steered by various physical or user-defined properties, such

as strain rate, vorticity, distance to objects and scalar density, in critical local regions.

Furthermore, an important feature of turbulent fluid, intermittency, is created by apply-

ing turbulence control during randomly selected temporal periods.

We develop a new Lagrangian primitive, named Langevin particle, to incorporate tur- • bulent flow details in fluid simulation. A group of the particles are distributed inside

the simulation domain based on a turbulence energy model with turbulence viscosity. A

particle in particular moves obeying the generalized Langevin equation, a well known

stochastic differential equation that describes the particle’s motion as a random Markov

process. The resultant particle trajectory shows self-adapted fluctuation in accordance

to the turbulence energy, while following the global flow dynamics. We then feed back

Langevin forces to the simulation based on the stochastic trajectory, which drive the

4 flow with necessary turbulence. The new hybrid flow simulation method features nonre-

stricted particle evolution requiring minimal extra manipulation after initiation. The flow

turbulence is easily controlled and the total computational overhead of enhancement is

minimal based on typical fluid solvers.

To accelerate the fluid modeling, we resort for the computation ability of graphics pro- • cessing unit (GPU). Lattice Boltzmann Method (LBM) [6] simulates the fluid by solving

the Navier-Stokes equations in explicit numerical scheme. This solver only requires the

local information from neighbor which is naturally suitable for GPU acceleration. The

finite-time Lyapunov exponent (FTLE) used for extracting the structure features from

the fluid seeds particles on each grid point and the divergence of these particles measure

the of the fluid. Those particles increased the computation intensity and im-

plant this algorithm on GPU largely improve the speed performance. At last, the fluid

decompression has more speed requirement than compression for user application. The

decompression scheme suggested in this dissertation includes the frequency transform,

velocity reconstruction, advection, particle generation. Each method in this system is

parallelized into new scheme to accommodate the programmability of GPU.

1.3 Publications

1. Fan Chen, Ye Zhao. Distance Field Transform with an Adaptive Iteration Method, IEEE

International Conference on Shape Modeling and Applications (SMI), pages 111-118,

Beijing, China, June, 2009

2. Fan Chen, Ye Zhao, Zhi Yuan. Langevin Particle: A Self-Adaptive Lagrangian Primitive

5 For Flow Simulation Enhancement. Computer Graphics Forum ( Eurographics11 ), 30(

4).

3. Fan Chen, Ye Zhao, Zhi Yuan. Spectral Modeling of Divergence-Free Vector Fields.

IEEE VisWeek, 2010 (poster).

4. Zhi Yuan, Fan Chen, Ye Zhao, Zhiqiang Wang, Sean Reber, Cheng-Chang Lu. Ad Hoc

Compression of Smoke Animation. Submit to IEEE TVCG

5. Zhi Yuan, Fan Chen, Ye Zhao. Pattern-Guided Smoke Animation with Lagrangian Co-

herent Structure. ACM Transaction on Graphics (SIGGRAPH Asia 2011), 30 (6), 2011.

6. Zhi Yuan, Ye Zhao, Fan Chen. Incorporating Stochastic Turbulence in Particle-Based

Fluid Simulation. The Visual Computer, 28(5), 435-444, 2012, Springer.

7. Zhi Yuan, Ye Zhao, Fan Chen. Stochastic Modeling of Light-weight Floating Objects.

The 24th International Conference on Computer Animation and Social Agents, May,

2011. Extended abstract in ACM SIGGRAPH Symposium on Interactive 3D Graphics

and Games, March, 2011.

8. Ye Zhao, Zhi Yuan, Fan Chen. Enhancing Fluid Animation with Adaptive, Controllable

and Intermittent Turbulence. Proceedings of ACM SIGGRAPH/Eurographics Sympo-

sium on Computer Animation (July, 2010).

6 CHAPTER 2

Background

2.1 Fluid Simulation

Fluid simulation is generally achieved by solving the famous incompressible Navier-Stokes(NS) equations [4]. The equations applying Newton’s second Law to fluid motion are popularly used by the animators and researchers in the simulation of fluid. The description of the equations is as follows:

u = 0, (1) ∇ ∂u + u u = P + ν 2u + F. (2) ∂t ∇ −∇ ∇

Eqn.1 is called the incompressibility condition which makes sure the mass of fluid is conserved.

In other words, for one point in the fluid volume, the in and out velocity must sum up to zero.

Eqn.2 is called the momentum equation describing the fluid motion driven by the pressure and force. u stands for velocity, t is the time, P is the pressure, ν is the kinetic viscosity coefficient, and F is the external force. The first term on the right of Eqn.2 is the advection term addressing that the velocity is advected by itself. The second term means the velocity changes along the opposite gradient of the pressure and the pressure pushes the fluid from high pressure region to low pressure region. The third term which is the diffusion term says the velocity diffuses along the gradient of the velocity. The external forces can be the force such as gravity, buoyancy, etc.

7 These partial different equations(PDE) can be solved by using different methods. Lattice Boltz- mann Method(LBM) [7,8] solves the NS equations on the lattice which is well suitable for simulating the flow with complex geometries and can be easily implanted on parallel environ- ment due to its explicit scheme of solving PDE equations. However, it requires small time steps and fine discrete grid to achieve stable and decent results which may decrease the overall speed. Another grid-based solver called stable solver [5] proposed by Stam is uncondition- ally stable by using semi-Lagrangian advection. All these grid-based methods demand high resolution of discretization to achieve realistic result with fruitful details. Therefore, in some large scenery or high definition applications, the computational cost is highly expensive which is not suitable for some real-time applications. The emergence of Lagrangian approaches [9] release the fluid solver from the limitation of the grid(lattice), which employs the particles to model the fluid. Each particle carries the material properties to move in the fluid and the mass conservation is inherently satisfied. Meanwhile, the particle-based methods do not suffer the numerical dissipation introduced by interpolation operations during the advection step. The

Smoothed Particle Hydrodynamics(SPH) [10] was first proposed by Lucy [11] and by Gingold and Monaghan [12]. Then Reeves [13] introduced the particle-based scheme into computer graphics. Since then large number of work has been done in this field and achieved great success [14–18].

2.2 Distance Field

In the fluid simulation, boundary handling is very important for generating realistic fluid. The representation of an obstacle can affect the final animation of the flow. An inaccurate repre- sentation may make the fluid unrealistic. An obstacle is usually represented by the distance 8 field [19] in the fluid simulation which is an implicit representation of a geometric shape. The generation of distance field has been an essential research topic in computer vision, graphics and visualization, as well as applied mathematics [20–24].

A distance field can be generated directly by computing from each point to a geometric primi- tive. An interactive algorithm computes 3D Euclidean distance fields on GPU [25] by rasteriz- ing the distance vectors from the points on the slice to the primitives.

Related to our research, distance transform is a general approach to form the distance field by propagation from a starting set computed by direct geometric and analytic algorithms. Iteration methods are first applied to solve the shape-from-shading problem on the whole domain [26,27] in numerical computing field. Another strategy is to propagate the distances to neighbors with special templates through the domain. The template can be designed based on chamfer distance [19], or more precisely, on vector distance [28]. Fast marching methods [29–31] are proposed to compute the arrival time of a wavefront expanding in the normal direction at an active band of grid points, which actually solve the Eikonal equation from a given boundary condition. Zhao et al. [32,33] present a sweeping method to solve the Eikonal equation by

Gauss-Seidel iterations with alternating sweeping ordering on rectangle grid or meshes. Fast sweeping method is applied for static convex Hamilton-Jacobi equations [34]. Frisken et al.

[35] propose a hierarchical computation for distance field generation. For more details, we refer the interested readers to good surveys on 2D [36] and 3D distance transform [37].

Except the whole-domain iteration methods, these basic algorithms cannot be easily paral- lelized due to their computational schemes. Several approaches employ specially-designed

GPU algorithms, such as tile-based updating scheme of fast marching [38], domain division of

9 sweeping method [39], and delicate narrow band packing [40]. Working on polygon meshes, a method based on scan-conversion of the mesh is proposed by Mauch [41, 42]. Sigg et al. improve their algorithm with hardware implementation [43]. Another approach is to construct a Voronoi diagram, which leads to a distance field generation of 2D and 3D polygons [44].

Weber et al. present a parallel algorithm for approximation of geodesic distances on geome- try images [45]. These methods process the geometric elements independently, and thus can utilize the parallel nature of graphics hardware to achieve good performance and success in distance generation. In comparison, our method does not rely on the meshes to represent ge- ometric shapes. In our approach, we utilize the inherent parallel scheme in iteration methods, which provide a wavefront scheme to further improve the performance and flexibility.

2.3 Fluid Turbulence

“Turbulence is an irregular motion which in general makes its appearance in fluids, gaseous or liquid”, stated by Taylor and von Krmn in 1937. Therefore, turbulence is very significant in the fluid modeling and a key factor to enhance the reality of the fluid. However, the modeling of turbulence is quite difficult due to its irregularity and high Reynold number(which defines the degree of the turbulence).

Also, these fine-scale details are contingent on the resolution of computing grids usually re- stricted by computational resources and performance requirements. Moreover, numerical dissi- pation contributing to significant energy loss further leads to unrealistic detail damping. Alter- natively, fully particle-based solvers have been used, e.g., the Smoothed Particle Hydrodynam- ics [46] is employed in a large body of research work such as [14–18]. The pure Lagrangian

10 approach usually needs a large amount of Lagrangian primitives (particles) distributed in the domain, and it has not been intensively studied in computer graphics to simulate turbulent smoke.

Advanced numerical scheme Many advanced numerical schemes are proposed for solving the governing NS equation with reduced energy damping. The advection is replaced by the La- grangian fluid-implicit-particles (FLIP) [47] and higher order schemes with repeated semi-

Lagrangian steps [48, 49]. Furthermore, different numerical schemes are introduced includ- ing higher order advection scheme (BFECC) [48]. Alternative paths include adaptive high- resolution simulation (e.g. [50]), particle fluids (e.g. [14]) and precomputation (e.g. [51]).

The enforced circulation preservation [52] from Stokes’ theorem and an energy preserving scheme [53] in a finite volume manner provide stable Eulerian solutions on simplicial grids.

These methods commonly work on a stationary grid and require solving a large linear system with rapidly emerging complexity from grid size increase. The computational cost is reduced by coarsening the grid in particular for pressure solver with an efficient approximation of the pressure gradient on the fine grid [54]. Spatial refinements [50, 55, 56] adaptively provide details with high resolution at parts of the simulation domain with extra grid manipulation on-the-fly.

Noise field integration Fluid turbulence manifests stochastic fluctuation, and direct numerical simulation cannot model very turbulent behavior with the intrinsic nature. Synthetic are naturally employed to be integrated with the simulated velocities, which reduces cost and creates natural-looking results. A handful of recent approaches utilize Perlin or Wavelet noise to generate spectrum-controllable divergency-free fields with the curl operation [57], which are

11 added to the simulated flow fields. Divergence-free fields for artistic simulation are calculated by a fast simulation noise [58]. Beyond fluids, fractal mountains were created in the frequency domain according to fractal spectrum [59], which can be applied to fluid turbulence. Forces have also been used in animated fluid control [60–63]. Since such isotropic noises are not directly applicable for the fast evolving anisotropic flow fields, these methods endeavor to ma- nipulate the noises with turbulence parameters computed from special energy transport models.

For the purpose, Schechter and Bridson [64] propose a simple linear model, Kim et al. [65] use locally assembled wavelets, and Narain et al. [66] apply an advection-reaction-diffusion equa- tion. Most recently, a particle-based method is developed [67] to create scalable chaotic effects based on a very coarse grid simulation and using a particularly-stretched wavelet noise for tur- bulence production. The energy transport is solved by a two-equation k ε model on particles − to incorporate anisotropic noise and update particle velocities. This method achieves very fast performance by rendering particles directly. However, it aims at creating very chaotic flows and does not perform well for non-turbulent fluids. These models are in accordance to the complex estimation (e.g., statistical Kolmogorov theory [68]) of turbulence evolution from the simulation results. They integrate noises in a post-processing stage. Therefore, extra efforts are necessarily devoted to make the noise coupling temporally consistent with the evolving simulation.

Energy injection On the other hand, ongoing flows are also enhanced through turbulence energy injection. Vorticity confinement forces computed at all grid sites increase rolling features of smoke [69]. Later, manually seeded vortex particles carrying an additional vorticity apply similar rotational forces while the particles stream inside the flow [70]. The carried vorticity

12 is modified through a vorticity-velocity form of the NS equation. The method requires very careful seeding since it might mistakenly impose unnatural rotation to the flow without the guidance of a physical turbulence model. Pfaff et al. [71] present to sample vortex particles physically on boundary layers for obstacle induced turbulence. After seeding, the method solves energy transport equations to determine when the particles should increase or reduce their chaotic agitation, and correspondingly heuristic rules are developed for particle merging and splitting. It does not handle fluid streams without objects. spectrally generated divergence- free noises to instigate turbulence conditionally but not with energy evolution. This method adds forces in the whole domain so that it may introduce excessive energy into the system and make it unstable.

2.4 GPU acceleration in Fluid Modeling

The computational power of graphics processing unit (GPU) grows very fast recently for the demanding of the real time 3D graphics. The graphics card is dedicated to speed up rendering of graphics data such as polygons, texture and so on, is processed extraordinarily fast. Com- paring with CPU, there is large discrepancy in floating-point capability between them. Fig. 1 shows the peak performance of GT200 from NVIDIA is around 0.9 GFLOPS per second while the contemporary Intel Harpertomn is around 0.12 GFLOPS per second. The multithreaded architecture of GPU makes it appropriate for highly parallel and computation intensive appli- cation. Therefore, besides those graphics applications, more and more numerical applications and general computational problems resort to hardware for accelerating the performance [72].

In fluid modeling, the explicit LBM solver only involves local information during numerical

13 Figure 1: Floating-Point Operations per Second. [1] computation. It has advantages to being highly parallelized on GPU. Many researchers dedi- cate to accelerate LBM with graphics card. Zhao et al. [73] implements LBM solver with CG on a single GPU. Mattila et al. [74] optimize the memory layout with adopting the swapping technique. With the advent of Compute Unified Device Architecture (CUDA), it is more easier to program on graphics card. LBM can also accelerated by MPI based on CUDA on GPU clus- ter [75,76]. Finite-Time Lyapunov Exponent (FTLE) field extracting the fluid features such as flow separation, transport barriers has been used in visualization to better understanding the

fluid flow. The burdened computation intensity and parallelizability of FTLE can be efficiently performed on GPU through OpenGL [77]. Brunton et al. [78] reduce the redundancy of parti- cle integrations by approximate the particle flow map to speed up the calculation. Barakat et al. [79] adaptively sample and render the FTLE field depending on the view angle and com- putation is on the fly through optimizing the memory management and computation resources.

Fluid modeling generates 3D, high-resolution, and time-varying data sets. The large data size

14 imposes challenge on storing and transmitting the animations where good compression tech- niques are demanded. Current GPU based compression methods [80] are not suitable for fluid data since they can’t reserve the small-scale details which play important roles in fluid data.

15 CHAPTER 3

Distance Field

3.1 Introduction

A discrete distance field provides an implicit representation of a geometric shape, which is defined by a collection of sampling points inside an enclosing domain of the shape. Each sampling point stores the smallest distance from itself to the interested shape. Usually, these sampling points are grid points on a rectangular grid in 2D or 3D domain. The distance can be defined in terms of arbitrary metrics, while the Euclidean distance is the most popular model for graphical applications. As an implicit shape modeling scheme, the distance field is widely used in many important applications, such as image segmentation and processing, 3D shape editing, smoothing, morphing, collision detection, topology operations and volume graphics.

Therefore, distance field generation has been an essential research topic in computer vision, graphics and visualization, as well as applied mathematics. A variety of approaches have been proposed to address the problem that can be described as solving an Eikonal equation. Most recent endeavors adopt a strategy based on the distance field transform. Instead of direct com- puting the closest distance from every point to the shape, only the grid points belonging to a boundary band close to the shape is computed, from which the remaining distances are eval- uated by distance propagating to the rest of the volume. The distance propagation algorithms can be categorized into two main strategies:

Domain Sweeping: Distance propagation starts from some corners of the rectangle grid to the 16 whole domain by a predefined sequential order related to the axial directions. For example, for a 3D rectangular domain [0..NX] [0..NY ] [0..NZ], distance information propagates from × × (0, 0, 0) to ((NX 1), (NY 1), (NZ 1)) by traversing all grid points in an order of: first − − − x direction, next y direction, then z direction. Such distance transform does not consider − − − the arbitrary locations of the starting bands providing the basis of propagation. Obviously, one such propagating traversal cannot accomplish the task. Typically several passes of traversal in different directional orders are required.

Front(contour) Propagation: Taking the initial band into consideration, the propagation starts from the grid pointsinside the band and transfers the known distance information to their neigh- bors until the whole domain is computed. The adaptive approach guarantees distance transform in an increasing order, which is implemented by exploiting a special priority data structure (e.g. a sorted list or a heap) storing a sorted active band. The priority data structure maintains grid points being used for transform with the discrepancy among their distance values. The scheme only retrieves a grid point with the shortest distance from the band, and propagates the distance to its neighbors not in the band. Thus, it avoids backtracking over previously evaluated grid points and enables fast marching and correct results.

The sweeping methods go through the N = NX NY NZ grid points with several (a con- × × stant value) passes, and thus achieve asymptotic complexity O(N). In comparison, the front propagation methods have the complexity of O(NlogN) due to the heap maintenance efforts.

Though it seems the former is advantageous, the latter provides a more flexible approach when a particular limit of distance is required to compute. When answering a query “give me the points with distance smaller than dl?”, it can easily stop computing during propagation and

17 provide correct results. A small dl is usually imposed in many graphical applications, where the front propagation methods can provide a faster response than the sweeping approaches that have to complete the whole domain computation. Both strategies have been widely studied and achieved great success, we refer interested readers to a good survey [37] for detailed analysis.

A common disadvantage of both strategies is that none of the approaches can be easily par- allelized. Because both are established on an algorithmic basis of sequential processing. The sweeping methods compute distance transform on grid points one point to another according to particular traversal sequences. The front spreading methods process the active band with a sorted sequential order.

The distance transform can be performed in a parallel computational scheme. A straightfor- ward idea goes back to an approach utilizing iteration strategy [27]: At a time step T , each grids point asks all its neighbors about their current distance, and then updates the distance of itself by finding the smallest among all the values propagated from these neighbors respec- tively. The updated distance value is the new distance of each point at time T +1. When all the points have a converged distance value, i.e., the value does not change in consecutive time steps, the distance field of the whole domain is achieved. This strategy may take many steps to reach the convergence status, resulting in relatively slow computational performance. How- ever, in this approach each grid point can be processed concurrently at each time step, which enables parallel computing, which is critical for exerting the computational power of modern multi-core CPUs and other parallel architectures. However, this approach is not adaptive and cannot achieve the early termination for queries with a limited distance value.

In this dissertation, we propose a new approach that has (1) the ease of simple programming

18 and the parallel computing capability by an iterative scheme; and (2) the controlling ability to handle particular distance limits of propagating fronts, enabling the query-based computation.

In order to provide front propagation features, we enable a similar narrow band to manage the active grid points; however, to apply a parallel computing scheme, these points do not have different priorities as in the fast marching method [31]. Our method utilizes an adaptive iteration on the active band to fulfill the requirements. Without distinguishing points inside the band by a priority-enabled data structure, the algorithm provides correct distance results only if a grid point is activein thenarrow band until its distancewill no longer have the possibilityto be updated. Therefore, each grid point will be assigned a lifespan monitoring its existence inside the band. The lifespan is determined by the structural feature of the domain-decomposition grids. We examine the geometric properties of several grid structures to find the theoretical lifespan of an arbitrary grid point, and then, use it to control how long a grid point should stay in the band. Moreover, we exploit a multiple-segment narrow band propagation algorithm to further reduce the complexity and improve the performance.

3.2 Distance Field Transform

3.2.1 Definition

A distance field represents surfaces or curves with implicit representation, which has been broadly used in shape modeling purposes in computer graphics. A distance field is defined as a scalar field that specifies a distance to a shape, where the distance is usually signed to distinguish between the inside and outside of the shape. Data set X representing a distance

19 (a) (b) Figure 2: Vector based distance propagation.

field to surface S is defined as: X : R3 R and for p R3, → ∈

X(p)= sgn(p) min p q : q S (3) {| − | ∈ } where sgn(p)=1( 1) if p is inside (outside) of S, and is the Euclidean norm. − ||

The distance transform computes the distance field of all the points, p R3, from an initial ∈ starting set with known distance values. The starting set stores the distance of points in a boundary layer of S, which is computed by geometric or analytical algorithms.

3.2.2 Vector-Based Distance Transform

We compute the distance transform from one point to its neighbors by a vector propagation method, in which the distance is represented as a vector from a grid point to the closest point on the surface. As illustrated in Figure 2a, a known distance of P is represented as a vector

−→CP , where C is the closest point on the surface to P . When P propagates to its neighbor Q, the distance of Q will be computed by

−→CQ = −→CP + −→P Q. (4)

Here, −→P Q is the constant vector between two neighbors. 20 Note that the length of −→CQ is used as the distance of Q propagated from P , though theoretically

C is the closest point to P not Q. This is the assumption of all distance transform methods, which can lead to computational errors related to the grid resolution.

The grid point Q will be able to compute its distance from other neighbors in the same way as from P . For example, in Figure 2b, another neighbor R also provides Q’s distance represented by −−→DQ. The actual distance of Q for next step is one of these distance vectors with the smallest length.

3.2.3 Our Computational Scheme

The domain sweeping methods control the directions and orders of propagation. That is, Q is only allowed to compute the distance vector from particular neighbors in each pass, so that after a few different passes, every point achieves its shortest distance to surfaces. In comparison, the front marching methods only allow one existing point with the smallest distance to find its inactive neighbors and propagate. As a result, each neighboring point guarantees to acquire its shortest distance (see [31] for proof). In the conventional iteration method, each point Q of the whole domain computes many temporary distance vectors from all the neighbors and choose the smallest for the next time step. In an arbitrary time step, Q may not have achieved its final distance value. After many iteration steps (usually proportional to O(N)), eventually, all the grid points will approach their correct closest distance. Such convergency is achieved when all the points no longer change (in reality, within a very small error tolerance) their distance value in consecutive steps.

21 In our approach, the iteration computation applies only on an active narrow band, a small por- tion of the whole domain, to improve performance. Furthermore, we propose a new algorithm that does not maintain the priority queue inside the narrow band. Therefore, our method pro- vides an adaptive iteration method for distance transform. In the next section, we describe the details of our active band propagation algorithm and the iteration strategy based on a point’s lifespan.

3.3 Active Band Scheme

3.3.1 Propagation Procedure

At the beginning, grid points in a boundary layer closely enclosing the interested shape obtain their distance vectors by direct geometrical computing. This starting set initiates the active narrow band, NB. Thisband stores all the gridpoints, to whom the evolvingdistance transform front has been propagated and whose distance has the probability to be updated in future steps.

The band will evolve along time steps by adding new points, and by removing points having achieved their final distance.

T Assuming the distance dP of each point P inside the band NB is known at a time step T , each of its neighbors, Q, is considered. Next, we use the method described in Section 3.2.2 to compute a temporary distance of each Q. In this way, each Q will have a group of temporary

T +1 distances and it will use the one with the smallest magnitude as its distance dQ at next time step T +1. Q can be a point newly added to NB in T +1. It can also be a point already existing in NB before this time step. This illustrates the difference between our method and the fast marching methods.

22 We are facing a challenge of how to remove a point from NB. Due to the use of the narrow band, the iteration convergence rule, which examines whether points no longer update their distances, cannot be applied. Because the rule is only valid when all points in the whole domain are computed together. Instead, we seek a solution by investigating the structural property of the grid to define the lifespan of a point (i.e. how long a point should stay in NB). In other words, we want to find the time step when a point has achieved the correct final distance with no chance to be updated again, and then remove it from NB.

x

x S

C1

44.6°ө Q

l

C2

S

Figure 3: Distance transform on a rectangular grid.

3.3.2 Lifespan of a Point

The lifespan of a point can be determined from the geometry of a grid. As illustrated in Figure

3, for a point Q with distance l to the shape S, all the possible closest points on S to Q must reside on a virtual circle (in 3D, a sphere). Considering two possible locations of the closest point, C S and C S, each of them transfers the closest distance information to Q however 1 ∈ 2 ∈ in different time steps with respect to the grid structure. When we only allow the distance transform to happen between axial neighbors (neighbors along x and y axes) for each time

π π l l(cos( 4 )+sin( 4 )) step, C2 will use t(C2) = ∆x steps to reach Q. However, C1 will use t(C1) = ∆x 23 steps following the pink path. Theoretically, for any possible closest point C(θ) S with an ∈ angle θ, with respect to the symmetry, the time step used to transform the distance information to Q is defined as

l(cosθ + sinθ) t(C(θ)) = ∆x l(√2cos(θ π )) π = − 4 , θ (0, ). (5) ∆x ∈ 4

Q From this equation, the minimum and maximum of possible time steps are computed as tmin =

Q π t(C(θ = 0)) = t(C2) and tmax = t(C(θ = 4 )) = t(C1).

Q Q A point Q in the square grid should be kept in the active band NB from tmin to tmax time steps, since it will have the possibility to be updated to a smaller distance by the iteration between the time range. From Equation 5, we get

tQ = √2 tQ max min = tQ +(√2 1) tQ . (6) min − min

Q Notice that tmin is the first time Q is added to NB, and therefore, we can remove Q from NB after ((√2 1) tQ ) time steps. In this way, we guarantee that a point has approached its − min final distance upon leaving NB. So finally, when NB is empty with no new points added and all points being removed, the distance field generation completes.

It is obvious that the propagation pattern, i.e., how a point propagates to its neighbors, can affect the lifespan. In a 2D square grid (Figure 3), if we allow the distance propagation between diagonal neighbors and axial neighbors. In this case, the time steps from C1 to Q is t(C1) =

lcos( π ) 4 = l , which is now smaller than that from C : t(C ) = l . Here, we find tQ = ∆x √2 ∆x 2 2 ∆x min t(C ) and tQ = t(C )= √2 t(C ). Therefore, Equation 6 is still valid for the grid. However, 1 max 2 1 24 with propagation in the diagonal directions, the speed of the distance transform is improved,

Q since a point can be added to the active band faster with a smaller tmin.

From the analysis above, we find that the lifespan of a point in NB is between tmin and tmax.

Such lifespan is determined by the grid structure. Please note that when a general Eikonal equation is considered, a link between neighboring points has a weight value defined by a particular function. Therefore, we cannot find a uniform grid scale ∆x or ∆y, which requires further intensive study and is on our future research agenda.

3.3.3 Grid Structures and Lifespan Coefficient

We have shown that a point Q should stay in NB for ((√2 1) t ) (0.414 t ) time steps − f ≈ f Q from Equation 6. tf = tmin is the time step it is first added to NB and is proportional to the length l as described in Figure 3. A lifespan coefficient can be defined as α 0.414, which ≈ determines the lifespan (α t ). For a 3D domain, the lifespan coefficient is defined in the f similar way as α = (√3 1) 0.732. If l has a large value, then the point will stay in NB − ≈ for a significant time (0.732 t ). In the implementation, the lifespan of time steps will need f to use an integer value, Ceiling(α t ). We can reduce the lifespan of a point by changing the f coefficient α to a smaller value, by using a different grid structure.

For example, we show a 2D triangular grid in Figure 4. Applying the similar geometric analy-

l sis, we can also describe the maximum and minimum for this grid structure. We get tmin = ∆x

l and tmax = cos( π )∆x , and then the lifespan coefficient α 0.04. 12 ≈

For 3D distance transform, we investigate the Face-Centered Cubic (FCC) grid structure, which has better sampling efficiency compared with the cubic grid. An FCC grid consists of simple

25 29.4°ө

x C1

Q C2

l S

Figure 4: Distance transform on a triangle grid.

Figure 5: The 12 neighbors of an FCC lattice site forms a cuboctahedron (Image courtesy of Dr. Feng Qiu).

Cubic Cartesian (CC) cubic cells with additional sampling points located at the center of each cell face. Each point in the FCC grid has direct links to a total number of 12 nearest neighbors, in contrast to 6 in the cubic grid. This is the best angular discretization rate that any 3D regular grid can achieve, since according to [81], in R3 the maximum number of sphere of radius 1 that can simultaneously touch the unit sphere (kissing number) is 12. This unique feature has important implications for sampling and interpolation. The details can be found in [81]. For example, the 12 links of an FCC grid point are symmetric under rotation and reflection, which presents a relatively simple geometric structure leading to a smaller coefficient α. Figure 5 shows the cuboctahedron defined by the 12 closest neighbors of an FCC site. The FCC grid can be easily created from a cubic grid. As shown in Figure 6a, for each grid point, only

26 Q l x C1

33.3°ө x C2

(a) (b) Figure 6: Distance transform on an FCC grid. the links to its 12 neighbors with link length √2 are allowed and thus we can construct an

FCC grid. We use the geometric analysis to find lifespan coefficient. Now, for a point Q with distance l to the S, the closest point on S to Q must reside on a sphere. Figure 6b illustrates two possible locations, C1 and C2, by looking in the neighborhood of Q, where the pink links represent available transform routes. In this case, Q, C1 and C2 are on the same plane. We use the similar analysis as for Figure 3, t = t(C )=(√2 t(C )). Note that C and C could be max 2 1 1 2 on other locations which has α< 0.414. However, since (0.414 t ) is the maximum possible f lifespan, Q will have to stay in the active band for the lifespan (0.414 t ), thus leading to an f improvement of α from 0.732 of cubic grid to 0.414 of FCC grid.

3.4 Computational Procedure

In recapitulation, our distance transform algorithm is

1. Initialize the active narrow band NB by analytically computing distance of a starting

set of grid points to the shape S. The points in the starting set belong to an enclosing

boundary layer of S. Set other grid points to a status UNTOUCHED. Initialize time step

T =0. 27 2. At time T +1, for each point P in NB, find each of its neighbor, G, whose status is not

FINISHED.

temp 3. Compute the temporary distance dG of N from P by Equation 4. Use the smallest

temp temp T distance for G, dG = min(dG ,dG).

4. If G is UNTOUCHED, setitto TOUCHED and add it to NB. Assign a lifespan value of

G as f = Ceiling(α (T + 1)). G

5. Go back to Step 2, until all the points P in NB are processed.

T +1 temp 6. For each point P in NB, assign dP = dP . If T +1 > fP , remove it from NB and

set it to FINISHED.

7. If NB is empty, exit; else T = T +1, go back to 2;

In our method, each point in NB can be processed concurrently. Therefore, we propose a parallel distance transform approach based on the active band.

3.5 Multiple-Segment Distance Transform

We have described the basic algorithm of our adaptive iteration method. Next, we analyze the complexity of the method and then propose further improvement based on the analysis.

For a 3D domain with NX, NY and NZ points in the three dimensions, we have N = NX × NY NZ grid points totally, each point P will stay in NB for a lifespan (α t ), where t is × f f proportional to the real distance l. For convenience, we represent the lifespan as (α t )=(β l), f where β is a new constant coefficient. For the whole propagation procedure, all the points

28 N totally will be visited (β li) steps. Since 0 li lmax, the total computation time is i=0 ≤ ≤ N N (β l ) < (β l )= N β l . (7) i max max i=0 i=0

Because l O(N), the complexity of the aforementioned algorithm is approximately max ∝ O(N 2).

Here, lmax is the largest possible distance in the domain. Our method can stop when reaching a user-defined lmax. If only a small lmax is required, as in many applications, our algorithm can achieve a very fast performance. On the other hand, our algorithm is parallelizable inside the narrow band. If using a parallel computing scheme, its performance could be further improved.

Next, we improve the method from another perspective.

The computational performance of our method is related to the distance value. If there are nd grid points with an arbitrary distance ld, in adaptive iteration, each point stays in NB for

(β l ) steps. For a large number of n points with a large l value, the computational speed d d d is slow. For instance, for the distance field of one center point in 3D space, the result consists of multiple spheres centered at the point. The distance transform is very fast at the beginning and becomes slower as the radius increases to a very large lmax. From the observation, we have endeavored to further improve the performance by restricting the maximal value of ld, i.e. lmax, to a fixed value. This leads to a new multiple segment propagation method for distance transform.

In general, we decompose the whole 3D domain to a number of segments, each of which represents a group of grid points with a specific range of distance values. We only allow the active band, NB, to propagate inside one segment before all the grid points inside the band

29 achieve their final distance. It works as following:

First, we define a limited value l1 for the first segment. When the active band propagates to a

seg new point with distance larger than a threshold lmax, this neighbor point will not be added to

seg NB. Only after all the points with distance smaller than lmax are computed, we set the bound- ary layer of these points as the new starting set of NB, and then start the distance transform for the next segment. The threshold of the second segment is (2 lseg ). For the ith segment, max the threshold is (i lseg ). Repeating this procedure for all segments, the new algorithm outputs max the distance field for the whole domain.

For each segment, the lifespan of the points is computed based on the initial band from the boundary of last segment, instead of the original shape. Then maximal lifespan value of all the points in each segment is (β lseg ), no matter what segment it is. As a result, with the number max of points, Ni, of each segment i, and with s segments in total, the computational time is

N0 N1 Ns−1 β ( (lseg )+ (lseg )+ ... + (lseg )) = max max max i=0 i=0 i=0 β lseg (N + N + ... + N )= β lseg N. (8) max 0 1 seg max

seg Comparing with Equation 7, this lmax is user defined and smaller than lmax, in this way, we achieve great speed improvement from O(N 2) to O(m N) with a constant m = (β lseg ). max We report the effect of the value m to our distance transform performance in the next section.

3.6 Results and Discussion

We examine our method with many data sets on an ordinary PC with an Intel Core2 CPU 6300 at 1.86 GHz and 2 GB of RAM. To show our computational performance, we use the adaptive iteration method for distance transform on multiple data sets. The examples include artificial 30 Table 1: Performance of our distance transform method on multiple data sets. For each model in different grid size, we compare the computational time between multiple-segment method seg with a segment size lmax = 20 (see Section 3.5), and the method with no segment. Computa- tional time is measured in seconds. Volume Size 64 64 64 128 128 128 256 256 256 Example Time with Segment× Time× without Segment Time with Segment× Time× without Segment Time with Segment× Time× without Segment One Point 0.813 1.458 8.157 23.219 97.642 391.423 Two Points 1.078 1.938 11.109 22.766 124.485 409.234 Fandisk 0.952 0.953 9.391 11.797 107.518 171.532 Bunny 0.984 0.969 9.829 11.391 122.298 166.189 Armadillo 1.078 1.094 9.485 11.843 110.313 177.408 Teapot 0.812 0.859 8.937 12.689 117.827 202.907 data sets with one staring point (i.e. we compute distance field in the whole domain to one existing point) and two starting points, respectively. Moreover, we also compute the distance

field of several polygonal models: a fandisk, the Stanford bunny, armadillo and teapot.

We report the performance in three levels of volume size for each model in Table 2, which are 64 64 64, 128 128 128 and 256 256 256, respectively. In our method, we × × × × × × use the FCC grid, so actually we only need to perform the computation in a half size of the regular cubic grid, since the FCC grid is constructed from cubic grid by only allowing the links to its 12 neighbors with link length √2 to be included. This straightforward implementation uses a coarse sampling in space. However, it has been shown in previous literature that the

FCC sampling scheme requires 29.3% fewer samples compared to CC lattice in 3D domain.

In practice, our method achieves good accuracy. In the largest volume size we tested, our method has the maximum error of 0.276 and average error of 0.001 for the one point model in comparison with the analytical result. For the two points model, the maximum error is 0.547 and average error is 0.001.

In Table 2, for each model of different volume sizes, we compare the computational time

seg between the multiple-segment method with a segment size lmax = 20 (see Section 3.5), and the method with no segment. It shows that using the multiple-segment method can improve the

31 speed. This improvement is larger for one point and two point models, which have substantial increase on the number of grid points along with the radius increase, and thus very suitable for multiple-segment acceleration. Please note that for polygonal models, the performance depends on the location and relative size of the model inside the 3D domain, which defines the distance value distribution.

In Figure 7, we show the computational time for distance transform on different distance ranges, for three data sets. For example, the bunny model uses 9.36 seconds to compute grid points with distance smaller than 20, and uses 16.578 seconds to compute grid points with dis- tance ranging from 20 to 40, and so on. It shows that our speed mainly depends on the distance value, thus for many applications requiring a limited distance size, we can achieve very fast response. The downtrend after reaching the top is because the number of grid points decreases when the active band approaches to the domain boundary.

2QH3RLQW %XQQ\ $UPDGLOOR













7LPH LQVHFV 







              'LVWDQFH

Figure 7: Computational time for distance transform on different distance ranges.

The performance of our segmented method is related to the segment size as described in Section

seg 3.5. Figure 8 shows the performance of using different segment sizes (i.e. lmax) as well as no segment for three data sets. It depicts that generally using a small segment size will make

32 2QH3RLQW %XQQ\ $UPDGLOOR











7LPH LQVHFV 





       QRVHJPHQW 6HJPHQW6L]H

Figure 8: Performance of using different segment size.

seg the computation faster. However, a very small segment size (i.e. lmax = 10) could slow the computation, because the overload of resetting the narrow band between consecutive segments will compromise the speed achievement in this case.

Figure 9: Distance field of the two points is rendered as isosurfaces with different distance values.

We use a Marching Cubes method to generate isosurfaces with different distance values for visualization of the results. In Figure 9, the distance field from two starting points consists two propagating spheres. Figure 10 shows the results of the Armadillo in the 256 256 256 × × volume. Figure 11 is the results of the bunny model.

33 (a) Origin (b) Distance=2.5 (c) Distance=6 (d) Distance=8.5 Figure 10: Distance field of the Armadillo is rendered as isosurfaces with different distance values.

(a) Origin (b) Distance=2.5 (c) Distance=10.5 (d) Distance=21 Figure 11: Distance field of the bunny is rendered as isosurfaces with different distance values.

34 CHAPTER 4

Adaptive and Controllable Turbulence Enhancement

4.1 Introduction

Fluid simulation, mostly based on numerically solving the governing Navier-Stokes (NS) equa- tions, has achieved great success in computer graphics, which has led to astounding appear- ances in movies and games of streaming water, flaming fire, propagating smoke, and more.

Recently, many researchers have endeavored to introduce turbulence for enhancing fluid ani- mations. As stated by Taylor and von K´arm´an in 1937 (at Royal Aeronautical Society): “Tur- bulence is an irregular motion which in general makes its appearance in fluids, gaseous or liq- uid”. However, turbulence could also mean “very hard to predict” due to the very large degree of freedom with high Reynolds number (Re). Turbulent fluids exhibit intrinsic fluctuations in a wide range of length and time scales, featuring stochastic and intermittent dynamics.

Strategy and Related Work Direct numerical simulation (DNS) cannot directly model turbu- lent behavior with a very large Re due to limited computational resources. Furthermore, fast simulation and interaction are very important for animation design and control in computer graphics. Therefore, graphical animations of turbulent fluids typically involve coupling syn- thetic small-scale (subgrid) noise, modeling chaotic dynamics, to a coarse-grid NS simulator.

This strategy relies on the Reynolds decomposition that breaks the instantaneous velocity field u into a mean (DNS resolved) field U and a rapidly fluctuating component u′. Based on this,

35 the methodology can be described as

u NS(U) ST(u′). (9) ⇐ ⊕

Follow this strategy, several successful approaches [64–66,71,82,83] provide various imple- mentations for: a fluid solver NS() simulating the mean flow U, a noise-based procedure ST() synthesizing and evolving the synthetic fluctuation u′, and an integration model coupling ⊕ them together. Here, NS() is usually implemented as a stable solver on a coarse grid.

In noise synthesis ST(), these methods generate turbulence u′ with random functions at various spatial scales and frequencies, sometimes referred to as octaves. A chaotic field was modeled in the frequency domain directly [82–84]. Recently, the curl operation, following Bridson et al. [57], is used on [64, 66] and wavelet vector noise [65], or alternatively, vortex particles belonging to different wave numbers are randomly seeded according to probabilities pre-computed by artificial boundary layers [71]. During the dynamic noise generation, the en- ergy transport among octaves is modeled by a simple linear model [64], an advection-reaction- diffusion PDE [66], locally assembled wavelets [65] or decay of particles [71]. As a common recipe, the celebrated Kolmogorov 1941 theory (K41) of energy cascade is applied [68]. These methods are built upon two graphical assumptions: A. The K41 inspired energy transport is modeled within the limited grid resolution in NS() and ST(). In fact, the simulation scale is much larger than that of the inertial subrange described in K41 for very high Re flows; B.

The energy cascade happens locally, where the energy is carried by local scalars or particles.

However, the theory actually postulates the spectral statistics of global energy distribution that may not be spatially localized. Nevertheless, the assumptions, which we will follow, enable turbulence creation and feedback to the mean flow in a graphical way, leading to great success 36 in improving fluid animation techniques.

In integration operation , there exist two challenges: one is the magnitude relation between ⊕ u′ and U, and the other is the temporal evolution of u′ with respect to U. Early work [82–84] advected gas by u′ and U together. The velocity magnitude matching is achieved easily with the graphical assumption B: at a location the kinetic energy of the smallest resolved scale of

U can be used to derive the kinetic energy of u′ from K41 so that the velocity relation is determined. In different implementations, Schechter et al. [64] seeded the resolved energy artificially, Kim et al. [65] used a locally computed kinetic energy, Narain et al. [66] adopted a strain rate related viscosity hypothesis, and Pfaff et al. [71] created the confined vorticity also following the hypothesis. The strain rate based method is physically meaningful but not always suitable for a graphical animator, who for example wants to introduce boundary effects for a small obstacle not solved by the existing coarse simulation. In this case, sufficient strain information from U cannot be provided. For this reason, Pfaff et al. [71] sought solution in a high resolution pre-computation.

However, it is not very pleasant to address the second challenge, where the generated fields should temporally evolve with the large-scale flow. To handle this, small-scale u′ fields are de- liberately managed with texture distortion detection [65], through an empirical rotation scalar

field [64] or by special noise particles [66]. These approaches achieve good results while intro- ducing complexity originating from implementing as a simple vector combination outside ⊕ of the governing NS equations. Pfaff et al. [71] instead coupled a stable solver with a vortex particle system, in which is realized as particle forces. It requires careful management of ⊕ particles and the well-introduced turbulence is the focus.

37 In this dissertation, we propose a framework to integrating turbulence to an existing/ongoing

flow suitable for graphical controls. In comparison with direct field addition, our framework avoids the artificial and complex coupling by solving integration inside the NS solvers.

Our Solution We model fluid fluctuation by a random process of adaptive turbulence forcing from a sequence of pre-computed force fields with scale and spectrum control:

ST(): In retrospect of K41, Kolmogorov assumed that at small scales the flow will be • statistically homogeneous and isotropic. Inspired by this, we spectrally synthesize small-

scale homogeneous fields with respect to an energy spectrum distribution, which follows

K41’s 5 law or user-prescribed ones. A sequence of synthetic fields are pre-generated − 3 and play a role as random forces.

: Instead of being combined with U directly, the synthesized fields represent chaotic • ⊕ forcing f perturbing the resolved mean flow. Thus, is realized in a forced NS simulator ⊕ (FNS()), inherently leading to smooth feedback and temporal evolution. Eqn. 9 can be

rewritten as

u FNS(NS(U), ST(f)). (10) ⇐

Moreover, this scheme handles boundaries inside FNS() where many successful methods

exist, releasing ST() from special operations of previous endeavors.

NS(): As an independent process from original simulation, our framework can be com- • bined with a large body of work on NS solvers (e.g. [5,14,53,69]).

Using random forcing is a standard method in physics to study and evaluate liberally-developed homogeneous and isotropic turbulence [85,86]. Here we contribute to explore it in integrating

38 synthetic turbulence with external large-scale flows. This approach is different from simply using a high-resolution fluid simulation. First, the randomness is critical in modeling turbulent dynamics. Second, the random forcing can represent higher frequency effects not limited by a given high-resolution grid. Furthermore, the method receives input from large-scale flows and controls their effect on resultant fields. More important, as a graphical tool, we practically ap- ply this turbulent forcing only in necessary local areas and/or in appropriate temporal periods, which are defined by user-interests, boundaries, strain and vorticity etc. Furthermore, we can apply random forces with large rotational scales, modeling chaotic fluctuation overlapped with the resolved flow. We therefore not only model small-scale turbulence but also inject manipu- lative turbulence in large octaves. To make more realistic fluid animation, we further model the temporal intermittency by randomly controlling the turbulence forcing in a heuristic way. Our effort, to the best of our knowledge, is the first attempt in graphics to include this important feature of turbulent fluids.

In summary, we implement an adaptive fluid animation scheme with controllable turbulent behavior, which meets the demand in many interactive applications. Our contributions can be summarized as:

Random turbulence forcing integrates synthetic turbulent fluctuation with large-scale • simulation, with respect to spatial and temporal consistency;

Controllable turbulence amplitude includes unresolved subgrid fluctuation, and/or over- • lapped large scale chaos;

Spectral synthesis of turbulence forces enables easy implementation and direct spectral • control, following arbitrary energy spectrum descriptions. A sequence of small-scale 39 force fields is independently pre-computed without extra simulation overhead and can

be reused for different animations;

Adaptive turbulence takes effect in local areas and/or in particular time ranges, condi- • tioned by physical or user-defined features;

Intermittent turbulence provides more realistic turbulent fluid animation. •

4.2 Random Forcing

Turbulent flows, which are unrepeatable in details and irregular in both time and space, con- found simpleattempts to solve them in the ordinary NS equations. It leads toan extensionof the understanding of fluid velocity as a random variable. Based on the Reynolds decomposition

, the Reynolds-Average NS (RANS) equations for incompressible fluid are introduced [87]:

∂U + div(UU) = P + ν 2U div(u u ), where the divergence (div()) of an additional ∂t −∇ ∇ − ′ ′

Reynolds stress tensor, u′u′, describes the underlying stochastic turbulent agitation. As an unknown tensor containing the information about the effect of the subgrid scales on the mean

flow, it is typically approximated by heuristic models (e.g. under Boussinesq’s reasonable hy- pothesis treating turbulent stress like viscous stress). Although the models capture some of the chaotic nature of real turbulence in small-amplitude disturbances at resolved scales, the models are essentially deterministic. Hence, they miss the stochastic effect of random fluctuations at subgrid scales. More general attempts model the Reynolds stress effects by a random process that manifests as random forcing:

∂U + div(UU)= P + ν 2U + f. (11) ∂t −∇ ∇

40 The turbulence forcing term f is different and does not conflict with typical external forces (e.g. buoyancy). It nonetheless is a stochastic instrument to inject turbulent energy. Typically, it is considered as Gaussian random noises that are white in time [86], whose Fourier transform has the property: f(w, t)f(w, τ) = E(w)δ(t τ), where w is the wave number, t and τ are time − steps. The overline denotes ensemble averaging, δ() is the Dirac function, and E(w) represents input energy. Using Eqn. 11, a sequence of random f will naturally satisfy temporal coherence of the resultant turbulence. In this dissertation we apply synthetic force fields to drive the velocity fluctuation integrated with the mean flow in FNS(). However, we do not fully provide a physical solution of RANS. To make the turbulent animation following the large-scale flow, we control the mean flow input and force agitation with a special feedback scheme, which will be discussed in Sec. 4.4. Next, we first describe the generation of solenoidal f fields with spectral modeling.

4.3 Turbulence Synthesis

We create a divergence-free vector field, v, completely in the Fourier domain by constructing random functions following the frequency domain version of the divergence-free equation.

After an inverse Fourier transform, the resultant field is strictly band limited with single or multiple vortex scales following a prescribed energy spectrum flexibly controlled by users. Its strict compliance with a spectrum design is mathematically guaranteed.

41 (a) =2√2, σ =0.2 (b) =2√2, σ =0.5 Figure 12: Random vector fields generated for a preferred scale with different deviations. 4.3.1 Frequency Domain Generation

The Fourier domain form of the divergence-free equation div(v)=0 is:

w vˆ(w)=0, (12)

where vˆ = (ˆvx, vˆy, vˆz) is the Fourier transform of v, and w = (wx,wy,wz) is the spatial frequency (wave number). We define the vector as

vˆ(w)= R1(w)v1(w)+ R2(w)v2(w), (13)

where R1(w) and R2(w) are two random complex numbers. Here two unit vectors v1 and v2 are orthogonal to w, and also orthogonal to each other:

wy wx v1(w) = ( , , 0), (14) w2 + w2 − w2 + w2 x y x y 2 2 wxwz wywz wx + wy v2(w) = ( , , ), w w2 + w2 w w2 + w2 − w | | x y | | x y | | where w is the magnitude of vector w [88]. The two random numbers are generated as | |

iα1 R (w)= Sw e sinβ, (15) 1

iα2 R (w)= Sw e cosβ, 2 42 (a) Kolmogorov energy. (b) Arbritrary energy. Figure 13: Divergence-free vector fields with two scales. Top: Spectrum; Bottom: Vector field. 1 = √2, 2 =8 and σ1 = σ2 =0.7. where Sw is a spectrum controlling parameter at frequency w. We utilize three scalar random numbers α ,α , β [0, 2π]. This solenoidal field generation strategy, based on the Fourier 1 2 ∈ domain orthogonal projection, has been widely used in physics, as well as by Stam [82,84].

The method was also applied to create 3D Kolmogorov spectrum fields which are added to 2D simulations for large-scale smoke phenomena [83]. Our method generates small-scale force

fields in a similar way. As described in Sec. 4.2, we are able to supply turbulent randomness that is white in time, i.e. not necessary to strictly respect temporal continuity and smoothness, which will be implicitly satisfied by forcing in FNS(), so that we no longer need to model the

4D Fourier field as Stam did. This also gives us freedom to explicitly model intermittency (see

Sec. 4.5). Next, we show how to control energy input in spectral bands.

43 4.3.2 Energy Spectrum Control

The parameter Sw is related to the energy input in a particular frequency w, which is used to control the total kinetic energy of the resultant vector field, 1 v2 . Here, represents 2 statistically averaging over the domain. The kinetic energy can be computed in the Fourier

1 domain by integrating vˆvˆ∗ in the whole domain Ω, where denotes complex conjugate. 2 ∗ This computation is achieved by integrating on each spherical area, Λ, with a radius w : | | 1 1 1 + 1 + 2 vˆvˆ∗ = vˆvˆ∗dΩ = ∞( vˆvˆ∗dΛ)d w = ∞ 4π w vˆvˆ∗d w . An energy in- 2 2 Ω 2 0 Λ | | 2 0 | | | | 2 put Ew is thus computed at each w as Ew = 4π w vˆvˆ∗, which determines the total kinetic | | | | 1 2 + energy of the vector field by v = ∞ Ewd w . From Eqns. 13, 14 and 15, we get 2 0 | | 2 2 2 2 vˆvˆ∗ = R R∗ + R R∗ = S (sinβ + cosβ )= S . We thus define Sw by 1 1 2 2 E S2 = w , (16) w 4π w 2 | | where Ew is a controllable input for the resultant fields.

Single Scale To provide more flexibility, we generate a single-scale field by

(|w|−µ)2 Ew = Cwe− 2σ2 . (17)

The Gaussian function defines an energy spectrum with concentration at frequencies that have a magnitude and a corresponding deviation σ determining the degree of concentration. For a

(N/2)√3 field size N, a given magnitude approximately models a 3D vortex scale l = µ , where

√3 is the diagonal factor, and N/2 comes from the conjugate symmetric implementation in the Fourier domain for achieving inverse transform results as real (non-complex) vectors. Fig.

12a shows 2D results using = 2√2 and σ = 0.2. The nearly regular vortex size and energy distribution are due to the small σ = 0.2 which plays a significant role in vortex appearance. 44 In Fig. 12b, the variation is made significant when σ =0.5, due to a loose concentration. The major energy input (i.e. large velocity magnitude visualized by red/yellow colors) focuses on the vortices with the predefined scale . This example illustrates using the Gaussian function to

flexibly control the vortex scale and energy distribution, with 2D visualization used for clearer representation and better understanding. However the method works equally well in 3D cases.

Multiple Scales A multiple-scale field, with two concentrations as an example here, is com- puted by

w 2 w 2 (| |−µ1) (| |−µ2) − 2σ2 − 2σ2 Ew = C1we 1 + C2we 2 . (18)

Fig. 13 shows blended rotational behaviors from the two scales, where large-scale (1) vor- tices are agitated by the small scale (2). Following K41 that suggests small-scale vortices

5 5 holding decreasing kinetic energy with the law, we define C1w,C2w w − 3 (Fig. 13a). − 3 ∝| |

In comparison, we also use an arbitrary C1w = C2w not obeying this physical rule (Fig. 13b).

Consequently, it shows more small-scale turbulence than Fig. 13a (see the brighter 2 = 8 spectrum ring in Fig. 13b compared to Fig. 13a). Note that the 5 law in K41 describes a − 3 continuous decay in inertial subrange. Here we use the relation between two discrete scales

(within a Gaussian kernel range). Though not physically accurate, it leads to easy and mean- ingful control of chaotic fluid behavior.

4.3.3 Computation

A sequence of force fields is independently pre-computed with the spectral method. This sep- aration from a mean flow simulation makes it flexible to control and design turbulent effects in a post-processing stage. This differs from the previous methods where they particularly create

45 u′ from U at each step. Due to the force integration, our method does not need to generate the force field at each step. In fact, the generated fields can be reused in a simulation. In our experiments, only 25 pre-computed force fields are randomly chosen, leading to good turbulent results. Furthermore, the same sequence of the fields can be repeatedly used in different simu- lations with different incoming flow fields. Since the Fourier domain operations are trivial, the computational complexity is completely bounded by the inverse Fourier transform. Though fast computing is not demanded as a pre-computation, the generation still can be completed very fast with Fast Fourier Transform in O(nlogn) and with GPU acceleration. For example, it costs 450ms and 275ms for a 1283 and 643 grid on an nVidia 8800 GT GPU, respectively.

4.4 Turbulence Integration

Integration Scheme To integrate pre-computed f to an existing flow, FNS() executes opera- tions at each step as:

1. Load velocity field from an existing mean flow simulation NS() at critical local regions

of the whole simulation domain;

2. Apply linear interpolation to generate the high-resolution velocity field U from the mean

flow input;

3. Create initial condition of FNS() as qU + (1 q)u(t), where u(t) is the instantaneous − velocity field from last simulation step;

4. Run FNS() fluid solver for one step, with the force coupling from a randomly-selected

field f;

46 ST() f

U qU + (1-q)u u NS() FNS() u Feedback

Figure 14: Data flow of FNS() computation. 5. Use the resultant high-resolution flow field u(t +1) for density advection and rendering;

6. Goto 1.

Figure 14 shows the data flow of the FNS() computation. At each step, the initial velocity

field consists of two components: one is the output instantaneous field (u) from last simulation step, and another one (U) is acquired from the mean flow by spatial interpolation. The two components are added by qU + (1 q)u with a control parameter q, and then modified by − solving the NS equations with the infusion from a solenoidal force field f. A large q enforces the resultant instantaneous flow strictly regulated by U. On the contrary, a smaller q will make the turbulence become much significant diverging from U. Our scheme can be looked as enabling a feedback control so that the integration provides natural coupling and control

flexibility. In comparison, the previous methods directly coupling synthesized noise with U act as a feed-forward control that requires special handling of ST() and as we discussed ⊕ before. Finally, the resultant velocity field contributes in its corresponding regions for fluid rendering.

Note that the interpolation might not be needed if an animator plans to add turbulence to the existing flow without using a high-resolution grid. In this case, f is directly applied for stimu- lating synthetic turbulence in existing octaves of U.

47 Besides the Eulerian solver, our method can also be directly applied to Lagrangian fluid solvers.

We conduct an experiment with the state-of-the-art SPH methods (Smoothed Particle Hydro- dynamics), which have been widely explored in computer graphics due to its programming simplicity, various simulation scales and easy boundary handling [14]. Together with the typi- cal pressure and viscosity forces, we impose f to each particle, manifesting the unsolved turbu- lent behavior. The force coupling strategy has no difference from Eulerian approaches, which is described below. To the best of our knowledge, this is the first time in graphics applying turbulence enhancement to a pure Lagrangian solver (Note that though with introduced vortex particles, [71] still relies mainly on an Eulerian solver).

Force Coupling A stochastic force f should perform in the turbulence simulator with respect to the mean flow properties. We match the force f with U at each location, using the graphical assumption B as in previous works (see Sec. 4.1). At first, we make f U 0 by reversing the ≤ direction of f if needed. This guarantees the randomly created turbulent fluctuation will not reverse U dramatically that leads to unnatural flow variation effects. Second, we follow the magnitude relation f uf /δt [89], where δt is the turbulence simulation time step length | |∝| | and uf is the introduced velocity variation by f. Finally an amplitude relation between uf and

U should be determined: uf = p U , where p is a coupling parameter. Then we achieve | | c| | c

f = uf /δt = p U /δt. (19) | | | | c| |

pc can be found by applying the velocity cascade relation among octaves of uf and U [65].

This approach is feasible but increases complexity. It indeed may not be necessary due to the approximation already produced by the graphical assumptions. pc can be more conveniently defined as an empirical control parameter. In general, if a function is defined as f = Ψ(U), Ψ | | 48 provides a very good tool to control the turbulence integration, and hence the final fluid effects.

Besides Eqn. 19, we describe flexible approaches of Ψ in Sec. 4.5.

4.5 Conditional and Intermittent Turbulence

Forcing acts as a stimulus for inducing chaotic effects rather than adding a resultant turbulent

field to global flow, leading to easy implementation of (1) adaptive turbulence only in neces- sary spatial areas and temporal periods; and (2) conditional turbulent effects with physical or artificial conditions. We apply turbulence forcing within critical areas in a large domain run- ning global simulation. The turbulent effects will propagate out of the selected local regions through the motion of scalar densities. This approach is very useful for many applications such as interactive games and emergency training.

Conditional Coupling As discussed in Sec. 4.4, a function Ψ is to determine the integration conditions of turbulence. We have defined Eqn. 19, which couples turbulence in the whole ef- fective region based on velocity magnitude. Here we provide several other choices for different animation purposes:

2 Strain rate: At each location r, the local strain rate S(U) = ((∂Ui/∂rj + • 2 ∂Uj/∂ri)/2) . We define

1 f = Ψ(U)= p w1 − S(U)/δt, (20) | | c| |

so that turbulence is initiated at locations with a large rate of change in U.

Distance: Obstacles are prone to introduce high strain rate thus causing boundary- • induced turbulence. While small obstacles may not be fully accounted for in a coarse

49 (a) (b) (c) (d)

(e) (f) (g) (h) Figure 15: Snapshots of turbulence enhancement simulations: (a) Original coarse simulation; (b) Wavelet subgrid turbulence; (c) Our subgrid turbulence; (d) Add vorticity confinement to (a); (e) Wavelet turbulence to (d); (f) Our turbulence to (d) with q =0.8; (g) Our turbulence to (d) with q =0.2; (h) Our turbulence to (d) with q =0.1.

50 grid simulation of U, a fluid animation can define

D(r) f = Ψ(U)= pcRamp( ) U /δt, (21) | | D0 | |

where D(r) is the shortest distance from r to the obstacles and D0 is a cutting length.

Ramp() defines a smoothly decreasing function from one to zero for D(r) < D0, and

otherwise it equals zero. Here we link D(r) to boundary layer effects, based on an

observation that the profile of shear stress, which leads to turbulence, is a decreasing

curve of the distance to the boundary surface [87].

Vorticity: Similar to strain rate, turbulence is related to the vorticity by • ω f = Ψ(U)= p | | H( ω ω )/δt, (22) | | c max( ω ) | |−| |0 | | where the vorticity ω = U, H() is the Heaviside step function, and ω is a threshold ∇× | |0

used to control the effects together with pc.

Density: Turbulence can be triggered by a function of the scalar density m of fluid. A • simple formula is

m f = Ψ(U)= p H(m m ) U /δt, (23) | | c max(m) − 0 | |

where H() is the Heaviside function and m0 is a threshold.

These examples illustrate that our solution supplies a framework incorporating a variety of turbulence starters, from physical features to an animator’s discretion, which can be further improved and extended for controllable and interactive animations.

Intermittency Turbulent fluids show alternations in time between nearly non-turbulent and chaotic behavior, which challenges K41’s hypothesis of universality. Many attempts, including 51 (a) Original fluid (b) Subgrid scale turbulence (c) Large scale turbulence

(d) Multiple scale turbulence(K41) (e) Multiple scale turbulence(Arbitrary) (f) Density based turbulence Figure 16: Snapshots of integrating turbulence to a laminar smoke.

52 by Kolmogorov himself, have been proposed to explain and solve the problem. It is extremely hard to present intermittency physically by DNS. We instead introduce temporal control in forcing integration to animate intermittent fluids. The fluid behaves in non-turbulent or tur- bulent dynamics alternately with randomly varied time intervals, ∆tturb and ∆tnon, respec- tively. We initiate turbulence coupling in intervals of ∆tturb, and otherwise use the large-scale

flow only. The two intervals are computed, when each time needed, as scalar random val- ues by ∆t [0, L ] and ∆t [0, L ], where L and L are used to control turb ∈ turb non ∈ non turb non the maximum interval length in time steps. However, when sometimes ∆tnon = 0, two tur- bulent periods are concatenated. For the whole animation period, the intermittency factor is

γ = Σ(∆tturb)/Σ(∆tturb + ∆tnon).

4.6 Experiments

Exp. 1 First, we describe our method in Fig. 15 in comparison to the successful wavelet turbulence enhancement method [65]. A basic stable solver [5] generates very static smoke effects with a coarse 48 64 48 simulation (Fig. 15a). Wavelet turbulence adds subgrid × × turbulence with a 2x finer grid (Fig. 15b). Our method creates similar subgrid turbulence in Fig.

15c with = 32√3 and pc =0.5. Our solution looks relatively more realistic since the forcing can impose chaotic behavior even for this very regular mean flow, while the wavelet approach’s effect is contingent on the original U. We then add the vorticity confinement [69] to Fig.

15a producing large-scale rotational behavior (Fig. 15d). In this case, the wavelet turbulence provides good small-scale turbulence (Fig. 15e). In comparison, our method can introduce small-scale turbulence in different levels with the control parameter q (see Fig. 14). In Fig.

15f, with q = 0.8 mainly supplying the mean flow component to FNS(), the enhanced smoke 53 propagates close to the origin shape of Fig. 15d which is similar to the wavelet result. While we decrease q =0.2, turbulence becomes significant in Fig. 15g since more enhanced component feedbacks to FNS() inducing amplification effects. In Fig. 15h, q =0.1 leads to even stronger turbulent effects. On a PC CPU (Intel Core2 6300 1.86GHz 4GB), our turbulent enhancement runs in 12982 ms per step, and the wavelet method uses 4048 ms per step, respectively. We compare these animations side by side in the supplemental movie to illustrate the dynamic difference.

Exp. 2 Next, we execute our animation using q = 0.2 based on a simulated laminar smoke past a sphere with a very coarse grid at 16 32 16. To better illustrate our approach, no × × enhancing techniques (e.g. vorticity confinement) are employed. We use a smoke evolving grid with a resolution 4x denser at 64 128 64. Turbulence integration is implemented × × on a local region surrounding the sphere with a resolution 38 76 38. Fig. 16 shows × × snapshots of the integrated turbulence at different scales: (a) original laminar result; (b) small turbulent variation with a subgrid scale (sub = 16√3) that approximates the grid scaling

1 factor 4 (pc = 0.3); (c) strong turbulent dynamics with a larger scale (l = 3 sub, pc = 0.5);

(d) turbulent behavior accommodating finer details than (c), with two coalesced scales (l and

) following 5 law (p =0.5); (e) reproducing dynamics of (d) with an arbitrary spectrum sub − 3 c law where two octaves (l and sub) having an equalized energy spectrum (non-Kolmogorov), which further reduces the effects of the large scale one. Fig. 16b-e use Eqn. 19 for force integration. Finally in Fig. 16f smoke density (Eqn. 23) is used to trigger turbulence. We use

σ = 0.5 for the simulations. In the supplemental movie, we compare the smoke effects in the different configurations. We also include multiscale animations using the vorticity (Eqn. 22)

54 (a) (b) (c) Figure 17: Snapshots of turbulence enhancement conditioned by the distance to obstacles: (a) a laminar smoke simulation; (b) direct turbulence integration to (a) at the same resolution; (c) Finer turbulent behavior achieved by executing simulation on a coarser grid than (a), while coupling turbulence to the interpolated flow. and strain rate (Eqn. 31) based turbulence integration, where chaotic variations appear around the sphere. On the PC CPU, the experiment uses 48 ms per step for the global simulation and

956 ms per step for the interpolation, integration and forced simulation. The density advection costs 342 ms. In comparison, a direct 64 128 64 simulation consumes 8715 ms that is 6.5 × × times slower, which cannot generate the various turbulent results.

Exp. 3 Another experiment is applied with three obstacles using q =0.2, as shown in Fig. 17.

Here we utilize the distance based turbulence enhancement condition (Eqn. 21) to approximate boundary induced chaos. Fig. 17a is the original simulation result with a resolution 64 32 × × 50. Fig. 17b presents a turbulent flow by adding turbulence forces to Fig. 17a without any interpolation or subgrid time steps. We use = 12√3, σ =0.7 and p =0.5. The 64 32 50 c × × simulation (with or without turbulence) runs in 628 ms per frame on the PC CPU and the added force does not increase noticeable computing overload. In Fig. 17c, we instead run the global simulation on a 2x coarser grid at 32 16 25 (51 ms per frame), and apply turbulence × × integration on an interpolated grid at 64 32 50. It shows finer turbulent features compared × × with Fig. 17b using a larger = 16√3. With this configuration, we also include intermittent turbulent effects in the supplemental movie using Lturb = 20 and Lnon = 40 steps.

55 (a) Origin (b) Turbulent (c) Origin (d) Turbulent Figure 18: Snapshots of turbulence enhancement with SPH ((b) and (d)), in comparison with original simulation ((a) and (c)). Exp. 4 We also perform SPH based turbulence enhancement. We use 4096 particles to perform a wave simulation. We follow [14] for implementation details. Fig. 18 shows the comparison of two snapshots between the turbulent and original animations. Each particle acquire the

3 turbulence force from a 128 grid with = 20 and pc = 1. Realtime simulation (52 frames per second) is achieved since our enhancement only need minimal extra computation of force addition. However, it shows good turbulent dynamics in Fig. 18b and Fig. 18d, comparing with the original results in Fig. 18a and Fig. 18c, respectively.

Discussion In the experiments, our method provides good turbulent behavior even with very coarse simulations, showing that it can be a fast and convenient tool for creating turbulent animations. The simulation performance is totally dependent on the FNS() solvers, which is more complex than using a simple vector field addition of small and large scales. However, in comparison to previous methods, this integration approach originates from the RANS equation and provides natural and controllable coupling for turbulent effects. Furthermore, in most cases FNS() can be computed only on local and adaptive regions. Our method is independent of any numerical solver, where high-order and Lagrangian solvers can be employed. It also makes it possible to utilize pre-computed and reusable synthetic fields separated from original simulations.

56 CHAPTER 5

Langevin Particles in Flow Simulation

5.1 Introduction

Flow simulation in computer graphics has achieved astounding appearance of various natu- ral phenomena. Meanwhile, graphical fluid solvers are continuously improved confronting the challenge from energy dissipation and limited computational resources, in order to pro- vide abundant turbulent details which are a deterministic factor of realistic visual effects. In this chapter, we introduce a new Lagrangian primitive, the Langevin particle, for easily and self-adaptively incorporating essential turbulence to ongoing flow simulations for graphical animations.

In this disseratation, we introduce the Langevin particle for modeling and incorporating es- sential turbulence to flow simulations. In physics literature, the particle stochastic Lagrangian model has been applied to model turbulent flows as a PDF (Probability Density Function) ap- proach [87], such as for 2D polydisperse two-phase flows [90]. We develop the method to enhance graphical fluid simulation by introducing randomness with controllable particles and feeding back with Langevin forces. A group of Langevin particles impose agitation forces in a self-adaptive manner to inject turbulence energy into flow simulations. Our approach intro- duces stochastic turbulent effects naturally with the inherent randomness of particle motion, while seeking no help from external Perlin or Wavelet noises. Abandoning the vorticity-based forces, it instead creates a different style of turbulence. Furthermore, it automatically adjusts

57 the intensity of fluctuation which works well for different kinds of flows, including strong chaos and placid streams. Meanwhile, the particles do not carry extra attributes such as vorticity, and the algorithm does not solve the PDEs of energy transport. These features make the method easy for implementation and result in only minimal extra overhead from existing solvers.

In particular, a Langevin particle is named since it moves obeying the generalized Langevin equation (GLE). This stochastic differential equation (SDE), including a random noise input, is widely used in physics for studying dynamics of classical and quantum fluids. It describes the trajectory of a minute element inside turbulent flows. The trajectory follows the major flow but meanwhile demonstrates a Brownian style stochastic oscillation. The modeled particle dy- namics has two features: (1) Stochastic motion: the particle performs a special path consisting of taking successive random steps but obeying the average path, which is a statistical Markov process that correctly models the physical anomalous behavior of flow-transported microscopic substances; (2) Turbulence-consistent behavior: the particle’s flowing behavior is influenced by the turbulence features of the underlying flow. In placid flow regions, the particle streams in a similar way of passive advection. However in highly turbulent areas, possibly induced by inlet variation or boundary geometry, the particle manifests chaotic abandonment of the mean flow field. Here the fluid turbulence features are computed from the base flow through the turbulent viscosity hypothesis, which is widely used in modeling turbulent flows with its estimation of small-scale fluctuation effects (e.g. in the Large-Eddy Simulation).

The GLE-modeled particle path can be looked as the combinatory result of a base flow and the measured turbulence. From this observation, we innovate to employ the Langevin particles to enhance the flow simulation, which may lack of details due to numerical dissipation, large

58 simulation scale, inaccurate boundary handling, etc. In detail, we apply a Langevin driving force computed from the velocity alteration of a Langevin particle at time steps along the particle’s trajectory. The forces feed back the otherwise “lost” turbulence to the simulator, leading to necessary disturbance to the simulation results. Our method combines the particle- based energy compensation with a random process implementedin the SDE. By adding random perturbation locally and adaptively in necessary location and time, it is a very different way from previous approaches, lending itself a good tool for flow simulation enhancement.

In summary, the new Langevin particle method has several desirable properties, including:

Stochastic nature: the turbulence depicts realistic spirit with the stochastic dynamics • modeled by the celebrated Langevin equation. The method provides distinct results with

multiple runs from identical configurations.

Self-adaptive behavior: particles evoke turbulence with physically necessary strength • while they move inside flow fields. In calm regions or high-viscous fluids, a Langevin

particle spontaneously retrenches or ceases its perturbation on simulation.

Easy implementation: the added particle motion and forces are easily programmed • from basic solvers, and the effects can be controlled with meaningful parameters.

Our main contribution is the development of the Langevin particles with these favorable fea- tures for easy flow turbulence enhancement in computer graphics. To the best of our knowl- edge, it is the first time that a stochastic process modeled by a stochastic differential equation is utilized in improving graphical flow simulation. Next, we describe the stochastic particle dynamics and Langevin equation. Flow simulation with Langevin particles is presented in Sec.

59 5.3, and we show multiple examples in Sec. 5.4. In Sec. 5.5, we provide a discussion of the approach as well as possible future extensions.

5.2 Langevin Model

Langevin particle motion is demonstrated by the generalized Langevin equation (GLE), which models a particle’s velocity with consecutive random deviations from an underlying mean flow.

In some respects, the particle velocity yields a prediction for flow turbulence evolution. Here, we introduce the stochastic Langevin equation and its generalized version in flow turbulence.

5.2.1 Particle Motion: A Random Process

A minute substance, such as molecule or dust, moves in the air not simply following the flow streamline but in a complicated way (e.g. Brownian). The trajectory can be described as

“random walk”, a fundamental model for temporal random processes, which is of basic in- terest in a number of fields from physics to economics [91]. A specially designed random path model adopting the Schlick phase function has been successfully applied for dispersive bubbles in graphics [92]. Our method has similar spirit to this approach, while we use more complex physically-based computation in modeling particle trajectories, which provides direct relation to a turbulence model leading to easy controls of particle behavior. A particle’s motion inside fluids can be considered as a statistical Markov process. In the terminology of stochas- tic process, it is a diffusion process, in particular an Onrstein-Uhlenbeck process whose PDF

(probability density function) involves with Fokker-Planck equation. The diffusion process is

60 represented by a stochastic differential equation [93]:

du(t)= D1[u(t), t]dt + D2[u(t), t]dW (t). (24)

Here, u(t) is the stochastic particle velocity. D1[u(t), t] and D2[u(t), t] are the drift and diffu- sion coefficients, of the two major particle activities drift and diffusion, respectively. W (t) is the fundamental Wiener process with the similar behavior to a pure random Brownian motion.

It captures the inherent random perturbation from molecule collisions and thermal fluctuations.

In computation, dW (t) actually provides a normal (Gaussian) distribution of random varia- tions. Because of the randomness, they deviate with stochastic fluctuation and the repeated executions result in nonidentical motions even with the same configuration.

5.2.2 Generalized Langevin Model

Langevin equation of the velocity of a particle suspended in a turbulent fluid flow is developed based on Eqn. 24: 3 ε du(t)= C u(t)dt + C εdW (t). (25) −4 0 k 0

The drift and diffusion coefficients are concreted and linked with parameters (ε, k, C0) that measure flow turbulence. We postpone the discussion of their definition a little later.

The basic Langevin Eqn. 25, however, models only stationary isotropic chaos. In computer graphics applications, our aim is to incorporate disturbance into dynamically evolving fluid

flows. Pope [94] developed the generalized Langevin equation (GLE) to describe the particle behavior in turbulent flows, which is

1 ∂ P 1 3 ε du(t) = dt ( + C ) (u(t) u(t) )dt −ρ ∂r − 2 4 0 k − + C εdW (t), (26) 0 61 where ρ is the fluid density at location r, P is the pressure, and represents the stochastic averaging. In this case the diffusion coefficient is modeled by a parameter √C0ε determined by the turbulence energy. The drift is modeled by a relaxation of velocity u(t) toward the local mean velocity u(t) controlled by ( 1 + 3 C ) ε . Here, the equation of motion for a particle − 2 4 0 k also includes a mean force due to the action of the mean pressure gradient on the particle,

1 ∂ P represented by . − ρ ∂r

5.2.3 Flow Turbulence

In the Langevin model, the particle behavior is controlled by flow turbulence properties. In turbulent flows, the instantaneous velocity field u can be divided with the Reynolds decom- position into a mean flow velocity u and a rapidly fluctuating component u′. Then the Reynolds-Average NS (RANS) equation is used to model turbulent flows:

∂ u 2 + div( u u )= P + ν u div( u′u′ ), (27) ∂t −∇ ∇ −

with an additional Reynolds stress, u′u′ , describing the underlying fluctuation. In a turbu- 1 lence energy model, k represents the turbulent kinetic energy, k = u′ u′ , which is half 2 the trace of the Reynolds stress tensor. ε is the rate of dissipation of turbulent kinetic energy.

The Reynolds stress, and hence k and ε, are physically unknown which contain the effects from sub-grid scales (i.e. not simulated) on the flow. In physics, they can be approximated by different heuristic models such as turbulent viscosity.

62 5.2.4 Computational Scheme

A base numerical simulationis looked as providing the mean flow field u . We allow a particle to move obeying Eqn. 26 as if it is inside a turbulent flow. Therefore, we need to evaluate the turbulence, in particular through ε and k, from the mean field. The turbulent viscosity hypothe- sis is applied that treats the turbulent stress like a viscous stress with a turbulent viscosity. This viscosity can be modeled by the mixing length model from the rotation-rate tensor wij [95]:

ν = l2 w , (28) t m ij

1 ∂ ui ∂ uj wij = ( ). (29) 2 ∂rj − ∂ri

Here is the tensor norm, r is the location and i,j are suffix of coordinates. l is the m characteristic length scale of the flow. Then, we compute the dissipation rate of turbulence, ε, from the norm of strain rate of the mean flow, S( u ), as

ε = ν S2( u ) (30) t 1 ∂ u ∂ u S2( u ) = ( ( i + j ))2. (31) 2 ∂r ∂r i,j j i

In Eqn. 26, the Langevin particles capture the full turbulence energy transferred (dissipated) from the simulation scale through ε, which is computed here by the turbulence viscosity and strain of flow. It defines the intensity of random deviation for particles.

After achieving νt and ε, we compute k based on another formula of the turbulent viscosity as:

ν ε k2 = t . (32) Cµ

In these computational equations, two constants are used: C0 =2.1 and Cµ =0.09 [87].

63 Finally, the statistical differential equation Eqn. 26 is discretized with finite-difference scheme.

The particle velocity is eventually updated as

1 ∂ P u(t + 1) u(t) = ∆t − −ρ ∂r 1 3 ε ( + C ) (u(t) u(t) )∆t − 2 4 0 k − + C ε∆tξ(t). (33) 0

Here ξ(t) is a normal (Gaussian) distributed random variable with mean zero and deviation one, which is derived from dW (t) = Norm(0, 1). It satisfies ξ(t)ξ(τ) = δ(t τ) where δ() − is the Dirac function to make it a Markov chain. In computation, we utilize the polar form of the Box-Muller transformation in generating random numbers with a normal distribution and construct the needed vector-valued variables from them. Interested readers please refer to [96] for mathematical details.

We have made a fairly brief introduction to the Langevin dynamics of particles in turbulent

flows. As we focus on the computational scheme, interested readers please refer to two good books for a complete mathematical and physical description [87,97].

5.3 Langevin Particles in Flow Simulation

Eqn. 33 provides the method of updating one Langevin particle’s velocity. Next, the particles are incorporated into a running flow simulation. The integration is a two-fold problem: one is the particle moving inside the simulated flow; and the other is the particle’s feedback to the simulation for turbulence agitation.

As turbulence enhancement seeks to recover detail fluctuations, the simulated velocities are

64 considered as the mean flow of an ideal chaotic fluid, i.e., u in Eqn. 33. A Langevin com- puted velocity tends to relax, from the random perturbation, towards this Eulerian mean ve- locity. Consequently, zero perturbation (i.e., ε =0) approximates the passive advection of the particle except the pressure gradient action on the particle. Meanwhile, significant turbulent disturbance (i.e., large ε) will lead to a strong deviation with the random term, √C0ε∆tξ(t), in the GLE Eqn. 33.

Therefore, the preferred turbulence effects are captured and modeled through the Langevin particle, which is then used to supply agitating forces to the simulation. We name such forces as the Langevin force.

F(t) u(t+1) P

u(t)

Figure 19: Compute Langevin force F(t). u(t) is the particle velocity, u(t) is the mean flow velocity, and u(t +1) is the particle’s target velocity at the following time step t +1 computed by Eqn. 33.

5.3.1 Langevin Force

Fig. 19 illustrates our algorithm of generating the Langevin force on a particular position,

P, along one Langevin particle’s path. At a time step t, the blue dashed line shows the local velocity that defines u at P. The black dashed line is the particle trace, while at this moment, the particle velocity is u(t). Meanwhile, the simulated flow is used to compute k and ε that evaluate the turbulence. From these values, Eqn. 33 computes the particle’s target velocity

65 u(t + 1) at the following time step t +1. Next, we compute the Langevin force as

u(t + 1) u(t) F(t)= α − , (34) ∆t with a constant α compatible to ∆t for a stable simulation. The force represents our intended

fluctuation that ideally should cause the momentum change of this particle, which is then fed back to the simulation to influence the flow. In Fig. 19, the red dashed vector depicts this computation and the red solid vector is the force F(t) at P. In practice, the force feedback is applied at the neighboring grid sites around P modulated by a Gaussian kernel with a unit radius.

5.3.2 Particle Evolution

In our method, the turbulence enhancement is on a Langevin particle’s own initiative adapted to simulated flows. The perturbation measured in Eqn. 33 by the stochastic term √C0ε∆tξ(t), is contingent on the flow feature of ε. Particles can be initiated at regions with large ε. More importantly, when particles are seeded or move to non-turbulent areas, ε will have a very small value and hence the random agitation will be self-adapted to a negligible level. Thus we can leave the evolution of particles free of special manipulation after initialization. This feature is very useful for animators, since previous approaches usually require manual work or particu- larly designed heuristic rules to manage particles during their evolution.

Initialization A general implementation is that we seed a group of particles in the whole do- main according to the distribution of ε, which can be realized through a Monte-carlo sampling process. That is: particles have a large probability to be sampled at locations with large ε.

However, the magnitude of ε is numerically distributed in a very large range, which makes it 66 hard to uniformly sample particles so that they appear at all the desired locations. For example,

ε may have a larger value at inlets than most areas around objects. However, we may want to sample particles around objects at areas with local maximum of ε. Therefore, the sampling can also be performed at necessary regions, such as around obstacles, at inlets, or with animator’s choice. In practice, the particle sampling is performed continuously during simulation steps.

That is, every s steps we sample m particles in necessary regions. s is set flexibly from 1 to a slightly larger value to control the number of particles and hence the added forces in the system. A small value of m produces good turbulence results in our experiments (See Sec.

5.4). In a simulation, active particles inside the whole system usually have an amount of a few hundreds.

Moving At each step, an active particle moves according to the flow velocity. Here, the velocity u(t +1) computed by Eqn. 33 should not be used, since it represents the target velocity that is particularly used in assessing the force by Eqn. 34. The target velocity is the “ideal” velocity of a particle inside a real turbulent flow, and the force is the way to propel the simulated flow towards the real flow. The force has to take effect through the NS solver, whose force-added simulation result eventually advects the particle at next step. Moreover, a particle is removed from the particle list when it stops in stationary flow regions or moves out of the domain.

Boundary A Langevin particle handles boundaries in a very easy way. It may be bounced back in a symmetric direction according to an obstacle’s surface normal. In fact, we found that it can be simply eliminated since the newly sampled particles will continue the turbulence enhancement task.

67 5.3.3 Turbulence Control

The generated turbulent effects are controlled through the parameter lm in Eqn. 28. Physically, it defines the length scale in a mixing length model of turbulence flow. For our purpose, the length lm can be specified at will to control the introduced turbulence. Increasing lm leads to larger turbulence viscosity νt and thus stronger turmoil. lm can be set as a variable with respect to spatial location, since in physics the characteristic length is not uniform in the whole domain but defined with the geometric configuration.

Meanwhile, the Langevin force magnitude is controlled by α, which is set as α = 0.01 in our experiments. A large value introduces large forces but may easily lead to unnatural flow turmoil. In practice, lm is a more convenient parameter. Another controllable parameter is the particle amount. Our method needs a small amount of Langevin particles to achieve desired turbulence, which requires minimal management overhead.

5.3.4 Simulation Procedure

Finally, we specify the pseudo-code of a complete simulation step with Langevin particles in Algorithm 1. Here, the grid-based solver simulates velocities at grid sites, and a linear interpolation is used when necessary for computing attributes at the position P.

5.4 Results

We perform several experiments on a workstation with an IntelXeon 2.53GHzCPU and12 GB memory. The supplemental movies describe the animations of these experiments. We adopt the basic stable solver [5] for the base simulation not using any advanced improvement techniques

68 (a) (b) (c)

(d) (e) Figure 20: Snapshots of integrating turbulence to a rising smoke flow with two different tur- bulence levels controlled by the characteristic length scale lm. (a) Original flow; (b) Vorticity confinement; (c) Random forcing; (d) lm =0.001; (e) lm =0.003.

69 Algorithm 1 A complete simulation step with Langevin particles 1: Incorporate external forces and Langevin forces 2: Advection with Semi-Lagrangian 3: Pressure projection with Poisson solver 4: 5: Advect Langevin particles in particle list with the flow 6: Seed Langevin particles if needed and add to the list 7: 8: for each particle LP in the list do 9: Read position P and velocity u of LP from the list 10: Load base flow velocity u and pressure gradient 11: Compute νt with Eqn. 28 and S with Eqn. 31 at P 12: Compute ε with Eqn. 30 and k with Eqn. 32 at P 13: Compute target particle velocity utarget with Eqn. 33 14: Compute Langevin force F with Eqn 34 15: Feed back F to the simulation 16: Remove LP if no longer active or out of domain 17: end for 18: 19: Render simulation data in order to clearly depict our enhancement method. First, we use Langevin particles on a rising smoke flow without internal obstacles using a grid of 40 96 40. Fig. 20 shows the snapshots × × of the simulation results with two different turbulence levels. Every two steps, we add only one

Langevin particle at the smoke inlet with a simple Monte-carlo sampling according to ε. That is, a location is given a probability value according to ε, compared with a random number, to sample a particle. With lm = 0.001, the original laminar flow is enhanced to a more realistic turbulent smoke (Fig. 20d). Using a larger lm = 0.003, the turbulence becomes stronger with the intensified fluctuation from the rising path (Fig. 20e). In comparison, Fig. 20b is the result using vorticity confinement [69], which shows relatively regular patterns close to the inlet without introducing randomness, and due to the accumulated vortical forces, more small- scale eddies are exhibited towards the top. Fig. 20c utilizes our random forcing method [98] which generates turbulent fluctuation, but the forces are applied everywhere which is hard to

70 (a) Original smoke

(b) Langevin particles

(c) Vortex particles Figure 21: Snapshots of integrating turbulence to a smoke simulation with diminishing wind. Left: before wind stops; Right: after wind stops.

71 control in order to avoid unnatural results.

Fig. 21 shows the snapshots of integrating turbulence to another smoke simulation. In the simulation, a blowing wind from the bottom stops during the smoke rising. Fig. 21a shows the original simulation with a grid resolution of 50 140 50. To add turbulence to the × × original placid flow, we add two Langevin particles to the simulation with the wind every two steps, as shown in Fig. 21b(left). Upon wind diminishing, the smoke continues rising, and the turbulence becomes quiet due to the loss of agitation energy. Our Langevin particles detect the decreased turbulence energy and model the quiet smoke ascending with their self-adapted

fluctuating effects, depicted in Fig. 21b(right). In comparison, we use the vortex particles method to the same simulation. Our approach has a comparable efficiency with this method, since both apply a fast computation on a small amount of particles. Vortex particles apply vorticity forces to the simulation as in Fig. 21c(left). Its enhancement of the flow shows more scattered smoke rotations, since the vorticity-based forces agitate the flow not directly respecting the mean flow path. While the wind ceases, vortex particles may continue to swirl the smoke (Fig. 21c(right)). The added vortical energy can disperse the smoke further and make it hardly go upward, inconsistent with the original simulation. Moreover, the vortex particles are only randomly sampled at initialization, while they do not introduce stochastic behavior during their motion. On the other hand, vortex particles utilize vortical forces to keep and enhance spatial vortex evolution. The Langevin forces induce random agitation for turbulence, but may affect the vortex structure and evolution with two drawbacks: (1) nice vortex motions are not directly produced with the diffusive particle motion; (2) no spatial coherence between the particles is explored. A future direction of our extension is to combine

72 (a) Original smoke (b) lm =0.001 (c) lm =0.003 Figure 22: Snapshots of turbulence enhancement of a smoke past obstacles with two different turbulence levels. the two approaches for even better results.

In Fig. 22, several snapshots illustrate the turbulent effects for smoke simulation with obsta- cles. We use an 160 80 80 simulation (Fig. 22a). Every ten simulation steps, we insert × × three particles from the inlet, five particles around the sphere and 5 particles around the cylin- der, respectively. These Langevin particles achieve necessary chaotic dynamics to the original smooth smoke, which suffers from the energy damping. We use lm = 0.001 and lm = 0.003 for two different levels of turbulence, as illustrated in Fig. 22b and Fig. 22c.

We also run an experiment of a flow over a table top. The smoke develops turbulence to the left of the table (Fig. 23). This is a special example used in [67,71] to illustrate that correct turbulence enhancement does not induce chaos above the table top, thanks to the usage of turbulence model. Our method successfully creates the similar results. Here, we use a grid of

128 30 30 that creates static flows (Fig. 23a). By initiating ten Langevin particles randomly × × from smoke inlet every two steps, Fig. 23b depicts the introduced chaotic smoke. Here, the lm value on the table top is set different from the lm value of the remainder space, reflecting the physical definition of the characteristic length that is typically used in physics to model flow behavior. It shows that the turbulence model provides a convenient and physically meaningful

73 (a) Original smoke (b) Turbulence enhancement(c) Enhancement with more particles Figure 23: Snapshots of turbulence enhancement of a flow over a table top with two different turbulence levels. way to achieve desirable effects. Increasing the particle numbers to thirty, the result in Fig. 23c shows stronger turbulence with more details.

As shown in Table 2, for the grid of 40 96 40 in Fig. 20, the stable solver runs 390 × × milliseconds (ms) per step in average, while the extra cost of Langevin particles is 21 ms per step, including all the computation from line 5 to line 17 in Algorithm 1. The Langevin computing adds only around 5% extra overhead. For the simulation of diminishing wind with the 50 140 50 grid, the solver uses 951 ms and the particles use 47 ms, respectively. For × × a larger 160 80 80 grid, the performance of Fig. 22 simulation is 4515 ms per step for × × the solver and 117 ms per step for the extra cost. Here the extra overhead is about 3% of the simulation. Finally, a smaller 128 30 30 grid (Fig. 23) uses 357 ms for the original × × solver and the particles use 11 ms, respectively. In general, our method has a minimal cost in comparison to the fluid solver, since we only maintain a small amount of Langevin particles and the GLE computing runs very fast. The computation of the particle dynamics and force feedback can also be parallelized. However, the performance is measured without parallel acceleration, since the computing speed is mainly determined by the non-parallelized implicit

fluid solver.

74 Grid Ave. Time Ave. Time Langevin Computing Example Resolution Per Step of Per Step of Over Base Solver Langevin Particles Base Simulation Fig. 20 40 96 40 390 ms 21 ms 5.3% Fig. 21 50 ×140× 50 951 ms 47 ms 4.9% Fig. 22 160× 80 × 80 4515 ms 117 ms 2.7% Fig. 23 128 × 30 × 30 357 ms 11 ms 3.1% × × Table 2: Performance report. 5.5 Discussion

Turbulence model: The rationale of our method is to re-inject the “missing” turbulence energy, due to numerical damping or coarse simulation scale, into the simulation. Instead of solving the

PDEs in a complete k ε model [87], ε in the GLE is considered as the full energy transferred − from the simulated scale (i.e., turbulence energy production P = ε in the k ε model), and − thus can be computed by Eqn. 30. Such treatment implies that the Langevin particle does not need to carry turbulence energy produced at a location to other locations along its path.

Thus, our method relies on the base simulation tightly, and the turbulent results follow the mean flow closely. The trend can be discovered by comparing Fig. 22a and Fig. 22c, or comparing Fig. 21b and Fig. 21a. Researchers have endeavored to achieve such flow behavior, which follows designed paths or low-resolution simulation results, with particular optimization methods [63,99].

In comparison, in the approaches (e.g., [66,67]) of solving a full k ε PDE system, the pro- − duced turbulence can be transferred along the flow and then accumulated with newly created turbulence energy. This feature tends to intensify the noise-coupling turbulence effects which are preferred in many applications. However in the meanwhile, the results sometimes will have a large deviation tendency from the base flow, and the scheme cannot easily handle the cases that turbulence should be reduced in the non-chaotic regions or temporal periods, as stated

75 in [67]. Indeed the Langevin particle can also be used to carry turbulence attributes and com- bined with such full k ε model. In the future, we will study a more operational framework − of turbulence model for animators to control the turbulence level and its agreement to the base

flow in a more intuitive and quantitative way.

Stochastic noise: Our method applies randomness inside the GLE as a stochastic process, which is incorporated locally to the simulation. This is different from the existing methods that employ externally created noise fields. Our approach is alleviated from their necessary treatment for temporal consistency during noise coupling, e.g., using the Jacobian of texture coordinates to handle deformation [65] or using special guiding particles [67].

An advantage of the Langevin method is that it imposes adaptive stochastic agitation on-the-

fly with a small group of particles. In comparison, our previous random forcing method [98] pre-generates independent noise fields without knowledge about underlying base flow. This drawback sometimes will create obvious visual artifacts since the added random force may adversely drive the flow towards an unnatural direction. In Pfaff et al. [67], a local anisotropic coupling of pre-created noise fields makes the results more visually consistent. However, a very large amount of particles are needed for this method to integrate the global noise fields.

Hybrid grid-particle: Our method employs the Langevin particles in addition to a grid-based simulation. We advance from the previous hybrid methods, e.g., vortex particles, by incor- porating adaptive stochastic turbulence. Our method can be looked as a combination of the noise-based and the particle-based enhancement strategies. However, the enhanced dynamics still relies on the grid simulation for turbulence production and NS solution. We will further

76 apply this scheme to a full Lagrangian approach, and also pursue to discover its usage in multi- phase and free-surface problems.

Non-vorticity force: Most force feedback methods employ vorticity-based forces. This ap- proach sometimes will adversely divert the flow from its path with its rotational forcing. In our method, the GLE models an enforced relaxation towards the mean path, so that the enhanced

flow follows the major route, as shown in Fig. 21. Meanwhile, a Langevin particle imposes forces which also change the turbulence parameters around itself and may possibly introduce a loop effect of adding forces. In our tests, the loop effect is not serious, because the vector direc- tions of the added forces are randomly distributed, and in comparison, the vortex confinement tends to accumulate additional forces to the rotational direction.

Particle management: The Langevin particles perform turbulence agitation in a self-adaptive manner, which alleviates the management cost to some extent. Nevertheless, the seeding lo- cation, the number of particles, and the removal of them are manually controlled to produce turbulence effects. The particles need to be seeded in large ε areas to catch and represent pos- sible turbulence, and the parameter α has to be carefully selected to avoid excessive energy injection. The management of these factors may potentially introduce visible artifacts in re- sults. The method also requires some efforts on maintaining the active particle list which is typical for a Lagrangian approach.

77 CHAPTER 6

Using GPU in Fluid Modeling

In fluid modeling, the computation can be very fast with low resolution. However, when it comes to high resolution, the computation will be very time-consuming which brings the dis- advantages to the applications in Computer Graphics. With the development of GPU, from the programmability only for shaders to the general purpose computation, more and more applica- tions could be implanted from CPU to GPU for high performance. In this dissertation, in order to achieve fast fluid modeling, we accelerated LBM solver, FTLE field computation and fluid decompression.

6.1 GPU computation with CUDA

GPU is specifically a dedicated graphics rendering device and the computational power of the

GPUs grows very fast. With the development of many-core GPUs, many applications can be accelerated by this parallelism platform with large number of cores. The challenge is how to develop those application on GPU and utilize its programmability. Compute Unified Device

Architecture (CUDA) [1] invented by Nvidia provides such a software environment which requires less efforts for programmers knowing the standard languages such as C.

Although the computation ability of muticore CPU also increases rapidly, GPU is designed specially for highly parallel applications. Comparing with CPU, GPU has more transistors for multi-data processing. From Fig. 24 [1], each GPU transistor has less flow control, less data

78 caching and less computation ability, but the large number of computation units are suitable for single instructions multiple data (SIMD) model.

Figure 24: The GPU Devotes More Transistors to Data Processing [1].

When programming through CUDA, a kernel which executes a great many threads in parallel on GPU or device, is called from CPU or host. All these threads are divided into two levels: grid and block Fig. 25 [1]. Each grid is separated into blocks and each block contains the threads. The grid and block have different memory access limitation.

Thread Block: Each thread block contains a batch of threads and all the threads in one block

can be synchronized. Each thread is identified by its thread ID, which is merely mean-

ingful in the specified thread block. The block could be designated as one dimensional,

two dimensional or three dimensional arrays. Fig. 25 shows the example of two dimen-

sional blocks. Each thread has its own local memory and registers only accessible by

itself. Each thread block has its own shared memory accessible by all the threads inside

the block. The maximum number of thread contained in one block is 512.

Grid of Thread Block: Each grid contains the thread blocks and each thread block inside

one grid is identified by its block ID. The maximum number of blocks contained in one

grid is 512 and the maximum number of threads contained in on grid is 512X512. If 79 the number of the data excess this number, more than one grid could be implemented. A grid is specified by calling a kernel. However, the threads from different blocks could not communicate with each other and synchronize with each other. Different blocks inside one grid could operate in parallel and to efficiently use the hardware.

Figure 25: GPU programming model [1].

80 (a)D2Q9 lattice (b)D3Q19 lattice Figure 26: 2D and 3D LBM lattice [2].

6.2 LBM Simulation

6.2.1 Introduction

Lattice Boltzmann Method explicitly solves the Navier-Stokes equations on the lattice. On each cell on 2D or 3D lattice, it only requires the property information such as velocity, density from its neighbors. The structure of LBM lattice is usually represented as DαQβ, where α is the dimension and β is the number of neighbors. Fig.26 shows the typical lattice structure for 2D and 3D. D2Q9 structure in Fig.26(a) indicates each cell has 9 links with its neighbors including the link to itself. Fig. 26(b) gives the example of D3Q19 and each cell has 19 links with its neighbors including the link to itself. Each link has velocity vector ei(x, t) and the distribution fi(x, t), where x is the position, t is the time and i is the number of the neighbor link.Then for each cell, the macroscopic fluid density ρ and velocity u can be computed as follows:

ρ = f , (35) i i 1 u = f e . (36) ρ i i i

81 The discrete format of Lattice Boltzmann Method is divided into two steps: collision and streaming. Eqn. 37 indicates the collision step while Eqn. 38 stands for the streaming step.

These two steps can be merged as one collision-streaming step resulting in Eqn. 39.

1 eq f (r, t∗)= f (r, t) (f (r, t) f (ρ, u)) , (37) i i − τ i − i

fi(r + ei, t +1) = fi(r, t∗) , (38) 1 f (r + e , t +1) = f (r, t) (f (r, t) f eq(ρ, u)) . (39) i i i − τ i − i

eq fi is the local equilibrium distribution defined by the Bhatnager, Gross, Krook (BGK) model

[100] as follows:

9 3 f eq(ρ, u) = m ρ(1+3(e u)+ (e u)2 u2), (40) i i i 2 i − 2

mi = 1/3, where i =0,

mi = 1/18, where i =1 ... 6,

mi = 1/36, where i =7 ... 18

where mi is defined according to the lattice structure. This equilibrium equation redistribute the momentum and locally guarantee the mass and momentum conservation in a equilibrium status. τ in Eqn. 40 is an constant value which represents the relaxation time and determines the fluid viscosity ν as 2τ 1 ν = − . (41) 6

Since τ is the only parameter to control the fluid behavior, this method is called as Single- relaxation-time LBM (SRTLBM). In this model, collision depends on only one relaxation.

Based on SRTLBM, Multiple-relaxation-time LBM (MRTLBM) [101] allows the indepen- dent relaxation behavior for each moment respectively. Better than SRTLBM, MRTLBM can 82 achieve optimal numerical stability. After relaxation, the inverse transform operation is applied to change back to space of distribution from space of moments. The transformation between are denoted as follows:

m = Mf, (42)

1 f = M − m,

T f = (f0, f1,...,fn) ,

T m = (m0, m1,...,mn) . where f and m are vectors here, and T represents the matrix transpose. n is the nubmer of distributions. For an example in a D3Q19 model, n is 19. M is a constant matrix according to different lattice structure. Then the collision and streaming equation are formed as:

1 eq f(r, t∗) = f(r, t) M − S[m(r, t) m (r, t)], (43) − −→ −

f(r + ei, t +1) = f(r, t∗). (44)

Here S is a diagonal matrix which denotes the relaxation rates for each links. BGK model is a special case where each element in S is the same constant 1/τ. meq is the vector for the local equilibrium values of the moments.

6.2.2 Implementation and Results

In the implementation of LBM solver with CUDA, each grid in dimension nx ny nz is × × distributed with one thread. Since the streaming and the collision are local operations, it is easy to allocate the operations for each thread. We only use global memory here, therefore, there 83 Resolution CPU Time per Frame GPU Time per Frame Speedup Ratio 32X32X32 48ms 3ms X16 69 64X64X64 374ms 23ms X16.5 128X128X128 3135ms 188ms X16.6 Table 3: SRTLBM time comparing between CPU and GPU

Resolution CPU Time per Frame GPU Time per Frame Speedup Ratio 32X64X32 70ms 5ms X14 64X128X64 570ms 43ms X13.2 128X256X128 4666ms 342ms X13.64 Table 4: MRTLBM time comparing between CPU and GPU is no data exchange between each thread and we do not have to synchronize their execution as share memory do. A grid (or a thread) needs to update itself only if its neighbors finish their computation. To avoid the write and read conflicts, we use two global memories one for the data before each iteration and one for the new updated data after each iteration. Due to the two level threads in CUDA, we assign the second level which is thread block with 512 threads for

nx ny nz best performance [1], then the first level has × × grids of block. If nx ny nz can not blocksize × × exactly divided by the block size, an extra thread block will be assigned for the reminder grids.

All the following experiments are implemented on a workstation with two Intel Xeon E5630

2.53 GHz Quad-Core CPUs and 12 GB memory with NVIDIA Tesla C1060 graphics card.

This graphics card has 240 1.3 GHz streaming processor cores and 4GB memory. The peak single and double precision floating point performance are 933 GFLops and 78 GFlops. We compare the performance between CPU and GPU for the computation of two LBM solvers.

Both experiments of SRTLBM and MRTLBM uses D3Q19 lattice structures. Table 3 gives the performance of SRTLBM solver with three different resolutions: 32 32 32, 64 64 64, × × × × 128 128 128. The experiments of SRTLBM show the GPU performance can speed up × ×

84 around 16 times then CPU. The time comparison of MRTLBM is given in Table 4. Another three dimensions: 32 64 32, 64 128 64, 128 256 128 are tested with time reported. × × × × × × From the results, the GPU version runs about 13 times faster than the CPU version. The different speedup ratios depend on the complexity of the algorithm itself.

6.3 FTLE

A key challenge in fluid study is to find the “representative pattern” of a fluid flow so that such a pattern acts as an operative vehicle for flow manipulation, such as guiding high-quality animations. One major impediment is that the flow is dynamically-evolving. Another one is the very complex momentum variation on a wide range of spatial scales. We use the finite-time

Lyapunov exponent(FTLE) field to discover flow patterns.

6.3.1 Introduction

Lyapunov exponent has its roots in the theory of dynamical systems, which characterizes the rate of separation of infinitesimally close trajectories. Haller [102] used finite-time Lyapunov exponent for identifying LCS over a finite time interval. Considering the motion of a La- grangian particle inside a fluid domain, its trajectory can be described by an ordinary differen- tial equation: dp(t) = u(p(t), t), (45) dt where p is the position of the particle at time t, and u is the velocity. In the parlance of dynamical systems, the trajectory that takes a particle forward T units in time from its initial

85 position defines a flow map:

t0+T Φt0 := p(t0 + T ). (46)

It depends on the initial time, t0, and the integration period, T . The flow map provides a way to compute the amount of local stretching, which is measured by the Cauchy-Green deformation tensor:

t0+T t0+T dΦt0 (p) dΦt0 (p) ∆ := ( )∗( ), (47) dp dp where denotes transpose of a matrix. ∆ is a 2 2 matrix for 2D flow or 3 3 matrix for ∗ × × 3D flow, respectively. It is computed on each grid site of a discrete grid on the fluid domain.

Furthermore, ∆ has positive eigenvalues since the matrix is positive definite. The eigenvalues measure the rate of separation of the underlying flow at a location p. In particular, the FTLE value is defined as a time-dependent scalar using the maximum eigenvalue λmax:

1 σ (p, t)= log λ (∆). (48) T T max | |

In Eqns. 46-48, the time interval T indeed can be either positive or negative. The Lagrangian particle thus moves forward or backward over time along its trajectories, respectively. For

T > 0, the FTLE measures forward trajectory separation and the associated LCS (i.e., local maxima of FTLE) represents repelling surfaces (stable manifold) in the flow. If T < 0, the separation is evaluated backward in time and the resulting LCS acts as attracting surfaces (un- stable manifold). Fig. 27b and Fig. 27c display a forward FTLE and a backward FTLE field, respectively. Note that the FTLE fields are dynamically evolving and these are the snapshots at a moment. The velocity field at that moment is shown in Fig. 27a.

Fig. 27c displays the major characteristic of the flow above the ball. The integration time T is

86 (a) Velocity streamline(b) Forward FTLE (c) Backward FTLE Figure 27: Flow pattern with FTLE and LCS. (a) Red: upward velocity. Green: downward velocity; (b)(c) Red: high FTLE value; Blue: low FTLE value. chosen depending on the amount of details needed in the resulting FTLE: a large T achieves smooth and large-scale structures, but it should not be too large in which necessary vortex is ignored. We use T =1 second in our examples.

6.3.2 Implementation and Results

From a sequence of low-cost simulation results on a coarse grid, the FTLE is computed numer- ically at each grid point at a time t. In a 2D domain, for each point p(x, y), four particles are positioned at (x τ,y τ) with a small deviation τ (e.g. 0.1 is used in our experiments for a ± ± unit grid interval). Six particles are required for 3D domain. The four particles are back traced in the velocity fields for a period of T , and their stopping positions are used to numerically compute Eqn. 47. The FTLE value σT (p, t) is then achieved using Eqn. 48 by calculating

T the maximum eigenvalue. The trajectory tracing is implemented within δt steps, where δt is the time step size in an NS simulator. For example, in Fig. 27, δt = 0.1 seconds and T is 1 second. We adopt a fourth order Runge-Kutta integration scheme in the tracing and use linear interpolations in the computation.

87 Resolution CPU Time GPU Time Speedup 32X64 1.3 0.03 X43 24X32X24 5.51 0.06 X91 32X48X32 19.8 0.14 X141 48X64X48 62.1 0.41 X151 Table 5: FTLE computation Performance

Computation of 3D FTLEs using fourth-order RungeKutta(RK4) integration and trilinear in- terpolation is slow on the CPU, however, the FTLE computational algorithm is explicit and embarrassingly parallel, which is easily implemented on GPUs to increase the performance.

The computed FTLE fields are stored after the fluid simulation. In practice, we only need to compute and store FTLE fields every β steps (e.g. β =5) in GPU, because they change grad- ually. The trajectory tracing requires the background velocity starting from the initial position of the seeding particle, and it is hard to approximate which neighbors are needed for back trac- ing with different fluid simulation. Therefore, we store all the velocities into global memory in GPU instead of only loading the neighbor data into share memory. With 3D dimension nx ny nz, there are totally nx ny nz 6 threads for kernel of tracing the particles. × × × × × We record the final six positions in another global memory and the kernel for calculating the

FTLE field will read those position for final computation.

Table 5 reports the FTLE computation performance. The result of 2D example with resolution

32 64 shows in Fig. 27 and GPU version increases the performance 43 times. While on a × 48 64 48 grid, the GPU computation is 150x faster than the CPU version. According these × × experiments, higher resolution can achieve better speedup more than the lower resolution.

88 Physically Smoke Animation Compression Smoke Animation Decompression Simulated Motion Smoke Density Field of Block-based Inter-frame Vectors Animation Key Frames Transform Compression File Bidirectional with Bidirectional Key Weight Maps of Quantization Storage Advection Density Field Advection Decoding, Frames Intermediate Downsampling and or Inverse Transform Frames Encoding Network Weight Velocity Transfer Compensation Maps Field Adaptive Velocity Motion Vectors on Recovered Density Simplification based Nonuniform Blocks Field for Rendering on Flow Features

Figure 28: Smoke animation framework overview.

6.4 Fluid Compression/Decompression

Smoke animation compression aims to reduce the data size created by physically based smoke simulations. In this section, we rationalize and delineate our compression framework and background, which is illustrated in Fig. 28.

6.4.1 Compression

Inter-frame Compression The simulation generates a series of smoke density fields, consist- ing of the time frames of animation. Each frame can be compressed using file or volume com- pression tools. Apparently, the temporal coherence between frames should be further exploited for better performance. An approach is to extend existing 2D video compression techniques

(e.g. block-based motion compensation in MPEG) to the compression of 3D density frames.

In particular, motion prediction and compensation promotes inter-frame compression by di- viding the whole volume into a group of blocks. Then, a block A in the frame of time step t +1 can be predicted by using some block B from the existing frame of t step, assuming

A is moved from B during the time interval. Such block motion prediction is measured by a motion vector (mAB)) between the source and destination. The motion vector is computed by conducting a neighborhood search for a best block match. Consequently, the original frame series are divided into key frames and intermediate predictive frames. Original density fields 89 are used for key frames. Each intermediate frame between two consecutive key frames only need to store the motion vectors of those blocks, and the difference, e.g., between the block A and the moved block B by the motion vector mAB. We call the key frame as “K-Frame” and the predicted intermediate frame “P-Frame”.

Using Flow Advection for Inter-frame Compression We develop a novel method which utilizes a special bi-directional advection for inter-frame compression. Flow velocity fields are simplified to compute motion vectors over nonuniform blocks. The motion vectors advect key frames to approximate intermediate frames.

Intra-frame Compression Densities in both K-Frames and P-Frames are divided into 3D blocks. Block-based transform is implemented by discrete cosine transform (DCT).The trans- form creates sparse data: a large amount of coefficients are zero or close to zero. Data size can be further reduced through quantization of the floating data to generate more zeros, which leads to lossy but high-rate compression.

6.4.2 Decompression

In decompression, we first apply the necessary operations of lossless decoding, up-sampling, and inverse DCT transform for intra-frame decompression. Second, the motion vectors over the nonuniform blocks are used to reconstruct the approximated velocity field. Then the key frames are bidirectionally advected and blended using the weight maps to create intermediate frames.

Decompression is preferably completed with a fast computational speed. Our decompression method is mostly amenable for parallel computing, enabling GPU acceleration.

90 6.4.2.1 DCT Algorithm

During the decompression, each data block in K frame and P frame is decoded by inverse

Discrete Cosine Transform(DCT) algorithm. The definition of the inverse transform For three dimension is given as follows:

L 1 M 1 N 1 − − − π(2l + 1)i π(2m + 1)j π(2n + 1)k B = ( ( (a a a A cos cos cos )) (49) lmn i j k ijk 2L 2M 2N i=0 j=0 k=0 where A is the matrix to which IDCT is applied, l is from 0 to L 1, m is from 0 to M 1, ijk − − n is from 0 to N 1. The a , a ,a are defined as: − i j k

 1 ; i =0  √L ai =  (50) 2 ; 1 i L 1  L ≤ ≤ − 

 1 ; j =0  √M aj =  (51) 2 ; 1 j M 1  M ≤ ≤ −   1 ; k =0  √N ak =  (52) 2 ; 1 k N 1  N ≤ ≤ −  According to the definition, the complexity of IDCT is O(N 6) for six loops which is very time- consuming. Therefore, we resort CUDA for acceleration to save the computation time. CUDA has two levels of threads: thread block and grid, therefore, we can regard that all the threads are distributed into blocks. Similarly, the whole data set is also divided into blocks, and the block size can be 8 8 8, 16 16 16. To apply DCT algorithm on CUDA, we assign a thread × × × × block for a data block, which means if we have block size 8 8 8, then the thread block size × × 91 will be 512. However, if we apply 16 16 16, the thread block will be 4096 and CUDA has × × the limitation of thread block size of 512 for best performance. Therefore, this scheme doesn’t work for all the blocksize.

Base on Eqn.49, the DCT transform is applied along three directions: X, Y,Z at the same time.

It also can be separated along each direction in three steps. In order to utilize the two level system in CUDA, we first apply the IDCT algorithm on XY slice by X and Y directions, then on Z direction. As a result, each block is 8 8 or 16 16 for every slice which does not exceed × × the limitation of 512.

6.4.2.2 Velocity Field Reconstruction

The velocity field reconstruction (Eqn. 53) can be conducted in a similar way as splatting. Each block works as a splat contributing its motion vector to a grid position. Splatting of multiple blocks is performed in parallel.

i G(di)mi u∗(p)= , (53) G(di) i where di is the distance from p to the block center of mi, and G is the Gaussian function with the mean =0 and the standard deviation σ =1.0. This method makes u∗ changing smoothly over block boundaries.

6.4.2.3 Bidirection Advection

We develop a new method that advects smoke density field in K-Frames for the estimation of

P-Frames. The advection is performed from both forward and backward directions based on the reconstructed field. Based on solving Navier-Stokes equations, the densities are advected 92 Forward Forward Forward Advected Advected Advected Forward Forward Forward Frame K Frame K+1 Frame K+2

KFrame PFrame PFrame PFrame KFrame Blend Blend Blend A K K+1 K+2 B

Backward Backward Backward Advected Backward Advected Backward Advected Backward Frame K Frame K+1 Frame K+2

Figure 29: Bidirectional advection for P-Frames estimation from two consecutive K-Frames. Red and purple arrow lines represent forward advection and backward advection, respectively. by velocities as: ∂ρ = u ρ, (54) ∂t − ∇ where u is the velocity at time t and ρ is the density. Then, the advected results are blended together to create a closer approximation of the original data, so that the P-Frame estimation is optimized. The Fig. 29 illustrates our bidirectional advection algorithm.

The advection computation could also be parallelized on GPU because different grid positions can compute it independently. Finally, the blending operation is also parallelizable as it is also performed over individual grid positions.

6.4.2.4 Generate Particle List

After the decompression of the density field, the volume date is sent to rendering pipeline for visualization. The rendering algorithm we used is the half angle slicing provide by Nvidia

CUDA SDK sample(reference). This algorithm renders the density as particles which requires the conversion from the volume data to particle list. This can be easy done in serial code or single thread code, however it is not easy to be applied on multi-threaded platform such as

93 Figure 30: stream compaction [3].

GPU.

Fortunately, this is a classical parallel primitive named as stream compaction(also known as stream reduction) in wide-spread applications: collision detection, sparse matrix compression and so on. Stream compaction filters out the elements that are unwanted or not interested in the vector. In Fig. 6.4.2.4, all the interesting letters are reserved in an new output vector. The volume data in our case can be easy transformed into a vector. Finally, the desired elements are reserved in a smaller vector. This requires two parallel steps, scan and scatter. The first step needs an extra vector to record which elements passed the predicate and which elements are not wanted by using flag 1 or 0(see the Fig. 6.4.2.4). Then sum scan algorithm 2 is applied on this flag vector. Fig. 6.4.2.4 shows a simple example how this algorithm works in parallel. For n numbers, there are log2n totally scans operate. Each scan employs n addition operations and these n operations will be done in parallel. As a result of this scan algorithm, this cummulative sum generated from the flag vector provides the new indexes of the interesting elements in the

final new output vector. Combining with the original data vector and the flag vector(see the

Fig. 6.4.2.4), the new filtered out vector can be generated in parallel.

94 Figure 31: Scan Algorithm [3].

Figure 32: Scan and Scatter [3].

95 Algorithm 2 Sum Scan 1: 2: for skip=1ton do 3: 4: for k = skip to n in parallel do 5: xoutput [k] = xin[k-skip] + xin[k] 6: k=k+skip 7: 8: end for 9: skip = skip * 2 10: 11: end for

Resolution inverse DCT Velocity Construction Advection Blending Generate Particles Rendering 48X64X48 4ms/36ms 2ms/21ms 2ms/58ms 0ms/0ms 3ms/8ms 29ms 96X128X96 9.8ms/80ms 94ms/162ms 192ms/410ms 0ms/4ms 12ms/15ms 30ms 128X192X128 25ms/218ms 225ms/1200ms 224ms/3120ms 2ms/25ms 30ms/29ms 30ms Table 6: Fluid decompression

6.4.3 Result

According to section 6.4.2.1, the implementation of DCT requires two kernels here: one for X and Y direction, one for Z direction. Since the whole 3D data is divided into blocks, we use the nature of two-level thread hierarchy in CUDA. For each XY slices, the thread block either has 8 8 or 16 16 threads. The computation of DCT only inside the blocks and only requires × × the data from the block itself, therefore, we could load the data of block into share memory for speeding up the calculation. Except the DCT kernels, all other kernels only use global memory.

The implement of bi-direction advection is similar with trajectory tracing in section 6.3, but only requires linear interpolation.

Table 6 describes the performance for each step in the fluid decoder: inverse DCT, velocity construction, advection, blending and particle generation. To render the final smoke result, we employ the half angle slicing rendering method [3]. The GPU version performs much better than the CPU version and speedup ratio is different for the different algorithm complexity.

96 The time discrepancy for generating particles between CPU and GPU is obvious here. The difference is not dramatic for generating particles here, since the CPU version is already very fast and the most challenge here is how to parallelize the algorithm and fit into the GPU ar- chitecture. From the results, we can observe that the smaller resolution can achieve real-time performance. Further optimization should be developed for higher resolution to receive better performance.

97 CHAPTER 7

Conclusion

In this dissertation, we have proposed our solutions to four important and challenging topics in enhancing fluid modeling with turbulence and acceleration: distance field representation of obstacles in fluid, adaptive and controllable turbulence enhancement, Langevin Particles and

GPU acceleration in fluid modeling. All these fields aims at creating realistic and fast fluid

field which are significant in Computer Graphics. In summary, our main contributions of this dissertation can be generalized as follows:

We propose a novel distance field transform method based on an iterative method adap- • tively performed on an evolving active band. Our method utilizes a narrow band to store

active grid points being computed. Unlike the conventional fast marching method, we do

not maintain a priority queue, and instead, perform iterative computing inside the band.

This new algorithm alleviates the programming complexity and the data-structure (e.g.

a heap) maintenance overhead, and leads to a parallel amenable computational process.

During the active band propagating from a starting boundary layer, each grid point stays

in the band for a lifespan time, which is determined by analyzing the particular geometric

property of the grid structure. In this way, we find the Face-Centered Cubic (FCC) grid

is a good 3D structure for distance transform.We further develop a multiple-segment

method for the band propagation, achieving the computational complexity of O(mN)

with a segment-related constant m;

98 This dissertation proposes a new scheme for enhancing fluid animation with controllable • turbulence. An existing fluid simulation from ordinary fluid solvers is fluctuated by tur-

bulent variation modeled as a random process of forcing. The variation is precomputed as

a sequence of solenoidal noise vector fields directly in the spectral domain, which is fast

and easy to implement. The spectral generation enables flexible vortex scale and spec-

trum control following a user prescribed energy spectrum, e.g. Kolmogorovs cascade

theory, so that the fields provide fluctuations in subgrid scales and/or in preferred large

octaves. The vector fields are employed as turbulence forces to agitate the existing flow,

where they act as a stimulus of turbulence inside the framework of the Navier-Stokes

equations, leading to natural integration and temporal consistency. The scheme also

facilitates adaptive turbulent enhancement steered by various physical or user-defined

properties, such as strain rate, vorticity, distance to objects and scalar density, in criti-

cal local regions. Furthermore, an important feature of turbulent fluid, intermittency, is

created by applying turbulence control during randomly selected temporal periods.;

We develop a new Lagrangian primitive, named Langevin particle, to incorporate tur- • bulent flow details in fluid simulation. A group of the particles are distributed inside

the simulation domain based on a turbulence energy model with turbulence viscosity. A

particle in particular moves obeying the generalized Langevin equation, a well known

stochastic differential equation that describes the particles motion as a random Markov

process. The resultant particle trajectory shows self-adapted fluctuation in accordance

to the turbulence energy, while following the global flow dynamics. We then feed back

Langevin forces to the simulation based on the stochastic trajectory, which drive the

99 flow with necessary turbulence. The new hybrid flow simulation method features nonre-

stricted particle evolution requiring minimal extra manipulation after initiation. The flow

turbulence is easily controlled and the total computational overhead of enhancement is

minimal based on typical fluid solvers.;

To achieve fast fluid modeling, we accelerated LBM solver, FTLE field computation and • fluid decompression. LBM is an explicit numerical scheme and only requires local in-

formation from the neighbors. This scheme can be intuitively parallelized on GPU. We

accelerate two different models: Single-relaxation-time LBM (SRTLBM) and Multiple-

relaxation-time LBM (MRTLBM). FTLE which is used to extract the structure feature

of the fluid, has very intense local computation. With computation capacity, the perfor-

mance can be extraordinarily improved. For the fluid decompression strategy proposed

in this dissertation, we design the entire pipeline on GPU. The most challenge steps are

the dct inverse transform and the particle list generation. To utilize the CUDA program-

ming model, we process the 3D data by slices for dct inverse transform. Furthermore,

the scan and scatter scheme is suitable for our scheme to generate particles from 3D grid

data.

100 BIBLIOGRAPHY

[1] Nvidia, “Nvidia cuda compute unified device architecture programming guide version 2.0,” 2008. [2] Y. Zhao, “Lattice boltzmann based pde solver on the gpu,” Vis. Comput., vol. 24, no. 5, pp. 323–333, Mar. 2008. [3] H. Nguyen, Gpu gems 3, 1st ed. Addison-Wesley Professional, 2007. [4] R. Bridson, Fluid Simulation for Computer Graphics. Natick, MA,USA: A. K. Peters, Ltd., 2008. [5] J. Stam, “Stable fluids,” Proceedings of SIGGRAPH, pp. 121–128, 1999. [6] D. Wolf-Gladrow, “A lattice Boltzmann equation for diffusion,” Journal of Statistical Physics, vol. 79, no. 5-6, pp. 1023–1032, 1995. [7] S. Chen and G. Doolean, “Lattice Boltzmann method for fluid flows,” Annual Review of Fluid Mechanics, vol. 30, pp. 329–364, 1998. [8] S. Ubertini, G. Bella, and S. Succi, “Lattice Boltzmann method on unstructured grids: Further developments,” Physical Review E, vol. 68, 016701, 2003. [9] S. Premoze, T. Tasdizen, J. Bigler, A. Lefohn, and R.Whitaker, “Particle-based simula- tion of fluids,” Proceeding of Eurographics, pp. 401–410, 2003. [10] K. Anjyo, P. Beaudoin, P. F. (editors, S. Clavet, and P. Poulin, “Particle-based viscoelas- tic fluid simulation,” 2005. [11] L. B. Lucy, “A numerical approach to the testing of the fission hypothesis,” 1977. [12] R. A. Gingold and J. J. Monaghan, “Smoothed particle hydrodynamics: theory and application to non-spherical stars,” Monthly Notices of the Royal Astronomical Society, vol. 181, no. 181, pp. 375–389, 1977. [13] W. T. Reeves, “Particle systems-a technique for modeling a class of fuzzy objects,” ACM Transactions on Graphics, vol. 2, no. 2, pp. 91–108, 1983. [14] M. M¨uller, D. Charypar, and M. Gross, “Particle-based fluid simulation for interactive applications,” in SCA ’03: Proceedings of the 2003 ACM SIGGRAPH/Eurographics symposium on Computer animation. Aire-la-Ville, Switzerland, Switzerland: Euro- graphics Association, 2003, pp. 154–159. [15] M. M¨uller, R. Keiser, A. Nealen, M. Pauly, M. Gross, and M. Alexa, “Point based animation of elastic, plastic and melting objects,” in SCA ’04: Proceedings of the 2004 ACM SIGGRAPH/Eurographics symposium on Computer Animation. Aire-la-Ville, Switzerland, Switzerland: Eurographics Association, 2004, pp. 141–151.

101 [16] S. Clavet, P. Beaudoin, and P. Poulin, “Particle-based viscoelastic fluid simulation,” in SCA ’05: Proceedings of the 2005 ACM SIGGRAPH/Eurographics symposium on Com- puter animation. New York, NY, USA: ACM, 2005, pp. 219–228. [17] M. Becker and M. Teschner, “Weakly compressible sph for free surface flows,” in Pro- ceedings of the ACM SIGGRAPH/Eurographics symposium on Computer animation. Aire-la-Ville, Switzerland, Switzerland: Eurographics Association, 2007, pp. 209–217. [18] B. Solenthaler and R. Pajarola, “Density contrast SPH interfaces,” in SCA ’08: Proceed- ings of the 2008 ACM SIGGRAPH/Eurographics Symposium on Computer Animation. Aire-la-Ville, Switzerland, Switzerland: Eurographics Association, 2008, pp. 211–218. [19] M. W. Jones and R. Satherley, “Using distance fields for object representation and ren- dering,” Proceedings of Eurographics, pp. 37–44, 2001. [20] D. Cohen-Or, D. Levin, and A. Solomovici, “Three-dimensional distance field metamor- phosis,” ACM Transactions on Graphics, vol. 17, no. 2, pp. 116–141, 1998. [21] Y. Zhou and A. W. Toga, “Efficient skeletonization of volumetric objects,” IEEE Trans- actions on Visualization and Computer Graphics, vol. 5, no. 3, pp. 196–209, 1999. [22] R. N. Perry and S. F. Frisken, “Kizamu: A system for sculpting digital characters,” Proceedings of SIGGRAPH, pp. 47–56, 2001. [23] D. Breen and R. Whitaker, “A level-set approach for the metamorphosis of solid mod- els,” IEEE Transactions on Visualization and Computer Graphics, vol. 7, no. 2, pp. 173–192, 2001. [24] B. A. Payne and A. W. Toga, “Distance field manipulation of surface models,” IEEE Computer Graphics and Applications, vol. 12, no. 1, pp. 65–71, 1992. [25] A. Sud, N. Govindaraju, R. Gayle, and D. Manocha, “Interactive 3d distance field com- putation using linear factorization,” in Proceedings of the 2006 symposium on Interactive 3D graphics and games. New York, NY, USA: ACM, 2006, pp. 117–124. [26] S. Osher and L. Rudin, “Rapid convergence of approximate solutions to shape-from- shading problem,” Technical report, Cognitech Inc., 1993. [27] E. Rouy and A. Tourin, “A viscosity solutions approach to shape-from-shading,” SIAM Journal on Numerical Analysis, vol. 29, no. 3, pp. 867–884, 1992. [28] R. Satherley and M. Jones, “Vector-city vector distance transform,” Computer Vision and Image Understanding, vol. 82, no. 3, pp. 238–254, 2001. [29] J. Tsitsiklis, “Efficient algorithms for globally optimal trajectories,” IEEE Trans. Auto- matic Control, vol. 40, no. 9, pp. 1528–1538, 1995. [30] D. E. Breen, S. Mauch, and R. T. Whitaker, “3D scan conversion of csg models into dis- tance volumes,” Proceedings of Symposium on Volume Visualization, ACM SIGGRAPH, pp. 7–14, 1998. [31] J. Sethian, Level Set Methods and Fast Marching Methods, second ed. Cambridge University Press, 1999.

102 [32] H. Zhao, “Fast sweeping method for eikonal equations,” Math. of Computation, vol. 74, pp. 603–627, 2004. [33] J. Qian, Y.-T. Zhang, and H.-K. Zhao, “Fast sweeping methods for eikonal equations on triangular meshes,” SIAM J. Numer. Anal., vol. 45, no. 1, pp. 83–107, 2007. [34] ——, “A fast sweeping method for static convex Hamilton-Jacobi equations,” J. Sci. Comput., vol. 31, no. 1-2, pp. 237–271, 2007. [35] S. F. Frisken, R. N. Perry, A. P. Rockwood, and T. R. Jones, “Adaptively sampled dis- tance fields: A general representation of shape for computer graphics,” Proceedings of SIGGRAPH 2000, pp. 249–254, 2000. [36] R. Fabbri, L. D. F. Costa, J. C. Torelli, and O. M. Bruno, “2d euclidean distance trans- form algorithms: A comparative survey,” ACM Comput. Surv., vol. 40, no. 1, pp. 1–44, 2008. [37] M. W. Jones, J. A. Baerentzen, and M. Sramek, “3d distance fields: A survey of tech- niques and applications,” IEEE Transactions on Visualization and Computer Graphics, vol. 12, no. 4, pp. 581–599, 2006. [38] W.-K. Jeong and R. Whitaker, “Fast eikonal equation solver for parallel systems,” SIAM conference on Computational Science and Engineering, 2007. [39] W. Zhao, “The fast sweeping method of eikonal. equations and its parallelism.” Master Thesis, INRIA France and Royal Institute of Technology, Sweden, 2003. [40] A. E. Lefohn, J. M. Kniss, C. D. Hansen, and R. T. Whitaker, “A streaming narrow-band algorithm: Interactive computation and visualization of level sets,” IEEE Transactions on Visualization and Computer Graphics, vol. 10, pp. 422–433, 2004. [41] S. Mauch, “Efficient algorithms for solving static hamilton-jacobi equations,” Ph.D. dis- sertation, California Institute of Technology, Pasadena, California, 2003. [42] K. Museth, D. Breen, R. Whitaker, S. Mauch, and D. Johnson, “Algorithms for interac- tive editing of level set models,” Computer Graphics Forum, vol. 24, no. 4, pp. 821–841, 2005. [43] C. Sigg, R. Peikert, and M. Gross, “Signed distance transform using graphics hardware,” Proceedings of the 14th IEEE Visualization 2003, pp. 83–90, 2003. [44] K. Hoff, J. Keyser, M. Lin, D. Manocha, and T. Culver, “Fast computation of generalized voronoi diagrams using graphics hardware,” Proceedings of SIGGRAPH, pp. 277–286, 1999. [45] O. Weber, Y. S. Devir, A. M. Bronstein, M. M. Bronstein, and R. Kimmel, “Parallel algorithms for approximation of distance maps on parametric surfaces,” ACM Trans. Graph., vol. 27, no. 4, pp. 1–16, 2008. [46] J. Monaghan, “Smoothed particle hydrodynamics,” Annul Revision on Progress in Physics, vol. 68, no. 8, pp. 1703–1759, 2005. [47] Y.Zhu and R. Bridson, “Animating sand as a fluid,” in Proceedings of ACM SIGGRAPH. New York, NY, USA: ACM, 2005, pp. 965–972. 103 [48] B. Kim, Y. Liu, I. Llamas, and J. Rossignac, “Advections with significantly reduced dissipation and diffusion,” IEEE Transactions on Visualization and Computer Graphics, vol. 13, no. 1, pp. 135–144, 2007. [49] A. Selle, R. Fedkiw, B. Kim, Y. Liu, and J. Rossignac, “An unconditionally stable mac- cormack method,” J. Sci. Comput., vol. 35, no. 2-3, pp. 350–371, 2008. [50] F. Losasso, F. Gibou, and R. Fedkiw, “Simulating water and smoke with an octree data structure,” ACM Trans. Graph., vol. 23, no. 3, pp. 457–462, 2004. [51] M. Wicke, M. Stanton, and A. Treuille, “Modular bases for fluid dynamics,” ACM Trans. Graph., vol. 28, no. 3, pp. 1–8, 2009. [52] S. Elcott, Y. Tong, E. Kanso, P. Schr¨oder, and M. Desbrun, “Stable, circulation- preserving, simplicial fluids,” ACM Trans. Graph., vol. 26, no. 1, p. 4, 2007. [53] P. Mullen, K. Crane, D. Pavlov, Y. Tong, and M. Desbrun, “Energy-preserving integra- tors for fluid animation,” ACM Trans. Graph., vol. 28, no. 3, 2009. [54] M. Lentine, W. Zheng, and R. Fedkiw, “A novel algorithm for incompressible flow using only a coarse grid projection,” in SIGGRAPH ’10: ACM SIGGRAPH 2010 papers. New York, NY, USA: ACM, 2010, pp. 1–9. [55] B. E. Feldman, J. F. O’Brien, B. M. Klingner, and T. G. Goktekin, “Fluids in deforming meshes,” in Proceedings of the 2005 ACM SIGGRAPH/Eurographics symposium on Computer animation. New York, NY, USA: ACM, 2005, pp. 255–259. [56] G. Irving, E. Guendelman, F. Losasso, and R. Fedkiw, “Efficient simulation of large bod- ies of water by coupling two and three dimensional techniques,” in ACM SIGGRAPH. New York, NY, USA: ACM, 2006, pp. 805–811. [57] R. Bridson, J. Houriham, and M. Nordenstam, “Curl-noise for procedural fluid flow,” in Proceeding of ACM SIGGRAPH. New York, NY, USA: ACM, 2007, p. 46. [58] M. Patel and N. Taylor, “Simple divergence-free fields for artistic simulation,” Journal of Graphics Tools, vol. 10, no. 4, pp. 49–60, 2005. [59] R. F. Voss, “Random fractal forgeries,” in Fundamental Algorithms in Computer Graph- ics, R. Earnshaw, Ed. Springer, 1985, pp. 805–883. [60] R. Fattal and D. Lischinski, “Target-driven smoke animation,” in SIGGRAPH ’04: ACM SIGGRAPH 2004 Papers. New York, NY, USA: ACM, 2004, pp. 441–448. [61] L. Shi and Y. Yu, “Controllable smoke animation with guiding objects,” ACM Trans. Graph., vol. 24, no. 1, pp. 140–164, 2005. [62] N. Th¨urey, R. Keiser, M. Pauly, and U. R¨ude, “Detail-preserving fluid control,” in Pro- ceedings of the 2006 ACM SIGGRAPH/Eurographics symposium on Computer anima- tion, Aire-la-Ville, Switzerland, Switzerland, 2006, pp. 7–12. [63] J. Barbiˇcand J. Popovi´c, “Real-time control of physically based simulations using gentle forces,” ACM Trans. Graph., vol. 27, no. 5, pp. 1–10, 2008.

104 [64] H. Schechter and R. Bridson, “Evolving sub-grid turbulence for smoke animation,” in Eurographics/ACM SIGGRAPH Symposium on Computer Animation, 2008, pp. 1–8. [65] T. Kim, N. Th¨urey, D. James, and M. Gross, “Wavelet turbulence for fluid simulation,” in Proceeding of ACM SIGGRAPH. New York, NY, USA: ACM, 2008, pp. 1–6. [66] R. Narain, J. Sewall, M. Carlson, and M. C. Lin, “Fast animation of turbulence using energy transport and procedural synthesis,” in Proceeding of ACM SIGGRAPH Asia. New York, NY, USA: ACM, 2008, pp. 1–8. [67] T. Pfaff, N. Thuerey, J. Cohen, S. Tariq, and M. Gross, “Scalable fluid simulation using anisotropic turbulence particles,” in ACM SIGGRAPH Asia, 2010, p. To appear. [68] U. Frisch, Turbulence: The legacy of A.N. Kolmogorov. Cambridge University Press, 1995. [69] R. Fedkiw, J. Stam, and H. Jensen, “Visual simulation of smoke,” Proceedings of SIG- GRAPH, pp. 15–22, 2001. [70] A. Selle, N. Rasmussen, and R. Fedkiw, “A vortex particle method for smoke, water and explosions,” Proceedings of SIGGRAPH, pp. 910–914, 2005. [71] T. Pfaff, N. Thuerey, A. Selle, and M. Gross, “Synthetic turbulence using artificial boundary layers,” in SIGGRAPH Asia ’09: ACM SIGGRAPH Asia 2009 papers. New York, NY, USA: ACM, 2009, pp. 1–10. [72] J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Krger, A. E. Lefohn, and T. Pur- cell, “A survey of general-purpose computation on graphics hardware,” 2007. [73] Y. Zhao, “Lattice Boltzmann based PDE solver on the GPU,” The Visual Computer, International Journal of Computer Graphics, To appear, 2008. [74] K. Mattila, J. Hyv¨aluoma, T. Rossi, M. Aspn¨as, and J. Westerholm, “An efficient swap algorithm for the lattice Boltzmann method,” Computer Physics Communications,no. 3, pp. 200–210. [75] Z. Fan, F. Qiu, A. Kaufman, and S. Yoakum-Stover, “Gpu cluster for high performance computing,” in Proceedings of the 2004 ACM/IEEE conference on Supercomputing, ser. SC ’04. Washington, DC, USA: IEEE Computer Society, 2004, pp. 47–. [76] C. Obrecht, F. Kuznik, B. Tourancheau, and J.-J. Roux, “Scalable lattice boltzmann solvers for CUDA GPU clusters,” Parallel Computing, vol. 39, no. 6-7, pp. 259 – 270, 2013. { }{ } [77] C. Conti, D. Rossinelli, and P. Koumoutsakos, “ GPU and APU computations of finite time lyapunov exponent fields,” Journal of Computational{ } { Physics} , vol. 231, no. 5, pp. 2229 – 2244, 2012. [78] S. L. Brunton and C. W. Rowley, “Fast computation of finite-time lyapunov exponent fields for unsteady flows,” Chaos: An Interdisciplinary Journal of Nonlinear Science, vol. 20, no. 1, p. 017503, 2010. [79] s. barakat, C. Garth, and X. Tricoche, “Interactive computation and rendering of finite- time lyapunov exponent fields,” IEEE Transactions on Visualization and Computer 105 Graphics, vol. 18, no. 8, pp. 1368–1380, Aug. 2012. [80] M. Balsa Rodriguez, E. Gobbetti, J. Iglesias Guiti´an, M. Makhinya, F. Marton, R. Pa- jarola, and S. Suter, “A survey of compressed gpu-based direct volume rendering,” in Eurographics State-of-the-art Report, May 2013, to appear. [81] J. H. Conway, N. J. A. Sloane, and E. Bannai, Sphere-packings, lattices, and groups. Springer-Verlag, 1987. [82] J. Stam and E. Fiume, “Turbulent wind fields for gaseous phenomena,” in SIGGRAPH ’93: Proceedings of the 20th annual conference on Computer graphics and interactive techniques. New York, NY, USA: ACM, 1993, pp. 369–376. [83] N. Rasmussen, D. Q. Nguyen, W. Geiger, and R. Fedkiw, “Smoke simulation for large scale phenomena,” ACM Trans. Graph., vol. 22, no. 3, pp. 703–707, 2003. [84] J. Stam, “A general animation framework for gaseous phenomena,” ERCIM Research Report R047, 1997. [85] J. Laval, B. Dubrulle, and J. C. McWilliams, “Langevin models of turbulence: Renor- malization group, distant interaction algorithms or rapid distortion theory?” Physics Of Fluids, vol. 15, no. 5, 2003. [86] V. Canuto and M. Dubovikov, “A dynamical model for turbulence. i. general formalism,” Physics of Fluids, vol. 8, no. 2, 1996. [87] S. B. Pope, Turbulent Flows. Cambridge University Press, 2000. [88] K. Alvelius, “Random forcing of three-dimensional homogeneous turbulence,” Physics of Fluids, vol. 11, no. 7, pp. 1880–1889, 1999. [89] M. R. Overholt and S. B. Pope, “A deterministic forcing scheme for direct numerical simulations of turbulence,” Comput. Fluids, vol. 27, no. 1, pp. 11–28, 1998. [90] J.-P. Minier, E. Peirano, and S. Chibbaro, “PDF model based on Langevin equation for polydispersed two-phase flows applied to a bluff-body gas-solid flow,” Physics of Fluids, vol. 16, pp. 2419–2431, 2004. [91] B. D. Hughes, Random walks and random environments. Oxford University Press, 1996. [92] D. Kim, O.-y. Song, and H.-S. Ko, “A practical simulation of dispersed bubble flow,” ACM Trans. Graph., vol. 29, no. 4, pp. 1–5, 2010. [93] P. E. Kloeden and E. Platen, Numerical Solution of Stochastic Differential Equations. Springer, 1992. [94] S. Pope, “A Lagrangian two-time probability density function equation for inhomoge- neous turbulent flows,” Physics of Fluids, pp. 3448–3450, 1983. [95] B. S. Baldwin and H. Lomax, “Thin layer approximation and algebraic model for seper- ated turbulent flows,” American Institute of Aeronautics and Astronautics Meeting, pa- per 78-257, 1978.

106 [96] G. Box and M. Muller, “A note on the generation of random normal deviates,” Annals Math. Stat, vol. 29, pp. 610–611, 1958. [97] W. T. Coffey, Y. P. Kalmykov, and J. T. Waldron, The Langevin Equation: with Applica- tions to Stochastic Problems in Physics, Chemistry and Electrical Engineering. World Scientific, 2nd edition, 2004. [98] Y. Zhao, Z. Yuan, and F. Chen, “Enhancing fluid animation with adaptive, controllable and intermittent turbulence,” Proceedings of ACM SIGGRAPH/Eurographics Sympo- sium on Computer Animation, July, 2010. [99] M. B. Nielsen and B. B. Christensen, “Improved variational guiding of smoke anima- tions,” Computer Graphics Forum, vol. 29, pp. 705–712, 2010. [100] P. L. Bhatnagar, E. P. Gross, and M. Krook, “A model for collision processes in gases. I. small amplitude processes in charged and neutral one-component systems,” Physical Review, vol. 94, no. 3, pp. 511–525, 1954. [101] D. d’Humi´eres, I. Ginzburg, M. Krafczyk, P. Lallemand, and L. Luo, “Multiple- relaxation-time lattice Boltzmann models in three-dimensions,” Philosophical Trans- actions of Royal Society of London, vol. 360, no. 1792, pp. 437–451, 2002. [102] G. Haller, “Distinguished material surfaces and coherent structures in 3D fluid flows,” Physica D, vol. 149, pp. 248–277, 2001.

107