<<

LiU-ITN-TEK-A-14/009-SE

Dynamisk visualisering av rymdvädersimuleringsdata Victor Sand

2014-05-16

Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University Linköpings universitet nedewS ,gnipökrroN 47 106-ES 47 ,gnipökrroN nedewS 106 47 gnipökrroN LiU-ITN-TEK-A-14/009-SE

Dynamisk visualisering av rymdvädersimuleringsdata Examensarbete utfört i Medieteknik vid Tekniska högskolan vid Linköpings universitet Victor Sand

Handledare Alexander Bock Examinator Anders Ynnerman

Norrköping 2014-05-16 Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – under en längre tid från publiceringsdatum under förutsättning att inga extra- ordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet - or its possible replacement - for a considerable time from the date of publication barring exceptional circumstances. The online availability of the document implies a permanent permission for anyone to read, to download, to print out single copies for your own use and to use it unchanged for any non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional on the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its WWW home page: http://www.ep.liu.se/

© Victor Sand Dynamic of Space Weather Data

Victor Sand Civilingenj¨or Medieteknik Link¨oping University

Master’s thesis Goddard Space Flight Center, Maryland, USA Norrk¨oping,Sweden

June 2014 Abstract

The work described in this thesis is part of the Open Space project, a collaboration between Link¨oping University, the National Aeronautics and Space Administration and the American Museum of Natural History. The long-term goal of Open Space is a multi-purpose, open-source scientific vi- sualization software.

The thesis covers the research and implementation of a pipeline for prepar- ing and rendering volumetric data. The developed pipeline consists of three stages: A data formatting stage which takes data from various sources and prepares it for the rest of the pipeline, a pre-processing stage which builds a tree structure of of the raw data, and finally an interactive rendering stage which draws a volume using ray-casting.

Large parts of the system are built around the use of a Time-Space Parti- tioning tree, originally described by Shen et al. This tree structure uses an error metric system and an octree-based structure to efficiently choose the appropriate level of detail during rendering. The data storage and structure are similar to the one in the GigaVoxels system by Crassin et al. Using a combination of these concepts and constructing the pipeline around them, space weather related volumes have been successfully rendered at interactive rates.

The pipeline is a fully working proof-of-concept for future development of Open Space, and can be used as-is to render space weather data. Many concepts and ideas from this work can be utilized in the larger-scale software project. iv Acknowledgements

First of all, I would like to thank my examinator, professor Anders Yn- nerman, for the fantastic opportunity and for keeping the project running. Thanks also to my excellent advisor Alexander Bock for many late hours of support and idea discussions. Your willingness to help and share your vast graphics knowledge has been truly invaluable.

Thank you Masha for your tireless and dedicated work with CCMC and for taking care of us thesis students. Your genuine interest in the project is a requirement for its success! I’m sure the next couple of students will feel just as welcome. Thank you Carter for keeping us busy and for the great private tour of the museum. Bob, thank you for keeping an eye on the big picture!

Aki, thanks for making my commute shorter, my lunches more tasty and my country music knowledge more solid. Nate, thanks for being a bro and thanks Avery for letting me sleep on your floor for a while. Come to Sweden and I’ll repay the favors! Thanks Martin for doing a great job during the first stage of the project and thereby making my job easier. Thanks to my many different roomates and friends in Washington D.C. for making my stay so much more than only work. I hope to see many of you again soon!

Many thanks to Holmen AB, Sparbankstiftelsen Alfa and Stiftelsen Anna Whitlocks Minnesfond for the financial help when CSN wouldn’t lend me more money. I could have not completed my stay without it.

Finally, thanks to my family for the endless support and encouragement!

Victor Stockholm, February 2014 ii Contents

1 Introduction 1 1.1 Background ...... 1 1.2 Aim and Goals ...... 2 1.3 Method ...... 2 1.4 Limitations ...... 2 1.5 ThesisStructure ...... 3

2 Background 5 2.1 SpaceWeather ...... 5 2.2 CommunityCoordinatedModelingCenter ...... 6 2.3 OpenSpace...... 6

3 Previous Work 9 3.1 VolumeRay-Casting ...... 9 3.2 TSP Tree Acceleration ...... 11 Overview and Motivation ...... 11 Structure ...... 11 Traversal ...... 11 ErrorMetrics ...... 13 3.3 Rendering of Large Voxel Datasets ...... 13 DataStructure ...... 13 Rendering...... 14

4 Pipeline Overview 15 4.1 PipelineStages ...... 15 4.2 InputsandOutputs ...... 16

iii CONTENTS

5 TSP Tree Implementation 17 5.1 Bricks ...... 17 5.2 Separation of Structure and Data ...... 17 5.3 Memory Layout ...... 18 5.4 ErrorMetrics ...... 19 5.5 PointerStructure...... 20 5.6 Traversal ...... 20

6 Data Formatting 23 6.1 Space Weather Data Sources ...... 23 ENLIL...... 23 CDF Data Format ...... 23 Kameleon ...... 24 6.2 Furnace ...... 24 6.3 Voxel Data Format ...... 24

7 Data Pre-Processing 27 7.1 Forge ...... 27 7.2 TSPTreeConstruction ...... 28 BrickPadding...... 28 OctreeConstruction ...... 29 BSTAssembling ...... 29 7.3 TSP Data Format ...... 30

8 Rendering 31 8.1 Flare...... 31 8.2 TSP Structure and Error Metrics Construction ...... 32 TSPStructureConstruction...... 32 Error Metrics Calculation ...... 32 ErrorCaching...... 32 8.3 Intra-FramePipeline ...... 33 View Ray Generation ...... 33 TSPTreeProbing ...... 34 Brick Uploading ...... 35

iv CONTENTS

Ray-Casting...... 36 8.4 Asynchronous Execution ...... 37 8.5 RenderingParameters ...... 37 8.6 ClusterRendering ...... 38 SGCT...... 38

9 Results 39 9.1 Hardware ...... 39 9.2 RenderingBenchmarks...... 39 9.3 ErrorMetricsBenchmarks...... 39 9.4 VisualResults...... 40 DesktopRendering...... 41 DomeRendering ...... 41

10 Discussion and Future Work 45 10.1 Visual Quality ...... 45 Rendering...... 45 10.2Interactivity...... 45 Performance...... 45 10.3Pipeline ...... 46 Encapsulation ...... 46 10.4TSPTreeStructure ...... 46 Construction ...... 46 Storage ...... 46 10.5 Data Formats ...... 47 10.6ErrorMetrics ...... 47 Calculation ...... 47 Control ...... 48 10.7Rendering...... 49

11 Conclusions 51

References 53

v CONTENTS

A Code Samples 55 A.1 TSPTreeTraversal...... 55 A.2 BrickPadding...... 62 A.3 OctreeConstruction ...... 64 A.4 BSTAssembling ...... 66 A.5 TSPTreeStructureConstruction...... 69 A.6 ErrorMetrics ...... 71 A.7 Rendering Loop ...... 76

vi 1

Introduction

This first chapter briefly discusses the background and goals of the work. It also describes used methods as well as the thesis’s structure and limitations. Note that the background is described in more detail in chapter 2.

1.1 Background

Open Space is the working title for a project initiated in the fall of 2012. Collaborators in the project are Link¨oping University, the Community Coordinated Modeling Cen- ter (CCMC) at the National Aeronautics and Space Administration (NASA) and the American Museum of Natural History (AMNH). The long-term goal of Open Space is an open-source scientific visualization software with focus on space-related data sources. This software will be capable of producing efficient, accurate and beautiful visualiza- tions of phenomena on a scale ranging from the size of atoms to the size of the entire known universe. The uses for this software will be both scientific as well as for public dissemination. In order to accomplish this goal, the participants are engaged in a Master’s thesis student project. This collaboration enables students from Link¨oping to be on-site at NASA Goddard Space Flight Center, working close to the NASA scientists and the data sources. Input and feature requests for the project come from all three stakeholders, giving the project a broad purpose that is rooted in research, in space science and in multimedia.

1 1. INTRODUCTION

The part of the Open Space project described in this thesis aims to efficiently render time-varying data sets of space weather using volumetric voxel rendering.

1.2 Aim and Goals

One of the most challenging aspects of volumetric rendering is handling large data sets efficiently. Time-varying data sets provide additional challenges due to memory limitations and the need to update the rendering often in order to achieve an animation with an acceptable frame rate. The aim of the work presented within this thesis is to implement an efficient volu- metric rendering pipeline, capable of handling large time-varying data sets. The results of this work will later be implemented in the larger-scale Open Space project. Addi- tionally, the implemented rendering system will have enough functionality to provide visualizations that can be used in presentations, videos et cetera.

1.3 Method

The thesis work will be carried out by implementing a volumetric rendering system from the ground up. The input to this rendering system will be data from space weather simulations. The system will be continuously improved as the work develops. Having a basic functionality working early enables an iterative approach, and makes modular implementation and testing easier as more advanced features are implemented.

1.4 Limitations

Since the thesis focuses on the rendering efficiency and the pipeline, less focus will be put on the space weather application domain. Although the software will be capable of rendering arbitrary volumetric data provided the right preprocessing steps are taken, a smaller amount of time is spent on the applications than in the previous prototyping phase (see section 2.3). For the same reasons, the rendering techniques are very simple compared to what is possible today.

2 1.5 Thesis Structure

1.5 Thesis Structure

To properly familiarize the reader with the subject and the project that this thesis work is a part of, the thesis will start with a brief section on space weather and some of the background of the collaboration. Then some previous work on Open Space and computer graphics will be presented, before describing the implemented pipeline. The chapter “Pipeline Overview” does not go into any implementation details, but is very useful for putting the subsequent chapters in context. After the high-level overview some time is spent on describing the implementation one of the main techniques, the Time-Space Partitioning (TSP) tree. These methods are used in many parts of the pipeline, and are therefore also presented early in the thesis. Following the introductory chapters are the three chapters that each describe a different part of the pipeline. Results, future work topics and discussion of the work are presented last in the main part of the thesis. To further explain some of the implementation thesis, an appendix with selected code samples is included in the back.

3 1. INTRODUCTION

4 2

Background

This chapter will provide context to the thesis by outlining the Open Space project, and breifly discussing what space weather is and how it is being studied.

2.1 Space Weather

The National Research Council explains the concept of space weather in the following way (1).

“Space weather” describes the conditions in space that affect Earth and its technological systems. Our space weather is a consequence of the behav- ior of the sun, the nature or Earth’s magnetic field and atmosphere, and our location in the solar system.

The National Space Weather Program Council has a similar description of the subject and also mentions the effects that space weather can have on earth (2):

“Space weather” refers to conditions on the sun and in the solar wind, magnetosphere, ionosphere, and thermosphere that can influence the perfor- mance and reliability of space-born and ground-based technological system and can endanger human life or health. Adverse conditions in the space environment can cause disruptions of satellite operations, communications, navigations, and electric power distribution grids, leading to a variety of socioeconomic losses.

5 2. BACKGROUND

2.2 Community Coordinated Modeling Center

Given the possible effects on earth, it is desirable to study and predict space weather events. The Community Coordinated Modeling Center (CCMC) at NASA Goddard Space Flight Center works with space weather simulation and forecasting. The center also provides the scientific community access to the models and resources for develop- ment and research.

2.3 Open Space

The prototyping phase of the Open Space project resulted in a thesis by T¨ornros (3). This work contains a thorough summary of the modeling and simulations tools used at CCMC, as well as an overview of pre-existing visualization software. The thesis also presents an approach for visualizing space weather data by means of volumetric rendering and ray-casting. An open-source software for interactive , (4), is used and extended to produce interactive renderings of space weather events. These renderings are done for one time step at a time. Screenshots from T¨ornros’ thesis can be found in figures 2.1 and 2.2. The results of this prototyping phase provide an entry point for the work described in this thesis, where the goal is to enable working with time-varying data sets.

6 2.3 Open Space

Figure 2.1: Screenshot of a coronal mass ejection event visualization

Figure 2.2: Screenshot of the Voreen workspace

7 2. BACKGROUND

8 3

Previous Work

The Previous Work section of the report provides a theoretical background of the techniques and concepts used in the implementation, mainly related to volumetric visualization and rendering of large voxel datasets.

3.1 Volume Ray-Casting

Volume ray-casting is an image order volume rendering technique. This means that the image is produced by iterating over pixels rather than iterating over objects in the scene. To determine the color of each pixel, view rays are sent from the position of the camera through the volume (figure 3.1), and the volume is sampled at points along these rays.

As the samples along each ray are gathered, each intensity is mapped to an RGBA color using a transfer function (figure 3.2). The colors from the transfer function mappings are composited into the final ray color using front-to-back compositing. The equations to calculate the composited color and opacity C′ and A′ given the accumulated values and the mapped color and opacity C and A are given in equation 3.1.

′ ′ − ′ Ci = Ci−1 + (1 Ai)Ci (3.1) ′ ′ − ′ Ai = Ai−1 + (1 Ai)Ai

9 3. PREVIOUS WORK

Figure 3.1: The concept of volume ray-casting. View rays are shot from a virtual eye/- camera position through the image plane.

Figure 3.2: Top: The volume is sampled along the view ray. Bottom: Each sampled intensity is mapped to a color using a transfer function.

10 3.2 TSP Tree Acceleration

3.2 TSP Tree Acceleration

While straight-forward rendering techniques can be adequate for small data sets, they are often not efficient enough for large amounts of data with high requirements on speed. Researchers in the field of volumetric rendering and 3D graphics in general con- tinuously strive to improve efficiency in data handling by the use of various acceleration structures. One such structure is called a Time-Space Partitioning (TSP) tree, and is the chosen data structure for this work. Implementation details are described in chapter 5.

Overview and Motivation

The TSP tree was first introduced by Shen et. al (5) and was later improved by Ellsworth et al (6). It is designed to capture and exploit both temporal and spatial coherency in a time-varying data set. The tree traversal algorithm uses user-supplied error tolerances to choose the correct level of detail at runtime. By separating the time domain from the spatial domain and treating them differently, the scheme can efficiently handle data sets where there is a large discrepancy between the resolutions in those domains. Error metrics are stored in the tree nodes, and the tree can be built once and then used repeatedly.

Structure

A TSP tree uses a complete octree as a skeleton. This octree subdivides the volume until a certain spatial subdivision level has been reached. Each octree node (inner nodes as well as leaves), in turn contains a binary search tree (BST) that contains the temporal information for that spatial subdivision. The binary search tree leaves are the individual time steps, and each level above the leaves represents a time span of twice the length. The binary search tree roots represent the whole temporal extent. In other words, the search tree roots represent averages of the octree nodes’ values over all time steps. The overall structure is illustrated in figure 3.3.

Traversal

TSP tree traversal starts in the octree. For every octree node, the corresponding BST is traversed top-down until a node with satisfying error metrics is found or a leaf

11 3. PREVIOUS WORK

Figure 3.3: TSP structure (illustrated using a quadtree). The example uses two spatial subdivisions and eight time steps. The top section represents the octree skeleton, and the bottom tree is the binary search tree for one of the octree nodes.

12 3.3 Rendering of Large Voxel Datasets is reached. If the error at the leaf is too big, the traversal continues with the next subdivision level in the octree. See section 5.6 for the full TSP traversal algorithm.

Error Metrics

The concept of error metrics is key in the use of TSP tree techniques. To separate the spatial and temporal domains, two different error metrics are used by the TSP tree algorithm. The spatial error indicates how coherent the voxels within a subvolume are, and the temporal error is a measure of how coherent the voxels between two or several time steps are. The first TSP tree publication (5) uses error metric based on the scalar values of voxels. To make the error metrics more accurate and closely related to the visible image, a color-based approach was introduced (6). The color-based approach is useful for any image where mapping from scalar values to colors are used, for example when using transfer functions.

3.3 Rendering of Large Voxel Datasets

One of the more prominent works in the field of voxel graphics is the GigaVoxels system, presented in the Ph.D. thesis by Crassin et. al (7). Their work outlines an extensive pipeline for handling very large sets of data. While the thesis extensively covers the subject of turning traditional scene to voxels (in contrast of working directly with voxels), a sophisticated pipeline for processing the data has been developed. This pipeline mainly deals with transferring data between system and video memory through a custom GPU paging system.

Data Structure

GigaVoxels makes use of a spatial, octree-based structure for hierarchical space sub- division. The smallest entities, the octree nodes, in this subdivision are called bricks, small voxel grids that represent the volume’s subdivision at a given level. Bricks make it possible to combine efficient 3D texture features of GPUs and give good flexibility in the subdivision. Furthermore, GigaVoxels uses brick pointers. These pointers are separated from the data itself, only pointing to the original data in a brick pool. The hierarchical structure

13 3. PREVIOUS WORK

Figure 3.4: Screenshot from the GigaVoxels system. Image from http://gigavoxels.inrialpes.fr/ can be easily represented by using these brick pointers rather than the full data, making the traversal of the structure much faster.

Rendering

The rendering algorithm in GigaVoxels is split into two passes, both done by one big GPU kernel. One pass traverses the octree top-down, and the other samples the volume. The level of detail (subdivision level in the octree) is chosen during the traversal step and is based on the projected size of the voxels on the screen. The system is built for large datasets where the whole scene is far too big for video memory. A GPU paging system loads data on-the-fly, getting requests by an ongoing ray sampling pass. A caching system keeps track of the recently used bricks, making room for new bricks in GPU memory when needed.

14 4

Pipeline Overview

The implemented pipeline consists of three main stages called Furnace (data format- ting), Forge (data pre-processing) and Flare (rendering). These three stages are de- scribed in detail in chapters 6, 7 and 8. This chapter provides a high-level overview of the whole pipeline.

4.1 Pipeline Stages

Figure 4.1: Overview of the three pipeline stages Furnace, Forge and Flare

The three parts of the pipeline do not interact with each other during runtime, and each stage is run separately. The separation and encapsulation provide efficient and customizable processing of the data, in which each phase refines and prepares the data

15 4. PIPELINE OVERVIEW for rendering. Furnace funnels various external data sources into a format that the rest of the pipeline can use. Forge takes the output from Furnace and builds the tree structures that Flare then uses during rendering, which is the final step.

4.2 Inputs and Outputs

To achieve encapsulation between the parts of the application, the preparation stages Furnace and Forge output binary files on disk. This means that as soon as a previous step is completed, the next step only needs the produced file and its structure. The input to the first step, Furnace, is the volume data to render. This data could be formatted and delivered in any way, so a Furnace module for each data source needs to be written. Furnace then outputs a file where the voxel data for all time steps is saved. Forge uses the straight-forward time-varying volume data and builds the TSP tree structure and saves it to a new, separate file. The same input data can be used to produce different configurations of the tree. Flare uses one of the TSP tree files and renders it.

16 5

TSP Tree Implementation

Before going into the details of the pipeline stages, it is useful to know how the TSP tree is implemented and how its structure affects different approaches and techniques throughout the software. This chapter describes some implementation details in the TSP tree usage.

5.1 Bricks

A tree structure where the leaves are individual voxels would induce a very large over- head. For this reason, the smallest element in the tree is a brick. Bricks are subvolumes of voxels. As an example, a volume of 128 × 128 × 128 voxels using 16 × 16 × 16 bricks would have room for 128/16 = 8 bricks per axis. Brick usage is a common concept and is used by both the TSP (5, 6) and the GigaVoxels (7) authors.

5.2 Separation of Structure and Data

The use of bricks keeps the tree compact enough to store the whole structure in GPU memory during rendering, since the individual voxels are not referenced. The tree structure is separated from the raw voxel data. The tree only keeps track of the brick numbers that correspond to the bricks saved on disk. The nodes in the structure store the brick number and the number of the node’s child along with error metrics. This approach is similar to the one described by Crassin et. al (7).

17 5. TSP TREE IMPLEMENTATION

5.3 Memory Layout

The TSP tree was originally described as an “octree of binary search trees”, meaning that each node an octree skeleton contains a binary search tree each, whose nodes in turn contain the data. The implementation in this thesis uses that structure (described in chapter 3) during traversal, but the brick numbering and data ordering that get saved to disk are slightly different. To enable efficient sequential loading of bricks during rendering (chapter 8) the data is instead ordered so that the nodes of the octrees are saved next to each other. This leads to a different pointer structure, and a structure that can be viewed as a “binary search tree of octrees”. The brick structure is illustrated in figures 5.1 and 5.2.

Figure 5.1: Conceptual tree layout, differing from the structure originally described. The layout can be thought of as a binary search tree of octree. The figure uses a quadtree for .

The main benefit of this approach is that bricks among the same BST and octree levels will have consecutive brick numbers. As will be discussed in chapter 10, the rotating nature of a Coronal Mass Ejection (CME) event simulation makes spatial filtering more useful than temporal filtering in the current implementation. This means that during runs, bricks will often be chosen from the same node in the BST tree. Additionally, it is probable that bricks close to each other in the octree will be used simultaneously. By storing the bricks in one temporal filtering step together, the two- stage rendering scheme (see chapter 8) can first identify a sequence of bricks and the uploading stage can use a single read operation to read these from disk, rather than having to fetch these bricks from different parts of disk memory.

18 5.4 Error Metrics

Figure 5.2: Memory layout of the TSP tree, using quadtrees instead of octrees. Numbers correspond to brick indices.

5.4 Error Metrics

The original TSP tree paper by Shen et. al (5) uses the coefficient of variation to indicate the error for a brick. This coefficient is defined as the standard deviation σ over the average µ. Shen et. al implement this by first calculating the average voxel value for each brick, calculating the standard deviation in the same brick, and finally producing the said ratio. This approach seems to have several problems. It is desirable to get a higher error the further from the original data (the leaves) in an octree we get, but the standard deviation gets lower and lower the more averaged (filtered) the higher-up bricks get. Additionally, averaging with the mean produces very large and varying values when the average gets close to zero. In large and empty areas of the volume the error should be very low, but dividing with small values is very unstable. For said reasons, the error metric calculation has been slightly modified.

The spatial error is calculated by first calculating the averagev ¯brick for each brick, where n is the number of voxels in a brick (equation 5.1).

n−1 1 v¯ = vi (5.1) brick n X i=0 Instead of the standard deviation, a modified version is calculated. This is done by comparing the brick average with the voxel values in the leaf bricks that are covered by this particular brick. The equation for the modified spatial error (5.2) is similar to a

19 5. TSP TREE IMPLEMENTATION regular standard deviation calculation. m stands for number of covered bricks and n stands for the number of voxels per brick.

m−1 n−1 v 1 2 espatial = u (vi,j − v¯brick) (5.2) um · n X X t j=0 i=0 The temporal error metric is calculated by first calculating the average value over time for each voxel (at positions i). See equation 5.3 where l is the number of time steps.

l−1 1 v¯i = vi,t (5.3) l X t=0 These values are then used to calculate the average modified standard deviation per voxel. Subsequently the voxel standard deviations are averaged per brick. This average is the temporal error metric, illustrated in equation 5.4, where n is the number of voxels per brick and m is the number of covered leaf bricks. Implementation details can be found in section 8.2.

n−1 m−1 1 v 1 2 etemporal = u (¯vi − v¯i,j) (5.4) n X um X k=0 t j=0

5.5 Pointer Structure

To save space, each tree node stores only one child pointer. The pointer can have different meaning to enable traversal of both the octrees and the overall binary search tree. If the binary search tree is the root, the child pointer is used to access the octree. Otherwise, the child pointer points to the BST child node. The pointer usage is implemented in the traversal scheme.

5.6 Traversal

The rendering algorithm (see section 8) uses two separate TSP tree traversal passes. Both passes traverse the tree structure in the same way. For traversing the TSP tree, the high-level approach suggested by Shen et. al (5) is used. A flowchart for the overall TSP tree traversal can be found in figure 5.3.

20 5.6 Traversal

Figure 5.3: Flowchart of the TSP tree traversal algorithm. OT - Octree, BST - Binary search tree.

21 5. TSP TREE IMPLEMENTATION

For the internal octree traversal, a modified version of the KD-restart algorithm by Horn et. al (8), modified for octrees, is used. This algorithm is stackless, which is very useful when traversing a structure on a GPU with limited memory and stack depth capabilities. The hybrid tree traversal implementation can be found in the code samples, section A.1.

22 6

Data Formatting

The first pipeline stage, Furnace, extracts volumetric data from various data sources and produces a format that the subsequent stages use. This format is designed to be very general and its main purpose is to act as an abstraction layer, leaving optimizations to procedures later in the pipeline.

6.1 Space Weather Data Sources

While the Open Space software will be capable of handling a large variety of data sources, space weather related data sources are used throughout the thesis work. This benefits CCMC, and is a natural continuation of the first stage of the project.

ENLIL

The main data source for this project is the ENLIL model by Xie et. al (9). The model describes the heliosphere in terms of plasma mass, momentum and energy, among other variables. ENLIL is used to describe Coronal Mass Ejection (CME) events. Simulations using this model can be accessed from the CCMC web site (10), and this project has mainly used a run titled Hong Xie 120312 SH 1 during development.

CDF Data Format

The CCMC space weather event data from simulations is stored in the standardized CDF (Common Data Format) file format. CDF files store the large number of variables that the simulations run generate as well as additional metadata.

23 6. DATA FORMATTING

Kameleon

The tool Kameleon (11), developed and maintained at CCMC, is made to extract data from the CDF files. The software acts as an abstraction layer between the model data and applications. Kameleon provides access as well as interpolation, allowing applications to extract spatial and temporal data at arbitrary points.

6.2 Furnace

The very first step in the pipeline is formatting the data. Furnace takes care of this task using custom modules for the different data sources. These sources include ENLIL data in the CDF format, and the module to handle ENLIL uses the Kameleon to extract the chosen data. Furnace is configured using a few basic parameters: The location of the input and output, type of data source and desired dimensions of the output volume.

Figure 6.1: overview of the data formatting stage Furnace

6.3 Voxel Data Format

The output from Furnace is called Voxel Data Format (VDF). The volume data is represented by floats and is ordered by time steps. The voxels in each frame are ordered by indices, given by equation 6.1, where xDim, yDim and zDim are the number of voxels along each axis in the volume.

ix,y,z = x + y · xDim + z · yDim · zDim (6.1)

24 6.3 Voxel Data Format

The data is stored in a binary file along with some header data. The header data describes the type of coordinates (currently Cartesian or spherical), the dimensions and the number of time steps. The VDF file format is described in 6.1.

Data field Representation Grid type unsigned integer Number of time steps unsigned integer x dimension unsigned integer y dimension unsigned integer z dimension unsigned integer Voxel data float

Table 6.1: VDF data format

25 6. DATA FORMATTING

26 7

Data Pre-Processing

The data needs to be re-formatted from the straight-forward structure into a TSP tree. The second pipeline stage, Forge, takes care of this process and outputs files which the rendering stage can use directly.

7.1 Forge

The input is a file in the VDF format described in chapter 6. Forge can be customized to output TSP trees with different brick sizes. The desired brick size is the only parameter to Forge, apart from input and output file names.

Figure 7.1: Overview of Forge

27 7. DATA PRE-PROCESSING

7.2 TSP Tree Construction

The TSP tree construction process consists of several stages. The stages use temporary files as storage where possible to avoid problems when building trees which are too large to keep in memory at once.

Brick Padding

Bricks will eventually be stored in a 3D texture before getting rendered (see section 8.3 for details). This texture, the brick atlas, does not keep the spatial information intact. The bricks may be uploaded in any order, and therefore a standard 3D texture interpolation will fail when sampling close to brick borders. To handle this, each brick gets padded with a layer of voxels from its spatial neighbors before getting put in the tree structure. The padding is carried out in two steps, illustrated in figure 7.2:

1. Add an extra layer of voxels around the whole volume, copying the closest border voxel. This provides neighbors for the bricks on the border.

2. Treat each brick in isolation, and add an extra layer of voxels around each of the bricks. The added voxels are copies of the neighboring voxels in the volume. Note that this step is only done for the original voxels, not the extra layer we added in the previous step.

Figure 7.2: Example of (2D) padding. Original volume is 4 × 4 with 2 × 2 bricks. The resulting volume (right) is 8 × 8 with 4 × 4 padded bricks, getting the padding from the neighboring pixels or the added outer border.

28 7.2 TSP Tree Construction

The sampling scheme makes sure that samples are always taken inside or on the border of the original bricks. This ensures that correct interpolation can be done, since the neighboring voxels will be from the original data. The downside with the approach is the added amount of voxels. In figure 7.1 are some typical brick sizes, and a comparison between the unpadded and the padded voxel count. As can be seen, padding with an extra layer of voxels results in a significant voxel increase for a 256 × 256 × 256 volume with 256 timesteps. For the 8 × 8 × 8 bricks case, padding means almost a doubling of the volume size. For larger brick sizes, the added overhead is smaller. Sample code for the brick padding can be found in section A.2.

Brick size Unpadded voxel count Volume size with padding Padded voxel count 8 × 8 × 8 9.797.856.768 320 × 320 × 320 19.136.439.000 16 × 16 × 16 9.797.595.136 288 × 288 × 288 13.950.091.512 32 × 32 × 32 9.795.502.080 272 × 272 × 272 11.749.341.240

Table 7.1: Comparison of unpadded and padded voxel counts for a 256×256×256 volume with 256 timesteps

Octree Construction

The first step in building the TSP tree is building one full octree from each time step in the input data. The octree construction is done by first rearranging the data into bricks of the chosen size and padding them (see previous section). These bricks are then given a new index using Z ordering (12). The Z-order (or Morton order) number arranges the bricks so that the nodes that will make up the children of a higher level are ordered next to each other. The octree is then built from the bottom up, averaging the bricks in groups of eight to build the parent nodes of the higher levels. The octree construction process is illustrated in figure 7.3. Each octree is saved to a separate file on disk, avoiding limitations on memory. Sample code for the octree construction can be found in section A.3.

BST Assembling

When the octrees are built, they get assembled into the “binary search tree of octrees” described in chapter 5. This process is relatively simple. First, the leaf level of the BST

29 7. DATA PRE-PROCESSING

Figure 7.3: Octrees are built by first giving the bricks a new number, and then averaging nodes to build higher levels until the root is constructed.

(corresponding to individual time steps) is constructed by using the individual octrees as leaves. Then, the higher levels are built by averaging the two octrees blow, so that each higher step represents a time span of twice the length of the spans on the lower level. This process is repeated until the root BST node has been constructed. Sample code for the BST assembling can be found in section A.4.

7.3 TSP Data Format

The output from Forge, TSP files, adds a few header fields to the format inherited from Furnace. The additional values concern brick dimensions. The TSP format is described in table 7.2.

Data field Representation Grid type unsigned integer Number of time steps unsigned integer x dimension unsigned integer y dimension unsigned integer z dimension unsigned integer x brick dimension unsigned integer y brick dimension unsigned integer z brick dimension unsigned integer Number of bricks along x unsigned integer Number of bricks along y unsigned integer Number of bricks along z unsigned integer Voxel data float

Table 7.2: TSP data format

30 8

Rendering

The final piece of the pipeline is the rendering. The software producing the final renderings is called Flare, and represents the largest of the three stages.

8.1 Flare

Flare renders TSP files from the pre-processing step. The renderings are customized by choosing a number of parameters and a transfer function, both topics described later in this chapter.

Figure 8.1: Overview of Flare.

31 8. RENDERING

8.2 TSP Structure and Error Metrics Construction

The overall details of the TSP tree structure and the error metrics can be found in chapter 5. This section further describes some of the implementation techniques and refers to source code samples.

TSP Structure Construction

The TSP structure used for traversal is kept in memory, both on the host and the GPU. It is not explicitly stored on disk, so it needs to be constructed before the rendering loop can be initiated. The construction is relatively quick and traverses the whole brick structure on disk once, keeping track of child indices and allocating space for error metrics (to be calculated in the next steps). Code for the construction function can be found in section A.5.

Error Metrics Calculation

The spatial error calculation runs in two passes. The first pass calculates the average color for each brick, and the second pass compares the brick average to the leaves that the current brick covers. The temporal error calculation also uses several passes. The first pass is run to keep track of each voxel’s average value over time. Then the modified standard deviation is calculated per voxel and then averaged over bricks. The error calculation can be omitted. If the user wants no errors, the calculation step is skipped and traversal will always reach the leaves. Equations for error calculation can be found in section 5.4, and code samples in section A.6 of the appendix.

Error Caching

Since the current implementation of the error metrics only depends on the voxel data, the calculated error for a TSP file can be reused. Flare saves the error metrics to a file which is read before subsequent renderings. The file is small and reading is fast. The simple caching approach enables pre-calculation of the error metrics. It is separated from Forge since the user may want to use different kinds of error metrics. In particular, color-based error metrics that depend on the current transfer function rather than the

32 8.3 Intra-Frame Pipeline raw intensity values (6). Such error metrics need to be re-calculated at every change of transfer function, and the mechanism therefore belongs in Flare.

8.3 Intra-Frame Pipeline

The rendering algorithm is two-staged. The data needs to be fetched from disk and uploaded to GPU memory, and a TSP tree probing step is responsible for requesting the right bricks to upload. When the bricks are uploaded, the ray-casting step renders the images. This section describes the intra-frame steps taken in detail.

View Ray Generation

When the model, view and projection matrices are updated, it is time to calculate the direction of the view rays. The algorithm used for this was proposed by Kr¨uger and Westermann (13) and relies on rendering a colored cube. The volume to be rendered is bounded by a cube with its opposing corners in (0, 0, 0) and (1, 1, 1), respectively (see figure 8.2). The cube is colored by letting each corner vertex also represent a color. The aforementioned vertices therefore represent a black and a white corner.

Figure 8.2: Bounding cube vertices

A simple GLSL shader interpolates the corner colors across the faces of the cube, resulting in a fully colored cube where the color in a point on the surface also represent the point’s position in space. By rendering this colored bounding cube twice, one time with back face culling and one time with front face culling (figure 8.3), the view rays

33 8. RENDERING can be calculated. Given coordinates on the view plane, the direction of a view ray is calculated by taking the difference between the the color of the front facing point and the color of the back facing point (figure 8.4).

Figure 8.3: Bounding cubes with interpolated colors. Left: Front faces. Right: Back faces.

Figure 8.4: Example of entry and exit point samples and resulting ray direction.

TSP Tree Probing

The data resides on a file on disk and needs to be uploaded to the GPU memory before the rendering can take place. To consolidate the uploading of all bricks into one single request and thereby minimize transfer overhead, a probing step is used. The TSP tree probing is a dry-run of the rendering, where the result is a brick request list rather than a rendered image. This probing uses the same view rays and the same traversal

34 8.3 Intra-Frame Pipeline algorithm as the subsequent rendering. Rays are shot through the volume, and every time a brick with acceptable error metrics (or a leaf) is found, the responsible OpenCL kernel increases a value in an array where the indices correspond to brick indices. After the probing, the bricks that will be needed have a count that is higher than zero. This process is illustrated in figure 8.5

Figure 8.5: Brick counts before and after a tree probing step. Initially, all counts are set to zero. After the probing step, bricks which will be used during rendering will have a count higher than zero. The example uses only two view rays for simplicity and does not show the tree traversal process.

Brick Uploading

The brick request list generated by the probing step is scanned, and every brick that has a count higher than zero is put into a brick list. While this brick list is built, every added brick also gets a coordinate in the brick atlas, the 3D texture that will hold the uploaded bricks. This coordinate is saved in the brick list and thereby every brick to a unique atlas lookup position. Note that the atlas coordinates do not have any spatial meaning, it is only a way to keep track of where the rendering kernel will fetch the data. The data upload occurs in two steps. The data from disk is uploaded to an OpenGL Pixel Buffer Object (PBO) that is mapped to system memory. The PBO corresponds

35 8. RENDERING to the 3D texture that will store the atlas. The uploading is done by scanning the brick list and placing each brick in the correct spot in the PBO. If the uploading algorithm detects several consecutive bricks to be uploaded, those are read together to avoid disk read overhead. When the PBO is populated with the brick data, an OpenGL 3D texture is built by copying the pixels from the PBO. The 3D texture is then ready to use by the GPU rendering kernel. See figure 8.6 for a schematic overview of the brick upload process.

Figure 8.6: Brick uploading. The bricks in the brick list (top) are read from disk, copied to the right position in the PBO in memory and then uploaded to the 3D texture on the GPU.

Ray-Casting

The rendering kernel can be launched as soon as the 3D texture is ready. The rendering process traverses the TSP tree in the same way as during probing (see section 3.2) and with the same view parameters, thus visiting the same bricks. The sample position (converted to spherical coordinates if needed) gets translated to the correct texture atlas coordinates and samples the brick in that atlas cell. The samples are composited in the manner described in section 3.1 to render the final color for the sampled view plane coordinate.

36 8.4 Asynchronous Execution

8.4 Asynchronous Execution

The rendering of each individual frame must follow the logical order presented above, but the pipeline’s bandwidth can be used in a more efficient way by interleaving the rendering steps during two rendering iterations. Since different tasks in the pipeline are handled by different parts of the system, parallel execution is important for perfor- mance. In a simple model, some tasks are handled by the CPU and others by the GPU. Figure 8.7 describes the order of which tasks are processed. Note that the figure does not show the correct relation between executions times. See section 9.2 for measured times, and section A.7 in the appendix for the complete rendering loop code.

Figure 8.7: Simplified overview of the interleaving of rendering steps. Each color repre- sents a different time step. The figure shows two rendering iterations, where the frame at t=2 (in green) gets completely processed at the same time as the neighboring frames get finalized or initiated.

8.5 Rendering Parameters

The rendering application can be configured using a few different parameters. Below is a list of these parameters and their meaning.

Local OpenCL work size (x and y) Changes the local work size for the OpenCL probing and ray-casting kernels. Can be used to tune performance. See NVIDIA’s OpenCL Best Practices Guide (14) for performance heuristics.

Texture division factor A higher factor than 1 decreases the output texture size, thereby lowering the number of calculated view rays. This factor can be used to easily give up quality to gain speed.

Spatial error tolerance Maximum tolerable spatial error.

Temporal error tolerance Maximum tolerable temporal error.

37 8. RENDERING

Error calculation (on/off) If turned off, the error calculation step is skipped and tolerances are set to zero.

Probing step size Step size in the probing kernel.

Ray-casting step size Step size in the tay-casting kernel.

Ray-casting intensity A factor that the final colors get multiplied with. Used adjust image brightness.

8.6 Cluster Rendering

Large dome theater displays often use a cluster of rendering computers and projectors to be able to render on very large and curved screens. Such a cluster needs to be able to divide the rendering work between its nodes, in such a way that each node renders a portion of the scene without visible seams and artifacts.

SGCT

Simple Graphics Cluster Toolkit (SGCT) is developed at Link¨oping University. It is a cross-platform C++ library enabling graphics synchronization over a cluster of com- puters. A rough and basic implementation of SGCT is used as the rendering system for the work presented in this thesis, enabling dome rendering and stereography as well as standard desktop rendering.

38 9

Results

9.1 Hardware

Development, rendering and testing have been carried out on a standard desktop com- puter, equipped with the following hardware:

• 16 GB system memory

• SATA 2.0 SSD drive

• GPU: GeForce GTX 690, two cores with 2 GB RAM each, PCI Express 3.0

9.2 Rendering Benchmarks

Table 9.1 shows a selection of benchmarks of the different steps taken in the rendering loop. Each measure has been determined by averaging a number of runs. The total rendering loop time has been measured when utilizing the asynchronous execution of the rendering, while the other time benchmarks have been measured individually.

9.3 Error Metrics Benchmarks

To benchmark the efficiency of the error metrics approach, three levels of error have been determined. These three levels are labeled as no, low and high error. The errors have been determined using a combination of subjective visual quality and looking at approximately how many bricks (out of the total amount in the volume) being used and/or cached while rendering. Both temporal and spatial errors have been accepted at

39 9. RESULTS

A B C D E F G H I J 128 128 0.02 0.01 0.07 0.019 0.028 0.05 0.00005 0.0002 128 128 0.04 0.03 0.07 0.017 0.010 0.05 0.00005 0.0002 128 16 0.02 0.01 0.14 0.023 0.062 0.10 0.010 0.0002 128 16 0.04 0.03 0.12 0.017 0.022 0.10 0.010 0.0002 128 32 0.02 0.01 0.10 0.016 0.044 0.07 0.013 0.0002 128 32 0.04 0.03 0.08 0.020 0.018 0.07 0.013 0.0002 256 32 0.02 0.01 0.50 0.036 0.069 0.42 0.010 0.0002 256 32 0.04 0.03 0.49 0.016 0.027 0.42 0.010 0.0002 256 64 0.02 0.01 0.41 0.023 0.056 0.39 0.013 0.0002 256 64 0.04 0.03 0.41 0.022 0.024 0.39 0.013 0.0002 A: Number of voxels per axis in full volume B: Number of voxels per axis in bricks C: Probing step size D: Ray-caster step size E: Total rendering loop time in seconds F: Probing kernel execution time in seconds G: Ray caster kernel execution time in seconds H: Disk to PBO upload time in seconds I: Read brick request list and build brick list time in seconds J: Other render loop steps (proxy geometry, textures et cetera) in seconds

Table 9.1: Rendering loop benchmarking. Measurements made while rendering 256 time steps of an ENLIL model. the low and high error level. The low error level corresponds to a relatively poor visual results, but still usable under some conditions. The high setting produces renderings with very large artifacts and can only be used for benchmarking. The result is showed in table 9.2. The no error level uses 100% of the bricks. The low and high levels use approxi- mately 75% and 40% of the bricks, respectively.

9.4 Visual Results

This section shows samples from interactive renderings of a CME event.

40 9.4 Visual Results

Brick size Error level Total render loop time (s) 16 no 0.12 16 low 0.16 16 high 0.07 32 no 0.10 32 low 0.07 32 high 0.05

Table 9.2: Error metrics benchmarking. Measurements made while rendering 256 time steps of an ENLIL model with 128 voxels per axis.

Desktop Rendering

In figure 9.1 are three renderings of the same sequence, each using a different transfer function. The three screenshots from each sequence are from the beginning, middle and end of the visible CME event.

Dome Rendering

The Hayden Planetarium at the American Museum of Natural History in New York, USA, inhabits an immersive fulldome theater. This theater is used for high quality space productions, both pre-rendered and interactive. AMNH is an important collaborator in the Open Space project, and a test run of a cluster implementation of the rendering software was successfully carried out on-site in the planetarium. See figure 9.2 for a photo of the occasion.

41 9. RESULTS

Figure 9.1: Rendering screenshots, each column with a different transfer function.

42 9.4 Visual Results

Figure 9.2: SGCT was used to enable this interactive space weather rendering at the Hayden Planetarium at the American Musem of Natural History in New York, USA.

43 9. RESULTS

44 10

Discussion and Future Work

10.1 Visual Quality

Rendering

The visual quality of the renderings is adequate given the simple rendering approach. While the produced images are correct and informative, it would be desirable to increase the resolution of the volumes further, as volumes of 128 or 256 voxels per axis often will be too low-resolution for real applications.

10.2 Interactivity

Performance

In an interactive application, performance is obviously crucial. A certain framerate has to be reached to both give the user a good viewing experience, as well as making interactions responsive. If the framerate drops too low, interactions will lose intuition and usefulness. The performance measurements have shown that the application can run on a consumer-grade desktop computer and reach good framerates for the used volumes.

45 10. DISCUSSION AND FUTURE WORK

10.3 Pipeline

Encapsulation

The choice to split the software intro three individual parts has been one of the major decisions in the development. While the pipeline has not yet been fully utilized by trying different data sources, one of the requirements has been to handle a large variety of data sources. The encapsulation provides a nice funnel with which new kinds of volumetric data can be added without altering the tree structure or rendering. In the same way, the rendering or the tree construction can be changed without worrying about the other stages of the pipeline. In an experimental proof-of-concept application like the one implemented, this has been very important.

10.4 TSP Tree Structure

Construction

The implementation of the TSP tree construction is relatively straight-forward. The focus has been to produce correct and robust results rather than making the process fast. This approach has also been taken while developing the use of several temporary files on disk during the creation process. The technique effectively erases many concerns related to memory availability when constructing potentially very large tree structures. Naturally, the trade-off for this capability is speed. A great increase in efficiency could be achieved by developing a more dynamic solution where fast system memory is being used as much as possible, only switching to disk when needed. The algorithm could also benefit greatly from parallelization, but that also requires not depending on the slow disk read/write bandwidth for many operations.

Storage

Storing the raw data on disk and the tree structure in memory has proven to be an efficient solution, used in projects of larger scale. The time spent on constructing the structure and uploading it to the graphics card is very small compared to the transfer of data or kernel execution, and the reads from the structure in the kernels are also a very small and quick part of the algorithm.

46 10.5 Data Formats

The key to this approach is the use of bricks. The brick concept is fundamental since it provides a way to make the tree structure several orders of magnitudes smaller than the complete structure. It also provides a natural domain in which to filter and calculate error metrics, as bricks have a spatial meaning. For caching purposes, it is important that the bricks do not themselves need to know their place in the full structure. This requires that the tree structure is kept in order at all time but, again, the overhead of this structure is very small compared to the benefits of being able to put bricks in arbitrary positions on the texture atlas that gets uploaded to the GPU.

10.5 Data Formats

The data format chosen for the implementation reflects the encapsulation in the pipeline, being somewhat redundant and requiring careful structuring. As we have seen, the in- dividual parts of the pipeline can only communicate using files, so it is very important that the data formats are kept intact to avoid changes in many parts of the software. Considering this, it would be desirable to further break out and abstract the data for- mat definition outside the current pipeline and make the read and write operations more flexible. For example, in a larger scale implementations, it needs to be easier to add an extra variable to a header.

10.6 Error Metrics

Calculation

The visual and performance-related results have shown that the concept of error metrics can be useful. An increased error tolerance yields a shorter rendering time as less bricks need to be uploaded and traversed. Additionally, the way of calculating these metrics does take spatial and temporal coherence into account. Spatial errors with less variations do get more heavily filtered compared to areas with more changes, such as the areas where a CME front develops. However, there is much to improve in this area. Since the background winds in a CME simulation are inherently rotating, using filtering in the temporal domain quickly leads to very visible artifacts. The rotation of the magnetic fields needs to be smooth for a good visual experience. While this rotation makes temporal filtering hard in the

47 10. DISCUSSION AND FUTURE WORK current software state, there are very large gains to be made with a more sophisticated implementation. If the movements of the background winds can be predicted, it would be possible to reuse large portions of the data by merely changing the position accord- ingly. There is often no need to update this data as it rarely changes intensity, only position. Another area that needs to be improved before the error metrics can be truly useful is the calculation efficiency. As with the TSP tree construction, the implementation is currently very straightforward and unoptimized. Traversing the tree structure several times to calculate averages over both the spatial and temporal domains leads to an unacceptable time complexity. Ellsworth et. al (6) show alternative implementations of error calculations. However, that approach relies on errors based on color, which has proven to be troublesome (see chapter 5). As discussed in chapter 5.4, the original implementation has been altered. Working with large areas with intensities close to zero leads to numerical problems. The same type of problem arose when exploring the color-based approach. Using color as a reference could be beneficial for visual results, but using color as ordinal values has its drawbacks. Colors with small intensities (large, black areas) again lead to numerical problems and inconsistencies. Ideally, the error metrics calculation should be able to calculate coherency uniformly in dark as well as bright areas.

Control

The current error metrics implementation has two major drawbacks. Error tolerances cannot be adjusted in real time, and measurements are not based on color. This means that the efficiency might be visually very different for different transfer functions, and also that it is very hard to see the effects fast. For future work, a further investigation of color-based and real-time error metrics could prove very useful. Choosing the correct brick size is very important, and the brick approach means that a key trade-off has to be made. With small bricks, the filtering schemes and error calculation is more fine-grained. Errors will be calculated for smaller areas and visual artifacts may be smaller. On the other hand, the tree structure gets larger and the traversal slower. The overhead from duplicating border bricks in the padding step also gets larger, but that might not be a problem unless bricks get very small.

48 10.7 Rendering

10.7 Rendering

As the focus of this project has been put on the pipeline and the preparing stages, the rendering technique can be improved substantially. While the data request and rendering loop have been developed to fit the pipeline, the ray-casting rendering itself is relatively unsophisticated. There are several ways to improve this. For example, a volume sampling scheme capable of adapting to the detail level of the volume would save a lot of samples and improve the visual quality. It is very important to be able to integrate various kinds of data and objects in the future Open Space project. Planets, spacecraft, text labels and field lines are just some examples of items that could fit into a scene. This has to be taken into consideration when further developing the rendering system and choosing the appropriate methods. As these items, or any other phenomena to be rendered, can have very large variations in scale, adaptiveness is important not just in a volume data set but for all kinds of data in a scene. The dual-loop implementation with one data request pass and the subsequent ren- dering pass is in principle a simplified version of the advanced approach presented in GigaVoxels (7). While GigaVoxels’s focus is shifted towards static (but large) scenes, there are elements that could prove useful for future work. For example, the GigaVoxels Cache Manager using a Least Recently Used updating policy could prove efficient in combination with a further developed temporal caching approach. The data streaming system in GigaVoxels also takes visibility into account. That is important for scenes originating from mesh data, but not as crucial in volumetric scenes where the whole volume is visible. On the other hand, a rendering scheme that can provide varying levels of details is desirable. For example, spending less time on far-away voxels or voxels that won’t contribute to an already saturated viewing ray would save a lot of processing power. When using relatively small bricks, the brick padding solution can mean a dou- bling of the volume size. As smaller bricks might be desirable for fine grained error calculations, one has to be careful when choosing the size of the bricks. The balance between error metrics control, traversal speed and storage size must be adapted to each application.

49 10. DISCUSSION AND FUTURE WORK

50 11

Conclusions

The renderings produced using the implemented system have been correct, useful and running at interactive rates. The visual quality is good, while allowing for several fur- ther improvements. The system can produce these images on a consumer-grade hard- ware configuration as well in a clustered environment, showing flexibility and adaptabil- ity. As shown before, volumetric rendering is very useful for visualizing space related data in 3D. For a larger-scale system, such as the future Open Space project, some important areas of improvement can be summarized:

• The error metrics system needs to be more stable, efficient and intuitive. A color-based, real-time solution would be desirable.

• The TSP tree solution can be useful after optimizing the construction stages and the GPU traversal scheme.

• While the basic concepts of the brick uploading, caching and traversal work well, a more mature system needs a dynamic approach where the system can seamlessly switch between in-memory scenes and disk uploads could reduce overhead for small scenes or systems with large amounts of memory available.

Equally important, there are also specific approaches that have shown to be useful:

• Encapsulation at logical places in the pipeline is important for flexibility and adaptability. This allows certain areas to be improved or changed without affect-

51 11. CONCLUSIONS

ing the other parts of the system. It is important to decide on these stages early in development.

• Using bricks for the data and brick pointer for the in-memory traversal enables on-demand data uploads, something that is absolutely vital in large scenes that do not fit into memory. The choice of brick size is very important, and balances many performance aspects.

• Storing and building the tree structures on disk is important for letting the system scale and handle very large amounts of data.

52 References

[1] Commitee on Solar and Space Physics and Comittee on Solar-Terrestrial Research. Space weather: A research , 1997. 5

[2] The National Space Weather Program Council. The national space weather pro- gram - the strategic , 1995. 5

[3] M. T¨ornros. Interactive visualization of space weather data. Master’s thesis, Link¨oping University, Sweden, 2013. 6

[4] Jennis Meyer-Spradow, Timo Ropinski, J¨org Mensmann, and Klaus Hinrichs. Voreen: A rapid-prototyping environment for ray-casting-based volume visual- izations. In IEEE Computer Graphics and Applications, Volume 29, Number 6, pages 6–13, 2009. 6

[5] Han-Wei Shen, Ling-Jen Chiang, and Kwan-Liu Ma. A fast volume rendering algorithm for time-varying fields using a time-space partitioning (TSP) tree. In Proceedings of the Conference on Visualization ’99: Celebrating Ten Years, VIS ’99, pages 371–377, Los Alamitos, CA, USA, 1999. IEEE Computer Society Press. 11, 13, 17, 19, 20

[6] David Ellsworth, Ling-Jen Chiang, and Han-Wei Shen. Accelerating time-varying hardware volume rendering using TSP trees and color-based error metrics. In Proceedings of the 2000 IEEE Symposium on Volume Visualization, VVS ’00, pages 119–128, New York, NY, USA, 2000. ACM. 11, 13, 17, 33, 48

[7] Cyril Crassin, Fabrice Neyret, Sylvain Lefebvre, and Elmar Eisemann. Gigavoxels: Ray-guided streaming for efficient and detailed voxel rendering. In Proceedings of

53 REFERENCES

the 2009 Symposium on Interactive 3D Graphics and Games, I3D ’09, pages 15–22, New York, NY, USA, 2009. ACM. 13, 17, 49

[8] Daniel Reiter Horn, Jeremy Sugerman, Mike Houston, and Pat Hanrahan. Interac- tive k-d tree GPU raytracing. In Proceedings of the 2007 symposium on Interactive 3D graphics and games, I3D ’07, pages 167–174, New York, NY, USA, 2007. ACM. 22

[9] Hong Xie, Leon Ofman, and Gareth Lawrence. Cone model for halo CMEs: Ap- plication to space weather forecasting. J. Geophys. Res., 109(A03109), 2004. 23

[10] Community Coordinated Modeling Center. http://ccmc.gsfc.nasa.gov/. Ac- cessed: 2014-02-09. 23

[11] Community Coordinated Modeling Center. Kameleon - conversion, access, interpo- lation. http://ccmc.gsfc.nasa.gov/downloads/kameleon.pdf, 2006. Accessed: 2014-02-09. 24

[12] G. M. Morton. A computer oriented geodetic data base and a new technique in file sequencing. In IBM Germany Scientific Symposium Series, 1966. 29

[13] J. Kruger and R. Westermann. Acceleration techniques for GPU-based volume rendering. In Proceedings of the 14th IEEE Visualization 2003 (VIS’03), VIS ’03, pages 38–, Washington, DC, USA, 2003. IEEE Computer Society. 33

[14] NVIDIA Corporation. NVIDIA OpenCL best practices guide, 2009. 37

54 Appendix A

Code Samples

A.1 TSP Tree Traversal

1 // OpenCL k e r n e l 2 3 // Mirrors struct on host side 4 struct TraversalConstants { 5 int gridType ; 6 float stepsize ; 7 int numTimesteps ; 8 int numValuesPerNode ; 9 int numOTNodes ; 10 float temporalTolerance ; 11 float spatialTolerance ; 12 } ; 13 14 // Return index to left BST child (low timespan) 15 int LeftBST( int bstNodeIndex , int numValuesPerNode , int numOTNodes , 16 bool bstRoot , global read only int ∗ tsp) { 17 // If the BST node is a root, the child pointer is used for the OT. 18 // The child index is next to the root. 19 // If not root, look up in TSP structure. 20 if ( bstRoot) { 21 return bstNodeIndex + numOTNodes ; 22 //return bstNodeIndex + 1; 23 } else { 24 return tsp [ bstNodeIndex ∗ numValuesPerNode + 1]; 25 } 26 } 27

55 A. CODE SAMPLES

28 // Return index to right BST child (high timespan) 29 int RightBST( int bstNodeIndex , int numValuesPerNode , int numOTNodes , 30 bool bstRoot , global read only int ∗ tsp) { 31 if ( bstRoot) { 32 return bstNodeIndex + numOTNodes ∗ 2; 33 } else { 34 return tsp [ bstNodeIndex ∗ numValuesPerNode + 1] + numOTNodes ; 35 } 36 } 37 38 // Return child node index given a BST node, a time span and a ti mestep 39 // Updates timespan 40 int ChildNodeIndex( int bstNodeIndex , int ∗ timespanStart , int ∗ timespanEnd , 41 int timestep , int numValuesPerNode , int numOTNodes , 42 bool bstRoot , global read only int ∗ tsp) { 43 // Choose left or right child 44 int middle = ∗ timespanStart + (∗ timespanEnd − ∗ timespanStart)/2; 45 if ( timestep <= middle) { 46 // Left subtree 47 ∗ timespanEnd = middle; 48 return LeftBST( bstNodeIndex , numValuesPerNode , numOTNodes , 49 bstRoot , tsp); 50 } else { 51 // Right subtree 52 ∗ timespanStart = middle+1; 53 return RightBST( bstNodeIndex , numValuesPerNode , numOTNodes , 54 bstRoot , tsp); 55 } 56 } 57 58 // Return the brick index that a BST node represents 59 int BrickIndex( int bstNodeIndex , int numValuesPerNode , 60 global read only int ∗ tsp) { 61 return tsp [ bstNodeIndex ∗ numValuesPerNode + 0]; 62 } 63 64 // Checks if a BST node is a leaf ot not 65 bool IsBSTLeaf( int bstNodeIndex , int numValuesPerNode , 66 bool bstRoot , global read only int ∗ tsp) { 67 if ( bstRoot) return false ; 68 return ( tsp [ bstNodeIndex ∗ numValuesPerNode + 1] == −1); 69 } 70 71 // Checks if an OT node is a leaf or not

56 A.1 TSP Tree Traversal

72 bool IsOctreeLeaf( int otNodeIndex , int numValuesPerNode , 73 global read only int ∗ tsp) { 74 // CHILD INDEX is at offset 1, and −1 represents leaf 75 return ( tsp [ otNodeIndex ∗ numValuesPerNode + 1] == −1); 76 } 77 78 // Return OT child index given current node and child number (0−7) 79 int OTChildIndex( int otNodeIndex , int numValuesPerNode , 80 int child , 81 global read only int ∗ tsp) { 82 int firstChild = tsp [ otNodeIndex ∗ numValuesPerNode + 1]; 83 return firstChild + child ; 84 } 85 86 87 float TemporalError( int bstNodeIndex , int numValuesPerNode , 88 global read only int ∗ tsp) { 89 return as float( tsp [ bstNodeIndex ∗ numValuesPerNode + 3]) ; 90 } 91 92 float SpatialError( int bstNodeIndex , int numValuesPerNode , 93 global read only int ∗ tsp) { 94 return as float( tsp [ bstNodeIndex ∗ numValuesPerNode + 2]) ; 95 } 96 97 // Given a point, a box mid value and an offset , 98 // retuen enclosing octree child 99 int EnclosingChild(float3 P, float boxMid, float3 offset) { 100 if ( P.x < boxMid+ offset .x) { 101 if ( P.y < boxMid+ offset .y) { 102 if ( P.z < boxMid+ offset .z) { 103 return 0; 104 } else { 105 return 4; 106 } 107 } else { 108 if ( P.z < boxMid+ offset .z) { 109 return 2; 110 } else { 111 return 6; 112 } 113 } 114 } else { 115 if ( P.y < boxMid+ offset .y) { 116 if ( P.z < boxMid+ offset .z) {

57 A. CODE SAMPLES

117 return 1; 118 } else { 119 return 5; 120 } 121 } else { 122 if ( P.z < boxMid+ offset .z) { 123 return 3; 124 } else { 125 return 7; 126 } 127 } 128 } 129 } 130 131 // Update octree offset 132 void UpdateOffset(float3 ∗ offset , float boxDim , int child) { 133 if ( child == 0) { 134 // do nothing 135 } else if ( child == 1) { 136 offset −>x += boxDim ; 137 } else if ( child == 2) { 138 offset −>y += boxDim ; 139 } else if ( child == 3) { 140 offset −>x += boxDim ; 141 offset −>y += boxDim ; 142 } else if ( child == 4) { 143 offset −>z += boxDim ; 144 } else if ( child == 5) { 145 offset −>x += boxDim ; 146 offset −>z += boxDim ; 147 } else if ( child == 6) { 148 offset −>y += boxDim ; 149 offset −>z += boxDim ; 150 } else if ( child == 7) { 151 ∗ offset += (float3)( boxDim) ; 152 } 153 } 154 155 // Given an octree node index, traverse the corresponding BST tree and look 156 // for a useful brick. 157 bool TraverseBST( int otNodeIndex , int ∗ brickIndex , int timestep , 158 constant struct TraversalConstants ∗ constants , 159 global volatile int ∗ reqList , 160 global read only int ∗ tsp) {

58 A.1 TSP Tree Traversal

161 162 // Start at the root of the current BST 163 int bstNodeIndex = otNodeIndex ; 164 bool bstRoot = true ; 165 int timespanStart = 0; 166 int timespanEnd = constants −>numTimesteps ; 167 168 // Rely on structure for termination 169 while ( true ) { 170 171 // Update brick index (regardless if we use it or not) 172 ∗ brickIndex = BrickIndex(bstNodeIndex , 173 constants −>numValuesPerNode , 174 tsp); 175 176 // If temporal error is ok 177 if (TemporalError(bstNodeIndex , constants −>numValuesPerNode , 178 tsp) <= constants −>temporalTolerance ) { 179 180 // If the ot node is a leaf, we can’t do any better spatially so we 181 // return the current brick 182 if (IsOctreeLeaf( otNodeIndex , constants −>numValuesPerNode , tsp)) { 183 return true ; 184 185 // All is well! 186 } else if (SpatialError(bstNodeIndex , constants −>numValuesPerNode , 187 tsp) <= constants −>spatialTolerance ) { 188 return true ; 189 190 // If spatial failed and the BST node is a leaf 191 // The traversal will continue in the octree (we know that 192 // the octree node is not a leaf) 193 } else if (IsBSTLeaf(bstNodeIndex , constants −>numValuesPerNode , 194 bstRoot , tsp)) { 195 return false ; 196 197 // Keep traversing BST 198 } else { 199 bstNodeIndex = ChildNodeIndex(bstNodeIndex, ×panStart , 200 ×panEnd , timestep , 201 constants −>numValuesPerNode , 202 constants −>numOTNodes , 203 bstRoot , tsp); 204 }

59 A. CODE SAMPLES

205 206 // If temporal error is too big and the node is a leaf 207 // Return false to traverse OT 208 } else if (IsBSTLeaf(bstNodeIndex , constants −>numValuesPerNode , 209 bstRoot, tsp)) { 210 return false ; 211 212 // If temporal error is too big and we can continue 213 } else { 214 bstNodeIndex = ChildNodeIndex(bstNodeIndex, ×panStart , 215 ×panEnd , timestep , 216 constants −>numValuesPerNode , 217 constants −>numOTNodes , 218 bstRoot , tsp); 219 } 220 221 bstRoot = false ; 222 } 223 } 224 225 226 // Traverse one ray through the volume, build brick list 227 void TraverseOctree(float3 rayO , 228 float3 rayD , 229 float maxDist , 230 constant struct TraversalConstants ∗ constants , 231 global volatile int ∗ reqList , 232 global read only int ∗ tsp , 233 const int timestep) { 234 235 float stepsize = constants −>stepsize ; 236 float3 P= rayO ; 237 // Keep traversing until the sample point goes outside the un it cube 238 float traversed = 0.0; 239 while (traversed < maxDist) { 240 241 // Reset traversal variables 242 float3 offset = (float3)(0.0); 243 float boxDim = 1.0; 244 int child ; 245 246 // Init the octree node index to the root 247 int otNodeIndex = OctreeRootNodeIndex() ; 248 249 // Start traversing octree

60 A.1 TSP Tree Traversal

250 // Rely on finding a leaf for loop termination 251 while ( true ) { 252 253 // See if the BST tree is good enough 254 int brickIndex = 0; 255 bool bstSuccess = TraverseBST(otNodeIndex , &brickIndex , timestep , 256 constants , reqList , tsp); 257 258 if (bstSuccess) { 259 260 // Visit and use brick (e.g. probing or rendering) 261 UseBrick(brickIndex); 262 // We are now done with this node, so go to next 263 break ; 264 265 } else if (IsOctreeLeaf(otNodeIndex , 266 constants −>numValuesPerNode , tsp)) { 267 // If the BST lookup failed but the octree node is a leaf, 268 // use the brick anyway (it is the BST leaf) 269 UseBrick(brickIndex); 270 // We are now done with this node, so go to next 271 break ; 272 273 } else { 274 // If the BST lookup failed and we can traverse the octree, 275 // visit the child that encloses the point 276 277 // Next box dimension 278 boxDim=boxDim/2.0; 279 280 // Current mid point 281 float boxMid = boxDim; 282 283 // Check which child encloses P 284 285 if ( constants −>gridType == 0) { // Cartesian 286 child = EnclosingChild(P, boxMid, offset); 287 } else { // Spherical (==1) 288 child = EnclosingChild(CartesianToSpherical(P), boxMid, offset); 289 } 290 291 // Update offset 292 UpdateOffset(&offset , boxDim, child); 293 294 // Update node index to new node

61 A. CODE SAMPLES

295 int oldIndex = otNodeIndex; 296 otNodeIndex = OTChildIndex(otNodeIndex, constants −> numValuesPerNode , 297 child , tsp); 298 } 299 300 } // while traversing 301 302 // Update 303 traversed += stepsize; 304 P+= stepsize ∗ rayD ; 305 306 } // while (traversed < maxDist) 307 }

A.2 Brick Padding

1 // Loop over all timesteps 2 for ( unsigned int i=0; i timestepData(xDim ∗yDim ∗zDim , 6 static cast (0)) ; 7 8 // Point to the right position in the file stream and read it 9 off timestepSize = xDim ∗yDim ∗zDim ∗ dataSize ; 10 off timestepOffset = static cast (i) ∗ timestepSize+headerOffset ; 11 fseeko(in, timestepOffset , SEEK SET) ; 12 fread( reinterpret cast(×tepData[0]) , timestepSize , 1, in); 13 14 // We now have a non−padded time step, and need to pad the borders 15 16 // Allocate space for the padded data 17 std::vector paddedData(xPaddedDim ∗yPaddedDim ∗zPaddedDim , 18 static cast (0)) ; 19 20 // Loop over the padded volume that we want to fill 21 // xp = ”x padded” 22 // xo = ”x original” 23 unsigned int xo, yo, zo; 24 for ( unsigned int zp=0; zp

62 A.2 Brick Padding

27 28 if (xp == 0) { 29 xo=xp; 30 } else if (xp == xPaddedDim −1) { 31 xo=xp−2; 32 } else { 33 xo=xp−1; 34 } 35 36 if (yp == 0) { 37 yo=yp; 38 } else if (yp == yPaddedDim −1) { 39 yo=yp−2; 40 } else { 41 yo=yp−1; 42 } 43 44 if (zp == 0) { 45 zo=zp; 46 } else if (zp == zPaddedDim −1) { 47 zo=zp −2; 48 } else { 49 zo=zp −1; 50 } 51 52 paddedData[xp+yp∗xPaddedDim + zp∗xPaddedDim ∗yPaddedDim ] = 53 timestepData[xo+yo∗xDim + zo ∗yDim ∗zDim ]; 54 } 55 } 56 } 57 58 // Create a container for the octree leaf level bricks 59 std::vector ∗ > baseLevelBricks(numBricksBaseLevel , NULL) ; 60 61 // Loop over the volume’s subvolumes and create one brick for each 62 for ( unsigned int zBrick=0; zBrick ∗ brick = Brick::New(xPaddedBrickDim , 67 yPaddedBrickDim , 68 zPaddedBrickDim , 69 static cast (0)) ; 70 71 // Loop over the subvolume’s voxels

63 A. CODE SAMPLES

72 unsigned int xMin = xBrick ∗ xBrickDim ; 73 unsigned int xMax = (xBrick + 1) ∗xBrickDim −1+paddingWidth ∗ 2; 74 unsigned int yMin = yBrick ∗ yBrickDim ; 75 unsigned int yMax = (yBrick + 1) ∗yBrickDim −1+paddingWidth ∗ 2; 76 unsigned int zMin = zBrick ∗ zBrickDim ; 77 unsigned int zMax = (zBrick + 1) ∗ zBrickDim −1+paddingWidth ∗ 2; 78 79 unsigned int zLoc= 0; 80 for ( unsigned int zSub=zMin; zSub<=zMax; ++zSub) { 81 unsigned int yLoc = 0; 82 for ( unsigned int ySub=yMin; ySub<=yMax ; ++ySub ) { 83 unsigned int xLoc = 0; 84 for ( unsigned int xSub=xMin; xSub<=xMax ; ++xSub ) { 85 // Look up global index in full volume 86 unsigned int globalIndex = 87 xSub+ySub∗xPaddedDim + zSub∗xPaddedDim ∗yPaddedDim ; 88 // Set data at local subvolume index 89 brick −>SetData(xLoc, yLoc, zLoc, paddedData[globalIndex]) ; 90 xLoc++; 91 } 92 yLoc++; 93 } 94 zLoc++; 95 } 96 97 // Save to octree leaf level 98 unsigned int brickIndex = 99 xBrick+yBrick ∗xNumBricks + zBrick ∗xNumBricks ∗yNumBricks ; 100 baseLevelBricks[brickIndex] = brick; 101 } 102 } 103 } 104 }

A.3 Octree Construction

1 // Loop over all timesteps 2 for ( unsigned int i=0; i∗ > octreeBricks(numBricksPerOctree ); 6 7 // Use Z−order coordinates to rearrange the base level bricks

64 A.3 Octree Construction

8 // so that the eight children for each parent node lie 9 // next to each other 10 for (uint16 t z=0; z(xNumBricks ) ; ++z ) { 11 for (uint16 t y=0; y(yNumBricks ) ; ++y) { 12 for (uint16 t x=0; x(zNumBricks ) ; ++x) { 13 unsigned int zOrderIdx = 14 static cast (ZOrder(x, y, z)); 15 unsigned int idx = x + y∗xNumBricks + z ∗xNumBricks ∗yNumBricks ; 16 octreeBricks[zOrderIdx] = baseLevelBricks[idx]; 17 } 18 } 19 } 20 21 // Construct higher levels of octree 22 23 // Position for next brick, starting at position beyond base level 24 unsigned int brickPos = numBricksBaseLevel ; 25 // Position for first child to average 26 unsigned int childPos = 0; 27 28 while (brickPos < numBricksPerOctree ) { 29 // Filter the eight children and then combine them to build 30 // the higher level node 31 std::vector ∗ > filteredChildren(8, NULL); 32 unsigned int i=0; 33 for ( unsigned int child=childPos; child ∗ filteredChild = 35 Brick::Filter(octreeBricks[child]) ; 36 filteredChildren[i++] = filteredChild; 37 } 38 Brick ∗ newBrick = Brick::Combine(filteredChildren); 39 40 // Free up some memory 41 for ( auto it=filteredChildren .begin(); 42 it!=filteredChildren.end(); ++it) { 43 delete ∗ it ; 44 ∗ i t = NULL; 45 } 46 47 // Set next child pos 48 childPos += 8; 49 50 // Save new brick 51 octreeBricks[brickPos++] = newBrick; 52 }

65 A. CODE SAMPLES

53 54 // Write octree to file 55 for ( auto it=octreeBricks.begin(); it!=octreeBricks.end(); ++it ) { 56 fwrite( reinterpret cast(&(∗ it)−>data [0]) , 57 static cast ((∗ it)−>Size()), 1, out); 58 // Free memory when we’re done 59 delete ∗ it ; 60 } 61 }

A.4 BST Assembling

1 // If the number of timesteps is not a power of two, copy the las t timesteps 2 // enough times to make the number a power of two 3 CheckPowerOfTwo() ; 4 5 // Create base level temp file by reversing the level order 6 7 { // Scoping files 8 9 std::FILE ∗ in = fopen(tempFilename . c str(), ”r”); 10 if (!in) return false ; 11 std::FILE ∗ out = fopen(newFilename. c str(), ”w”); 12 if (!out) return false ; 13 14 // Read one octree level at a time, starting from the back of source 15 // Write to out file in reverse order 16 17 // Position at end of file 18 for ( unsigned int ts=0; ts((numOTNodes) ∗numBrickVals ∗(ts+1)); 21 for ( unsigned int level=0; level buffer(valuesPerLevel); 27 28 fseeko(in, octreePos ∗(off) sizeof ( float ) , SEEK SET) ; 29 size t readSize = static cast (valuesPerLevel) ∗ sizeof ( float ); 30 fread( reinterpret cast(&buffer[0]) , readSize, 1, in);

66 A.4 BST Assembling

31 fwrite( reinterpret cast(&buffer[0]) , readSize , 1, out); 32 } 33 } 34 35 fclose(in); 36 fclose(out); 37 38 } // Scoping files 39 40 // Create one file for every level of the BST tree structure 41 // by averaging the values in the one below. 42 unsigned int numTimestepsInLevel = numTimesteps ; 43 unsigned int numValsInOT = numBrickVals∗numOTNodes ; 44 std::vector inBuffer1 (numValsInOT) ; 45 std::vector inBuffer2 (numValsInOT) ; 46 std::vector outBuffer (numValsInOT) ; 47 48 size t OTBytes = static cast (numValsInOT ∗ sizeof ( float )); 49 std :: string fromFilename = newFilename; 50 std:: string toFilename; 51 52 do { 53 54 std::stringstream ss; 55 ss << BSTLevel − 1; 56 std::cout << ”Creating level ” << BSTLevel << std :: endl; 57 toFilename = tempFilename + ”.” + ss.str() + ”.tmp”; 58 59 // Init files 60 std::FILE ∗ in = fopen(fromFilename. c str(), ”r”); 61 if (!in) return false ; 62 std::FILE ∗ out = fopen(toFilename. c str(), ”w”); 63 if (!out) return false ; 64 65 fseeko(in, 0, SEEK END) ; 66 off fileSize = ftello(in); 67 fseeko(in, 0, SEEK SET) ; 68 69 for ( unsigned int ts=0; ts(&inBuffer1[0]) , OTBytes, 1, in); 73 fread( reinterpret cast(&inBuffer2[0]) , OTBytes, 1, in); 74 75 // Average time steps

67 A. CODE SAMPLES

76 for ( unsigned int i=0; i(2) ; 78 79 } 80 81 // Write brick 82 fwrite( reinterpret cast(&outBuffer[0]) , OTBytes, 1, out); 83 } 84 85 fromFilename = toFilename; 86 87 fclose(in); 88 fclose(out); 89 90 BSTLevel−−; 91 numTimestepsInLevel /= 2; 92 93 } while (BSTLevel != 0); 94 95 std ::FILE ∗ out = fopen(outFilename . c str(), ”w”); 96 97 // Write metadata to file 98 WriteHeader(out); 99 100 // Write each level to output 101 for ( unsigned int level=0; level buffer(( size t)inFileSize/ sizeof ( float )); 115 // Read whole file , write to out file 116 fread( reinterpret cast(&buffer [0]) , 117 static cast (inFileSize), 1, in); 118 119 fwrite( reinterpret cast(&buffer [0]) ,

68 A.5 TSP Tree Structure Construction

120 static cast (inFileSize), 1, out); 121 122 fclose(in); 123 } 124 fclose(out); 125 126 // Do some checking and data validation 127 CheckFileSize() ; 128 ValidateData() ;

A.5 TSP Tree Structure Construction

1 void TSP:: Construct() { 2 // Structure is saved in int array 3 4 // Loop over the OTs (one per BST node) 5 for ( unsigned int OT=0; OT(pow(8 , OTLevel)) ; 19 20 for ( unsigned int i=0; i

69 A. CODE SAMPLES

29 30 if (BSTLevel == 0) { 31 32 // Calculate OT child index (−1 if node is leaf) 33 int OTChildIndex = 34 (OTChild < numOTNodes ) ? ( int ) (OT∗numOTNodes +OTChild) : −1; 35 data [ OTNode∗NUM DATA + CHILD INDEX] = OTChildIndex; 36 37 } else { 38 39 // Calculate BST child index (−1 if node is BST leaf) 40 41 // First BST node of current level 42 int firstNode = 43 static cast ((2∗pow(2, BSTLevel −1)−1)∗numOTNodes ); 44 // First BST node of next level 45 int firstChild = 46 static cast ((2∗pow(2, BSTLevel) −1)∗numOTNodes ); 47 // Difference between first nodes between levels 48 int levelGap = firstChild −firstNode ; 49 // How many nodes away from the first node are we? 50 int o f f s e t = (OTNode−firstNode) / numOTNodes ; 51 52 // Use level gap and offset to calculate child index 53 int BSTChildIndex = 54 (BSTLevel < numBSTLevels −1) ? 55 ( int )(OTNode+levelGap+(offset ∗numOTNodes )) : −1; 56 57 data [ OTNode∗NUM DATA + CHILD INDEX] = BSTChildIndex; 58 59 } 60 61 OTNode++; 62 OTChild+=8; 63 } 64 65 OTLevel++; 66 } 67 } 68 }

70 A.6 Error Metrics

A.6 Error Metrics

1 bool TSP:: CalculateSpatialError() { 2 3 unsigned int numBrickVals = paddedBrickDim ∗ paddedBrickDim ∗ paddedBrickDim ; 4 5 std::string inFilename = config −>TSPFilename() ; 6 std::FILE ∗ in = fopen(inFilename.c str(), ”r”); 7 if (!in) { 8 ERROR(”Failed to open” << inFilename) ; 9 return false ; 10 } 11 12 std::vector buffer(numBrickVals) ; 13 std::vector averages(numTotalNodes ); 14 std::vector stdDevs(numTotalNodes ); 15 16 // First pass: Calculate average color for each brick 17 INFO( ”\ nCalculating spatial error , first pass”); 18 for ( unsigned int brick=0; brick (brick ∗numBrickVals∗ sizeof ( float )); 22 fseeko(in, offset, SEEK SET) ; 23 24 fread( reinterpret cast(&buffer [0]) , 25 static cast (numBrickVals) ∗ sizeof ( float ), 1, in); 26 27 float average = 0.f; 28 for ( auto it=buffer.begin(); it!=buffer.end(); ++it) { 29 average+= ∗ it ; 30 } 31 32 averages[brick] = average/ static cast (numBrickVals) ; 33 } 34 35 // Spatial SNR stats 36 float minError = 1e20f; 37 float maxError = 0.f; 38 std::vector medianArray(numTotalNodes ); 39 40 // Second pass: For each brick, compare the covered leaf voxe ls with

71 A. CODE SAMPLES

41 // the brick average 42 INFO(”Calculating spatial error , second pass”); 43 for ( unsigned int brick=0; brick coveredLeafBricks = 53 CoveredLeafBricks(brick); 54 55 // If the brick is already a leaf, assign a negative error. 56 // Ad hoc ”hack” to distinguish leafs from other nodes that happens 57 // to get a zero error due to rounding errors or other reasons. 58 if (coveredLeafBricks. size() == 1) { 59 stdDev= −0.1f ; 60 } else { 61 62 // Calculate ”standard deviation” corresponding to leaves 63 for ( auto lb=coveredLeafBricks .begin() ; 64 lb!=coveredLeafBricks.end(); ++lb) { 65 66 // Read brick 67 off offset= 68 dataPos +static cast ((∗ lb) ∗numBrickVals∗ sizeof ( float )); 69 fseeko(in, offset, SEEK SET) ; 70 fread( reinterpret cast(&buffer [0]) , 71 static cast (numBrickVals) ∗ sizeof ( float ), 1, in); 72 73 // Add to sum 74 for ( auto v=buffer.begin(); v!=buffer.end(); ++v) { 75 stdDev+=pow(∗ v−brickAvg, 2.f); 76 } 77 78 } 79 80 // Finish calculation 81 if ( sizeof ( float ) != sizeof ( int )) { 82 ERROR(”Float and int sizes don’t match, can’t reintepret ”); 83 return false ; 84 } 85

72 A.6 Error Metrics

86 stdDev/= static cast (coveredLeafBricks. size () ∗numBrickVals) ; 87 stdDev = sqrt(stdDev); 88 89 } 90 91 if (stdDev < minError) { 92 minError = stdDev; 93 } else if (stdDev > maxError) { 94 maxError = stdDev; 95 } 96 97 stdDevs[brick] = stdDev; 98 medianArray[brick] = stdDev; 99 100 } 101 102 fclose(in); 103 104 std:: sort(medianArray.begin() , medianArray.end()) ; 105 float medError = medianArray[medianArray. size () /2]; 106 107 // ”Normalize” errors 108 float minNorm = 1e20f ; 109 float maxNorm = 0 . f ; 110 for ( unsigned int i=0; i 0.f) { 113 stdDevs[i] = pow(stdDevs[i], 0.5f); 114 } 115 data [ i ∗NUM DATA+SPATIAL ERR ] = ∗ reinterpret cast(&stdDevs[ i ]) ; 116 if (stdDevs[ i ] < minNorm) { 117 minNorm = stdDevs[i]; 118 } else if (stdDevs[ i ] > maxNorm) { 119 maxNorm = stdDevs[i]; 120 } 121 } 122 123 std::sort(stdDevs.begin(), stdDevs.end()); 124 float medNorm = stdDevs[stdDevs. size () /2]; 125 126 minSpatialError = minNorm; 127 maxSpatialError = maxNorm ; 128 medianSpatialError = medNorm ; 129 130 return true ;

73 A. CODE SAMPLES

131 } 132 133 134 bool TSP:: CalculateTemporalError() { 135 136 std::string inFilename = config −>TSPFilename() ; 137 std::FILE ∗ in = fopen(inFilename.c str(), ”r”); 138 if (!in) { 139 ERROR(”Failed to open ” << inFilename) ; 140 return false ; 141 } 142 143 std::vector meanArray(numTotalNodes ); 144 145 // Save errors 146 std::vector errors (numTotalNodes ); 147 148 // Calculate temporal error for one brick at a time 149 for ( unsigned int brick=0; brick voxelAverages(numBrickVals) ; 158 std::vector voxelStdDevs(numBrickVals) ; 159 160 // Read the whole brick to fill the averages 161 off offset = dataPos +static cast (brick ∗numBrickVals∗ sizeof ( float )); 162 fseeko(in, offset, SEEK SET) ; 163 fread( reinterpret cast(&voxelAverages [0]) , 164 static cast (numBrickVals) ∗ sizeof ( float ), 1, in); 165 166 // Build a list of the BST leaf bricks (within the same octree l evel) that 167 // this brick covers 168 std::list coveredBricks = CoveredBSTLeafBricks(brick); 169 170 // If the brick is at the lowest BST level, automatically set the error 171 // to −0.1 (enables using −1 as a marker for ”no error accepted”); 172 // Somewhat ad hoc to get around the fact that the error could be

74 A.6 Error Metrics

173 // 0.0 higher up in the tree 174 if (coveredBricks.size() == 1) { 175 errors[brick] = −0.1f ; 176 } else { 177 178 // Calculate standard deviation per voxel, average over bri ck 179 float avgStdDev = 0.f; 180 for ( unsigned int voxel=0; voxel((∗ l e a f ∗numBrickVals+voxel ) ∗ sizeof ( float )); 189 fseeko(in, sampleOffset, SEEK SET) ; 190 float sample ; 191 fread( reinterpret cast(&sample) , sizeof ( float ), 1, in); 192 193 stdDev+=pow(sample−voxelAverages[voxel], 2.f); 194 } 195 stdDev/= static cast (coveredBricks. size()); 196 stdDev = sqrt(stdDev); 197 198 avgStdDev += stdDev; 199 } // for voxel 200 201 avgStdDev /= static cast (numBrickVals) ; 202 meanArray[brick] = avgStdDev; 203 errors[brick] = avgStdDev; 204 205 } 206 } // for all bricks 207 208 fclose(in); 209 210 std :: sort(meanArray.begin() , meanArray.end()); 211 float medErr = meanArray[meanArray. size () /2]; 212 213 // Adjust errors using user−provided exponents 214 float minNorm = 1e20f ; 215 float maxNorm = 0 . f ; 216 for ( unsigned int i=0; i 0.f) {

75 A. CODE SAMPLES

218 errors[i] =pow(errors[i], 0.25f); 219 } 220 data [ i ∗NUM DATA+TEMPORAL ERR] = ∗ reinterpret cast(&errors[i]); 221 if (errors[i] < minNorm) { 222 minNorm= errors[i]; 223 } else if (errors[i] > maxNorm) { 224 maxNorm= errors[i]; 225 } 226 } 227 228 std::sort(errors.begin(), errors.end()); 229 float medNorm = errors[errors.size()/2]; 230 231 minTemporalError = minNorm; 232 maxTemporalError = maxNorm ; 233 medianTemporalError = medNorm ; 234 235 return true ; 236 }

A.7 Rendering Loop

1 bool Raycaster :: Render() { 2 3 // Update transformation matrices and bind them to color cube shader 4 if (! UpdateMatrices()) return false ; 5 if (! BindTransformationMatrices(cubeShaderProgram )) return false ; 6 7 glClear(GL COLOR BUFFER BIT | GL DEPTH BUFFER BIT) ; 8 9 // Render cube 10 glUseProgram(cubeShaderProgram −>Handle()); 11 cubePositionAttrib = cubeShaderProgram −>GetAttribLocation(”position”); 12 glFrontFace(GL CW) ; 13 glEnable(GL CULL FACE) ; 14 15 // Front cube 16 glBindFramebuffer(GL FRAMEBUFFER, cubeFrontFBO ); 17 glCullFace(GL BACK) ; 18 glBindVertexArray(cubeVAO ); 19 glBindBuffer(GL ARRAY BUFFER, cubePosbufferObject ); 20 glEnableVertexAttribArray(cubePositionAttrib ); 21 glVertexAttribPointer(0, 4, GL FLOAT, GL FALSE, 0 , 0) ;

76 A.7 Rendering Loop

22 glClear(GL COLOR BUFFER BIT | GL DEPTH BUFFER BIT) ; 23 glDrawArrays(GL TRIANGLES, 0 , 144) ; 24 glDisableVertexAttribArray(cubePositionAttrib ); 25 glBindBuffer(GL ARRAY BUFFER, 0) ; 26 glBindFramebuffer(GL FRAMEBUFFER, 0) ; 27 glBindVertexArray(0); 28 29 // Back cube 30 glBindFramebuffer(GL FRAMEBUFFER, cubeBackFBO ); 31 glCullFace(GL FRONT) ; 32 glBindVertexArray(cubeVAO ); 33 glBindBuffer(GL ARRAY BUFFER, cubePosbufferObject ); 34 glEnableVertexAttribArray(cubePositionAttrib ); 35 glVertexAttribPointer(0, 4, GL FLOAT, GL FALSE, 0 , 0) ; 36 glClear(GL COLOR BUFFER BIT | GL DEPTH BUFFER BIT) ; 37 glDrawArrays(GL TRIANGLES, 0 , 144) ; 38 glDisableVertexAttribArray(cubePositionAttrib ); 39 glBindBuffer(GL ARRAY BUFFER, 0) ; 40 glBindFramebuffer(GL FRAMEBUFFER, 0) ; 41 glBindVertexArray(0); 42 43 glUseProgram(0); 44 45 // Get current and next time step from separate Animator clas s 46 unsigned int currentTimestep ; 47 unsigned int nextTimestep ; 48 currentTimestep = animator −>CurrentTimestep() ; 49 nextTimestep = animator −>NextTimestep() ; 50 51 // Choose buffers 52 BrickManager ::BUFFER INDEX currentBuf , nextBuf; 53 if (currentTimestep % 2 == 0) { 54 currentBuf = BrickManager::EVEN; 55 nextBuf = BrickManager::ODD; 56 } else { 57 currentBuf = BrickManager::ODD; 58 nextBuf = BrickManager::EVEN; 59 } 60 61 // When starting a rendering iteration , the PBO corresponding to the 62 // current timestep is loaded with the data. 63 64 // Launch traversal of the next timestep 65 if (! LaunchTSPProbing(nextTimestep)) return false ; 66

77 A. CODE SAMPLES

67 // While traversal of next step is working, upload current data to atlas 68 if (! brickManager −>PBOToAtlas(currentBuf)) return false ; 69 70 // Make sure the traversal kernel is done 71 if (! clManager −>FinishProgram(”TSPProbing”)) return false ; 72 73 // Read buffer and release the memory 74 if (! clManager −>ReadBuffer(”TSPProbing”, tspBrickListArg , 75 reinterpret cast(&brickRequest [0]) , 76 brickRequest .size() ∗ sizeof ( int ) , 77 true )) return false ; 78 79 if (! clManager −>ReleaseBuffer(”TSPProbing” ,tspBrickListArg )) return false ; 80 81 // When traversal of next timestep is done, launch raycastin g kernel 82 if (! clManager −>SetInt(”TSPRaycaster”, timestepArg , currentTimestep)) 83 return false ; 84 85 // Add brick list 86 if (! clManager −> 87 AddBuffer(”TSPRaycaster”, brickListArg , 88 reinterpret cast(&(brickManager −>BrickList(currentBuf)[0])) , 89 brickManager −>BrickList(currentBuf). size () ∗ sizeof ( int ) , 90 CLManager::COPY HOST PTR, 91 CLManager::READ ONLY) ) return false ; 92 93 if (! clManager −>PrepareProgram(”TSPRaycaster”)) return false ; 94 95 if (! clManager −>LaunchProgram(”TSPRaycaster” , 96 winWidth , 97 winHeight , 98 config −>LocalWorkSizeX() , 99 config −>LocalWorkSizeY() )) 100 return false ; 101 102 // While the raycaster kernel is working, build next brick li st and start 103 // upload to the next PBO 104 if (! brickManager −>BuildBrickList(nextBuf , brickRequest )) return false ; 105 if (! brickManager −>DiskToPBO(nextBuf) ) return false ; 106 107 // Finish raycaster and render current frame

78 A.7 Rendering Loop

108 if (! clManager −>ReleaseBuffer(”TSPRaycaster”, brickListArg )) return false ; 109 if (! clManager −>FinishProgram(”TSPRaycaster”)) return false ; 110 111 // Render to framebuffer using quad 112 glBindFramebuffer(GL FRAMEBUFFER, SGCTWinManager : : I n s t a n c e ( )−>FBOHandle ()); 113 114 if (! quadTex −>Bind(quadShaderProgram , ”quadTex”, 0)) return false ; 115 116 glDisable(GL CULL FACE) ; 117 118 glUseProgram(quadShaderProgram −>Handle()); 119 quadPositionAttrib = quadShaderProgram −>GetAttribLocation(”position”); 120 if (quadPositionAttrib == −1) { 121 ERROR(”Quad position attribute lookup failed”); 122 return false ; 123 } 124 glCullFace(GL BACK) ; 125 glBindVertexArray(quadVAO ); 126 glBindBuffer(GL ARRAY BUFFER, quadPosbufferObject ); 127 glEnableVertexAttribArray(quadPositionAttrib ); 128 glVertexAttribPointer(0, 4, GL FLOAT, GL FALSE, 0 , 0) ; 129 glClear(GL COLOR BUFFER BIT | GL DEPTH BUFFER BIT) ; 130 glDrawArrays(GL TRIANGLES, 0 , 6) ; 131 glDisableVertexAttribArray(quadPositionAttrib ); 132 glBindVertexArray(0); 133 134 glBindFramebuffer(GL FRAMEBUFFER, 0) ; 135 136 if (CheckGLError(”Quad rendering”) != GL NO ERROR) { 137 return false ; 138 } 139 140 glUseProgram(0); 141 142 // Window manager takes care of swapping buffers 143 144 return true ; 145 }

79