UPTEC IT 12 013 Examensarbete 30 hp Augusti 2012

Efficient on the Face Centered and Body Centered Cubic Grids

Max Morén

Abstract Efficient Volume Rendering on the Face Centered and Body Centered Cubic Grids Max Morén

Teknisk- naturvetenskaplig fakultet UTH-enheten In volumetric , the body centered cubic grid (BCC) and its reciprocal, the face centered cubic grid (FCC), are despite their good sampling properties not well Besöksadress: off regarding available rendering software and tools. Ångströmlaboratoriet Lägerhyddsvägen 1 Hus 4, 0 Described in this thesis is the development of an extension for the volume rendering engine , implementing two recently presented GPU accelerated reconstruction Postadress: algorithms for these grids, along with a simple nearest neighbor method. These Box 536 751 21 Uppsala reconstruction methods replaces the trilinear reconstruction method used for stored in a Cartesian cubic grid (CC). The goal is for the produced software to be Telefon: useful for efficiently visualizing results from experiments with the BCC and FCC grids 018 – 471 30 03 and thus help make such data easier to observe.

Telefax: 018 – 471 30 00 The performance and rendering quality of the new raycasters is measured and compared to Voreen's existing Cartesian cubic ray caster. The experimental results Hemsida: show that the raycasters can render data in the BCC and FCC format at interactive http://www.teknat.uu.se/student frame rates while maintaining comparable visual quality.

Handledare: Elisabeth Linnér Ämnesgranskare: Robin Strand Examinator: Arnold Pears ISSN: 1401-5749, UPTEC IT 12 013

Sammanfattning

Inom volumetrisk visualisering är den mest använda sampelstrukturen utan tve- kan det kartesiska gittret. Denna struktur är kanske det enklaste sättet att lag- ra tredimensionella skalärfält och har använts länge. Det har dock visat sig att samplingsegenskaperna för denna typ av gitter är långt ifrån optimala i alla lä- gen. Det kubiska rymdcentrerade gittret (BCC) och dess dual, det kubiska yt- centrerade gittret (FCC), är betydligt bättre när det gäller samplingseffektivitet för bandbegränsade signaler. Trots detta är verktyg för att rendera data lagrad i dessa gitter svåra att hitta. I detta examensarbete utvecklas en tilläggsmodul för volymrenderingsprogram- met Voreen. Voreen är ett ramverk för att rendera volymetrisk data framförallt genom raycasting. Voreen byggs i detta arbete ut med nya komponenter för att utföra raycasting på data i BCC- och FCC-format användandes två nyligen pre- senterade rekonstruktionsmetoder. Dessa metoder ersätter den trilinjära rekon- struktionsmetoden som används för data lagrad i kartesisk struktur, och kan även de producera bilder i realtid. De nya komponenternas prestanda och bildkvalitet mäts och jämförs med den existerande kartesiska raycastern i Voreen. Resultaten visar att trots en viss skill- nad i prestanda kan den nya mjukvaran producera bilder i realtid för både BCC och FCC av kvalitet jämförbar kvalitet.

v

Acknowledgements

I would like to thank my supervisor Elisabeth Linnér and reviewer Robin Strand for their great help and support. Thanks also to Fredrik Nysjö for taking a look at the code and answering my questions regarding GLSL.

vii

Contents

1 Introduction 1 1.1 Motivation ...... 1 1.2 Why Voreen ...... 2 1.3 Previous work ...... 2

2 Background 3 2.1 The BCC and FCC grids ...... 3 2.2 Direct volume rendering ...... 3 2.2.1 Variants and enhancements ...... 5 2.3 Problem description ...... 5 2.4 Raycasting in the BCC and FCC grids ...... 6 2.5 Reconstruction algorithms ...... 6 2.6 Voreen’s architecture ...... 6

3 Method and implementation 8 3.1 Steps provided by Voreen ...... 8 3.2 Implementation ...... 9 3.2.1 BCC DC-spline ...... 11 3.2.2 FCC DC-spline ...... 12 3.2.3 BCC linear box spline ...... 15 3.2.4 BCC nearest neighbor ...... 15 3.2.5 FCC nearest neighbor ...... 15 3.3 Storage scheme ...... 16

ix 3.3.1 Interleaved storage model ...... 16 3.3.2 Separate storage model ...... 17 3.4 Gradient calculation ...... 18 3.4.1 Inside Voreen ...... 18 3.4.2 Externally ...... 20 3.5 Module and processors ...... 20 3.5.1 Step length compensation ...... 21 3.6 Utilities ...... 21 3.7 Measuring performance ...... 21 3.8 Measuring rendering quality ...... 22

4 Results and discussion 25 4.1 Performance ...... 25 4.2 Rendering quality ...... 25 4.3 Discussion ...... 30 4.3.1 Performance ...... 30 4.3.2 Rendering quality ...... 30

5 Conclusions 31 5.1 System requirements ...... 31 5.2 Known issues ...... 31 5.3 Future work ...... 31 5.4 Conclusion ...... 32

Appendices 36

A BCC fragment shader 36

B FCC fragment shader 39

x List of Figures

2.1 A BCC and an FCC ...... 4 2.2 Unit cells of the CC, BCC and FCC grid ...... 4 2.3 A minimal BCC raycasting network in Voreen ...... 7

3.1 Known and unknown values in the DC-spline model ...... 14 3.2 First order neighbors ...... 19 3.3 Voreen network used for rendering quality analysis ...... 23 3.4 Example of images used in the rendering quality analysis ...... 24

4.1 CC and BCC performance comparison ...... 26 4.2 CC and FCC performance comparison plot ...... 27 4.3 Marschner-Lobb test signal renders ...... 28 4.4 Sphere rendering quality analysis plot ...... 29

xi

List of Tables

3.1 Expensive operations of reconstruction algorithms ...... 9 3.2 Offsets of the Cartesian sub-grids ...... 10 3.3 Supported volume types in Voreen ...... 17 3.4 New volume types ...... 18

xiii

Acronyms

BCC body centered cubic. xi, 1–10, 14, 16, 18, 19, 21–26, 28, 31, 32

CC Cartesian cubic. xi, 1–6, 9, 14, 19–22, 25–28, 30, 32

FCC face centered cubic. xi, 1–6, 8–10, 14, 16, 18, 19, 21, 22, 25, 27, 28, 31, 32

GPU processing unit. 2, 5, 6, 8, 32 lerp linear interpolation. 8, 9, 32

RGBA red-green-blue-alpha. 5

RMSE root mean square error. 22

SOCD second-order central differencing. 18

xv

Chapter 1

Introduction

In visualization as well as many other areas of computational , volumetric data is traditionally sampled and stored onto a Cartesian cubic (CC) grid. This data struc- ture is simple to handle and understand. Because it is so easily separable into two- dimensional or one-dimensional subsets, many things such as interpolation can be done by generalizing methods for lower-dimension Cartesian grids, e.g. the omnipresent two-dimensional Cartesian grid. Convenience, in other words, is among the biggest strengths of this type of grid. It has been shown however, that the sampling properties of the CC grid is far from op- timal in some settings. The body centered cubic (BCC) and face centered cubic (FCC) grids, three-dimensional grids corresponding to the two-dimensional hexagonal grid, has been shown to perform significantly better for band-limited signals. The results in [1] show that sampling such a signal onto a BCC grid provides the same visual quality when reconstructed with about 30% less points than the same signal sampled onto a CC grid. A downside of the non-Cartesian grids is their less than wide spread use, and therefore more limited tool support. In this thesis a module is developed for the volume rendering engine Voreen [2]. The module adds support for rendering both BCC and FCC volumes that can be used together with the feature rich framework Voreen provides. The performance and visual quality of the new raycasters and their different reconstruction methods is also measured and compared to the existing CC raycaster.

1.1 Motivation

The purpose of this thesis is to create a simple tool for viewing data stored in BCC or FCC format in real time. The tool is aimed at anyone wishing to study and experiment with data in these grids. It is after all an important part of an experimental work flow to be able to see the results.

1 A known existing volume rendering framework is extended in hopes to make the tool flexible and accessible. The extension is made to fit in well with the other features of Voreen, and is therefore modeled after the existing Cartesian raycasters. Effort is also put into not interfering with other existing functionality when modifications have to be made.

1.2 Why Voreen

Voreen was chosen as the platform previous to the start of this thesis work. It is a capable tool recommended by other scientists in this field. It is partially developed in Sweden, at Linköping University. Voreen’s web site [2] also states that it is designed to be flexible in allowing new visu- alization techniques to be integrated. It is written in C++ and has a well defined class documented using Doxygen. For the user, Voreen also provides more than just standard raycasting. A variety of additional useful features exists. Scripting, and image processing are some examples.

1.3 Previous work

Tools that render data in BCC and FCC grids are scarce other than the special pur- pose proof-of-concept implementations sometimes found together with publications in this area. An open source rendering software called vuVolume supports BCC and FCC through CC pre-conversion. It is however a non-interactive renderer that does not yet make use of (GPU) acceleration. Furthermore, de- velopment in the public repository at http://sf.net/projects/vuvolume seemingly stopped in 2009.

2 Chapter 2

Background

2.1 The BCC and FCC grids

In a CC grid are positioned at the intersection of equidistant, axis-aligned lines. The BCC or FCC grids can be seen as a composition of multiple such grids, interleaved and translated by an offset. This is just one of multiple ways that these non-Cartesian grids can be decomposed into simpler structures, but it is this model that is used for the techniques in this thesis. In this model, if the inter-voxel distance (grid spacing) of a CC grid is ℎ, the BCC grid contains two such grids with the second being translated by , , in relation to the first. An FCC grid can be built using four ℎ-spaced CC grids with the last three being translated by 0, , , ,0, and , , 0 in relation to the first. The Voronoi region of a point (the region where the point is the nearest) in a Cartesian grid is a square box. The voxels are therefore shaped like square boxes, a cube even, if the grid is equally spaced in all dimensions. In a BCC grid it instead takes the shape of a truncated octahedron, and for FCC a rhombic dodecahedron. What these geometric shapes look like can be seen in Figure 2.1. The repeating structure of nearby points for the grids known as their unit cells, can be seen in Figure 2.2.

2.2 Direct volume rendering

A volumetric data set can be rendered as an image using so called direct volume ren- dering. The technique is direct, as in samples are directly extracted and shaded to pro- duce an image as opposed to fitting geometric primitives to the sampled data before rendering. Direct volume rendering can be done efficiently using the programmable

3 Figure 2.1: A BCC and an FCC voxel

(a) Unit cell of the CC grid (b) Unit cell of the BCC grid (c) Unit cell of the FCC grid

Figure 2.2: Unit cells of the CC, BCC and FCC grid

4 shader support in modern graphics cards, which is the variant that is used and de- scribed in this thesis. The algorithm casts a ray through every on the projection surface into the scene, sampling at a finite set of points along it. The samples are then converted into colors and finally composited to create the final pixel value seen in the rendered image. The mapping between the scalar values in the volume to colors is done using a transfer . Generally this function what is stored in the volume (often intensity) to a red-green-blue-alpha (RGBA) quadlet. It is possible to omit the transfer function step if such color values are already stored in the volume. The compositing step then merges the sampled colors into one using some type of blending, e.g. alpha blending [3].

2.2.1 Variants and enhancements

Enhancements to the basic algorithm have been invented that can improve both im- age quality and performance. Performance-wise, as most of the computation time is usually spent sampling the data set, this activity should be minimized. One can stop sampling values as soon as the alpha of the current composited color equals or exceeds a threshold, as further sampling does not contribute to the final pixel value. This tech- nique known as early ray termination [3], can be used by raycasters in Voreen.

2.3 Problem description

The first step is to find one or more appropriate reconstruction methods that can be implemented on the GPU and in Voreen efficiently enough that the requirement of interactiveness can be met. The reconstruction quality should be as good as possible given this performance constraint. Secondly, a decision has to be made on what way to interleave, layer or otherwise store the BCC and FCC data. Creation of tools to generate data in this format are also required. Which storage scheme that is most appropriate will depend on the access pattern of the selected reconstruction algorithm or algorithms. Thirdly, new raycasters must be implemented in Voreen that utilize the BCC- and FCC reconstruction methods. As many of the existing tools available in Voreen are grid agnostic, it is useful to maintain compability with them. The raycasters should therefore exist in harmony with the rest of the framework as much as possible. When implemented, the performance and image quality of the new raycasters and their different reconstruction modes is to be measured, the results presented with a comparison to the existing CC raycaster.

5 2.4 Raycasting in the BCC and FCC grids

When performing GPU accelerated raycasting in a regular CC grid, the hardware can implicitly reconstruct the signal linearly by simply setting the proper texture filtering mode. This form of interpolation which is implemented on all recent GPUs, is called trilinear interpolation. Graphic cards are designed to minimize the cost of the eight texture lookups required, making it nearly as fast as a nearest neighbor lookup on most hardware [4]. The trilinear lookup is however only directly usable for CC data. For data not arranged in this structure, like the case with BCC and FCC, a custom reconstruction algorithm has to be used. For such custom algorithms, using several nearest neighbor lookups per sample is often nessecary.

2.5 Reconstruction algorithms

For the module developed in this thesis, two reconstruction algorithms are selected for implementation. The DC-spline algorithm by B. Domonkos et al. in [5] for BCC and FCC, and the linear box spline for BCC by B. Finkbeiner et al. in [6], later extended in [7]. Both methods are designed for implementation on the GPU. The DC-spline method uses the fact that trilinear texture lookups are accelerated and shows good performance despite requiring five trilinear texture lookups. The linear box spline uses non-interpolated lookups but requires only four of them. The render- ing quality of the two filters is compared by Domonkos et al. in [8] and is concluded to be quite similar. Both methods promise acceptable performance, image quality and do not require the data to be pre-processed to interpolate. The higher order variants that exists for both methods are not implemented due to the constraint of performance. While a box spline method for the FCC grid has been presented in [9] by M. Kim et al., it is more computationally expensive and is not designed for the GPU. It is therefore not chosen for implementation.

2.6 Voreen’s architecture

Voreen divides the process of turning a volumetric data set into an image into mul- tiple simpler processing stages. The tasks in these stages are performed by pluggable components called processors. The user connects processors by linking their inputs and outputs together. The resulting graph of connected processors is called a net- work. Components within a network starts processing when their output is required by another processor.

6 Figure 2.3: A minimal BCC raycasting network in Voreen

In addition to their input and outputs, processors can also have properties linked to each other. Properties are user changeable settings that affect the way that data is interpreted and processed, such as camera settings and light position. It is often useful to link properties together when performing compositing or comparisons of images. An example network containing a minimal raycasting setup using the new BCC pro- cessor can be seen in Figure 2.3. Except for a small set of core functionality, the entirety of Voreen’s components are contained in loadable modules. The base module contains the most basic components, such as the VolumeSource, Canvas and Single/MultiVolumeRaycaster processors. An- other useful module that ships with Voreen are for example the python module that provides the user with a Python API that has access to processors and their proper- ties as well as some of Voreen’s internals. This is used to measure performance in section 4.1 and to perform the error analysis in section 4.2.

7 Chapter 3

Method and implementation

In the GPU based implementation of the DC-spline method for BCC proposed in [5], the algorithm uses a total of six trilinear texture lookups and five linear interpolation (lerp) instructions. Generalized for the FCC grid it requires sixteen trilinear texture lookups and eight lerps. The linear box spline reconstruction technique for BCC both described and imple- mented in [7] is also adapted and included. It requires four nearest-neighbor texture lookups. Nearest-neighbor algorithms are also implemented for both BCC and FCC. These simple interpolation methods do well in terms of performance, but understandably worse than the others in terms of visual quality. Despite this they can be useful in some experiments and used as a reference during development in Voreen. An overview of the reconstruction algorithms is in 3.1.

3.1 Steps provided by Voreen

In a Voreen network such as the one in Figure 2.3 the basic setup for performing ray- casting is already provided. In a typical ray casting network, the VolumeSource pro- cessor loads the volume file and generates an OpenGL three-dimensional texture. The CubeMeshProxyGeometry then sets up a cube with the correct proportions based on the loaded volume connected to its input port. The MeshEntryExit points then ren- ders the inside and outside of this cube to separate textures available on its output ports. These textures are then used by the raycaster to form entry and exit points for the casted rays, a well known method for raycasting volumes using [3]. Thus, the raycaster only needs to bind all the textures and produce an output image using a fragment shader. The rendering algorithm in this shader is a typical ray traver- sal loop. An entry and exit point is extracted at the position of the current pixel in the

8 Table 3.1: Number of expensive operations performed in the implementations of the different reconstruction algorithms

Method Lerps Lookups CC trilinear 0 1 CC nearest neighbor 0 1

BCC nearest neighbor 0 1 BCC linear box spline 0 4 BCC DC-spline 5 6

FCC nearest neighbor 0 1 FCC DC-spline 7 16

entry and exit textures. These are then used to form direction vector for the ray, which is then traversed using the configured step size. The produced image can then be ob- served by connecting the output image port of the processor to a Canvas processor.

3.2 Implementation

This section describes the concrete implementations, step by step, used for the cho- sen reconstruction methods. It attempts to describe the adapted algorithm, including how texture accesses are performed in the shared volume format. The relevant GLSL fragment shader code can be found in Appendix A and Appendix B.

• [⋅] denotes the nearest integer function.

• lerp(푎, 푏, 훼) denotes a linear interpolation, defined as (1 − 훼)푎 + 훼푏.

• 푓(푥, 푦, 푧) denotes a nearest neighbor lookup at a known point in sub-grid tex- ture 푛.

• lin(푥, 푦, 푧) denotes a trilinear texture lookup at (푥, 푦, 푧) in sub-grid texture 푛.

• Δ is the sub-grid offset constant for grid 푛, listed in Table 3.2.

9 Table 3.2: Offsets of the Cartesian sub-grids

BCC FCC Symbol (BCC) Symbol (FCC) % % Δ (0, 0, 0) (0, 0, 0) U u% Δ ( , , ) (0, , ) U Δ – ( , 0, ) – E% Δ – ( , , 0) –

Algorithm 1 Find the nearest BCC % and U points from 퐩

퐚 ← ([푝], [푝], [푝]) 퐛 ← ([푝 − ], [푝 − ], [푝 − ]) + ( , , ) return (퐚, 퐛)

Algorithm 2 Find the nearest FCC % , U , u% and E% points from 퐩

퐚 ← ([푝], [푝], [푝]) 퐛 ← ([푝], [푝 − ], [푝 − ]) + (0, , ) 퐜 ← ([푝 − ], [푝], [푝 − ]) + ( , 0, ) 퐝 ← ([푝 − ], [푝 − ], [푝]) + ( , , 0) return (퐚, 퐛, 퐜, 퐝)

10 3.2.1 BCC DC-spline

To interpolate the signal in a point 퐬:

1. Find the closest % and U to 퐬, call them 퐚 and 퐛 using Algorithm 1.

2. Find a transformation matrix 푀 that takes us to the standard case where 퐛 = 퐚 + , , , seen in Figure 3.1(a). To avoid branching, it is defined by applying the sign function (sgn) on the dif- ference of 퐚’s and 퐛’s coordinates

sgn(푎 − 푏) 0 0 ⎛ ⎞ 푀 = 0 sgn(푎 − 푏) 0 ⎝ 0 0 sgn(푎 − 푏) ⎠

3. Interpolate the signal in the unknown corner points of the cube. 1 푓(퐩̃ ) = lerp 푓 (퐩 − Δ ), 푓 (퐩 − Δ ), = lin 퐩 − Δ 2 1 푓(퐩̃ ) = lerp 푓 (퐩 − Δ ), 푓 (퐩 − Δ ), = lin 퐩 − Δ 2 1 푓(퐩̃ ) = lerp 푓 (퐩 − Δ ), 푓 (퐩 − Δ ), = lin 퐩 − Δ 2 1 푓(퐩̃ ) = lerp 푓 (퐩 − Δ ), 푓 (퐩 − Δ ), = lin 퐩 − Δ 2

where 1 퐩 = 푀 퐚 + (푥, 푦, 푧) 2

4. Using these interpolated values, interpolate the signal in the corner points of the dotted quad 퐬 lies in.

̃ ̃ ̃ 푓(퐪) = lerp 푓(퐩), 푓(퐩), 2|푠 − 푏| ̃ ̃ ̃ 푓(퐪) = lerp 푓(퐩), 푓(퐩), 2|푠 − 푏| ̃ 푓(퐪) = lerp 푓(퐩 − Δ), 푓(퐩 − Δ), |푠 − 푏|

= lin (푎, 푠, 푎) − Δ 1 푓(퐪̃ ) = lerp 푓 (퐩 − Δ ), 푓 (퐩 − Δ ), |푠 − 푏 | + 2

= lin (푏, 푠, 푏) − Δ

11 5. Finally, bilinearly interpolate the signal in 퐬 using the values interpolated at the quad’s corner points in step 4. ̃ 푓(퐬) = lerp(lerp(퐪, 퐪, |푠 − 푎|), lerp(퐪, 퐪, |푠 − 푢|), |푠 − 푎|)

The relevant part of the GLSL shader source code implementing this can be seen in Appendix A.

3.2.2 FCC DC-spline

To interpolate the signal in a point 퐬:

1. Find the nearest % , U , u% and E% from 퐬 and call them 퐚, 퐛, 퐜 and 퐝 respec- tively, using Algorithm 2.

2. Find a transformation matrix 푀 that takes us to the standard case seen in Fig- ure 3.1(b). To avoid branching, it is defined by applying the sign function (sgn) on the dif- ference of the nearest known point coordinates

sgn(푏 − 푎) 0 0 ⎛ ⎞ 푀 = 0 sgn(푐 − 푎) 0 ⎝ 0 0 sgn(푑 − 푎) ⎠

3. Interpolate the signal in the unknown points.

12 1 1 푓(퐩̃ ) = lerp 푓(퐩 ), 푓(퐩 ), 3 2 1 + lerp 푓(퐩 ), 푓(퐩 ), 2 1 + lerp 푓(퐩 ), 푓(퐩 ), 2 1 = lin 퐩 − Δ + lin 퐩 − Δ + lin 퐩 − Δ 3 1 1 푓(퐩̃ ) = lerp 푓(퐩 ), 푓(퐩 ), 3 2 1 + lerp 푓(퐩 ), 푓(퐩 ), 2 1 + lerp 푓(퐩 ), 푓(퐩 ), 2 1 = lin 퐩 − Δ + lin 퐩 − Δ + lin 퐩 − Δ 3 1 1 푓(퐩̃ ) = lerp 푓(퐩 ), 푓(퐩 ), 3 2 1 + lerp 푓(퐩 ), 푓(퐩 ), 2 1 + lerp 푓(퐩 ), 푓(퐩 ), 2 1 = lin 퐩 − Δ + lin 퐩 − Δ + lin 퐩 − Δ 3 1 1 푓(퐩̃ ) = lerp 푓(퐩 ), 푓(퐩 ), 3 2 1 + lerp 푓(퐩 ), 푓(퐩 ), 2 1 + lerp 푓(퐩 ), 푓(퐩 ), 2 1 = lin 퐩 − Δ + lin 퐩 − Δ + lin 퐩 − Δ 3

where 1 퐩 = 푀 퐚 + (푥, 푦, 푧) 2 4. Using these interpolated values, interpolate the signal in the corner points of

13 q0,1

q0,1 q1,1 q1,1

s s q0,0

q0,0 q1,0 p0,0,0 q1,0 p0,0,0 z y x (a) BCC (b) FCC

Figure 3.1: Known (black and white) and unknown (grey) values of the expanded CC grid in standard orientation

the quad.

̃ ̃ 푓(퐪) = lerp 푓(퐩), 푓(퐩), 2|푝 − 푎| ̃ ̃ 푓(퐪) = lerp 푓(퐩), 푓(퐩), 2|푝 − 푎| ̃ 푓(퐪) = lerp 푓(퐩), 푓(퐩), 2|푝 − 푎| ̃ ̃ 푓(퐪) = lerp 푓(퐩), 푓(퐩), 2|푝 − 푎|

5. Finally, bilinearly interpolate the signal in 퐬 using the quad’s corner values cal- culated in step 4. ̃ 푓(퐬) = lerp(lerp(퐪, 퐪, |푎 − 푠|), lerp(퐪, 퐪, |푎 − 푠|), |푎 − 푠|)

The relevant part of the GLSL shader source code implementing this can be seen in Appendix B.

14 3.2.3 BCC linear box spline

The GLSL implementation by Finkbeiner et al. in [7] can be adapted with only some slight modifications. As the storage model used by their method differs from the one used in this project, the data access needed to be adapted. In the model used for their algorithm, the Cartesian grid spacing is 2. In such a model, a % point have all odd coordinates, while a U point have all even coordinates. A lookup must be directed to the correct sub-grid texture. A simple translation and selection of texture is made by first scaling the point of interest’s coordinates by 2 before interpolating.

퐒 = 2퐬

When a point 퐏 is to be sampled, the correct sub-volume can be chosen based on the parity of any of its coordinates.

̃ 푓 − Δ if 푃 is odd 푓(퐏) = 푓 − Δ otherwise

3.2.4 BCC nearest neighbor

To interpolate the signal in a point 퐩, find the nearest % and U points and return the signal in the one closest to 퐩.

1. First find both the nearest % , 퐚, and U , 퐛 points using Algorithm 1.

2. Find 퐪, the one with the smallest distance to 퐩.

퐚 if ‖퐩 − 퐚‖ < ‖퐩 − 퐛‖ 퐪 = 퐛 otherwise

3. Make a single lookup at 퐪. 푓(퐩)̃ = 푓(퐪)

Note: In the final implementation, for performance reasons, two lookups are always performed and the sample to use is selected by casting the comparison ‖퐩 − 퐮‖ < ‖퐩 − 퐯‖ to an integer. Tests show that this is slightly faster.

3.2.5 FCC nearest neighbor

To interpolate the signal in a point 퐩, find the nearest % , U , u% and E% points and return the signal value in the one closest to 퐩.

1. Find nearest % , U , u% and E% points using Algorithm 2 and call them 퐚, 퐛, 퐜 and 퐝 respectively.

15 2. Let 퐴 = {퐚, 퐛, 퐜, 퐝} Select 퐪 to be the element in 퐴 that is closest to 퐩.

퐪 = arg min (‖퐩 − 퐫‖)

3. Make a single lookup at 퐪. 푓(퐩)̃ = 푓(퐪)

3.3 Storage scheme

As the description above shows, the DC spline method uses trilinear interpolation which requires neighbors in a subgrid to be actual neighbors in the data format. This restriction makes the data format used in [7] impossible to use, since it mixes data from each subgrid in the same texture space. Fortunately, going the other way is possible, i.e. the box spline can use the data format suggested for use with the DC-spline method.

3.3.1 Interleaved storage model

The suggested implementation in [5] uses an interleaved format that represents values from each grid as separate components of the same texels. The intended use of multi- ple components in OpenGL is to store color channels together and have the graphics card treat them as one vector. Thankfully, shaders also have access to each vector com- ponent individually, which gives the opportunity to store the intensity data as two or four such components for BCC and FCC respectively. Table 3.3 shows Voreen’s supported volume formats and the corresponding OpenGL texture format they are loaded into. As can be seen, there are seemingly already for- mats supporting both 8 and 16-bit data with two and four components. However, the only two component model available (LA) later loads the data as a four component 8-bit texture, always assuming that the values are 16-bit unsigned integers. This is un- desirable, as values in a 16-bit unsigned volume would have to be recomputed from two 8-bit values in the fragment shader, possibly losing both precision and performance. Furthermore, an 8-bit volume would be even more unusable in this format, as two values would be mistakenly loaded for each texel (RG and BA) and thus also break the coordinate system. The correct data type would be Volume2xInt16 and Vol- ume2xUInt16, which are both recognized by the VolumeGL class and resolved into the OpenGL GL_LUMINANCE16_ALPHA16 format. As Table 3.3 shows how- ever, the code is never reached as seemingly no part of Voreen makes use of this data type. Intuitively the RawVolumeReader would then pass volumes using the LA model along in this format, but it actually uses Volume4xUInt8 as said previously.

16 Table 3.3: Supported volume types and their corresponding Voreen and OpenGL in- ternal formats [10]

Model Format Voreen format OpenGL format I CHAR VolumeUInt8 GL_ALPHA8 I UCHAR VolumeUInt8 GL_ALPHA8 I SHORT VolumeUInt16 GL_ALPHA16 I USHORT VolumeUInt16 GL_ALPHA16 I INT VolumeUInt16 GL_ALPHA16 I UINT VolumeUInt16 GL_ALPHA16 I FLOAT VolumeFloat GL_ALPHA I FLOAT8 VolumeUInt8 GL_ALPHA8 I FLOAT16 VolumeUInt16 GL_ALPHA16 RGBA UCHAR Volume4xUInt8 GL_RGBA8 RGBA USHORT Volume4xUInt16 GL_RGBA16 RGB UCHAR Volume3xUInt8 GL_RGB8 RGB USHORT Volume3xUInt16 GL_RGB16 RGB FLOAT Volume3xUFloat GL_RGB LA Any Volume4xUInt8 GL_RGBA8

This behavior is not changed, as it could be in use by something that is not in the pub- lic repository. Instead, the DatVolumeReader and RawVolumeReader are ex- tended to understand a new object model RG. The model’s name is chosen to continue the trend of RGB and RGBA which are also named after what they are finally loaded as in OpenGL. The unused code in VolumeGL that handles Volume2xUInt16 is then re-purposed to store two 16-bit values in the R and G channels (GL_RG16). This takes care of the RG object model using the USHORT sample format. A new similar piece of code is added to the class that does the same for two 8-bit values (GL_RG8), which takes care of the RG object model with the sample format UCHAR.

3.3.2 Separate storage model

The volumes can also be split into their Cartesian cubic grid components stored as separate volumes. Doing this requires that we bind the two or four textures to different texture units, as all of them need to be available to the shader simultaneously. It may be more cumbersome to the user to load up to four volumes separately, but this could be worked around by enhancing the support for loading multiple volumes at once. For instance, a volume source processor supporting bulk loading of all necessary volumes could be created.

17 Table 3.4: New volume types and their corresponding Voreen and OpenGL internal formats

Model Format Voreen type Internal format RG CHAR Volume2xInt8 GL_RG8 RG UCHAR Volume2xUInt8 GL_RG8 RG SHORT Volume2xInt16 GL_RG16 RG USHORT Volume2xUInt16 GL_RG16

3.4 Gradient calculation

To perform shading, Voreen’s other shaders normally calculate gradients of sampled values to use as normals with a user-selectable shading method. Gradient calculation on the fly is a heavy operation requiring many neighbor values to be reconstructed for each sample point. The extra lookups required for each sampling in the non-Cartesian methods make this process computationally unfeasible. Instead the gradients for each known point in the grid can be pre-calculated and stored to- gether with its pertaining intensity value. As there is no OpenGL internal format that can store 8 or 16 components as would be necessary for BCC and FCC respectively, the separate storage model has to be used if the gradients are to be included. The interleaved storage method could be used if the gradients could be stored in a separate texture, but this would however require interpolation to be performed twice for each sample point, impacting performance. If we use the separate storage scheme, the interleaved intensity grids are split into sep- arate textures, stored together with their corresponding gradient vector and loaded as GL_RGBA textures. We can then exploit vectorization and interpolate both intensity and gradient values in a single operation.

3.4.1 Inside Voreen

A gradient estimation technique called second-order central differencing (SOCD) [11] can be used for both BCC and FCC. It is a simple and cheap method that can still yield acceptable results. For 퐩 = (푥, 푦, 푧) in Figure 3.2, where ℎ is the grid spacing, it is

18 q0,1,0

q h 0,0,-1 q-1,0,0 p

q1,0,0

q0,0,1

q0,-1,0

Figure 3.2: First order neighbors to 퐩 defined like this: 휕푓 푓(퐪 ) − 푓(퐪 ) ≈ 휕푥 2ℎ 휕푓 푓(퐪 ) − 푓(퐪 ) ≈ 휕푦 2ℎ

휕푓 푓(퐪 ) − 푓(퐪 ) ≈ 휕푧 2ℎ

This technique is equivalent to the standard central differencing method performed separately on each of the CC grids that makes up a BCC or FCC grid. Even more fortu- nately, this functionality already exists in Voreen, as can be seen in include/voreen/- core/datastructures/volume/gradient.h in the source code [10], and is easily accessible to the user via the VolumeGradient processor. This processor can use the aforementioned algorithm on the input data set and save the resulting gradient volume to memory and disk and make it available to the network on its output port. To use this for BCC or FCC the separate CC grids are loaded into separate Volume- Source processors and then connected to a VolumeGradient processor that has its Technique option set to Central differences. To combine intensity values and calculated gradients into one volume, a new very simple processor called VolumeInterleave is writ- ten. It has two volume input ports and one volume output port. The user connects the gradient volume to the first port and the intensity volume to the second and then the output port to a BCC or FCC ray caster.

19 3.4.2 Externally

If the user wants to use other gradient estimation techniques, this can of course be done outside Voreen. The gradients have to be stored as triplets of unsigned 8-bit or 16-bit integers matching the intensity format. When unsigned, the negative part of their domain has to be shifted into the positive side. This is done by a simple multipli- cation and an addition. 1 ∇푓 1 1 1 ∇푓̂ = + , , (3.1) 2 ‖∇푓‖ 2 2 2

If we are using signed values a simple normalization will do. ∇푓 ∇푓̂ = ‖∇푓‖

Then the values are ready to be expanded and stored in their respective binary format. OpenGL will re-normalize and shift the values before they are provided to the shader program. Irregardless of signedness, the shader will be presented with values in the range [0, 1]. To restore them to normalized gradients, it therefore performs (3.1) in reverse. 1 ∇푓̂ = 2 ∇푓̂ − 2

The gradient is now ready to be used in a standard lighting model computation, such as for Phong lighting.

3.5 Module and processors

The code is organized into one module, called bcc, that provides two new processors, BCCRaycaster and FCCRaycaster. The first has two input volume ports and the second naturally has four. To these ports the user connects the correct CC sub-grids. Except for their different input mecha- nism, they function the same way as the standard SingleVolumeRaycaster in Voreen’s base module. They have three image output ports that can be configured to gener- ate renders using different lighting models and compositing modes. The code for the new processors is based on the SingleVolumeRaycaster and the MultiVolumeRaycaster from the base module. To use the single texture format the user only connects a volume to the first input port. To use the separate texture format, the user connects textures to all the input ports and the raycasters will automatically switch to using this format. A processor called VolumeInterleave is also added to the base module. Its intended use is interleaving gradients and intensity values when performing gradient calculation inside Voreen as described in section 3.4.1.

20 3.5.1 Step length compensation

The step length for raycasters in Voreen is calculated from a user-defined constant called sample rate. This value affects the rendered image both because of the number of samples gathered for each ray and pixel, and because it is used in the compositing step to make blending look similar even between different step lengths. The step length is defined as 1/max(volume dimensions) × sample rate for a regular CC volume. This will generate a step length that is too long for BCC and FCC volumes, as only one of the sub-grids is taken into account. Therefore, the BCCRaycaster and FCCRaycaster compensates by multiplying the resulting step length with a factor. This factor is the ratio of additional samples existing in sub-grids in relation to a pure CC grid. This is equal to the multiplicative inverse of the real cubic root of 2 and 4 for BCC and FCC respectively. In other words, the adjusted step length is calculated like this: 1 step length = step length ⋅ √ 2 1 step length = step length ⋅ √ 4

3.6 Utilities

To be able to generate volumes to render is essential during the development of the new raycasters within Voreen. A tool named mkvol is created for this purpose. It gen- erates data sets in Voreen dat-raw format of mathematical functions, including gradi- ents, for CC, BCC and FCC.

3.7 Measuring performance

To test the performance of a raycaster Voreen has a handy set of functions available in its Python library, which can be accessed from a script running within Voreen’s python module. The script benchmark.py that comes shipped with Voreen is modified to load a predefined set of volume files and run them in sequence. Like the original version, the script renders a volume using the chosen raycaster for a given number of frames, spreading out a full rotation of the object among them. The average frame rate is then calculated from the amount of time that passed during the rendering of the frames. The modified version outputs the number of voxels in the benchmark volume, the average frame rate and the total amount of time the test took to run. The results of the various benchmarks that are run on the new raycasters are in section 4.1.

21 The signal used for generating the test volumes is a simple sphere defined as:

1 − |퐩| if |퐩| < 1.0 푓(퐩) = 0 otherwise

3.8 Measuring rendering quality

To measure the render quality, produced images are compared to a render of a rela- tively high resolution CC volume of the same signal. To produce a difference image Voreen’s BinaryImageProcessor is utilized, rendering an image using a shader that gives every pixel the absolute value of the difference of the corresponding in the high resolution CC render and the processor being measured. A Voreen network that does this can be seen in Figure 3.3, and Figure 3.4 shows an example of a differ- ence image being produced. The network is combined with a script that rotates the volume 360 around the 푦-axis and during this captures a number of renders of this difference image and analyzes them. Three types of error metrics are calculated for every rendered image: maximum error, mean error and root mean square error (RMSE). The mean and sample standard devi- ation of these values are calculated for a set of different volume sizes. Due to how the step size is calculated, the number of steps per voxel remains constant when switching volume size. The sphere signal is used for generating volumes also in this test. The results of the comparison of the CC, BCC and FCC raycasters and their recon- struction modes can be seen in section 4.2.

22 Figure 3.3: The Voreen network used to generate a difference image for error analysis, here with the BCC raycaster currently being measured

23 (a) Truth (b) BCC render of the same sig- nal sampled at 2×6×6×6 resolu- tion using the linear box spline method

(c) Difference of (a) and (b)

Figure 3.4: Example of images used in the rendering quality analysis

24 Chapter 4

Results and discussion

4.1 Performance

In Figure 4.1 the performance of the BCC raycaster can be seen compared to the CC raycaster. In Figure 4.2 the performance of the FCC raycaster can be seen compared to the CC raycaster. The method used for measuring performance is described in section 3.7. Note that while the number of voxels is chosen as the 푥-axis for these plots, this value does not affect the performance in a straight forward manner. Rather, the dimensions of the volume affects the number of samples used per pixel in the image, as the ray marching step size is the inverse of the largest side of the volume multiplied by the user-defined constant sample rate.

4.2 Rendering quality

Seen in Figure 4.4 is the mean rendering error values over 10 frames when rendering the sphere signal for different sizes. The error bars show the sample standard deviation for the values. Sample rate is set to 4.0 and a banded, monochrome transfer function is used. Figure 4.3 shows renders for visual comparison of the frequently seen Marschner- Lobb signal, presented in [12]. It is useful for its varying frequency content. The vol- umes are cube shaped and as close as possible to two million voxels in size. is enabled using pre-calculated gradients.

25 30 CC nearest neighbor CC linear 25 BCC nearest neighbor BCC DC spline BCC linear box spline 20

15

10 Frames per second (FPS) Frames

5

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 Millions of voxels (a) Separate format

30 CC nearest neighbor CC linear 25 BCC nearest neighbor BCC DC spline BCC linear box spline 20

15

10 Frames per second (FPS) Frames

5

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 Millions of voxels (b) Interleaved format

Figure 4.1: Performance comparison of the CC and BCC raycasters and their different reconstruction methods when rendering a sphere as 256×256 pixels without shading

26 30 CC nearest neighbor CC linear 25 FCC nearest neighbor FCC DC spline

20

15

10 Frames per second (FPS) per second Frames

5

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 Millions of voxels (a) Separate format

30 CC nearest neighbor CC linear 25 FCC nearest neighbor FCC DC spline

20

15

10 Frames per second (FPS) per second Frames

5

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 Millions of voxels (b) Interleaved format

Figure 4.2: Performance comparison of the CC and FCC raycasters and their different reconstruction methods when rendering a sphere as 256×256 pixels without shading

27 (a) CC trilinear (126×126×126) (b) BCC DC-spline (2×100×100×100 voxels)

(c) BCC linear box spline (2×100×100×100 (d) FCC DC-spline (4×80×80×80 voxels) voxels)

Figure 4.3: Renders of datasets of the Marschner-Lobb test signal [12]

28 RMSE, sphere rendered with banded TF, 10 frames, sample rate 4.0 NRMSE, sphere rendered with banded TF, 10 frames, sample rate 4.0 16000 0.25 CC trilinear CC trilinear 14000 BCC DC-spline BCC DC-spline BCC linear box spline 0.2 BCC linear box spline 12000 FCC DC-spline FCC DC-spline iue4.4: Figure 10000 0.15 8000 Error Error 6000 0.1 peerneigqaiyaayi results analysis quality rendering Sphere 4000 0.05 2000

0 0 0 500 1000 1500 2000 2500 3000 0 500 1000 1500 2000 2500 3000

29 Voxels Voxels

Mean error, sphere rendered with banded TF, 10 frames, sample rate 4.0 Max error, sphere rendered with banded TF, 10 frames, sample rate 4.0 9000 70000 CC trilinear CC trilinear 8000 BCC DC-spline 60000 BCC DC-spline BCC linear box spline BCC linear box spline 7000 FCC DC-spline 50000 FCC DC-spline 6000

5000 40000

Error 4000 Error 30000 3000 20000 2000 10000 1000

0 0 0 500 1000 1500 2000 2500 3000 0 500 1000 1500 2000 2500 3000 Voxels Voxels 4.3 Discussion

4.3.1 Performance

As the results show the frame rates for the non-Cartesian raycasters can be considered interactive. The frame rate is as expected seemingly proportional to the number of samples per pixel, and thus only slower than the CC algorithm by a constant factor. Important to note is that the performance measurement results can only be said to fairly measure the performance of this specific implementation. They do not serve as a just comparison of the algorithms themselves, as they are knowingly implemented in a non-optimal way so that they can all use the same storage scheme.

4.3.2 Rendering quality

The rendering quality proved to be acceptable for the tests performed. Most importantly, as this is the goal of the project, the rendering quality is comparable to, or sometimes even exceeding, the CC raycaster. What this tells is mostly that the quality can be said to be good enough for use cases where the reconstructional accu- racy and image quality is not of utmost importance. A renderer designed for real-time should not be used in this case at any rate. Like the performance measurements, the measured quality is only representative of the filters in this particular implementation and configuration. Only the sphere signal results are presented in this report. Other signals were at- tempted but results were not meaningful as visible artifacts were present also in the reference image. To test quality more extensively, a different rendering technique should be used to render the reference image. While the sphere can be rendered with- out visible artifacts in Voreen, it was not possible for other signals for viable volume dimensions and step length.

30 Chapter 5

Conclusions

5.1 System requirements

All the general system requirements for Voreen listed at http://voreen.org/ 98-System-Requirements.html of course apply. Additionally, the internal formats used for BCC volumes (GL_RG8 and GL_RG16) have been available since OpenGL version 3.0 without the use of an extension [13]. These features are supported not by all, but by most modern hardware and graphics drivers.

5.2 Known issues

• Other raycasters render nothing with a RG volume as input since the intensity is loaded into the red and green channels instead of the alpha channel. Since this is technically not an error, no error or warning message is shown. This may confuse the user.

• The in the transfer function editor is broken for the BCC and FCC raycaster when using volumes in the interleaved format.

• 12-bit volumes are not supported in the RG sample format.

5.3 Future work

A different storage scheme should be investigated for performance reasons. The single- texture interleaved format used by Finkbeiner et al. in [7] allows for very efficient lookups using a single sampler and just two integer divisions. This while still main- taining the packing efficiency of 100%. The method should provide a performance

31 benefit for all algorithms other than the DC-spline. The multi-channel and multi- texture formats used in the current implementation requires branching to decide which texture or channel to use. This may force threads to execute both parts of the branch and thereby unnecessarily fetching values from all samplers. More optimizations could likely be found by profiling the shader code and by analyz- ing the resulting assembly of the GLSL programs.

5.4 Conclusion

Existing methods for efficiently reconstructing a signal on the BCC and FCC grids are sought out and evaluated. The ones deemed most appropriate for an interactive GPU implementation are selected and implemented in Voreen. Minimal changes are made to Voreen’s core for the new components to function, and are non-intrusive in nature. Furthermore the relative performance and rendering quality of the new components is studied. The results show that the new raycasters maintain acceptable frame rates and comparable image quality to the CC raycaster. The DC-spline method implementation performs well relative to the CC trilinear method despite requiring five or sixteen times as many texture lookups and several lerps. Even while using a suboptimal storage layout, the linear box spline also performs relatively well. As the new processors are modeled after the SingleVolumeRaycaster they can be used as drop-in replacements in existing raycasting workspaces. In the end, the produced software is able to perform the tasks it is designed to do and can also be useful as a starting point for similar projects in the future.

32 Bibliography

[1] Tai Meng, Benjamin Smith, Alireza Entezari, Arthur E. Kirkpatrick, Daniel Weiskopf, Leila Kalantari, and Torsten Möller. On visual quality of optimal 3D sampling and reconstruction. In Proceedings of Graphics Interface 2007, GI ’07, pages 265–272, New York, NY, USA, 2007. ACM. [2] Voreen Project. Project homepage. Retrieved March 19, 2012, from http:// voreen.org/. [3] J. Kruger and R. Westermann. Acceleration techniques for GPU-based volume rendering. In Proceedings of the 14th IEEE Visualization 2003 (VIS’03), VIS ’03, pages 38–, Washington, DC, USA, 2003. IEEE Computer Society. [4] Christian Sigg and Markus Hadwiger. Fast third-order texture filtering. In Matt Pharr, editor, GPU Gems 2: Programming Techniques for High-Performance Graph- ics and General-Purpose Computation, chapter 20, pages 313–329. Addison-Wesley Professional, 2005. [5] Balázs Domonkos and Balázs Csébfalvi. DC-splines: Revisiting the trilinear in- terpolation on the body-centered cubic lattice. In Proceedings of the 15th Vision, Modeling, and Visualization Workshop (VMV), pages 275–282, Siegen, Germany, November 2010. [6] Bernhard Finkbeiner, Usman R. Alim, Dimitri Van De Ville, and Torsten Möller. High-quality volumetric reconstruction on optimal lattices for computed tomog- raphy. Comput. Graph. Forum, 28(3):1023–1030, 2009. [7] B. Finkbeiner, A. Entezari, D. V. D. Ville, and T. Möller. Efficient volume render- ing on the body centered cubic lattice using box splines. Computers & Graphics, 34(4):409–423, 2010. [8] Balázs Domonkos and Balázs Csébfalvi. 3D frequency domain analysis of re- construction schemes on the body-centered cubic lattice. & Geometry, Internet-Journal, 2011, 13(1), 2011. [9] Minho Kim, A. Entezari, and J. Peters. Box spline reconstruction on the face- centered cubic lattice. Visualization and Computer Graphics, IEEE Transactions on, 14(6):1523–1530, November – December 2008.

33 [10] Voreen Project. Source code snapshot of Voreen, revision 41. Retrieved March 19, 2012, from Subversion repository at http://svn.voreen.org/ public/voreen-snapshot/?p=41.

[11] Z. Hossain and T. Möller. A novel approach and comparison of normal estima- tion techniques on body centric cubic (BCC) lattice. Retrieved May 22, 2012, from http://graphics.stanford.edu/~zhossain/cmpt_767/ cmpt767-zahid-final.pdf.

[12] Stephen R. Marschner and Richard J. Lobb. An evaluation of reconstruction fil- ters for volume rendering. In Proceedings of the conference on Visualization ’94, VIS ’94, pages 100–107, Los Alamitos, CA, USA, 1994. IEEE Computer Society Press.

[13] The Khronos Group. OpenGL ARB_texture_rg specification. Retrieved May 2, 2012, from http://www.opengl.org/registry/specs/ARB/ texture_rg.txt.

34 Appendices

35 Appendix A

BCC fragment shader

1 #define NORM(v) ((v) / volumeStruct1_.datasetDimensions_) 2 3 #ifndef VOLUME_FORMAT_INTERLEAVED 4 #define SAMPLE vec4 5 #define TEX0(p) texture(volumeStruct1_.volume_, NORM((p) - g0_off)) 6 #define TEX1(p) texture(volumeStruct2_.volume_, NORM((p) - g1_off)) 7 #define TEX2(p) texture(volumeStruct3_.volume_, NORM((p) - g2_off)) 8 #define TEX3(p) texture(volumeStruct4_.volume_, NORM((p) - g3_off)) 9 #else 10 #define SAMPLE float 11 #define TEX0(p) texture(volumeStruct1_.volume_, NORM((p) - g0_off)).r 12 #define TEX1(p) texture(volumeStruct1_.volume_, NORM((p) - g1_off)).g 13 #define TEX2(p) texture(volumeStruct1_.volume_, NORM((p) - g2_off)).b 14 #define TEX3(p) texture(volumeStruct1_.volume_, NORM((p) - g3_off)).a 15 #endif 16 17 const vec3 g0_off = vec3(0.0, 0.0, 0.0); 18 const vec3 g1_off = vec3(0.0, 0.5, 0.5); 19 const vec3 g2_off = vec3(0.5, 0.0, 0.5); 20 const vec3 g3_off = vec3(0.5, 0.5, 0.0); 21 22 vec4 reconstructDC(in vec3 p) 23 { 24 vec3 pw = p * volumeStruct1_.datasetDimensions_ - 0.5; 25 26 vec3 v0 = round(pw); /* White. */ 27 vec3 v1 = round(pw + g1_off) - g1_off; /* White-black. */ 28 vec3 v2 = round(pw + g2_off) - g2_off; /* Black. */ 29 vec3 v3 = round(pw + g3_off) - g3_off; /* White-white. */ 30 31 mat3 flipMatrix = 32 mat3(vec3(sign(v2.x - v0.x), 0.0, 0.0), 33 vec3(0.0, sign(v3.y - v0.y), 0.0), 34 vec3(0.0, 0.0, sign(v1.z - v0.z)));

36 35 36 vec3 sample; 37 38 sample = v0 + flipMatrix * vec3(0.5, 0.0, 0.0) + 0.5; 39 SAMPLE p100 = (TEX0(sample) + 40 TEX3(sample) + 41 TEX2(sample)) / 3.0; 42 43 sample = v0 + flipMatrix * vec3(0.0, 0.5, 0.0) + 0.5; 44 SAMPLE p010 = (TEX0(sample) + 45 TEX3(sample) + 46 TEX1(sample)) / 3.0; 47 48 sample = v0 + flipMatrix * vec3(0.0, 0.0, 0.5) + 0.5; 49 SAMPLE p001 = (TEX0(sample) + 50 TEX2(sample) + 51 TEX1(sample)) / 3.0; 52 53 sample = v0 + flipMatrix * vec3(0.5, 0.5, 0.5) + 0.5; 54 SAMPLE p111 = (TEX2(sample) + 55 TEX1(sample) + 56 TEX3(sample)) / 3.0; 57 58 SAMPLE p000 = TEX0(v0 + 0.5); 59 SAMPLE p011 = TEX1(v1 + 0.5); 60 SAMPLE p101 = TEX2(v2 + 0.5); 61 SAMPLE p110 = TEX3(v3 + 0.5); 62 63 SAMPLE q00 = mix(p000, p010, 2.0 * abs(pw.y - v0.y)); 64 SAMPLE q11 = mix(p101, p111, 2.0 * abs(pw.y - v0.y)); 65 SAMPLE q01 = mix(p001, p011, 2.0 * abs(pw.y - v0.y)); 66 SAMPLE q10 = mix(p100, p110, 2.0 * abs(pw.y - v0.y)); 67 68 SAMPLE left = mix(q00, q01, 2.0 * abs(pw.z - v0.z)); 69 SAMPLE right = mix(q11, q10, 2.0 * abs(pw.z - v1.z)); 70 SAMPLE value = mix(left, right, 2.0 * abs(pw.x - v0.x)); 71 72 #ifndef VOLUME_FORMAT_INTERLEAVED 73 value.xyz -= 0.5; 74 return value; 75 #else 76 return vec4(0.0, 0.0, 0.0, value); 77 #endif 78 } 79 80 vec4 reconstructNearest(in vec3 X) 81 { 82 vec3 x = X * volumeStruct1_.datasetDimensions_ - 0.5; 83 84 vec3 j[4]; 85 j[0] = round(x); 86 j[1] = round(x + g1_off) - g1_off; 87 j[2] = round(x + g2_off) - g2_off; 88 j[3] = round(x + g3_off) - g3_off; 89 90 float jd[4]; 91 jd[0] = distance(j[0], x); 92 jd[1] = distance(j[1], x); 93 jd[2] = distance(j[2], x); 94 jd[3] = distance(j[3], x); 95 96 SAMPLE value; 97 98 if (jd[0] < jd[1]) { 99 if (jd[0] < jd[2]) {

37 100 if (jd[0] < jd[3]) 101 value = TEX0(j[0] + 0.5); 102 else 103 value = TEX3(j[3] + 0.5); 104 } 105 else { 106 if (jd[2] < jd[3]) 107 value = TEX2(j[2] + 0.5); 108 else 109 value = TEX3(j[3] + 0.5); 110 } 111 } 112 else { 113 if (jd[1] < jd[2]) { 114 if (jd[1] < jd[3]) 115 value = TEX1(j[1] + 0.5); 116 else 117 value = TEX3(j[3] + 0.5); 118 } 119 else { 120 if (jd[2] < jd[3]) 121 value = TEX2(j[2] + 0.5); 122 else 123 value = TEX3(j[3] + 0.5); 124 } 125 } 126 127 #ifndef VOLUME_FORMAT_INTERLEAVED 128 value.xyz = (value.xyz - vec3(0.5)) * 2.0; 129 return value; 130 #else 131 return vec4(0.0, 0.0, 0.0, value); 132 #endif 133 }

38 Appendix B

FCC fragment shader

1 #define NORM(v) ((v) / volumeStruct1_.datasetDimensions_) 2 3 #ifndef VOLUME_FORMAT_INTERLEAVED 4 #define SAMPLE vec4 5 #define TEX0(p) texture(volumeStruct1_.volume_, NORM((p) - g0_off)) 6 #define TEX1(p) texture(volumeStruct2_.volume_, NORM((p) - g1_off)) 7 #define TEX2(p) texture(volumeStruct3_.volume_, NORM((p) - g2_off)) 8 #define TEX3(p) texture(volumeStruct4_.volume_, NORM((p) - g3_off)) 9 #else 10 #define SAMPLE float 11 #define TEX0(p) texture(volumeStruct1_.volume_, NORM((p) - g0_off)).r 12 #define TEX1(p) texture(volumeStruct1_.volume_, NORM((p) - g1_off)).g 13 #define TEX2(p) texture(volumeStruct1_.volume_, NORM((p) - g2_off)).b 14 #define TEX3(p) texture(volumeStruct1_.volume_, NORM((p) - g3_off)).a 15 #endif 16 17 const vec3 g0_off = vec3(0.0, 0.0, 0.0); 18 const vec3 g1_off = vec3(0.0, 0.5, 0.5); 19 const vec3 g2_off = vec3(0.5, 0.0, 0.5); 20 const vec3 g3_off = vec3(0.5, 0.5, 0.0); 21 22 vec4 reconstructDC(in vec3 p) 23 { 24 vec3 pw = p * volumeStruct1_.datasetDimensions_ - 0.5; 25 26 vec3 v0 = round(pw); 27 vec3 v1 = round(pw + g1_off) - g1_off; 28 vec3 v2 = round(pw + g2_off) - g2_off; 29 vec3 v3 = round(pw + g3_off) - g3_off; 30 31 mat3 flipMatrix = 32 mat3(vec3(sign(v2.x - v0.x), 0.0, 0.0), 33 vec3(0.0, sign(v3.y - v0.y), 0.0), 34 vec3(0.0, 0.0, sign(v1.z - v0.z)));

39 35 36 vec3 sample; 37 38 sample = v0 + flipMatrix * vec3(0.5, 0.0, 0.0) + 0.5; 39 SAMPLE p100 = (TEX0(sample) + 40 TEX3(sample) + 41 TEX2(sample)) / 3.0; 42 43 sample = v0 + flipMatrix * vec3(0.0, 0.5, 0.0) + 0.5; 44 SAMPLE p010 = (TEX0(sample) + 45 TEX3(sample) + 46 TEX1(sample)) / 3.0; 47 48 sample = v0 + flipMatrix * vec3(0.0, 0.0, 0.5) + 0.5; 49 SAMPLE p001 = (TEX0(sample) + 50 TEX2(sample) + 51 TEX1(sample)) / 3.0; 52 53 sample = v0 + flipMatrix * vec3(0.5, 0.5, 0.5) + 0.5; 54 SAMPLE p111 = (TEX2(sample) + 55 TEX1(sample) + 56 TEX3(sample)) / 3.0; 57 58 SAMPLE p000 = TEX0(v0 + 0.5); 59 SAMPLE p011 = TEX1(v1 + 0.5); 60 SAMPLE p101 = TEX2(v2 + 0.5); 61 SAMPLE p110 = TEX3(v3 + 0.5); 62 63 SAMPLE q00 = mix(p000, p010, 2.0 * abs(pw.y - v0.y)); 64 SAMPLE q11 = mix(p101, p111, 2.0 * abs(pw.y - v0.y)); 65 SAMPLE q01 = mix(p001, p011, 2.0 * abs(pw.y - v0.y)); 66 SAMPLE q10 = mix(p100, p110, 2.0 * abs(pw.y - v0.y)); 67 68 SAMPLE left = mix(q00, q01, 2.0 * abs(pw.z - v0.z)); 69 SAMPLE right = mix(q11, q10, 2.0 * abs(pw.z - v1.z)); 70 SAMPLE value = mix(left, right, 2.0 * abs(pw.x - v0.x)); 71 72 #ifndef VOLUME_FORMAT_INTERLEAVED 73 value.xyz -= 0.5; 74 return value; 75 #else 76 return vec4(0.0, 0.0, 0.0, value); 77 #endif 78 } 79 80 vec4 reconstructNearest(in vec3 X) 81 { 82 vec3 x = X * volumeStruct1_.datasetDimensions_ - 0.5; 83 84 vec3 j[4]; 85 j[0] = round(x); 86 j[1] = round(x + g1_off) - g1_off; 87 j[2] = round(x + g2_off) - g2_off; 88 j[3] = round(x + g3_off) - g3_off; 89 90 float jd[4]; 91 jd[0] = distance(j[0], x); 92 jd[1] = distance(j[1], x); 93 jd[2] = distance(j[2], x); 94 jd[3] = distance(j[3], x); 95 96 SAMPLE value; 97 98 if (jd[0] < jd[1]) { 99 if (jd[0] < jd[2]) {

40 100 if (jd[0] < jd[3]) 101 value = TEX0(j[0] + 0.5); 102 else 103 value = TEX3(j[3] + 0.5); 104 } 105 else { 106 if (jd[2] < jd[3]) 107 value = TEX2(j[2] + 0.5); 108 else 109 value = TEX3(j[3] + 0.5); 110 } 111 } 112 else { 113 if (jd[1] < jd[2]) { 114 if (jd[1] < jd[3]) 115 value = TEX1(j[1] + 0.5); 116 else 117 value = TEX3(j[3] + 0.5); 118 } 119 else { 120 if (jd[2] < jd[3]) 121 value = TEX2(j[2] + 0.5); 122 else 123 value = TEX3(j[3] + 0.5); 124 } 125 } 126 127 #ifndef VOLUME_FORMAT_INTERLEAVED 128 value.xyz = (value.xyz - vec3(0.5)) * 2.0; 129 return value; 130 #else 131 return vec4(0.0, 0.0, 0.0, value); 132 #endif 133 }

41