Hierarchical Multi-resolution Data Structure forMolecular Visualization

DIPLOMA THESIS

submitted in partial fulfillmentofthe requirements forthe degree of

Diplom-Ingenieurin

in

Visual Computing

by

Milena Nowak, BSc Registration Number 0927584

to the Faculty of Informatics at the TU Wien

Advisor: Prof.Dipl.-Ing. Dr.techn. Stefan Bruckner

Vienna, 27th July,2020 Milena Nowak Stefan Bruckner

Technische UniversitätWien A-1040 Wien Karlsplatz 13 Tel. +43-1-58801-0 www.tuwien.ac.at

Erklärungzur Verfassungder Arbeit

Milena Nowak, BSc Speckbachergasse 13, 1160 Wien

Hiermit erkläreich,dass ichdieseArbeit selbständig verfasst habe, dassich dieverwen- detenQuellenund Hilfsmittel vollständig angegebenhabeund dassich dieStellender Arbeit–einschließlichTabellen, Karten undAbbildungen –, die anderenWerken oder dem InternetimWortlaut oder dem Sinn nach entnommen sind, auf jeden Fall unter Angabeder Quelle als Entlehnung kenntlich gemacht habe.

Wien, 27. Juli 2020 Milena Nowak

iii

Acknowledgements

There is areasonthe first person who receives thanks is generally the person who supervised the thesis. In Norwegian,the role is described as "leading theway"towards finishingadegree. Thatjob, it turnsout,involves alot of work. Therefore, Iwouldfirst and foremostliketothank StefanBrucknerfor his patience and clarity. But academicsupport is notthe onlykind needed,somythanks alsogotoMarte, who’s companymade long workinghoursduring short autumndaysalot more pleasant, and Rebecca, for her invaluable moral support and feedback. Iwould also like to thank everyone who madeitpossiblefor me to spendmost of my degree in Bergen, as well as my family formaking it clear that Ihavetheir support -and ahome -nomatter whereIam.

v

Kurzfassung

Biomolekulare Datensätzesind nicht nurkomplex,sie wachsen auchmit der Entwicklung des Forschungsgebietes. Um ihre Eigenschaften zu erforschen, verwendenExperten un- ter anderemdreidimensionale Modelle der Datensätze,die ausMillionen individueller Moleküle bestehen. Beider visuellen Analyse derModelle kann es außerdem hilfreich sein, durchverschiedene Darstellungsformenstrukturelle Eigenschaften zur Geltungzu bringen. Daher gibteseinen Bedarf an flexiblenDatenstrukturen,die es ermöglichten, effizientmit solche Datensätze zu arbeiten. Vorhandene Lösungsansätze im Bereich großer, aufPunktwolken aufbauender Datensätze verwenden in den meisten Fällen rasterbasierte Strukturen in verschiedenen Auflösungen. AnderePublikation konzentrieren sichdagegenauf die Verbesserungder Oberflächendar- stellung,oder auf sichwiederholendeStrukturen innerhalb großer Datensätze. Wirschlagen eine Octree Datenstruktur vor, die die Daten räumlichsoaufteilt, dass jeder Datenblock eine ähnliche Menge Datenpunktebeinhaltet, undmehrere, interpo- lierbaren Auflösungsstufenzur Verfügung .Der Aufbau der Datenstruktur erfolgt in einem Vorverarbeitungsschritt,dessen Resultat gespeichert wird,und daher nurein Mal nötigist.Zusätzlicheffektivierenwir die Darstellungdurchdie Verwendung eines Least RecentlyUsed Cache, derdas LadensichtbarerDatenblöcke verwaltet, und durch perspektivenabhängige Detaildarstellung. In unserer Auswertungzeigen wir,dass insbesonderedas Anpassendes dargestellten Detailgrades die Frameratensignifikant erhöht. Die Kombination einerhöheren Auflösung im Vordergrund, beigleichzeitiger Reduktionder Daten im Hintergrund,erhöht die Geschwindigkeit deutlich, ohne, dass dievisuelle Qualitätbeeinträchtig wird. Durch die Optionen, Detailszureduzierenund nurDatenblöcke zu laden, die im Blickfeld der Kamera sind,wird es möglich, Datensätze anzuzeigen,die ansonsten die Kapazität der uns zur VerfügungstehendenRessourcen überlastenwürden.Der Vorteil der auf der Dichtedes Datensatzes basierenden räumlichen AufteilungimGegensatz zu einer regelmäßigen Aufteilung zeigtsichbesondersbei derVerwendung vonAlgorithmen, die beider Berechnung Nachbarschaften in Betracht ziehen. Daskönnte insbesondere bei der Implementierung einesSolventExcluded Surface (SES) Oberflächenmodelles, einer der wichtigsten,aberrechnerischaufwändigen,Darstellungsformen solcher Datensätze, vongroßemVorteilsein. Die vonuns vorgeschlagene Datenstrukturist für Datensätze optimiert,die mehrere Millionen Punkte in einemeinzigen Zeitschritt beinhalten.

vii

Abstract

The complexityofbiomoleculardatasetsisboth high, andstill rising. Three-dimensional models of moleculesare used in researchtotest and investigatetheirproperties. Such models canconsist of several millions of atoms. Additionally,visual enhancementmethods and molecular surfacemodels are helpfulwhenvisualizing molecules.There is therefore a demand for efficientand flexible data structures to accommodatesuchlarge point-based data sets. Existing solutions in the fieldofmolecular visualization for large datasets include the use of, in most cases, regular grid-baseddata structures, as well as levels of detail. Other papers focus on repeating structures or improvingthe efficiency of surface models. We propose an octree-based data structure thatdivides space into areas of similardensity, and provides several levels of detail. Our approachisoptimized forasingletime-step, moving much of thecomputational overhead into apre-processing step.This allows us to speed up frame rates for interactivevisualizations usingvisibilityculling, leastrecently used caching based on the pre-built octreedata structure,and levelofdetail solutions such as depth-basedlevel of detail rendering. In our evaluation, we showthatlevel of detailrendering significantly improvesframe rates, especially in the case of distance-based level of detail selection while keepingthe amount of detailsinthe foregroundhigh. Both the possibilitytoreducethe resolutionand the caching strategythat allows us to only upload visible parts of the data setmakeitpossible to render datasets thatpreviously exhausted thecapacities of our test set-up.Wefound themain advantage of adensitybased octree, insteadofaregular divisionofspace, to be in neighbourhood-based calculations,suchasthe clustering algorithm required to build levels of detail. This could prove particularlyuseful forthe implementation of aSolvent Excluded Surface(SES) representation model,whichwouldbeanimportantfeature to include when developingthe framework further.

ix

Contents

Kurzfassung vii

Abstract ix

Contentsxi

1Introduction1 1.1 Background ...... 2 1.2 Motivation...... 4 1.3 Contribution ...... 5 1.4 Structure...... 5

2State of the Art 7 2.1 Frameworks andSystems forthe Visualization of Biomolecular Data... 7 2.2 Rendering of Point-based Data ...... 11 2.3 Methods forHandling Large Point-based Data Sets ...... 14 2.4 Biomolecular Representation Modelsand EnhancementMethods..... 21 2.5 Summary and Conclusion ...... 23

3Data Structure25 3.1 Overview of the Components for CPU DataHandling...... 26 3.2 The OctreeDataStructure ...... 27 3.3 Summary and Conclusion ...... 37

4Rendering39 4.1 Managing and Rendering the Octree DataStructure ...... 39 4.2 SmoothlyBlendingBetween LODs ...... 47 4.3 Molecular Surface Rendering ...... 50 4.4 Visual Enhancementfor Molecular Rendering ...... 54 4.5 Summary and Conclusion ...... 59

5Implementation 61 5.1 Libraries ...... 62 5.2 Implementation Choices...... 63

xi 6Results 67 6.1TestData ...... 67 6.2Configuring and Pre-processing the Data Structure ...... 67 6.3Rendering...... 70 6.4Enhancement Methods...... 80 6.5Conclusion and Comparison to Similar Solutions ...... 83

7Discussion 91

8Conclusion 95

List of Figures 97

List of Tables99

Bibliography101 CHAPTER 1

Introduction

Theart andscience of visualization itselflong predates the field of visualcomputing. Withinliterature,severalvarying definitions of theterm visualization exist.Broadly speaking, visualization could be described as themeaningful transformation of data or information into graphical representations.More specifically,according to Senay and Ignatius[SI94], "Theprimary objectiveindata visualization is to gain insight into an information space by mapping dataontographicalprimitives." The necessary ingredients are therefore data thatyou want to explore or explain, suitable toolsto achievethe desired illustration or rendering, and of coursecertain skills andknowledge to do itsuccessfully.Traditional visualization techniquesgenerally rely on hand-drawn 2D illustrations.Possiblythe most well-known historicalexample of 2D visualizationis an 1869 graphicbyCharles JosephMinard showninFigure 1.1a. It depicts the marchof Napoleon’s army into Russia in 1812and 1813, and according to Tufte [Tuf83],"It may well be the best statistical graphiceverproduced.”

The same frameworksand techniquesthatare used to communicate patternsorinteresting featuresinadata set, canoftenalsobeused to explore data. The term scientific visualization often referstoaspecific type of datavisualization, created in order to communicate or explore data obtained in ascientificfield.Senayand Ignatius [SI94]focus theirdefinitiononthe aspect of exploration, statingthat"Scientific data visualization supportsscientists andrelations, to prove or disprovehypotheses, anddiscovernew phenomena using graphical techniques."Anothersimilarly famous historicalvisualization, the mapofthe choleraoutbreakinLondon in 1854 by Dr. JohnSnow, is an example of data visualization being used to investigateahypothesis. Using thestatistical visualization shown in Figure 1.1b, Dr.Snowwas able to support hishypothesisthatthe epidemic wascaused by water fromacontaminated well, and convincethe authoritiestoremove the handle of thewell. While Tufte[Tuf97]pointsout thatitisproblematic to use densitybasedvisualization techniques without factoring in populationdensity, the graphic

1 1. Introduction

remains famous as it wasused to successfully explore andconvincinglycommunicatea hypothesis that waslater proventobecorrect. Ourwork is concerned withmolecular visualization,whichcan be considered aspecific branchofscientific data visualization.The definitiongiven by Senayand Ignatius is particularly relevanttoour work,whichfocuses on facilitating interactive explorationof molecular data. As we willsee in Section 1.1, there areseveral widelyusedconventions on howtovisualizemolecules usingcomputer graphics. An exampleofnon-digital 3D data visualization from thefield of molecular graphics is theoriginal so-called ball-and-stick model, developed by AugustWilhelm Hofmann.Hebuilt the models forexplaining structuralformulae out of croquet balls and presented them while lecturinginLondon in 1865 (see Travis [Tra92]).Versions of the ball-and-stickmodel are still usedincurrent molecularvisualizationsinthe field of computer graphics.

1.1 Background

Before moving on to the questionofhow we usescientific visualization in the field of biomolecules, alookatthe basicsofthe matter to be visualized providessome useful context. In theintroductory remarksofhis bookonmolecular cellbiology,Lodish[Lod00] explainsthe convoluted structuresbiological matter consists of. Withinanorganism, there are organs,whichconsist of tissue,those in turnconsist of cells, and cells are composedofatoms. All biological systems are madeupofthe same typesofatoms andare organized based on similar principles at the cellular level. Biomoleculesare dividedinto macromolecules, such as proteins, lipids,and nucleic acids, and smallmolecules. Molecules are thematerial from which cellsare built, but theyalso have otherfunctions, such as carryingmessagesfromcell to cell. Albeitavery short summary of thecomplextopicof molecular biology,this should provide some idea of whyimproving theunderstanding of what he calls "the molecules of life" is important. Lodish suggests thatinorder to learn about biological systems, we have to regard them one segmentatatime. In ordertodo that, it is helpfultohavesuitable visualizations at one’sdisposal. To makemolecular structures intelligible,including theirproperties and interactions, is the primary purpose of molecular visualization accordingtoKozlíkováetal. [KKF+17]. Theyalso provide an overview overthe historyofmolecular illustrations prior to the digitalage. Before (interactive) visualizationscouldbecreated by computer graphics, depictions were drawn by hand. Atoms were already illustrated as spheres in the17th century.In1873, vander Waals [VdW73]approximated the volumeoccupiedbyindividual atoms andmoleculesusing experiments. The resulting approximateradii for chemical elementshave been used to visualize atoms and molecules ever since. More elaborate atomicmodels describedinthe early 20th century leadtosuggestionsofmore detailed illustrations.However, the majorityofrecent papers on molecular visualizations we reviewedinthe contextofthis work still use vander Waals spheres as thebasis of their visualizations. According to Francoeur [Fra02], theearliest efforts to usecomputer graphics to create

2 1.1. Background

(a) Thevisualizationrefered to as "Napoleon’s March"designed by Minard in 1869 is amuch cited exampleofdataorinformation visualization.(From Tufte [Tuf83], who considersthistobe an excellentstatistical graphic)

(b) Dr. Snow used amap-based visualization of (c)Illustrations of the so-called ball-and-stick apart of London that washit particularily hard molecular model made from croquet balls that by the choleraepidemic in 1854 to convince were introduced by Hofmann in 1865. (From authorities of hishypothesis thatthorigin of Travis [Tra92]) 3 theepidemic wasacontaminatedwell. (From Tufte[Tuf97])

Figure 1.1: Historical examplesofdata visualizations 1. Introduction

interactive representations of molecular structures were made by ateamaroundCyrus Levinthal at theMassachusettsInstitute of Technologyinthe mid-1960s.Techniques in the subfield of theinteractive visualizationofbiomolecularstructures have mostly been developedduring thelasttwo decades (see Kozlíkováetal. [KKF+17]).Marschner and Shirley [MS15], definecomputer graphics as "anyuse of computerstocreate and manipulate images", which includestechnicalillustrations and animations.When the aimisaninteractiveapplication, as is the caseinour work, the particular challenge is to createthe images within the required timeframe, thatisgenerally in real-time. Martin [Mar65]pointsout thatdifferent authorities define theterm real-timedifferently. So rather thanspecifyingany particularframe-rate, he defines areal-timesystem as one which receives andprocessesdata, andreturns the results "sufficiently quickly to affect the functioning of the environmentatthattime.", whichisarather system-oriented definition. Laplante andOvaska[LO11]introduce the term with the claim thatmost people wouldunderstand real-time to meansomething alongthe lines of "atonce" or "instantaneously". As an intuitivedefinition, they call asystem interactivewhen,inorder to avoid system failure, it needstoprocessinformation withinaspecifiedinterval, which is similartothe definitiongiven by Martin [Mar65]. Theliteraturewereviewed, generally usesthe term real-time without giving aprecise threshold in termsofframes persecond (fps), but rather provides the reader with the achieved frame-rates.Asanapproximate guideline, we can pointtothe speed at whichthe human eyecan process images, which is 10-12 imagesper second, or thespeeds used in filmmaking history, 16 or 24 frames persecond (see Read and Meyer [RM00]).

1.2Motivation

Thebasic question of whybiomoleculesare worth investigatingshould, amongother possibleanswers, be answerable by the simple fact thatthey arethe buildingblocks the human bodyismade of. When it comes to whyweneedvisualizations,Jones et al.[JJS05]point outthatthere arechemicalphenomenathatrequire visualization tools to investigateand describe them, as they are not obvious withoutvisualization. They also note that such tools can be used to visually communicate complex molecular interactions and dynamics that are difficult to describeusing only words. In particular, theinteractivevisualization of complex molecular dataishighly valuedbydomain experts. Craigetal. [CMB13]emphasize that interactive animations can communicate structural information much more effectively thanstatic images. An importantcomputational aspectofthe complexityofmolecular dataisthe amount of data to be processed at interactiverates. Molecular structures can containmillions of atoms, creating very large data sets. Forexample, the largest single molecule we useinour tests,contains2,440,800 atoms, while ourlargestartificially createdtest set contains 122,040,000 atoms. As we will seeinChapter 2and Chapter 3, there are certain limitations to the amountofdatathatcan be processed at the same time. While the exact constraints depend on the system, the fact thatthere is alimit remains true across hardware and software capabilities.

4 1.3. Contribution

According to our research, atoms in most molecular datasets are represented as point- based data(see Chapter 2). It is thereforepossibletotakeinspirationfromexisting solutions forlarge point-baseddata sets. However, moleculardatasetsposesome specific challengesthatare not necessarily relevant forother point-based data sets.Inparticular, several differentsurface models formolecular representationshave been proposed and are requested by domainexperts. Notable surfacemodels include the SolventAccessible Surface(Lee andRichards [LR71]),SolventExcluded Surface(Richards [Ric77]),Ligand ExcludedSurface (Lindowetal. [LBH14]),and the GaussianSurface (Blinn[Bli82]). These are, in variousdegrees,computationally expensiveand partially dependent on neighborhood queries (formore detailsee Chapter 2and Chapter 4orrefer to Kozlíková et al. [KKF+17]).Our aim is to implementadatastructure forpoint-basedmolecular datathatisefficientfor neighborhood queries,and flexible in rendering the data.

1.3 Contribution

We propose ahierarchical multi-resolution structure based on an octree, andlevels of detail. Thedata structure, including levelsofdetail, is built in apre-processing step once,and thenstoredfor later use. All levelsare therefore available at runtime,allowing smooth interpolation between resolutions.Manyofthe octree-basedsolutions we came across in our researchdivide the data setinaregularway and use theinternalnodes of theoctrees to store differentlevels of detail.Incontrast to this approach, we divide the data set spatially on the originallevel of detail.The spatialextentofanindividual block or node in thetree is determined by the number of data points within it. Ablock in a sparse region can thus have alarger spatial extent or boundingbox thanone in adense region. We store the datafor all levels of detail in theleaf nodes.Thisapproach combines manyofthe advantages of regulargrids, suchasthe possibilityto"pickand mix" different resolutions for individual blocks, with advantagesofhierarchical spatialdivision,for examplegreatercontrol overthe amountofdatastoredineachleaf. We demonstrate the use of our modular structure foroptimization strategies including coarsevisibility culling,mixed levels of detail, depth-dependentlevel of detail, and datamanagement viaaleast recently used (LRU)cache.Asour basic surface rendering model, we provide an implementation of theGaussian SurfaceModel as proposed by Bruckner [Bru19]. In addition,weincludeenhancementoptions such as depth of field and ambientocclusion. The project is implemented in C++and OpenGL.

1.4Structure

Afteranalyzingthe currentstate of theart in Chapter 2, we first introduce ourdata structure(Chapter 3),and then go on to rendering aspects of ourimplementation (Chapter 4). In thechapter concerning the data structure, we focus mainlyonthe CPU side of theimplementation, explainingthe divisionofdata into an octree,and the clustering-based level of detail calculation.The chapter on renderingincludesbothGPU and CPU components, including the LRUcache,level of detail blending,and screen-space

5 1. Introduction

enhancementand surface methods.Inaseparate implementation chapter(Chapter 5), we provide additionalinformation about libraries, choice of important data types, and other factorsthatconcern theconcreteimplementation in C++/OpenGL,rather than algorithms andconcepts.Finally,weprovide numerical and visual results (Chapter6), and discussapplications and limitations(Chapter 7)

6 CHAPTER 2

State of the Art

In our work, we aim to renderlarge setsofbiomolecular data at interactiverates, and buildaframework thatallows the efficientcalculation of neighborhood-based algorithms. Rendering biomolecular datahas along history in visualizationand computer graphics. Individualatomsare usually defined by theirposition and rendered as spheres,sothey canbeconsidered point-baseddata. Similar datasets based on points are also used forothervisualizationtasks, for examplefluid simulations. Theindividualdata points are commonly called particles,especially whenthey are theresult of asimulation.We start this chapter by givinganoverviewoverprevious work in the field of biomolecular visualization. In thefollowing section, we takeacloserlookatpoint-based rendering methods. As our goal is to enablereal-timerenderingoflarge datasets, we look into previous work specificallytargetinglarge point-based data. Here, we payparticular attention to theproposed datastructuresand level of detailmethods.Inthe finalpart of this chapter, we coverresearchonthe representation of biomolecular data, as well as enhancementmethods for point-baseddata.

2.1 Frameworks and Systems forthe Visualizationof Biomolecular Data

Thedata we visualize is biomoleculardataconsisting of up to acoupleofmillions of atoms. Our aimisinteractioninreal-time, and in our current implementation, we focus on asingle time-step.Inthis section, we look at implementationsthatpropose solutions to very similarchallenges.Inarecent state of theart report, Kozlíkováetal. [KKF+17] give an extensiveoverview overdevelopments and results in the field ofvisualization of biomolecular structures. Manyresearcherspublishthe results of their work in the form of toolkits.Itisoutside the scopeofthis thesis to compile acomprehensivelist, so we only mention some of the

7 2. Stateofthe Art

most well-knownand relevant availabletools. Thefollowing papers giveamore detailed overview over toolkits: Chaventetal. [CLK+11]focus on toolsand implementations that use theGPU to accelerate the visualizationoflarge molecular datasets.Johnson and Hertig[JH14]provide awell-structured overview overtools and methodsfor thevisual analysis and communication of biomolecular datafromauser’s pointofview.Dubbeldam et al. [DVVC19]present asurvey of existing (bio-)chemicaltools and visualization software. An early exampleofatoolkit formolecular visualization is RASMOL, amolecular visualization tool presentedbySayle and Milner-White.[SMW95]. Oneofthe most frequently cited tools is VMDbyHumphrey et al.[HDS96]. It is mainly used for molecular dynamics simulations. BothChimera by Pettersen et al. [PGH+04]and PyMOL by Schrödinger et al.[Sch15]are popularextendable frameworksfor molecular datamostly written in Python. Herraez[Her06] proposesJMol as atooltoinvestigatebiomolecular data,but also emphasizes possibilities to integrate it into educationalweb pages.Paraview by Ahrens et al. [AGL05]isfrequently usedfor large molecular datasets,though it is a visualization tool made to handle large point-basedvolume data sets generally.Megamol by Grottel et al. [GKM+14]isanother framework withthe capabilitytorender different kindsofpoint-baseddata, though itsfocus is on molecular data. Theyuse acombination of GPU rasterization, ray-casting of glyphs, and image-spacefiltering to interactively render millionsofatoms. Though we focus on asingle time-stepinour implementation,molecular dynamics simulationimplementations often applysimilaroptimization strategies to ours, as they usually have to handle very large data sets.The main difference is thatthe trade-off for computationally expensivepre-processing is more problematicwhenrendering animations. Hao et al. [HVS04]for example choose to useasimpler data structure,asthey aim to render multiple time stepswithout asignificant amountofpre-processing.Instead, they focus their effortsonocclusion culling schemes and levels of detail. Max[Max04]renders largemoleculardatasetsatinteractiverates. He also focuses on asingle time step, and achieves interactivitybyusingahierarchical level of detail data structure.Heuses individual atoms as leaves in their tree.Incontrast,weuse leafnodes to groupthe data into neighborhood clusters. Siggetal. [SWBG06]use GPU splatting of molecular data. Their implementation is somewhat similar to ours, as they use the same kind of data and visualize it basedonvdW surfaces.However, they do notuse anyadvanced data structure andonly renderdata sets up to severalthousands of atoms in real-time. Sharmaetal. [SKNV04]presentanapproachtointeractively visualize large atomistic data sets. Theyuse ahierarchicalview frustum-culling algorithmbasedonanoctree data structure to remove atomsoutsidethe field-of-view.Then,they select atoms which have ahighprobabilityofbeing visible.The selected atomsare rendered as spheresat various LODs, with aresolutionderived from an exponentialfunctionofthe distance fromthe viewer.They do notinclude anyadvancedsurface visualizationtechniques in their viewer. Their octree consists of three levels. Eachoctree node containsthe coordinate bounds of its subspace.Anode at level2containsapointertoastructure

8 2.1. Frameworksand Systems forthe Visualization of Biomolecular Data

Figure2.1:Octree data structure by Sharma et al. [SKNV04]thatdivides the data set into threelevels of detail, as illustrated in theirpublication thatstoresthe atomdatawithin its bounds,asillustratedinFigure 2.1. This is asimilar approachtoours, thoughwedonot use afixed number of levels. Instead, we subdivide eachregion until no node containsnomore than thegiven maximum amount of atoms.

Grottel et al. [GRDE10]presentavisualizationframework formolecular dynamics simulations. They choose asimple data structureand ratherfocus on optimization strategies. They employseveral differentmethods: data quantization, data caching in videomemory,and atwo-level occlusion culling strategy involving hardwareocclusion queriesand maximum depth mipmaps. As molecular dynamics usually implies time- dependent data sets, aparticular aim is to alleviate the datatransfer bottleneck.Lindow et al. [LBH12]render large molecular data sets by making useofrepetitivecomponents found in molecular data. According to the authors, theirwork is closelyrelated to thatof Sharmaetal. [SKNV04], andGrottel et al.[GRDE10]. Theystore atomsofindividual componentsinagrid. The gridisthen storedinathree-dimensionaltexture. This texture is then renderedusing ray-castinginanapproachsimilar to methods proposed forvolume rendering. Materialsare renderedatatomic scale, bridging fiveordersof magnitude in scalefromthe smallest detailstothe overall structure of the data sets. Theyexploit the fact thatbiological structuresconsistofmanyrecurring molecular substructures. Eachrecurring component is represented by asingle grid. All instances of acomponentare rendered by applyingdifferent transformations to thatgrid. The

9 2. Stateofthe Art

authorsuse aray-castingalgorithmsimilar to classical voxel rendering methods which utilizes the boundingbox of thedata. Ray-castingisimplemented in the fragment . Occlusionculling isaccomplished implicitly on the atomic level, based on the underlyinggrid. Additionally,they usedeferred shading to emphasizethe global shapeof biological structures.Falk et al.[FKE13], whoextendthe algorithmproposed by Lindow et al.[LBH12], usethe same kind of data sets, whichconsistofmanyinstances of only a couple of differentmolecules. The biggest change they propose is hierarchical ray-casting. When atraversed grid cellonlycovers onepixel, the algorithm does not render thedata, but only determineswhetherthe cell is emptyornot by making atexture lookup. The same principle is usedonentire molecules inthe scene whentheir boundingbox only covers one pixel. Le Muzicetal. [LMPSV14]optimize this approachbyusingaLOD schemeand by building visualization elements on theflyrather thanusingagrid, which allows them to alter atom positions dynamically.They storeatom positions in atexture buffer, and use tessellation and geometryshaderstoemit atoms,trying to maximize the amountofatoms emitted pervertex call. In addition, they also makeuse of repetitive structures.Paruleketal. [PJR+14]focus on supporting computationally expensive surface abstractions in real-time forlarge individual molecules. They achievethatby building ahierarchybased on spatial clustering inabottom up approach. Clusters are formed basedontheir location, and eachcluster is represented by asphere with aradius thatbounds allparticles inside the cluster.For the next level, the errorthreshold is raised, and the new spheres are usedasinputfor theclustering algorithm. Theprocess is stopped, when only asingle cluster remains. Then, they combine thathierarchy with surface abstractions at differentlevels of computational complexitytobuildaseamless model. Guo et al. [GNL+15]implementaview-dependentmacro-molecule rendering framework, whichincludeslevels of detail. Theybuild theirhierarchicalmodel usinga spatial clustering methodsimilar to the one by Parulek et al. [PJR+14]and cluster atoms until onlyasingleone isleft. In order to find the mostsuitable distance metric, they compare theirobjectspaceerrorasshown in Figure 2.2. LOD selection and view frustum culling are performedonthe CPU.LikeLindowetal. [LBH12]and Falk et al. [FKE13], they use repetitivestructuresfoundinmacro-molecules to optimizeperformance. For theirGPU ray-casting,they usethe methodproposed by Grottel et al. [GKM+14].

The grid-basedapproachbyMatthews et al. [MEK+17]isoptimized forvisualenhance- menteffects such as shadows and ambientocclusion. As theirdatastructure needs to be recomputed everytime the scene changesinorder to ensurethe qualityofvisualeffects, they build it on the GPU.They presentthe results of theirmethodwith moleculesofup to about300.000 atoms. It is importanttonote that quantitativeresults of frameworks aimed at optimizingcomputationally expensivevisualeffects cannot be directly com- pared to implementations that focus on utilizingrepetitivestructures or on rendering a maximum amountofsimple spheres at interactiverates. Similarly,the difference between renderingindividualmolecules and repetitivemolecular structures should be noted.

10 2.2. Rendering of Point-based Data

Figure2.2: Comparisonsofthe object spaceerrorofdifferentdistance metrics by Guo et al. [GNL+15]

2.2 Rendering of Point-based Data

Classical representations of molecular dataare based on spheres. Early examples that are still in use includevan der Waalsrepresentations, as proposed by vander Waals [VdW73], and theball-and-stickrepresentations, first used by Hoffmann [Hof65]inlecturesatthe RoyalInstitution of GreatBritain.Thoughproposedapproaches vary,theyusually fall into one of three categories.The data is eitherpolygonizedusing algorithms like Marching Cubes(Lorensenand Cline[LC87]),sampled into adensity volume representation, or spheres are rendered directly using splatting.Asplat or glyph is atwo-dimensional graphical element thatisprojectedontothe 2D viewing plane for eachvoxel or point. According to Ibrahim et al. [IWR+17], ray-castingusing per-fragmentraysbased on splatsfor eachparticle, has been accepted as thestate of the art,atleast inmolecular dynamics visualization.However, differentrendering techniques, accelerationstrategies, and datastructures have always been developedinparallel. In arecent report, Reinaet al. [RGE19] coverthe developmentofpoint-based visualization in the last decade.

Polygonalmeshes generally yield satisfying visualresults, but the approachcreatesa largenumberoftriangles, and does not scale well beyond acoupleofthousand particles. Mülleretal. [MCG03]implementbothpoint splatting, and Marching Cubes-based surface reconstruction forSPH simulations. They find the results achieved by iso-surface triangulation via marching cubes morevisually convincing. It does, however, slow down rendering,whichisnot desirableinreal-time rendering of large datasets. Hao et

11 2. Stateofthe Art

al. [HVS04]use occlusionculling to optimizerendering of time-varyingmolecules. They buildper-frame occlusionmaps on-the-fly.Atomsfromthe sorted listofpotentialatoms foreachframe are projected onto the image plane in afront-to-backorder. Insteadof checking foroverlapofthe circles representing atoms, theyuse twonested squares, one that completelyfits within the circle, whichisused to build the occlusionmap, and one that covers the entire circle to checkifthe atom has been blockedbypreviously rendered atoms. Theyalso include amethod thatchooses an appropriatetessellation resolution according to the amountofspace the atom takes up in the finalrendering. Keiser et al. [KAD+06]propose an alternativemeshing strategyusing 3D Delaunaytriangulation of particles.Bothmethodsimprove performance, buteventhe reducedmeshisstill computationally expensivetorender, compared with other representations.Researchin recentyears hasmainlyfocusedonvery large and dynamicdatasets,sotessellationhas mostly been replaced by othermethods.

Sampling particlesintoavolume allows theuse of GPU-acceleratedvolume rendering, using optimizations such as earlyray termination and emptyspaceskipping as proposed by Krüger and Westermann [KW03]. Mostmethods sample point-baseddata on some type of regular grid.Quiaoetal. [QEE+05]visualizepoint-based atomic simulation data by sampling to aCartesian grid, taking advantage of graphics texture memory.Navrátilet al.[NJB07]use aregular grid, extractingiso-surfaces using Marching Cubes, which they then render using the toolkit ParaView(Ahrens et al.[AGL05]).Theirmethod allows them to visualize severalhundred time-steps containingtwo million particles.Fraedrich et al.[FAW10]sample particle data insidethe view volume into aperspectivegrid. They compare order-dependent splatting to volumeray-casting, as well as ahybrid approach. In theirhybrid method,they add iso-surfacesfor finedetails which, according to their research, cannotberenderedaccuratelyusing splatting. Lindowetal. [LBH12]render largemolecular data sets by making useofrepetitive components found in molecular data.They store atoms of individual componentsinagrid,whichisthen stored in a three-dimensionaltexture. Thegrid is renderedusingaray-casting methodbased on the work of Hadwiger et al.[HSS+05]. Falk et al.[FKE13]extend this algorithm by hierarchical ray-casting, which omitsgrid cells thatcover onlypartofapixel. Reichl et al. [RTW13]visualizelarge SPH simulations using acompressed octree grid, streaming data asynchronously to the GPU.They perform decompression on the GPU,which allowsthem to integrate it into GPU-based out-of-core volumeray-casting. Knollet al. [KWN+14]propose amulti-core CPU methodfor direct volume rendering of particle datausing radial basisfunction kernels. They apply it to both astrophysicaland molecular data sets.Schatz et al. [SMK+16]propose ahybrid multi-scale rendering architecture for very largedata sets. Large parts of the data set are rendered as ahierarchical density volume, while finedetails are visualized using direct particle rendering.

Splatting wasone of thefirst methods used forpoint-based data, and is still relevant today. Most of theearliest works,for exampleCsurietal. [CHP+79]and Reeves [Ree83], useaform of particlesplatting to implementspecialeffects,suchassmoke. Westover [Wes89]introduces an interactive volume renderingmethod basedonsplatting. Laur

12 2.2. Rendering of Point-based Data and Hanrahan [LH91]propose hierarchicalsplatting,also forvolumetric datasets.They combine the splattingmethodwith aprogressive refinementalgorithm,based on a pyramidicalvolume representation. Rusinkiewicz and Levoy [RL00]use hierarchical splatting as apointrendering methodfor meshes. They construct ahierarchy of bounding spheres basedonak- tree to representthe surface. Extendingthis approach, Max [Max04]proposes amethodfor molecular data. Instead of an arbitraryhierarchy, the natural structure of molecular dataisused. In ordertobetterapproximate the shapes of objects, thepaper proposesthe use of ellipsoids instead of spheres. Hopf and Ertl [HE03]extend the methodtoSmoothedParticle Hydrodynamics (SPH)simulations. They buildahierarchicaldatastructure usingPCA clustering.

Reina and Ertl [RE05]build amolecular dynamics visualization framework based on the work of Hopf and Ertl[HE03]. They generate implicit surfacestorender splats directly on the GPU, thus reducingthe bottleneck between CPU andGPU.Toledo and Levy [TL04] propose ray-tracingbasedGPU implementations of newgraphics primitives likespheres, cylinders, and ellipsoids. Intersection and lighting are computedinafragmentshader based on an analytic representation of theprimitives. Bajajetal. [BDST04]alsorender primitives directly on the GPU.For spheres,they usethe texture-based methoddescribed in one of NVIDIA’s Cg Tutorials by Fernandoand Kilgard[FK03]. Siggetal. [SWBG06] propose an algorithm for GPU-accelerated renderingofquadratic surfaces andapply it to molecular data. Each quadraticprimitive is represented andrendered as asingle vertex. That allows themtocompute ascreen-space boundingbox in avertex shader with perspective correctness. In our work, we usethe same basic approach, rendering one vertex peratom.Inorder to drawthe quad representing the atomcorrectly,weuse the bounding boxofthe atomdefined by its radius and theposition of itscenter and compute the ray-sphere intersectionanalytically.

Laterapproaches focus on optimizing splattingmethods for even largerdata sets. Grib- bleetal. [GIK+07]propose interactiveray tracing for glyph-based representations on multilevelgridstovisualize large time-varyingdata sets. Theirapproachisbased on the coherentgrid traversal algorithmproposed by Wald et al. [WIK+06]. Fraedrichet al. [FSW09]sort SPH data into apage-treeand render it out-of-core. Pages are blocks of data, usually grouped by spatial proximity. To render them,Fraedrichetal. rasterize particlesplats, creating aproxy geometry based on equilateral triangles. They work with datasets containing more than 10 billion particles. Falk et al. [FGE10]propose a combination of splatting, texture slicingand ray-casting to render large dynamic particle simulations.Grottel et al.[GRDE10]also base theirvisualizationonglyphs.They propose to usesplatsand adeferredshadingpass that estimates theirnormal vectors in image spaceinstead of ray-casting individual glyphs. Another paperbyGrottelet al. [GKSE12]proposes an ambientocclusion algorithmfor molecular dynamicsbased on density information collectedinreal-time. Theybuild adensity volume by splatting a spherefor eachparticle into thecorresponding volume cell.The densityvolume is then usedtocreate the ambientocclusioneffect in object-space. In their framework MegaMol, Grotteletal. [GKM+14]renderspherical glyphs using point-based GPUray-castingin

13 2. Stateofthe Art

their basicrenderingmode. Le Muzic et al.[LMPSV14]saveall the required atom positions in atexturebuffer. Each atom is represented by asingle vertex, the center of theatom. They are rendered as spheres using splatting in the fragmentshader. In order to test their framework, they created ascenecontaining30billionatoms, whichtheyare able to render at 10 fps using levels of detail. Wald et al. [WKJ+15]implement abalancedk-d tree datastructure forlarge point-baseddata sets. Theyuse CPUray-tracing to render theprimitives encoded inthe tree. Zirr andDachsbacher [ZD17]propose aGPU based voxelization technique forparticlesets suchasSPH data. Theyrender splats on aperspective gridand then compress the obtained information by only storingthe entry andexit surface depths foreachray.Inorder to accelerate ray-casting, theyuse ascreen-aligned implicit quadtree with differentlevels correspondingtoperspective grids withdecreasing resolution.They notethatthe approachcan be distributed across multiple GPUs. Xiao et al. [XZY17]combine particlesplatting,ray-castingand surface normal estimation techniques.Splatting is used to accelerateray-casting while surfacenormals areestimated using using Principal ComponentAnalysis.They also employGPU-based ray-casting.

2.3Methods forHandling Large Point-based DataSets

Mostpublications thatfocus on very largeparticle datasetscontaintime-steps, suchas molecular dynamics visualizationsand SPHsimulations.Cosmological data sets, suchas those used by Schatzetal. [SMK+16]containuptotrillions of particles. Our current framework is aimed at molecular datasets containing millions of atoms. An importantquestion in handling large datasets is whentoprocess data. Whether it shouldbepre-processed, streamed, or processed on demand, dependsboth on thetype of data, and on therequirements of theframework.Our implementation relies on pre- processing to build the data structure. Pre-processing has to be doneonce for eachnew data structure. As we save the calculated structure to abinary file, it cansubsequently be loadedinsteadofcalculatingitagain.Contrary to this approach, on-demandmethods onlyprocess the data the visualization requires whenever it is needed. This canslow down the application,but mayalso decrease thetotal amountofcomputations, and space necessary. Data requests canbevisibility-driven or query-driven. Streaming is similar to on-demand visualization, except that thedataisnot demanded, butsentwhen it becomesavailable i.e., when calculationsare finished. Alreadyavailable parts of the data setcan be rendered on-the-fly, without thedelayofhaving to wait for the entire data settobeprocessed. This can speed up the creationofvisualizations considerably.The disadvantage is that the user haslesscontrol over availabledata than in an on-demand based system. Large datasets are commonlydividedintosmaller sub-sets.Several terms are used, butinthe context of this work, we will refertothese sets as pages. Paging makes the amountofdataeasiertohandle,bothinthe pre-processing and the rendering stage. It is especially important forextreme-scale datasets thatare toolarge to fit into memory.

14 2.3. Methods for HandlingLarge Point-based DataSets

Datastructures thatdivide largedatasets into smallersub-sets canbeused to implement visibility culling.Blocksofdata that are entirelyinvisible can be discardedearly, saving computational cost. Sharma et al. [SKNV04]visualizedatasets consisting of up to a billion atoms. Theybuild an octreedata structure,whichthey usefor hierarchical view frustum culling. Zhuetal. [ZCW+04]propose amethodfor distributedrendering of large molecular datasets. Their approachuses agrid andfocuses on partitioning the data set anddistributing computations in ordertohandle large datasets. Gribble et al. [GIK+07]propose interactive ray-tracing of multilevel grids. Theyachieveinteractive frame rates fordata of up to 35 million particles. Freaedrichetal. [FSW09]traverse theirhierarchical datastructure everyframe in order to determine the nodes thathave to be rendered. Their test data set contains 10 billion particles, and they manage to achieveinteractiveframe rates. Grottel et al. [GRDE10]use atwo-level occlusionculling strategyfor large molecular dynamicsvisualization. Theyfirst perform per-cell culling to reduce the upload to the GPU.Afterthat, amaximumdepth mipmap is usedtocull individual glyphsinthe geometryprocessing stage. We sortour data into an octree data structure, and employapage-basedculling strategy where we discard pages,when their bounding boxisentirely outside the view frustum.

2.3.1 Common Data Structures Despitethe continuous growth of hardware capacities, datasets can becometoo large to fit into GPU memory,and forextreme-scale data even CPUmemory. In that case,the data needs to be brokenupintosmaller parts. When using paging strategies to render data sets thatare toolarge to fit into memory,partsofthe data are resident out-of-core, and have to be fetchedbefore rendering. Usually, thecurrently requireddataisrequested. Aworking set that fits into memory is then composed and uploaded or updated as needed. Manyprocessingmethodsrequire information from theneighbors of individual points. Going throughall data points in aset is computationally expensive, especially forvery largedata sets. Reducing thenumberofpoints that have to be considered in each step canspeed up renderingconsiderably.Samet [Sam90]gives acomprehensiveoverview over differentdata structures for point-based data. Whichdatastructure is used depends on thetypeofdata, as well as the priorities of the researchers.The most commontypes of datastructuresfor point-baseddata are regular grids, and different variationsoftrees.Some researchers build structures tailored to a specific data type.Bajaj et al.[BDST04]for example do not useaconventionaldata structure, but sorttheir dataintoastructurethey call FlexibleChainComplex, which is based on the hierarchy in Protein Data Bank (PDB)files.Inthis section, we focus on thetwo most common types, i.e.,grid and tree datastructures.Westart with some examplesofgrid-baseddatastructures, but focus in particularonimplementationsbased on octrees, as we usesuchadata structureinour ownwork. Haradaetal. [HKK07]dynamically constructasliced datastructure forparticle-based fluid simulations.They dividethe voxel gridintoslicesperpendicular to one axis, and define boundingboxes foreachslice. Memory forthe gridisdynamically allocated,

15 2. Stateofthe Art

Figure2.3:Data slices shown in Haradaetal. [HKK07]. Theycompareafixed grid data structure (left) with the dynamic gridthey propose (right) .

achievingimprovedcache efficiency,asitculls most of the emptyvoxels.Figure2.3 shows acomparison between afixed andadynamically allocatedgrid. In addition to presenting the conceptualdatastructure, they implementand testitboth on the CPUand GPU. Gribbleetal. [GIK+07]use hierarchicalmultilevel regular grids for particlebasedvolu- metric data. As they want to optimise theirmethod fortime-varying data sets, grids are constructed on-the-fly. Grottl et al.[GRDE10]use ray-casting on aregulargrid, whichhas thesame extentasthe bounding boxofthe data set. They employGPU ray-casting and deferredshadingwith smooth normal vectorgeneration. They arguethat the simplicityofthe datastructure allowsahigher number of occlusion queries. Fraedrich et al. [FAW10]sample particles onto auniform 3D grid. Instead of thesimulation domain, the grid is fixed to the view volume. Therefore, only theview frustumisdiscretized. Spacingbetween gridvertices along theview rayincreaseslogarithmically, resulting in a decreasingsampling rate alongthe viewing direction. Anotherimplementation that uses aregulargrid is presented by Matthewsetal. [MEK+17]. Theyinvestigatebothafixed gridand acompact grid forvisualization of molecular trajectories. Due to more efficient memory access,they conclude that thecompactgrid outperforms the fixed gridfor large proteins. Rusinkiewicz andLevoy [RL00]implement ahierarchy of bounding spheresbased on a k-d tree to represent the surface of asolid object.They useitfor viewfrustum culling, back-faceculling,alevel of detail scheme, and rendering.Pfister et al. [PZVBG00] sample geometricmodels into pointprimitives called Layered Depth Cubes (LDC). These primitives are then stored in an octree data structure. The lowest levelofthe octree

16 2.3. Methods for HandlingLarge Point-based DataSets contains theresolution acquired duringsampling. Blocks on higher levels are constructed by sub-samplingbyafactor of two. During rendering, the octreeistraversedfrom lowest to highest resolutionblocks. View-frustumcullingisperformed using the block’s bounding box. Hopf and Ertl [HE03]argue thathierarchical datastructures impose higher memory requirements, so theydecouplethe cluster hierarchy fromthe actual pointdata. They store theinformation about points in acontinuousarray, while the hierarchical datastructureonlystores pointers. In order to reducememoryrequirements, they store pointcoordinates relativetoclustercoordinates.Figure 2.4conceptually shows thehierarchy levels.The finest level, shown on therightside, containsthe actualpoint information.

Losassoetal. [LGF04]propose an unrestricted octree grid to refine andcoarsen particle datasets to simulate water and smoke. Max [Max04]buildsatree hierarchy with individual atoms as leaves. Coarserlevels of detailare represented as ellipsoidal objects, and are storedatthe internalnodes of thetree.Sharmaetal. [SKNV04]visualize datasets containing up to abillion atoms. Theiroctreeconsistsofthreelevels.Nodes at level2containpointers to the atoms stored in the correspondingsubspace. Lee et al. [LPK06]use abounding tree to achieveview dependent mesh simplification as well as viewfrustum culling. Raschdorf andKolonko[RK09]compare differentdata structures forlarge particle simulationdata sets. Theypresentahybrid data structure consisting of aloose octree combined with local neighborhood lists. Reichl et al.[RTW13]visualize largeSPH simulations usingacompressedoctree grid from whichthey streamdata asynchronously to the GPU. Wald et al.[WKJ+15]adapt thebalanced k-d tree proposed by Bentley [Ben75], and use CPUray-tracing to render it. Rather thanusing thetree to store differentlevels of detail, they useittoreorderthe original data set.

We implementanoctree data structureininour framework. The resolution of the tree is determined by aspecifiedmaximumnumberofpointsper page. Contrarytomany implementations, data, including level of detail resolutions, is onlystored in leafnodes.

2.3.2 LevelsofDetail

Visualizing severalmillionatomsindividually can become quitecomputationally expensive, even whenrenderingsimple surface models. However, display resolutions limit the amount of informationthat can be shownatonce, so optimization is possible. If the view is "zoomed in", whichmeans thatatomsare visible in detail,large parts of the data setare outside theview frustum, and can be culled. In a"zoomed out" view,less or no data maybeculled, but many individual atoms are clustered in aspace thatisrendered as less thanasingle pixel.Thatmeans thatthe data structurecan be shownatalower resolution, withoutgreatimpactonthe visual result. In such casesorwhenthe only goal is to visualize the generalshape of amoleculewithout need fordetails level of detail strategies can be used.Keiser et al.[KAD+06]arguethatthe resolutionofthe chosen levelofdetail should be highenoughtoshowfine-scale surface detail, while reducing the number of particles as much as possible in order to minimize the computational overhead.

17 2. Stateofthe Art

Figure2.4: The hierarchy levels as implemented andillustratedbyHopf and Ertl [HE03]. The illustrated hierarchy is decoupled fromthe storage of the underlyingpointdata. Points arestored in acontinuous array,while the hierarchicaldatastructure illustrated in the figureonly stores pointers.

18 2.3. Methods for HandlingLarge Point-based DataSets

Figure 2.5:Level of detail construction as implemented by Fraedrichetal. [FAW10]. The data structure is constructed bottom-up.Particlesare copiedtothe next level as long as their diameter is larger than the gridsampling resolution, thosethat aresmaller are merged.

They implement amulti-resolutionparticle-basedfluid simulation. Levelsofdetail are created by splitting and merging individualparticles.

Forgeneralpoint-based data, levelsofdetail are usuallycreated by clustering data points, based on the positions and in some cases radiiofthe points. Most level of detail structures are builtbottom-up. Pfister et al. [PZVBG00]storethe resolutionacquired during sampling at the lowest level of theoctree. Blocksonhigher levelsare constructed by sub-sampling by afactor of two.During rendering,the levels aretraversed from lowest to highest resolution. Hopf and Ertl [HE03]create their LOD hierarchy using principal component analysissplits. The multilevelgrid proposed by Gribble et al. [GIK+07]also constitutes alevel of detail scheme. Theoriginal particlesare stored at the finestlevel. Theresolution of thegrid is determined suchthatthe number of cellsisamultiple of the totalnumberofparticles. Coarser levels are imposed over the previous level, where each block correspondstoM×M×Mblocksofthe underlying level. In the implementation discussed in thepublication, theyuse atwo-level hierarchy.Fraedrich et al.[FAW10] alsobuild their data structure bottom-up. As longasthe diameter of aparticle is larger

19 2. Stateofthe Art

Figure2.6:Atincreasing distance from thecamera, Le Muzicetal. [LMPSV14]skip atoms, adaptingthe radiusaccordingly

than thegrid sampling resolution, it is copiedtothe next level, as showninFigure 2.5 on the left. Particlesthathavediameters smallerthanthatare merged andenlarged to the gridspacing.Reichl et al.[RTW13]build their level of detail hierarchy bottom up in asimilarway.However, they store it in an adaptiveoctreedatastructure,rather than a hierarchical grid. Le Muzic et al.[LMPSV14]use tessellation shaderstolower the number of rendered atoms according to increasingcamera distance.Figure 2.6 shows the same molecule renderedatdifferentlevels of detail. One of thegoals istoimplementsmooth transitions between neighboring LODs. Theysort theatoms by increasing distance from the center of themolecule’s boundingbox.Based on that distance, they decide how manyatomsare skipped and increasethe radius linearly. Formolecular visualization, it is possible to usethe naturalhierarchy of biomolecular datatoimplementlevels of detail. This approachisproposed by severalpapers, such as theFlexible ChainComplex by Bajajetal. [BDST04]. In their method, eachlevel is associated withageometric representation. Atoms are rendered as single spheres,and residues represented by abounding sphere. Secondarystructuresare represented by setsofcylinders and helices.Van derZwanetal. [VDZLBI11]implement continuous transitions between threeseparate visualabstraction models: the vdW surface, the so-calledball-and-stickmodel, and acartoonrendering.Waltemateetal. [WSB14]also base theirvisualization on the naturalcharacteristics of biomolecular data.They present an interactivetool, which makes it possibletocombine themesoscopic andmolecular levelincell visualization.The result is avisualization of wholecells, with an interactive magnifierthatallows the usertoinvestigatethe structure andbehavioronamolecular level. The methodthey propose is basedoninstantly computedlocal parameterizations that are usedtomap patches of membrane structuresontoregions selectedbythe user. They render the mesoscopic level basedontrianglemeshes,and the atomic levelusing asplatting technique based on local ray-casting of spheres.Parulek et al. [PJR+14] propose alevel of detail concept thatcombinesthreedifferent surface models in one

20 2.4. Biomolecular RepresentationModelsand EnhancementMethods visualization. TheyblendSES,Gaussian kernels and vander Waals surfaces, based on linear interpolation,and implementashading scheme that allowsthem to create seamless transitions between the representations. Guo et al. [GNL+15]use avolume baseddistance metric to selectappropriate levels of detail depending on the viewpoint.

2.4 BiomolecularRepresentation Models and EnhancementMethods

Atoms in molecular data sets canbetreated likeany othertypeofpoint-baseddata. Depending on the complexityofthe scene and capability of thehardware,simple point- or sphere-basedrepresentations maybethe most suitable, or indeed only possible visualization model. In many cases, however, alot of additional insightcan be gained by using more complex models.

2.4.1 Representation Models

The vander Waals(vdW) surface is defined by the unionofspheresproportional to the covalentradius of theindividualatoms. It is simple and efficient, andthereforethe most commonlyusedrepresentation for atomic data. Additionally,itserves as abasis for many other surface models. Lee and Richards [LR71]propose Solvent Accessible Surfaces(SAS) as an extension of vdW surfaces. They illustratewhichregions of amolecule can be accessed by asolvent. Conceptually, the surface is createdatthe positionofthe probe’s center, while rolling over the surface, which is the same as extending the vdW radiusof each atom by the radius of the probe. The disadvantage of theSAS is that it inflates thesurfaceofthe molecule. Richards[Ric77]addresses this issue with the introduction of Solvent ExcludedSurfaces(SES). Insteadofusing the center of the solventsphere, its boundary is used. Thismeans thatthe volumeofthe resultingmolecular surface is closer to thevdW surface, while still illustrating accessibility. SolventExcluded Surfaces show the boundaryofthe volume with respect to aspecific solvent.They arevery useful in the explorationand development of substances,e.g., when developingand testing pharmaceuticals. Therefore, they are included in most molecular visualization toolkits. However, the calculation of an SEStends to be computationallyexpensive.Itcan either be computed by discretizingspace, or by determiningthe implicit surface equations of itssurface patches.Lindow et al. [LBPH10]implementaparallelizedversionofthe contour-buildupalgorithm originallyproposed by Totrovand Abagyan [TA96]. Krone et al. [KGE11]also propose aparallelizedoptimization of thecontour-buildup algorithm on the GPU. They subdivide calculationtasks in order to optimize them forparallel processing. Parulek and Viola [PV12]presentamethodthatcalculates theSES while allowing the usertochangeparametersinteractively. They use local neighborhoods to compute implicit functions representing the surface.Hermosilla et al.[HKG+17]develop agrid-based GPU implementation using ray-marching, which allowssmoothtransitions between differentlevels of detail. As one of theirmaingoals is to ensure real-time

21 2. Stateofthe Art

interaction, they computeanSES using alow-resolution gridinreal-time, and refine the surface progressively in the background. Anothersurface representation is the Molecular Skin Surface (MSS), which appliesthe skin surface algorithm proposedbyEdelsbrunner [Ede99]tovdW spheres.The MSS hasthe advantage of being C1-continuous (so the derivative does not change wheretwo curvedsurfaces cross). Thedisadvantage on theotherhand is thatithas no biophysical background.While an SESapproximates asolventmolecule withasphere, Lindowet al. [LBH14]propose to usethe ligand’s actual vdWsurface instead,showing the surface thataparticular ligandcan access in more detail. This so-calledLigand ExcludedSurface is even more computationally expensive thanthe SES, so it is usually onlyapplied when pre-computation is feasible.Ininteractivecontexts, the SESorevensimpler models are still preferredbymost researchers. Blinn[Bli82]proposes an approximation of molecular surfaces using aGaussian convo- lutionkernel, commonly known as Metaballs. Parulek and Brambilla[PB13]presenta modelthatclosely resembles theSES,while approaching the rendering performance of the Gaussianmodel. It is based on iterativeblending of implicit functions,and is visual- izedusingGPU-based ray-casting. Krone et al.[KSES12]compute interactivesurface representations of large dynamic particledatasets.They build avolumetricdensity mapbased on Gaussian kernels,using aGPU-accelerated marchingcubes algorithm. Structuraldetails canbeadjustedinteractively.Inour framework, we usethe method presented by Bruckner [Bru19]. It is an image-space approachtothe computationof Gaussianmolecular surfaces. As themolecular surface only needs to be computed for visible parts of the molecule,unnecessary computations can be skipped. An on-the-fly list-baseddata structure is constructed for everyframe.Tocreate the list, the atomsin the molecule are rendered as spheresintwo separaterenderingpasses. The first pass is usedtoidentify all intersecting vander Waalsspheres forevery viewing ray. In the second pass,the sphere of influencefor eachatom is rendered in the same way. Foreach pixel, alinked list of the intersections of the spheresofinfluenceisstored. Those are the only spheres thatpotentially contribute to thevisible surface. In order to minimize computationsinoccluded regions further, visibility-based raytraversalisused.

2.4.2EnhancementMethods forPoint-based Data Differentshadingmethodscan be usedtoenhance complex molecular visualizations. Color can be mapped either to individual chemicalelements, or other properties,such as which amino acid chainitbelongs to. In addition, effectssuchasdepth of field, ambientocclusion and,edge-cueing can guide the viewer’s eyetoimportantareasofthe visualization and combat visual clutter. Ambientocclusiondescribes the amountofambientlightthatreaches each pointina scene.Objects on the insideofstructures,orthat are occluded by directneighbors are darker, so ambientocclusionhelpsthe viewertocorrectlyjudge the depth of structures. Tarini et al.[TCM06]propose aset of techniques to enhancethe user’s understanding

22 2.5. Summaryand Conclusion of rendered molecules.Inorder to compute ambientocclusion,they use severalrender passes, creating shadow-maps fromthe z-buffer.They achieveacontour line effect by drawing aline around each rendered primitive. The thicknessofthe lines can be rendered dependent on the difference in depth betweenthe primitives it separates.Additionally, they propose to drawhalos aroundatoms. Thebigger thedistance between apoint in the halo and itsbackground, the more opaquethe algorithm draws it.Grottel et al. [GKSE12] implementanobject-space ambientocclusion algorithm based on localneighborhood information. Their approachisbased on theaggregation of particledataintoacoarse resolution densityvolume, which is then usedtocalculatethe ambientocclusionfactor. Staibetal. [SGG15]propose an illumination model that supports transparency,ambient occlusion, surface reflection, and ambient illuminationbasedonthe emission-absorption model of volumerendering. It runs in real-time for millionsofparticlesand is based on analytic solutions to the volumerendering integral.McGuire et al.[MOBH11]present ascreen space ambientocclusionmethod.Rather than approximating the equations governing indirect illumination in an efficientbut error-prone way,they use the full radiometricmodel until the screen-space sampling step.Simplification is still possible by using the fall-off functiontocancelexpensiveoperations. Skånebergetal. [SVGR15] aim to communicatetheir spatial arrangement,but also to visualize pairwise atom interaction strengths. In order to achievethat, they propose an analytic approachfor capturing ambientocclusionand interreflections of molecular structures in real-time.The neighborhood search required to calculate the influence of atoms, is based on agrid data structure. Thecell size canbeset anddefines acut-offdistance forthe interreflections. Matthews et al. [MEK+17]also useambientocclusiontoenhance depth, and shadows to help the user perceiverelativemotions of partsofthe protein. Their algorithm is similar to thatofSkåneberg et al. [SVGR15]. Theyuse aregulargrid to accelerateshadow casting,but findneighbors peratom rather thanper fragment. Anotherenhancementmethod widely usedtodrawattention to acertain regionorfeature, is depth of field.Depth of field is an optical effect knownfromphotography,where objectsonand aroundafocus plane are sharp, while therest is blurred. Theblurring factoriscalledcircle of confusion.Itcan be calculated based on focal length, distance, and aperture. Thecloser the focusedobject is to the camera,the narrowerthe fieldof focus. Examplesofthis methodcan be foundinseveralpublications.One example is the methodproposed by Falk et al.[FKE13]. They use an image space methodtocreate the effect. They calculate the size of the circle of confusion, whichisthen used to sample a mipmap chain of thecolor and depth buffer. In our framework, we implementadepth of field effect basedonthe work presented by Bukowskietal. [BHOM13].

2.5 Summary and Conclusion

Boththe field of biomolecularvisualizationand large point-baseddata setshavebeen researched widely and successfully.However, biomolecular visualization methods generally eitherfocus on smallerdata sets, or relyheavily on apriori structural knowledgeabout the data.Methodsaimedatpoint-based datasets on theother hand can usually handle

23 2. Stateofthe Art

extreme scale data efficiently,but do notprovide anyofthe surface methods thatare often expectedinbiomolecularresearch. We therefore conclude that there is aneed fora framework that can handlelarge datasets without anyparticular knowledge about the structure of the data thatnonetheless allowsthe useofsurface models andenhancement methods.The most common datastructures forhandling large data setsappeartobe regular grids and to alesser extent variousforms of trees,inparticular octrees.Levels of detail for point-based data are mostoften implementedusing abottom up approach to spatialclustering.Asamain surfacemodel,biomolecular visualizationmethods generally usevdW surfaces and often additionally includethe possibilityofcomputing theSES. Popularenhancement methods for point-based data include in particularambient occlusionand depth of field.

24 3 CHAPTER

Data Structure

Thegoal of our implementation is to render large sets of molecular dataatinteractive rates,inorder to help users to better understand the data. As discussed in the previous chapter,there are severaldifferentapproachestosimilarproblems. Our solution is a novelcombination of several establishedapproaches. We dividethe description and discussionofour methodology into two parts. In thischapter,wefocus on the underlying data structure, while rendering aspects of our implementation are covered in aseparate chapter. Large datasets require structure management, especially whenthe systemisexpected to perform at interactiverates. Thegoal is to reducethe sizeofthe working set, i.e., the data thatneeds to be processed at aspecific time,while keeping as much information availableasnecessary or possible.Asitisunlikely thatall details of alarge dataset need to be visible at the same time, it is possibletomanage thedata in suchaway thatonly the parts thatare currently required, are fetched and processed. Similarly, some parts of large datasets areusuallyinthe fardistance, whichmeans that they only occupya very small portion of anygiven screen, or they are at least partially covered. Therefore, renderingdata at differentresolutions can alsobepartofasolution. Implementations that target extreme-scaledata sets, suchasFraedrichetal. [FSW09], needtosolvethe problem thatthe data does notfitintothe memory of current work- stations.Inorder to make visualizationsoftheentire datasets feasible, theyhave to be dividedinasuitable way.Manyimplementations provide level of detail schemes, whichreducethe amountofdatathatneeds to be processed for agiven frame, without necessarily reducing thevisual quality.Weaim to optimizerenderingfor datasetsofup to severalmillions of atoms, without requiring additional informationabout the data set. As we will see in Chapter 6, achievableframe ratesare highly dependent on the amountofpoints or atoms that are actually within the viewfrustum forthe current frame.Therefore,camera position and level of detail, on whichthe number of visible data points depends, have agreater influence on theperformance than the number of

25 3. Data Structure

atoms in theoriginaldataset. This makesitsomewhat difficulttodirectly compare numbers.Implementations that rely on repeating instancesofmolecules,suchasfor exampleLeMuzicetal. [LMPSV14], can render up to billionsofatoms. Those that focus on graphical aspects, such as enhancementorsurface models, like for exampleMatthews et al. [MEK+17]orParulek et al.[PJR+14], do not go beyond afew hundred thousand atoms in their test sets.Whatweare investigating, is howtoclose this gap, and develop adatastructure andrenderingsetupthatmakes it possibletorender several tensof millions of atoms withoutpriorstructural information, using differentmolecular surface models, and applyingenhancementeffects. As we want our datastructure to be able to handle large datasets,weneedtodivide the data into manageable blocks. There are two mainbottlenecks where very large data sets canbecome problematic. Processing or pre-processing steps thatdepend on neighborhood queries caneasily get out of hand if thealgorithm needstovisit eachindividualpointfor alarge data set.Also, theamountofdatathatcan be processed in the renderingstep itself, is limitedbyhardware capacities. Very often, however, notthe entire data set is visible at once. Therefore, the answer to both of these issues can be aspatial division of the data.Asthe densitywithin moleculardata setsvariesspatially,wewantour data structure to provide the option to control the maximum amountofdata in eachpage (spatial block). We also want pages to containapproximately the same, or at least a similaramountofdata. Our final choiceofdatastructure is adensitybased octree with pre-calculated levels of detail.

3.1Overviewofthe Components forCPU DataHandling

The main contribution of our work is adata structure that divides largedatasets into manageable blocks, which facilitate efficient calculationsofneighborhood-based algorithms. Our design is basedonacore CPU-baseddata structure,aswell as aset of other components thatmanage dataand rendering.Figure 3.1illustrates themain componentsthatare involved in handling the data and constructing the structure. Theblackarrows illustrate howcomponentsare related. Graycomponents above the dashedline are mainly concernedwith rendering and front-end operations, which are covered in Chapter 4. In this chapter,wefocus on components responsible fordata managementonthe CPU, which are illustratedbelowthe dashed line in Figure3.1.The protein is part of thescene component,whichisinturn part of the viewer component. At anygiven time, there is only oneinstance of each of thesecomponents. The viewer contains renderers. The renderer responsiblefor drawing the datapoints that represent atoms or clusters,iscalledsphere renderer. There can be severaldifferenttypes of renderers.Their displaymethods are calledconsecutively. We useaseparate rendererto render bounding boxes. Data from the .pdb file (Protein Data Bank, Berman et al.[BWF+00]) is read intoour frameworkinthe protein component. As we expect our implementation to be able to handle largedata sets, we divide them spatially.The page componentrepresentsthe

26 3.2. The Octree Data Structure

Viewer SphereRenderer

Scene draw Rendering Data management pdb data Protein

Page Cache

Figure3.1:The main components involved in handling the data structure. Blueindicates where the actual (per point) dataislocated. The blackarrows show which components containeachother. building blocksofthe resulting data structure. We store asingle page, the root of the octree, in theprotein component. As we want to control themaximum number of atoms on asingle page,the depth of theoctree dependsonthe densityofthe data structure. Each page thathas more thanaspecified amountofatomswithin its bounds is subdivided and stores pointers to its eight child nodes. The actual datapoints are stored in the pages that are leaf nodes. This allows us to mixpages of differentspatial extent,but with asimilaramountofdata, which we takeadvantage of in our cacheimplementation. The cache componentmanageswhichdata is uploadedtothe GPU.Itimplements aleast recently used cache whichisresponsible foralistofvisible pages and for replacing the onesthat are no longer needed.

3.2 The Octree Data Structure

Ourmain requirements foradata structure arethat it has to dividethe data in away that makesitpossibletohandledataefficiently,i.e., by reducing bottlenecks, as well as make computationally expensiveneighborhood-based calculations feasible.Wepropose an octree-baseddata structure. It divides thedataset spatially into hierarchically organized units, the pages. Within these pages,wesavethe data at differentresolutions, or levels of detail. The datastructure is built in apre-processing step.Inthis section, we explain the design and implementation at the coreofthe data structure, the pages and the octree

27 3. Data Structure

they form, as well as the construction of the levelsofdetail. Structuring available dataintoblocksthat aremanageable,bothintermsofmemory, and thecomputation of computationally expensiveneighborhood queries, can be achieved in several ways. Foracomprehensiveoverviewofdata structuresfor point-baseddata, we refer to Samet[Sam90]. The mostcommon choicesare regulargrids, and hierarchical tree-based datastructures. Severalvariations of both have been testedand proposed. A griddatastructure is asimple andfast waytodivide space. Regular and perspective gridsolutions are very common in implementations that render theirdatabysampling it into avolume(forexampleQiaoetal. [QEE+05], Navratiletal. [NJB07], Knollet al.[KWN+14], etc.).They are alsousedinsome related implementationsthatrender atoms as spheresdirectly on the GPU (Grottel et al.[GRDE10], Lindowetal. [LBH12], Falk et al. [FKE13],Matthews et al.[MEK+17]).The disadvantage of regular grids is that they allowlittle controloverthe amountofdata in eachblock.Itisofcourse possibleto definethe size of thecells by specifying amaximum number of points in each cell instead of spatial units.The drawback of adaptingthe cell sizetothe densestareasinthe data set is thatitresults in an unnecessarily fine division of space in sparse regions, especially if datapoints are distributed unevenly.Inour framework, we opt for ahierarchical data structure, as it provides greater control over the division of space. Tree-based data structuresare anothercommon choice, especially for implementations thatincludelevel of detail schemes. Several variations of trees have been proposed in the contextofpoint-based rendering.Though octrees are mostcommon,k-d trees (Rusinkiewiczand Levoy [RL00], Wald et al. [WKJ+15]) are also used. The child nodes of an octree node dividethe space into eight smallercubes (four squares in the two- dimensional case).Inregions with sparse data,lowersubdivisions canbeused, providing theability to controlresolution. K-d treesare also basedonhierarchical space partitioning. Whileoctreesare alwayspartitioned cubically,the splitting planes in k-d trees can be placed arbitrarily in space. Space is divided by one axis-aligned split plane perlevel of depth, recursively subdividingspace into abinary tree. Each node is split into two halves, alternatingbetween axes.Inorder to achieveabalancedk-d tree, points in the data set are usedtopartition space. Geometric primitives are usuallystored in leafnodes. In k-d trees, it is notpossibletocontrol thenumberofneighboring cellsthathavetobe examinedwhen using nearest neighbor queries, as splitplanescan be placed anywherein the node. Octrees,onthe other hand, dividespace in aregularmanner. Eachnodeis either aleaf, or is equally divided into eight sub-nodes. Thismakes the divisions more predictable, which makes the datastructure easier to handle. It alsomakes octrees a good choice when amore even division of thedata set in terms of the amount of points perpage is apriority. Most solutions thatuse octree data structures in combination with alevel of detail scheme thatwehavecome acrossinour research, use the nodes throughout the octree to store level of detail data. Often, theoriginaldataisstored in the leaves, and coarser levels of detail in internalnodes.This approachcreates astructurewhichisvery similar to regular gridswith severaldifferent resolutions linked together. Whilethis can be

28 3.2. The Octree Data Structure

pdb input reads gives from Protein to Page

LODs ... creates

Figure3.2: Flow of datawhile creating the data structure. Blueindicates where actual point data is saved. Theprotein component managesthe data structurewhichconsists of nested page components.

very efficientinmanysituations,the disadvantage is that pages or gridcells at different levelshave different spatial extents, which makes mixinglevels of detail more difficult. It alsomakes it harder to achieveasimilar amountofdatapoints pernode. We usethe hierarchical divisionofspace somewhatdifferently.Insteadofstoringcoarserlevels of detail in the internalnodes,westore alllevels of detailinthe leaves. Space is divided depending on atom densityinthe original data. Internal nodessimply contain pointers to their children.

3.2.1 Dividing the Data Set into Pages In apre-processing step, the setofdatapoints is processed, divided,and stored,building the data structurethat is thenusedthroughoutinteractiverendering.Figure 3.2 illustrates the flowofdata through thecomponents.The proteincomponentreads in the original data.Atthis stage, the data foreachpoint consists of aspatial position and aradius. The protein stores asingle page, which servesasthe basis for the octree, and which is filled with data andsub-pages in the pre-processing step. The page componentis responsible forboth subdividingdata, and calculatingclusters to build levels of detail. We readatom positions, and informationabout moleculesinthe Protein Data Bank (PDB)file format. Foreachatom,weget the spatial position, and theelementname from thefile, whichwethen look up in atable in order to get theradius. Thisinformationis stored in afour-componentvector. Whenbeingread fromthe .pdb file, the atoms of such asingle time-step are stored in one vector. This vectoristhen handedovertothe basic page,the root of theoctree, where the data is divided into sub-octrees,according to the specifiedmaximum number of atoms perpage. The maximumnumberofatomspossible in our currentsetup is 32,768 (215). Thisisdue to numerical limits whichare explainedinSection 3.2.3. We did notfind the limit of 215 atoms problematic, as theclusteringalgorithmweuse to build ourlevels of detail is dependentonneighborhoodcomparisonsand becomes very inefficientfor larger pages.

29 3. Data Structure

Algorithm3.1: Spatial datadivision. If the number of points in apage exceeds the maximum number of allowedpoints,itgetseightsub-pages andthe data points are divided according to their spatial position. Thisisdone untilall pages fulfillthe size limit. Input: Maximum number of atoms perpage max,current page p containing vectorofatoms, lowerBound and pageSize 1 if size of atoms >max then 2 Create array of 8 subP ages in p foreach a ∈ atoms do (a.xyz−lowerBound) 3 pos = pageSize 2 insert a into vectorofatomsinpage at position pos in subP ages 4 end 5 foreach page ∈ subP ages do 6 run spatial datadivisionalgorithm 7 end 8 end 9 else 10 return atoms 11 end

The pseudo-code shown in Algorithm3.1 sums up the divisionofspace into pages. Starting with the basic page savedinthe protein component, we checkifthe lengthof the vector of atoms assigned to it exceeds themaximum allowednumberofatomsper page. If it does not, thealgorithm ends.Ifitdoes, we create asub-octree, i.e.,atwo by twobytwo arrayofsub-pages thathave half the page size of theparent-page in each direction. Then, we iterate overall atoms,and sort themintothe sub-pages according to the formula giveninalgorithm 3.1. Foreachsub-page, we recursively repeat the process. Whenthe number of atomsassignedtoapage no longer exceeds thelimit, it is considered aleaf page, and thevectorofatoms is saved, insteadofcreating sub-pages. Once the number of atoms on apage falls underthe allowed maximum, we proceed to the creation of the levels of detail, as describedinsection3.2.2. Figure 3.3 shows asimplified illustration of thesubdivisionofadataset,withalimit of 3atomsper page. The basic pagecontains more than 3atoms, andissubdivided. After thefirst subdivision, twosub-pages stillexceed the limit. Pages are subdivided depth first,sowecheck allchildren of thefirstofthese two sub-pages recursively,before moving on to checkits siblings. Whennopage contains morethan3atoms, we have created theleavesofall branches and the finalstructure is reached. Thestructureofapage can be summed up as follows. Eachpage savesits position, size, lowerand upper bound.Aleafnodeadditionally storesanarray containing the atoms that fallwithin its bounds at differentlevels of detail. Inner nodes have pointers to their child nodes.

30 3.2. The Octree Data Structure

(a) Basicpage (b)Subdivision 1 (c) Subdivision2 (d) Subdivision3 (e) Finalstructure

Figure 3.3: Illustration of page subdivisionwith max. 3atoms perpage

Figure3.4: Octree structure for the molecule 3j3q.The blue lines showthe bounding boxesofpages. In denser areas,more subdivisions are used.

Figure3.4 shows the leaf nodesofthe hierarchicaldatastructure for themolecule 3j3q, whichcontains 2,440,800 atoms, making it the largestindividualmoleculecurrently availableinthe Protein Data Bank. The maximum number of atoms perpage is set to 32,768, the largest number possibleinour current setup. The figure showsthe molecule at theoriginal resolution. Additionally,the bounding boxes of the pages are illustrated in blue. In denserareas, themoleculeisdivided into smallersub-pages while emptyand sparse areas are not subdividedany further.

3.2.2 Calculating LevelsofDetail Levelsofdetail are acommon methodtoincrease flexibility, andimprove performance.For point-baseddata, they are usually calculated bottom-up, using avariation of hierarchical

31 3. Data Structure

clustering. In order to calculatethe clustersthatform our coarserlevels of detail, we use the hierarchicalclustering implementation by Müllner [M+13]. Several researches (Max[Max04], Bajajetal. [BDST04], Vander Zwan et al. [VDZLBI11], Waltemate et al. [WSB14]) propose usingthe naturalhierarchy of molecules,ratherthan purely spatialapproaches.This methodprovides the user with additional structural information, as well as anumberofdifferentlevels of detail. However, the approach is notparticularlyflexible, as it relies on very specificinformation, andcannoteasily provide intermediate levels of resolution. The main goal of our LODimplementation is to optimizeperformance, and enable us to render larger data sets, notnecessarily the integrationofadditional structuralinformation.Thatiswhy we chose themore flexible approachofhierarchicalclustering.Structural levels of detail could be consideredin addition to our existingLOD schemeinafuture developing step. Bothour owntests, and the results achieved by Paruleketal. [PJR+14], showthat density-based clustering does not perform well on thetypeofdataweuse.Atomswithin most moleculesdonot show asignificantvariation in distribution.Therefore,density- based clustering often resultsinonlyasingle clusterfor alarge partofthe molecule. Hierarchical agglomerativeclustering,suchasthe fastcluster algorithm proposed by Müllner [M+13]whichweuse, is more suitable. Guoetal. [GNL+15]argue thatthe algorithm has twolimitations: according to theirexperiments, it is difficult to findthe most suitable parameters,and it requireshierarchical abstraction. Theyuse avolume- baseddistance metric in order to reduce the screen-space error. However, we found the resultssatisfying, and thecontrol over parameters sufficient. Hierarchical agglomerativemethodsclusterinput data basedonadissimilarityindex, starting with eachpointinasingle cluster.Usually, the indexisstored in amatrix, which is by definitionreflexiveand symmetric. Another possibility is the so-calledstored data approach, whereinput points are handed to theclustering algorithm in vectorform, and dissimilarity is specified implicitly.The algorithm by Müllnerprovides two options to define the size of clusters.The user caneitherspecifythe desirednumberofclusters, or acut-offdistance. Algorithm3.2 showsasimplifiedpseudocode of thehierarchical clustering algorithm. We calculate the dissimilaritymatrix based on theEuclideandistancebetween two respectivepoints. Additionally,the algorithmeither needs to know acut-off distance, or thenumberofdesired clusters. In order to specify areasonable number of desired clustersmanually,one would needsome information about thestructureofthe molecule. Specifying acut-off distanceismore flexible,asthe user does notneed anyparticular information beforehand. Therefore, we found it more suitable forour needs and avariety of molecules. In our implementation, we useacut-off distance of c =2l,where l is the level of detail being calculated. Müllner [M+13]implementsfour modesoflinkage forhierarchical clustering: Single- linkage (nearest neighbor), Complete, Average and Median.Figure 3.5 shows the results of the differentdistance calculation formulas for the molecule2btvatlevel 1and 5. The

32 3.2. The Octree Data Structure

Algorithm3.2: Fastcluster by Müllner [M+13]based on anearest neighbor approach. The algorithm needs apair-wise dissimilarity (distance) index, as well as thenumberofpoints to be clustered,and returns alist of labels that dividethe points into clusters. Input: Apair-wise distance index D,the number of points to be clustered C Output: Output list L,numberofpoints percluster P 1 createlist of nearest neighbors nn calculate priorityqueueofindices Q,mindist as keys for j ← 0 to N − 1 do 2 a ← min elementofQ b←nn[a] while dist(a, b) = mindist(a) do 3 recalculate nearest neighbours 4 end 5 add (a, b)toL update C remove a from Q for remaining elements do 6 update distance matrixusingformula3.1 7 end 8 for remaining elementswhere x

n d(I,K)+n d(J, K) I J (3.1) nI + nJ

Algorithm 3.2 shows apseudocodeofthe hierarchicalclustering method. Foragiven set of nodes, apairofclosest points is determined. Those nodes n1 and n2 are then joined

33 3. Data Structure

(a)Averagelevel 1 (b) Median level 1 (c) Single level 1 (d) Complete level 1

(e) Average level 5 (f)Median level 5 (g) Single level 5 (h) Complete level 5

Figure 3.5: Levelsofdetail for the molecule2btv

Clustering Methods 7 35000 ts

6 30000 in te po la

5 25000 f cu r o al 4 20000 c be to

s 3 15000 num nd

2 10000 g co in lt

Se 1 5000 su

0 0 Re AverageMedianSingle Complete Method

Level 1 Level 5 Time

Figure 3.6: Numberofatomsatlevels 1and 5and buildingtime,showingthe number of resultingclusters for2btv forlevel 1(red) and 5(green) as well ast thetimeittakes to buildthe data structure using thatmethod (blue).

34 3.2. The Octree Data Structure

(a) Full resolution (b) Level1 (c) Level2 (d) Level3

(e) Level 4 (f)Level 5 (g)Level 6 (h) One sphere perpage

Figure3.7: Levels of detail forthe molecule 2btv. In the original resolution in thetop left,the molecule contains 49,061 atoms, while level7on thelowerrightcontains two data points, one perpage.

into anew node, whichreplaces the twooriginal nodes. For n1 and n2,the labelsand distance are added to theoutput. In the next step, thedissimilarity(distance) from the new node to all others in the set is calculated.This is repeated untilall nodes are clustered based on the cut-off distance or desired number of clusters.The result is alist of labels for allinput points. All points thatare labeledwith the same number arepart of thesame cluster,and become asingle data pointinthe new,coarserlevel of detail. Figure 3.7shows the resulting levels of detail for the molecule 2btv. In order to create the new data points fromthe labeled clusterdata, we need to find a position, andaradius for eachnew point. Additionally,wehavetochoose one point in eachcluster that will be the "heir" to the parentonthe coarser level. The"heir" is the pointintowhichall other points in the clustertransitionwhen smoothly blending between levels. Thisinformation is necessary in order to achieveasmoothtransition between differentlevels of detail, as described in more detailinChapter 4. We choose the pointwith the shortestdistance to thecenter, i.e.,the position of the newpoint representing the cluster.Wedetermine the position of the newpoint pc by calculating n p i=1 i + the meanofall points in thecluster pc = n .Both Parulek et al.[PJR 14]and Müllner [M+13]choose aradius that covers allcontributing atoms. We found thatthis methodblows up the atom’s volumeunnecessarily.Individual data points on coarser levels of detailnolong represent anyactualatoms, but rather aclusterofseveralatoms or sub-clusters. The relevant factoristherefore howvisually similar the calculated LODs

35 3. Data Structure

(a) When covering thecenters of allpointsin (b)Using the average distancebetween points the previous level, the overall volumeofthe in as abasisfor theradius of the newsphere, molecule increases with each level. thevolume remains closer to theoriginal.

Figure3.8: Molecule 1sva. Comparingradius covering allpoints in the cluster vs average distance usingthe originalresolution(orange)vs. level 5(white)

are to the original set of atoms. Rather than letting the radiuscover all points in the cluster,wecalculatethe new radius based on theaverage distanceusing the following formula in Equation3.2,where rc is the radiusofthe data pointonthe coarserlevel of detailthatrepresents the cluster, n is the number of points in the cluster, pi is the position of apointinthe clusterwith the radius ri,and pc is the position of thenew data point, whichisthe average of all positions within the cluster.

n distance(pi,pc) + r r = i=1 2 i (3.2) c n

Figure3.8 compares our proposed methodtoaradius that covers allcontributingatoms. Theonly exceptionwemakeisfor thefinal level,whichisalways one sphereper page. In this case,wedochoose aradiusthatcovers all spheres on the previous level.

3.2.3 Storage andEncoding of Data Points Foreachpoint,weneed to store both the radius, and theparentID, whichlinks the pointand allits siblingstothe pointthatrepresentsthe cluster they belong to on the next level of detail. Thefirstthree componentsofthe four-componentvectorthatstores each individualpointare needed to define the point’s spatialposition.Therefore, all additional information has to be storedinasingle 32 bit floatingpoint.Weuse 16 bitto

36 3.3. Summaryand Conclusion

x

y

z

central point flag 1234567891011121314151617181920212223242526272829303132 encoded radius parent ID

Figure 3.9: Asingle data pointstored in a4x32 bit component vector store the radius,and 1toflag thecentralpoint in the cluster, i.e., the "heir", leaving 15 to store the parentID. This allows us to store 32,768 distinct parentIDs, whichdefines the limit of howmanyatomscan be in asingle page. Figure3.9 illustrates howthe data forasingle pointisstored. We encode the radiusto usethose16bits of storage moreefficiently. The smallest possible radius for an atom is defined as 1.0, and we assumethatevenclusterpoints of accumulated atomswill not exceedaradius of 50.0. As a15bit variable can store 32,768 unique values, an encoded radius re is calculatedfor eachpoint’s radius r using equation 3.3.This maps r to a value between 0and 32,768. In the shader, the encoding is reversedbeforeusing the radiustorender the correspondingsphere.

r − 1 r = 32768 (3.3) e 50 − 1

3.3 Summary andConclusion

In this chapter, we presented the concept of our datastructure and its place within the framework. We implementanoctree constructedbynesting page components, that encapsulatethe data points. Theextentofapages bounding boxisdeterminedbythe given maximumamount of pointinapage, as well as thedensityinthe region. Addition- ally, our data structure provides levels of detailthatare calculated using agglomerative hierarchical clustering.Levelsofdetail are linked together using aparent ID, which makes it possibletosmoothlyinterpolatebetween them, as we willsee in the following chapter,where we will alsosee some of theadvantages of dividingspace hierarchically basedonthe densityofaregion. As the individualpages can be used to limit the amount of datathatneeds to be considered whenapplying methodsbased on neighborhood queries, as well as loaded and rendered perblock,itallows us to dealwith largedata setsthat wouldbeproblematic without an advanced data structure.

37

CHAPTER 4

Rendering

Whenvisualizing molecular data, it is importanttoallowthe user to investigateboth theoverall structure, and smalldetails. Understanding the structure as awhole, requires the user to be able to interact with it in realtime. In order to make it easiertosee structuraldetails,enhancementmethods are helpful. This includes both generalvisual enhancementmethods thathelptoguide the viewerseyestoimportant details,suchas depth of field and ambientocclusion, and surfacerepresentationmethods thatare specific to molecular data. In thefirst partofthis chapter, we describehow we use ourdata structure to render data sets. In part two, we giveanoverviewoverthe enhancement methods implemented in the framework.

4.1 Managingand Rendering the OctreeData Structure

The aim of our work is to render severalmillionatomsinreal time, without requiring additional structural information, while facilitatingenhancementand surface represen- tationmethods. We use aleast recently used (LRU)cache, implemented on the CPU, to manage the data structure described in Chapter3.For anygiven frame, acertain number of pagesare visible, and needtobeinmemory.This sub-setofdata is calledthe workingset. Figure 4.1 shows the main componentsinvolved in rendering the data structure. Com- ponents in grayare responsible for back-end data handling, which wasdiscussed in the previouschapter. This chapterisconcerned withthe components above the dashed line, whichare responsible for managing and rendering the data structurethat wascreated in apre-processing step. Thesphere renderer, which is apart of theviewer, manages the rendering process. It hasaccess to theviewer, whichcontains ascene,inwhichaprotein is stored.The cache, which handles the data using pages of theoctree, is accessed by the sphere renderervia the other components. Thefigure also shows the sphere renderer’s

39 4. Rendering

updateBuffer setLevelOfDetail

display display main Viewer SphereRenderer access data Scene Rendering Data management pdb data Protein

Page Cache

Figure4.1: Rendering components. Themaincomponents responsiblefor therendering process areshown above the dashed line. Attached to the SphereRenderer arethe component’s most importantmethods.

threemost importantmethodsfor therenderingprocess, shownasellipses connectedto the componentfrom whichcontains them.

4.1.1 Memory ManagementUsing an LRUCache When rendering large datasets on ascreen with its limitedresolutionwith the entire data set in view,individualspheresatsome pointbecome so small thatthey coverless thanasingle pixel. Additionally,there is ahighchance fordata points in the back to be covered up by othergeometry.While zooming in to apart of thedataset to look at details on th other hand,partsofthe data setsare invisible as they remain outsidethe view frustum. Therefore,itishardly ever necessary to render theentire, largedata set at the same time at fullresolution. In our pre-processing step,wehave already spatially divided our dataset into pages. This allowsustolookatentire chunks of thedata set at once,instead of havingtocheck visibilityfor allindividual atoms. We alsoknowthe maximum size of eachblock,asit is possibletochoosethatwhen building the data structure. Therefore, we cannow use information about thesystem’s limits,ordecidehow much of itscapacity we want our program to use, to determine amaximum amountofpages that can be uploaded to the GPU at thesame time. If theallocated capacityisreached, adecision has to be made about which pages to discard. This is where caching comes in as amemorymanagement strategy.The cachehelps to efficiently update andmanage which parts of the data set

40 4.1. Managing and Rendering theOctreeDataStructure are needed.Data can simply stay in memory,evenwhen it is notvisible,until we run out of memory, and the spaceisneeded by another page.

All typesofcache have some limit to theirmemory.When this limit is reached, something has to be removed in order to findspace foranew entry.Dependingonthe application, there aredifferent schemesthatcan be used. Least recently used(LRU) cachesget theirname from thefact that they remove the least recently usedentry in order to make space for anew one. In this context,leastrecently used means the page thathas not been requested or rendered forthe longestamount of time. Consecutiveframes generally containasimilarsub-setofdata, so data points thathavebeen accessedrecently, have ahighprobabilityofbeing visible again in thenext frame. Pages that have been out-of-frame longest on the other hand, are more likely to be located in acompletely differentareaofthedata set,and are therefore lesslikely to be needed forthe next frame. LRUcachesare common in volumerendering(Hadwigeretal. [HSS+05], Reichl et al. [RTW13]), thoughtheyhave been usedindirectparticle rendering as well (for example Fraedrich et al. [FSW09]). According to Beyer et al.[BHP15], another common strategy is to useacombinationofleastrecently used andmost recently used (MRU) caching. In the suggested strategy,LRU is used as long as thereisenoughspace in the cache. If the working set becomes toolarge for thecache,the program switches to MRU, to reduce cache trashing, i.e.,competitionfor available slots,resultingincache misses or overwriting of data blocksthatare needed. In our implementation,wechoose to implementanLRU cache, and render themoleculeatalower resolutionwhen we run out of cache space instead.

We implement the LRUcache on the CPUside.Usually,anLRU cacheisbasedon twobasicdatastructures, one that implements aqueue,and onewhichprovides hash functionality. The implementation of our LRUcache is basedonalist, and an unordered map. As akey in the map, we use pointers to the pages. As value, we storeavector containing thelevel of detail at which the data is uploaded, as well as the position in thecache.Ifthereare unused slots availableinthe cache at the timewhen the page is added, thesavedposition is simply the position of theentry in the map. In case of afull cache, we replace the least recently used page, that is the one at thebottom of thelist with the new page, andassign the original position of thereplaced page to thenew page. Thisinformationisused when we usethe LRUtoupdatethe buffers.

Figure 4.2 shows the molecule 3j3q.The pages’boundingboxes are coloredaccording to the age of the page.Pages showningreenhavebeenaccessedrecently,while pages showninred would be the first to be replaced, in case of afull buffer and cache.

4.1.2 Updating GPU Buffers Using the LRU

On the GPU side, we use twobuffers thatstore twoneighboring levels of detail, which canbeinterpolatedinthe shader.Weupdatethem using the LRUcache, and alist of visible pages for the currentframe.

41 4. Rendering

Figure4.2: Molecule 3j3q withcoloredbounding boxes for the pages of theoctree. The pages that were least recentlyaccessed are shown in red,the most recently accessed pages in green.

To update the renderinglist, i.e., the list of at least partially visible pages forthe current frame,weperform coarse view frustum culling by checking eachpage’sbounding box against the view frustum. Apage that is entirely outside theviewfrustum, is not added to therenderinglist, whicheffectively means that we are discarding invisible partsofthe data forthe currentframe. After checking if the boundingbox of apage lies withinthe view frustum,weupdate theLRU cache for all pagesthatare at least partially visible. TheLRU cachesaves anew pointer, if the page is notyet resident. If it already exists in the cache, it reorders the page pointers in the list, thus markingthe page as most recently used. To the second buffer, we upload thedata for thenext coarserlevel of detail. We want to have twoneighboring resolutions available in the shader in order to be abletoblend levels of detail smoothly.Bothbuffers are updated eachframe,using thesame LRU cache and list of visible pages. Figure4.3 illustrates three situations that can arise when addingapage to the cache. Theupper rowrepresents the cache, the lowerrow abufferitmanages. Blue pages have already been added to the rendering list forthis frame.Inthe first case on top of the figure, there is unused space left, so we do nothave to delete currently unused data in the buffer. Instead, we assign it to thenext availableslot in the buffer. The second case in the middleshows afull buffer.However, the number of pages addedfor thecurrent frame is still lowerthanthe total number of entriesallowedinthe cache. We replace

42 4.1. Managing and Rendering theOctreeDataStructure

currently not visible currently visible 11

Cache 9 7 4 5 2 0 1 3 6 10 8 11 Buffer 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 13

Cache 9 7 4 5 2 0 1 3 6 10 8 11 14 12 13 Buffer 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

?

Cache 9 7 4 5 2 0 1 3 6 10 8 11 12 13 14 Buffer 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

most recent least recent

Figure4.3:Cache and buffer update. The first case in the toprow shows acache with available space left, in the case shown inthe middle, thecache is full, but parts of the data are not currentlyneeded, and thelast illustration showsacache that is toosmallto holdall currently visible pages. theleast recently used page, andassign itsslot, in this case number 13, to thenew page. The lastcase on thebottomofthefigure showsafull cache. The renderinglist forthe currentframe exceeds the totalnumberofallowedentries in the cache. In thiscase,we do notreplace the least recently used entry,asitisneeded in the current frame.Instead, we stop adding pagestothe rendering list. Theuser hastochoose to either reduce the resolution, or zoom in to therelevantarea of the dataset, so thatsome areas become invisible.Itshould be noted that, if the entire capacityofthe GPU is used, this extreme case only occurs whenrendering data sets of several tens or hundredsofmillions of atoms, dependingonthe system. Whichposition in the bufferwewrite the data to, is also managed by the cache. Foreach page in therendering list, we checkifitisalreadyresident in the bufferbylooking it up in thecache. If it is not,but there is still emptyspaceavailable, it is assigned aposition. The offset of anewly addedpage’s datainabuffer is defined as its position in thecache

43 4. Rendering

Render list 511312 1 ... page 3 Cache 5 21 354... original 12 position 1 offset 1: 4x100 ... offset 2: 4x50

100 300 500 700 900 max atoms: Buffer 1 100 max atoms: Buffer 2 50

currently not visible currently visible

Figure4.4:The bufferoffset is calculatedbased on theoriginal cache position and the size of buffer chunks.

multiplied by the maximum number of atoms anypage at that levelofdetail contains, as illustrated in Figure 4.4. Thisoffset remains the same until the page is thrown outof the cache. Even if apage is temporarilyoutside the view frustum, its data remainsin the buffers,where it canbeusedagain at need, without having to uploaditagain. For example,ifweadd anew page with amaximum number of atoms100 at level0and 50 at level1,and apage is added to thecache at position 4, theoffsetwhen uploading to the first buffer is 400, andthatfor the second is 200.

When we remove the least recently used page in order to makespacefor anew page, we needtoknowwhichpart of thebufferthe data associated with the deleted pageresides in. This part of thebuffer will be overwritten with thedata of thenew page. Thatis whythe cache savesthe original position of apage in the valuevector. When we replace apage, thenew pagegetsassigned theoriginal "position" of the page it replaces, andwe overwrite the oldpage’sdatainthe buffers, using theoffsetderived from theposition and themaximum number of points forapage at the currently usedlevel of detail.

Thebasiclevel of detailisthe mostdetailedresolution we want to be abletodisplay. Usually,that is the originalresolution. As we showthe entire molecule as adefaultview when we start thevisualization, thebasiclevel of detail is set to thehighest resolution at which allatomsofthe molecule can fit into the available space. The usercan later change thebasiclevel of detail in the interface. However, thatclears the LRUcache and doing so can therefore decreaseperformance for very largemolecules. The basic level of

44 4.1. Managing and Rendering theOctreeDataStructure detail determines the sizeofbuffer chunks, whichiswhy we clearboth thecache and the bufferswhen it is changed.

In addition to the basic level, we definethe renderinglevel, whichisthe level of detail thatweactually show. Itcan be the same as thebasiclevel, or anylevel coarser than that. If,for example, we have enough available space forthe buffers to viewthe molecule at theoriginalresolution(level 0),but want to view it at level 2toincrease the frame rate, we canchange the renderinglevel. Insteadofreplacingand reshaping the entire buffer, which is acostly and inefficientthing to do frequently,wesimply re-use buffer chunks and replaceeachblock with itslevel 2data when it is needed. Therefore,we onlyrecommend changing thebasiclevel to anything other than fullresolution, when the GPUactually runs out of memory.

The renderinglevel alsohas to be taken into accountwhen updating the cache and buffers. Therefore, we also save the differencebetween the basic level, and the rendering level in the value vectorassociated with acache entry. If, for example, theglobalbasic levelis1,but we uploaded the data at rendering level 2, we save the number 1, indicating thatthe resolution savedinthe bufferdiffers fromthe basic level of detailbyone level. Whenwetry to addapage to thecache, we have to checkthatvalueaswell to see whetheritissaved at the desired resolution. If it is not, we update the value vector, and return the offsetofthatpage, which we thenuse to replace the data in the cache with the new resolution.

Figure 4.5 showsaschematicillustrationofthe buffers foragiven frame.All pages showninblue arecurrently at least partiallyvisible, while pages illustrated in red are completely outsidethe view frustum.The smallerpatternindicatespages renderedata finerresolution, while the onescoveredincoarser dotsare savedatalowerresolution. The red pages without apattern are neither visible,nor residentinthe buffer, while the two red-dotted pages were visible in aprevious frame, and have notbeen deleted. Though they are not renderedfor theillustrated frame, thedata associated with these pages is still in the buffer, as thereisspace available in the buffer and cache, shown in white in the illustration.

4.1.3 Rendering for DataSets of DifferentSizes Initially whenloading adata set, theentiremolecule is in view. We renderitatfull resolution if possible, and at thelowest available resolutionifnot.The user can later adjust the position of theprotein,aswell as choose thelevel of detail. Thegoal is to makerendering as efficientaspossible. Data that actually needs to be renderedata given time has to be prioritized whenuploading datatothe GPU,while invisible data shouldbediscardedasearlyaspossible.

As our standard renderingmethod,weuse thelist of visible pagestodrawthem individually.Wealso support drawing the entire buffer in asingle drawcallinstead, but in thatcase, we do nothaveaccess to the information that links pointtotheir parents,

45 4. Rendering

lower resolution 12 higher resolution currently not visible 43 5 currently visible view frustum 7 68

910

13 12 11 31365212 9811 10

Figure4.5: The cubeonthe leftisatwo-dimensional representation of an octree data structure with different-sized pages. Blue pages are currently at leastpartially within the view frustum (green), red pages outside it.The dots representthe level of detail at which the data is savedinthe buffer. The two red dottedpages are in the buffers becausethey were previously visible.

as parentIDs arerelativetothe page, not the entire buffer, so we cannot interpolate between levels of detail. Though all individual molecules currently available at theProtein Data Bank easily fit into memoryusingour testhardware,wewantour system to be able to handle larger datasets.Therefore, we provide aset of options for very large data sets. Thesize of the bufferslots is determined by the maximum number of atoms in asingle page at thecurrent basic level. Therefore, the number of pageswecan upload also depends on the basic level of detail. By default,that is if the memorycapacity allowsit, we render the entire data set at theoriginalresolution when starting the program.The maximum number of pagesallowedinthe cacheisgenerally equaltothe totalnumber of pagesinthe current data set, and theminimum level of detailis0,orthe original resolution. However, the user hasthe option of limiting thenumberofrendered pages manually later. If theminimum LOD is for example set to level2instead,levels 2and abovecan be shown, but notthe more detailed levels 0and 1. The maximum number of pages of acertain resolutionthatwecan fit onto the GPUcan be calculated depending on available memory.Ifthe entiredata set at original resolution is toolarge to fit into memory,werun aGPU memory test to determinethatlimit for all levelsofdetail. We thenshowthe resulting limitations in the GUI, giving the user the option of choosingwhether to limit the maximum amountofpages, or theminimum resolution.Limiting thenumberofpages means that onlypartsofthe largedata setcan be shown at anytime. By default, we render the data set at the coarsest available level of detail if the original resolution is not possible. If theuserwants to see more details,

46 4.2. SmoothlyBlendingBetween LODs they have to change it backmanually. The reason we implemented it likethatrather than automaticallyshowing the highestpossibleresolution, even forvery large data sets, is thatitslows down the system.Additionally,ifthe data set is very large, and allofit is visible, it is probable thatdetails are toosmalltobeseenanyway. We find it more convenienttobeable to handle the entire data setata"fast" resolution, andchange to a moredetailed view after zooming into an interesting area.

4.2Smoothly Blending Between LODs

We want to be abletoshowmore than one levelofdetail at the same time. Figure 4.6 illustrates the transitionfromone level of detail to another.Asdescribed in Section 3.2.2, we useaflag in thevector’s fourth component to mark the pointthatisclosest to the center pointofthe cluster.The transformationfromone level to the next, happens on ascale from 0to1,where 0isthe higher resolutionlevel (level 1inthe example) and 1the lowerresolutionlevel (level 2inthis case). Between thesetwo states, all children gradually migrate towards the parentposition,i.e. the center of the cluster. Forall points exceptthe "heir", the radius decreasesfromits originalvaluetowards 0. The radius of the "heir", thatisthe pointclosesttothe center, increases fromits own original radiustothe radiusofthe parent sphere. Alternatively, onecould letall spheres growand migrate closer to eachother without singling out an heir.While thatstrategy savesthe hassle of defining an heir, it leads to issueswhen rendering Gaussian surfaces. The optimized algorithmproposedbyBruckner [Bru19], which we usetocalculate the Gaussian surface, determineswhereasurface hastobedrawn by creating an intersection list. Thesurface is only created in areas where spheres overlap. If we let allspheres in a cluster migrate towards the center, they overlap more and more, thecloserweget to the new level of detail. The more spheresoverlap, the larger the surface area.Towards the endofthe transition, all spheres areatapproximately the same position, and have the same size, leadingtoalmost 100% overlapofmanyspheres.When we reach the new level, all spheres in the cluster are removedinfavor of thesingle spherethatrepresentsthem on the new level.Once thatstepisperformed, there is no overlapanymore,whichmeans thatnoGaussian surface is created either,leading to asudden reductioninthe size of the surface. As thegoal is asmooth transition, we thereforoptedtochoose asingle heir instead. Thus, all other spherescan be graduallyshrunkand then removed, whichmeans thatright before the swap,there is usually only asingle spheretobereplaced.Evenif one or two of itsneighbors are left,they are at this pointboth tiny and within the radius of the nowenlarged heir. Therefore, theydonot visibly impactthe Gaussian surface, whichallows forasmooth transition. The datastructurecontaining thepages at all levels of detail is stored in anested structure of pages, pointers,and arrays, containingthe data in the leavesofthe octree. On the GPU, onlytwo levels of detail are availableateachgiven timefor eachloaded page.Itwouldbepossibletoupload all available levels, butasthe maximum number of datapoints perpage is limitedto215,itisnever necessarytointerpolatebetween more thantwo levels of detail within asingle page. Instead, we choose thelevel of detail

47 4. Rendering

(a) Level 0 (b)Stage1(interpolated level of about 0.25)

(c) Stage 2(interpolated level ofabout 0.75) (d)Level 1

Figure4.6: Molecule 1sva.Transition of the entire molecule between level 0and 1

thatisuploaded to the GPU perpage. As mentioned in the previous section,wehavea basic levelofdetail thatdetermines the divisionofthe buffer, andarender level that determines at whichlevel of detailthe page is actuallyrendered. The rendering level cannever be amore detailedLOD thanthe basic level.Apart from thislimitation, it canbechanged interactively by the user, or determined depending on the frame rate, or distance.The distance-based option reduces thelevel of detail with increasingdistance fromthe camera, while the framerate basedmodus reducesthe detail once theframe rate drops belowaspecified threshold.

4.2.1 Distance-based LOD Adjustment

We are trying to reduce the number of rendered spheres in ordertospeed up the rendering process, while impacting the visual quality as little as possible. Spheres that arefurther away fromthe camera arerendered smaller than closerspheresinperspectiveprojection. Additionally,they have ahigher chance of being at least partially covered by other spheres.Reducing the resolutioninthe back is lesslikely to be perceived by the user. We canchose the resolutionper page, depending on the distance between the camera and thecenter of the page’s bounding box. However, we want to avoid visible artifacts and suddenchangesbetween neighboring pages. Therefore, we additionally smoothly interpolate between levels in the shader.

48 4.2. SmoothlyBlendingBetween LODs

On the CPU,wechoosewhich level of detail to upload to thebuffers, depending on the distance to thecamera. Thebasiclevel of detail for theentire molecule lb,isbydefault level 0, that is the original resolution,but canbecoarser if themolecule is toolarge to fit into memory.The user canchoose thedistance threshold t,and the amountof tolerance in the interface. On the CPU,only the distance threshold is used. Forall pages where the boundingbox center hasadistance to the camera dp,whichisgreater thanthatthreshold, the next coarserlevel of detail is uploadedtothe buffers.Ifthe basic rendering level for the entire molecule is 1, thelevel in its secondary buffer would be 2, while for all pages beyond the threshold,level 2wouldbeuploaded to the first, and level 3tothe second buffer. The information at which level of detail the atoms of apage are uploadedisneeded in the shader in ordertointerpolate between individual points. We encode it basedonthe difference to the molecule’s basic rendering level as shown in equation 4.1.Ifthe basic level of detail is used forapage, thedifference is 0, if it is beyond the threshold andthe next coarser level is used, the differenceis1.Currently, our system only supportsinterpolatingbetween three levels of detail. However, it could easily be expanded to severallevels of detailifdatasets with greaterdepth differences required it.

0, if dp

In the shader,weuse both the distance threshold, and thetolerancetoachieveasmooth transition, using the mechanism described in section4.2.Here, we dealwith individual atoms or points representingseveral atoms, rather thanentire pages. Figure 4.7illustrates the interpolation between three levels in the shader. Thepages shown in white arethose with center points closer to the camera thanthe cut-offdistance, while the centers of the graypages are furtheraway. The dashedline in the center of the figureshows the threshold.The blue area around it,definedast±,shows the tolerance. All points thatfall withinthatarea,are renderedatthe overlapping level. If the white pages are rendered betweenlevel 0and 1, and the graypages between 1and 2, everything within the bluearea is rendered at level 1. Equation 4.2 shows howthe rendered level of apoint is decided. Foreachatom (point),wehave the option of interpolating between the two levels in the buffer, one finer, and one coarser.WhichLODsthose are, is decided on the CPU, seeequation4.1.Asdescribed in section 4.2, interpolation happens on ascale from0to 1, where 0isthe finer,and 1the coarserlevel of detail. Thecasesshown in Equation 4.2 are markedwith numbersfrom1to5inFigure 4.7.Incase 1and 5, the finerand coarser available levelare rendered respectively.Case 2, which corresponds to the thick blue line in the illustration, usesthe level the pages have in common, for example level 1inthe examplewhere we interpolate between levels 0and 2. Cases 3and 4, theareasbetween the dashed bluelinesand the solid blueline,are the areas where actual interpolation happens.The closerthe atom or pointistothe camera within the extentofthe area ,the closer to 0its interpolationresult.

49 4. Rendering

It has to be notedthat even with our interpolation system, abruptlevel changescan happenifthe tolerance area is toosmall. Figure4.7 shows arelatively small .The red linesshowwhereabruptchanges would happen,asthe pages shown in white would be rendered at level1and the graypages at level2.The correct solutiontothis issue is simplytochoose an appropriate sizefor .Inpractice, the difference is usually not visible,aspoints in the part of themolecule facingaway from the camera are mostly covered by the points or atoms closertothe camera.

 0, if dt+ (case 2) i = d−t− , if t + t+2 (case 5)

4.2.2Frame Rate-Based LOD Adjustment Oneofthe declaredgoals of our framework is to achieveinteractiverates. We implemented amethodthatchooses at whichlevel of detailthe molecule is rendered,depending on the frame rate. The ideabehind this mode is thatauser mightwish to investigatethe overall structure of avery large moleculeinrealtime,without necessarily caring about the finer details at this stage. Forlarge data sets, it mightnot be possibletorenderthe entire structure at the frameratethe user desiresatthe originalresolution. We let the userspecifyatarget framerate viathe interface. If the average frame rate,ascalculated basedonthe last64frames, is lowerthanthe specified limit, acoarserlevel of detail is chosen for theentire molecule.This method changes therendering level, notthe basic level, so the cache structure remains the same. The method containsacounterthat measures thetimethathas passedsince the last change of level. As operationssuchas updating the atoms savedinthe cachetothe new rendering level have to be performed, it takesacouple of seconds for the frame rate to stabilize after achangeoflevel. If the frame rate drops belowthe threshold, we checkthe counter to seewhether at least 10 secondshavepassed since thelast time thelevel waschanged by the frame rate based renderingmethod before changing it again. We lowerthe resolutionone level at atime.

4.3 Molecular Surface Rendering

Though enhancementmethods and surface modelsare not the main focus of ourimple- mentation, we include basic methods as aproof of concept.The atoms or clusterspheres canberendered using the vander Waals, or aGaussian Surface Model.For structural and depth enhancement, ourframework includesdepth of field, ambientocclusion, and edgeenhancement. This sectionfocuses on GPU aspects of rendering,asweimplement theeffects largelyinscreen-space. Figure 4.8gives an overview over the shader passes in the sphererenderer. Gray representpasses for optional screen spaceeffects.

50 4.3. Molecular Surface Rendering

Level i+2 Transition Level i+1

Transition

Level i

lower resolution in Buffer

Figure4.7:Illustration of the smooth transition between levels in distance basedrendering. Pages with acenter pointmore than adefined distance fromthe camera areuploadedat acoarser LOD (dotted pages). Withinaparameter-defined area around that distance, we interpolate levelsofdetail based on the distance.

SphereRenderer

SSAO

sphere spawn shade gauss DOF blend

halo

Figure 4.8: Renderpasses.Shadersshown in grayare optional, the ones in white necessary for our basic rendering process.

51 4. Rendering

The vander WaalsSurface Modelapproximates the molecular surfacebyrendering individual atoms as spheres using acenter positionand theirvan derWaalsradii. Our renderingframework is centered aroundaGaussian Surface Modelasproposed by Bruckner [Bru19],whichisbasedonvdW spheres. The surfacecalculation method is achieved in image space, so thealgorithm is output sensitive.The influence the calculationshaveonthe achievable frame rateisdetermined by the atoms on the image plane, rather thanthe size of the molecule.Thatmakes it particularly suitable forvery large structures.The basis of aGaussian molecular surface is thefollowing densityfunction:

2 −s||x−ci|| r2 ρ(x)= e i i

ci and ri are atom position and radius, s is ascaling factor. The finalsurface is aunion of the Gaussian surface and the vdW surface. Therefore, we cansmoothly blendbetween the two, adjusting the result by tweaking theparameters. In our implementation, the user can do this by changingthe sharpnessfactor andtherebythe atoms’ sphereof influence. According to Liu et al. [LCL15], it is possible to achieveagood approximation of the SES and SAS with the correct parameters. To evaluate thedensityfunctioninimage space, an intersection list is created. We define eachatom’s sphereofinfluence, whichfunctions as acutoff radius. Outside the sphereof influence, an atom does notcontributetothe densityfunction. Therefore, the density functiononly needstobecalculated in areas wherethese spheres overlap, andare not occluded.The size of thesphere of influence is determined as follows, based on the threshold t and theminimum number of contributing atomsataposition N

ln(N ) r = r t i i s

We determinethe intersectionlistinaseparate rendering pass. Thespheres of influence are rendered in thesame way as thevan derWaals sphers: asingle vertex is rendered peratom,and aquad constructed based on the sphere’sscreen space boundingbox.For vander Waals spheres, we stop after thefirst intersection, as ourframework does not support transparency. Foreachsphere of influence thatisnot occluded by vander Waals spheres, we save the necessary information as an entryinthe list. It containsnearand farintersection, center position, theindex of the previous sphereentry,aswell as the information that is usuallystored in the fourth component, i.e. radius,parentIDand heirflag. Bruckner [Bru19]proposes an adaptation of selection sort for raytraversal. As that algorithm guarantees that thefirst i elementsare correctly sortedafter the i-th iteration, and we onlyneed the first intersectionwith the Gaussiansurface, we can stoponce that

52 4.3. Molecular Surface Rendering

Algorithm4.1: Visibility-driven raytraversal, [Bru19] Input: array of count intersection pointindices ii,referringtothe entry containing information suchasnear and far about thespheresof influence 1 ss =0 for i ← 0 to count − 1 do 2 k =i for j ← i +1 to count − 1 do 3 if nearii[j]

To calculate surface intersections,weuse sphere tracinginthe interval from neari +1to farj −1.

Depth Based Surface In our implementation,the user canset the sharpnessfactor s in theinterface, which determines the extent of the surface.The radius of thesphere of influenceiscalculated by multiplyingthe radiusofthe atom with aradiusscale factor rs,whichiscalculated using thesharpness and thecontributing atoms constant ca .For astandard value of ca =32.0, theradius scalingfactor is determined as follows:

log c es r = a (4.3) s s

Ourdepth basedsurface renderingmodechanges the radiusscaling factorusing the same principles as for distance basedLOD rendering (see Section 4.2.1). Theuser can

53 4. Rendering

choose threshold and toleranceinthe interface. All spheres closer to the camera thanthe threshold are rendered using the radiusscaling factor resulting fromthe chosensharpness. In the areabetween threshold, and threshold +tolerance,the scaling factorislinearly decreasedto1,whichresults in asphere of influence of thesame sizeasthe radius of the sphere itself.Asinthe caseoflevel of detail rendering, spheres that are further away from the camera,usually takeupless screen space and are more likely to be at least partially occluded. Theyare therefore natural candidates whenselectively decreasing resolution or rendering details. Even the optimized molecular surfaceweuse is computationally expensivefor large spheres of influence. So for larger surfaces, decreasingcalculationson spheresfurther from the camera is worth the extra calculations required, whichisshown in more detail in Section6.4.3.

4.4Visual Enhancementfor MolecularRendering

Visual enhancement methods are usedtohelp the viewer to better understand both details,and the overall structure of thedata. All of theenhancementmethodsweinclude in our framework are screen space methods, basedondifferentvariations of blurring passes. In addition, we integrate an ambientocclusion library.

4.4.1 Depth of Field In photography, depth of field is aresult of lightraysbeing broken in the camera’s lens. Both photographiccameras, and thehuman eyeare limited inthe range of depth they can perceiveinfocus at the same time. Therefore, we are usedtoseeing focusasadepth cue.Two of themain reasonsdepth of field is implemented in computer graphics,are thatasdepth cue,and to make scenes and objects look more realistic. Arealistic looking molecule to the human eyedoesnot exist,given thatthey are toosmallfor thehuman eyetosee.Depth of field can,however, playanimportantrolewhen trying to understand three-dimensionalstructures, irrespectiveofwhetherornot we canobservethem in the real world. In particlebaseddatavisualization, depth of field is often used to emphasize certain parts of the structure by blurring the back- andforeground. We adapted theimplementation by Bukowski et al.[BHOM13]for ourframework. Figure 4.9 shows theirillustration of thealgorithm. It is basedontwo blur passes, onehorizontal and one vertical. In order to achievethe effect, arendered image of the scene is blurred using ablur kernel to sample and add up neighboringpixels. Twodifferentblurred images are created, one for thebackground, and one for theforeground,with astronger blurring effect. The factorthatdecideswhichpartsofanimage are blurred and which sharp, is called circle of confusion (CoC). It is calculatedusing theparameters aperture, focal length, and focal distance. We storethe calculated value in the alpha channel of the blurred texture.The finaldepthoffieldimage is calculatedinaseparateshading pass. In addition,weimplementanauto focusoption, whichsetsthe regioninfocus approxi- mately to the current mouse position. We set the focal distance to the zvalue at the

54 4.4. Visual Enhancementfor Molecular Rendering

Figure4.9:Illustration of the depthoffieldalgorithmfromthe publicationbyBukowski et al. [BHOM13] position of themouse points at thetimethe userpresses theauto focus button. Figure 4.10 showstwo examplesofdepth of fieldblurring.Both screenshotswere taken at thesame position, andwith the same focal length. Screenshot 4.10auses an F-stop of 0.7, highlighting only afew selectedatoms, while screenshot4.10b, with an F-stop of 11, onlyblursdistant atoms, while everythingcloser to the camera is in focus.

4.4.2 Screen Space Enhancement Ambientocclusion helps to better understand and illustratethe structure by darkening surfaces thatare on the inside of astructure.Our framework containsdifferent versions

55 4. Rendering

(a) Blur F-stop 0.7 (b)BlurF-stop 11

Figure4.10:Depth blur on the molecule 6qz0

of ambientocclusionand edgeenhancement. We include the HBAOPluslibrary for ambientocclusion. Additionally,weimplementtwo versions basedonablurringpass, which the the usercan tweak using parameters.

AmbientOcclusion Using the HBAOPlusLibrary The NVIDIA HBAO+ (Horizon Based AmbientOcclusion) library is ascreen space ambientocclusion library.Itisanupdatedversion of thework presentedbyBavolia and Sainz [BS09].Itismainlydesignedtoproduce realistic ambientocclusionshadowing. Thelibrary uses the depthbuffer as an approximation of thescene. Rays are traced directly in 2D, and theambientocclusionisapproximated fromthese 2D rays. They use spherical coordinates, with the zenith axisoriented parallel to the Zaxis in eyespace. Theycomputeincremental horizon angles while steppingforward along aline in eye space,and integrate the ambient occlusionbasedonthe horizon angles. In our framework,wewrapthe library in the SSAOcomponent. The componenthas its owndisplaymethod, whichwecall fromwithinthe sphere renderer, if theuser chooses to enableambientocclusion. In the SSAO component, we also setthe parameters. Figure 4.11 shows the ambient occlusioneffect achieved by the library.Realistic ambient occlusionenhances thestructure, and is therefore useful for our purposes.Itisnot necessarily the only worthwhile technique to use though, given the fact thatour goal is to make it easiertounderstand the structure at hand, ratherthan lightahuman-recognizable scene in arealistic way.

Blur-based Structure Enhancement In addition to thelibrary,weimplementscreenspace enhancementbased on two blurring methods, which achieveslightly different results.The first versionseparable Gaussian blur shader. Is is based on aanincremental Gaussian burringkernel, as described by Turkowski [Tur07], and aGLSL shader published on ablog by Hay[Hay10] 1.

1https://callumhay.blogspot.com/2010/09/gaussian-blur-shader-glsl.html

56 4.4. Visual Enhancementfor Molecular Rendering

Figure4.11: Ambient occlusionusingthe HBAO+ library

It is used in apost-processing step and takesthe outputfrom the previous rendering pass as textureinput. Themethodusestwo passes, one horizontal and one vertical,in which the textureissampled in the given direction. The sampled points are summed up and weighted using the incremental Gaussian coefficients.The incrementalGaussian coefficients are storedinathreedimensionalvectorand its initial values are calculated as shown in Equation 4.4], where σ canbechosenbythe user in the interface.

1 x = √ y =exp −0.5σ2z = y2 (4.4) 2πσ

Figure 4.12 shows exampleswith two differentsetsofparameters.The imagesonthe left are without enhancement, while the right side shows images created using theGaussian blur-basedenhancement method. In Figure4.12b, asmaller kernelsize is used, which results in more localized blurring,and therebylimits more to the edges. The image shown in Figure 4.12d is achievedusing alarger kernelsize, which results in an additional effect thatiscloser to ambientocclusion and enhancesthe structures at the core of the molecule. We also implement another type of blur-basedenhancement, whichuses adifferent type of blurring effect.The halo blur shader is based on the works of Bruckner and Gröller [BG07], as well as Rong and Tan[RT06]. Rong and Tanpropose aflooding algorithmfor Voronoidiagrams. Bruckner and Gröller use asimilarapproachtocreate halo fields to enhancevolumetricdata sets. Eachpass, aneighborhood of eightpixels is sampled for each pointinthe image. At pass n,the sampled pointis2npixels awayfrom the point under consideration. Themain differencebetween this blurring methodand theGaussian blur is that the strength of theeffect is determined by the number of passes. Conceptually, the passes are usedinasimilarway as in the jump-flooding algorithm by Rong and Tan[RT06].

57 4. Rendering

(a)Noenhancement (b)Lambda20, sigma 20, kernel 2

(c) No enhancement (d)Lambda12, sigma 4, kernel8

Figure 4.12: ExamplesofGaussian blur basedenhancementonthe molecule 5odv

The distanceatwhichwesamplethe texture depends on whichpass it is. Eachpass, we sample pixelsclosertoour targetpixel, until, during the final pass, we sampledirect neighbors.

Figure 4.13shows the results of this rendering method. With fewer passes, such as for example2passes in Figure4.13b, finerdetails are enhanced. More passes result in a stronger blur, as shown for32passesinFigure 4.13d. The result of 16 passes is shown

58 4.5. Summaryand Conclusion in Figure 4.13c. Here, theedges become thicker, resulting in an effect thatemphasizes larger structures. The disadvantage is that some details are lost in the edges.

(a) Screenshotwithout the effect (b) Halowith2passes andlambda50

(c) Screenshot without theeffect (d) Halowith16passes andlambda11

Figure4.13: Examplesofhalo-basedambientocclusion

4.5 Summaryand Conclusion

In thissection, we introducethe methodsweuse to renderthe data structure we introduced in the previous chapter.The basic method relies on an LRUcache, alist of

59 4. Rendering

visiblepages for eachframe, and buffers whichweuse to uploadtwo different levels of detail to the GPU.The cache has afixed capacityand savespointers to the pages that have been added, as well as thepages’ renderinglevel andindexinthe buffer. We also explain differentoptions forblendinglevelsofdetail and look at the special caseofvery largedatastructures. In the secondpart of thechapter, we look at surface models, in particularthe GaussianSurface Model,and screenspace enhancement methods.

60 5 CHAPTER

Implementation

In this chapter, we explain the libraries,tools and technologies used in our implementation. Thecodeiswritten in C++ and OpenGL. Additionally,weuse some libraries and base parts of ourimplementationonprevious work. Figure 5.1 showsanoverviewovercomponents and libraries.The componentsinvolved

Figure 5.1: System architecture

61 5. Implementation

in rendering and data handling have already been explained in previous sections. Here, they are shown in the context of theentire framework. The viewer is the core of all visualization tasks.Itisconnected to the datahandlingcomponents viathe scene. It alsocontains interactors and renderers.Currently,there is oneinteractorthathandlesa camera,and twomainrenderers, the sphererenderer andthe bounding boxrenderer. However, the system is expandable, and can contain as manyinteractors and renderers as necessary.Libraries are showninconnectionwith the components thatuse them. As they interact with most of thecomponents, theGLlibraries are illustrated without individual connections to components for thesakeofreadability.

5.1 Libraries

Manytasksour framework hastohandle are standard operations, forwhichweuse librarieswhen theyare notpartofthe coreofour research. In this section, we givean overviewoverthe librariesand howthey areintegratedintothe implementation.

glbindingand globjects We use glbinding1 and globjects2as wrappers forOpenGL.Globjects is an objectoriented wrapper library,whichprovidesaninterface forthe programmable graphic pipeline. OpenGL objects are mappeddirectly to C++ classes. Glbindingisacross-platform C++ binding forOpenGL.

GLM GLM3 (OpenGL Mathematics) is alibrary based on theOpenGL Shading Language (GLSL) specifications.Itimplements classes and functionsinC++, followingthe naming conventions foundinGLSL. We mostly use it for datatypes such as matrices, but also some of itsfunctions,suchastransformations to radians.

GLFW GLFW4 is an OpenGLutility library forcreating windows,contextsand surfaces and receivinginput andevents. We useittocreate our window, handle keyboard events and measure time.

ImGui We use the ImGui5 library to build our graphical userinterface. Userinterface elements are created and drawneachframe, hence the name immediate mode GUI.

1https://glbinding.org/ 2https://globjects.org/ 3https://glm.g-truc.net/ 4https://www.glfw.org/ 5https://github.com/ocornut/imgui

62 5.2. Implementation Choices

LodePNG The LodePNG6 library handles image loading. We use it in the SphereRenderer to load environment maps and in theviewertosave screenshots. fastcluster As it is used in partofour core functionality, the fastcluster7 library hasalready been described in chapter 3. It is acollection of several variations of the fast cluster algorithm. We use it in the page component, to calculate levels of detailbased on iterativeclustering.

HBAOPlus As mentioned in chapter 4, the HBAOPlus8 library is used for ambientocclusion. It is encapsulated in the SSAO component. Thatcomponent handles the parameters forthe library.Ithas itsown display method,which is called by thesphererenderer, when the user chooses to activate ambientocclusion. The display method requires the view and projection matrix, aframebuffer, as well as depth and normal texture.

5.2 ImplementationChoices

In this section, we go through thearchitecture of our implementation and discussconcrete implementation choicesfor importantcomponents.

5.2.1 DataStructure On theCPU side, the actualpoint-baseddata is stored withinnested pagecomponents, as markedinFigure5.1. The single instanceofaprotein componentwithin ourscenehandles the basic page, i.e.,rootofthe octree, via asharedpointer. std::shared_ptr<> managesshared ownership of an objectofthe type page. Theadvantage of smartpointers in C++ is that they manage thedestruction of the object they pointto. The object’s memory is deallocatedwhen the lastpointerpointing to it is either destroyed, or assigned to something else. In contrast to other smart pointerssuchasstd::unique_ptr,it allowsseveralpointers to pointtothe same object. Asingle page containsadata structure forsub-pages, andone fordata points. Foreach individual page, only oneofthem is used at anygiven time. If thepage is an internal node in the octree, it stores referencestoits sub-pages. We store thissub-octree as a2 by 2by2array of references: std::shared_ptr[2][2][2].The advantage of amultidimensional arrayoverother structures,suchasvectors, is thatwecan access them in away thatreflects the page’sspatialposition. Accessing apage in the array

6 https://lodev.org/lodepng/ 7https://github.com/dmuellner/fastcluster 8https://www.geforce.com/hardware/technology/hbao-plus

63 5. Implementation

using thepattern[1][1][1] forexample notonly gives us with thereference we need, but alsoprovidesinformationabout whicharea of theparentpage it covers.

If apage is aleafnodeinthe data structure, it does notcontainany sub-pages, so no references are stored. Instead, we store the atoms and level of detail cluster points. We store them in avectorofvectors: std::vector >. Individual datapoints aresavedasglm::vec4.Asthe points arelater uploaded to theGPU,itisconvenienttouse thedata structuresofthe GLM library,whichprovides classesand functionswith the same naming conventions andfunctionalities thatGLSL offersinC++. Allpoints at aparticular level of apage are savedinastd::vector, so there is a std::vector for eachlevel of detail. The vectors for the differentlevels are stored in another vector, resulting in the final datastructure std::vector >.The advantage of vectors compared to otherdatastructures, such as arrays, is that we do notneed to define their size beforehand. While we do knowthe maximum amountofatoms thatcan be on apage, most pages are not filledtothe maximum,which would leave unused space in an array of pre-definedsize. Right now, we calculate afixed number of levels of detail, butusing avectorrather than an array leavesthe option of extendingitifanadditional, coarser levelisrequired.

5.2.2 Cache

OurLRU cacheimplementation is basedonastd::list and std::unordered_map. The list contains a std::pair consisting of apointer to the page std::shared_ptr and a glm::ivec2 for theposition and rendering levelinformation. As akey, the mapalso usesthe pointer to the page and as avalue an iterator over the list.

TheC++ data structure std::list is adoubly linked list.Incontrast to vectors and arrays, it does not store elements at contiguous memory locations. Therefore, insertions and deletionsare more efficient,because instead of changing memory, onlypointers are changed.This alsoreflects the way we designed buffer access. While the data of individual pages are stored contiguously, onlypartsofthe memory thathavenot been usedrecently are replacedwhen necessary.Instead, we use index informationtopoint to memory locations. This trait is advantageousinour implementation, becausewepush pages to thefront of the queueeachtimethey are either used or addedtokeep track of least recentuse.

Forthe mapcomponent,weuse a std::unordered_map,whichisthe C++ standard library’s implementationofahash map. In contrast to std::map,whichisbased on abinary searchtree,ituses ahash table to storedata.The orderinwhichthe data is added to thecache is trackedbythe list, so we do notneed the entries in the mapto be sorted, nor do we needtoknowanindividualentry’s neighbors. We update the map onlyifthe cachedoesnot contain the requested page yet, or when we have to delete a page in order to free up space fornew pages.The jobofthe mapistostore the values

64 5.2. Implementation Choices associated with the key, withoutany requirements of order. Therefore, the unordered mapisthe most suitable type of map. There are twovalues associated with renderingapage. Both the rendering level and the offsetposition in thebufferare integer values. As amap onlystores asingle value perkey,wewrapthem in a glm::ivec2.Thoughastd::vector would work justas well, glm::ivec2 limitsthe data thatcan be storedtotwo integer values, whichisall we want to allow.

5.2.3 Buffers To upload data points to the GPU, we needadata structure thatdoesnot have asize limit, exceptthe actual hardware constraints. We use the globjects/Buffer.h wrapper for OpenGL buffer objects. As we want to be able to interpolate between levels of detail, we use two buffers to store pointpositions of the current basic level, and the next coarser LOD.For eachofthem,westore auniquepointer: std::unique_ptr.The initialsizeofthe buffersisdetermined at startup, andthey are resized when the basiclevel of detail is changedbythe user. An alternative would be to store position datainTexture Buffer Objects.The maximum sizeofaBuffer Object is generally onlylimited by the size of the GPU.Thoughthe size limit is usually higher thanfor one-dimensional textures, Texture Buffer Objects do have asize limitation.However, as mentioned in chapter 2, there are severalpaperswith similarchallengestoours thatdo usetextures. Each time anew pageisloaded into the buffer, we replace the data in the relevant buffer areawith the points in the new page using setSubData for both buffers. The buffer containing thebasic rendering level is boundtoaVertexArrayObject. As astandardrenderingmethod, we use drawArrays on sections of thebufferbased on offsetscorresponding to individual pages. We iterate over all pages we want to renderfor agiven frame,and call the method drawPage.The methodgets the pointer to the page we arecurrently drawing, as well as the shader to be used. We then fetchthe twovalues stored in the cache, i.e. the page’sposition in thebuffer andthe renderinglevel. We drawthe relevant part of the bufferusing drawArrays from offset to end.The offset is calculated based on the page’s position in thebuffer, and themaximumatomsper page forthe currentbasicrenderinglevel (whichcorresponds to thebuffer slotsize), andthe end is the sizeofatoms inthe currentpage, so we do notdrawthe entire slot, just the actual points. We also upload thepage position to theshader,becauseweneed it for the second buffer, whichweupload to an Interface Blockoftypebuffer, as shown in listing5.2. On the C++ sidewebind the buffer using m_lowerLOD->bindBase(GL_SHADER_STORAGE_ BUFFER, 3).Inageometry shader,weuse thepage position and parentIDtoconnect thecorrectparent to points. Foreachrenderedpoint, we need to get the parent, the point representing the cluster it belongsto, in order to achieveasmooth transition. Listing 5.3 shows the linesofcoderelevant to retrievingthe position and radius of that point.

65 5. Implementation

Listing5.1: DrawPagemethod

1 glm::vec2values=viewer()−>scene()−>protein()−>cache()−>get(page); 2 long offset=values[0]∗viewer()−>scene()−>protein()−>maxAtomsPerPage(m_baselevel); 3 4 shader−>setUniform("pagePosition",(uint)values[0]); 5 shader−>setUniform("levelUp", values[1]−m_renderlevel); 6 7 long end =page−>atoms(values[1]).size(); 8 m_vao−>drawArrays(GL_POINTS, offset, end);

Listing 5.2: Interface Block

1 layout(std430, binding =3)buffer lodBlock 2 { 3 Position positions[]; 4 };

Listing5.3: Get parentpoint

1 #define DECODERADIUS(a) ((a/32768.0f)∗(50)) 2 3 int sphereID =floatBitsToUint(gl_in[0].gl_Position.w); 4 int parentID =bitfieldExtract(sphereID, 16,15); 5 glm::vec3 pos=positions[parentID+maxAtomsPerPage∗pagePosition].pos; 6 7 sphereID =floatBitsToUint(positions[parentID+maxAtomsPerPage∗pagePosition].rad_id); 8 9 int encodedRadius =bitfieldExtract(sphereID, 0, 16); 10 float sphereRadius =DECODERADIUS(encodedRadius);

First, we extract the parent ID from thefourth componentofour data point. As IDs are defined perpage, not for theentire buffer, we have to combinethe ID with the same offsetweusedtorender the basicrenderinglevel in ordertoget the parent’s position. The parentpoint’s radius is then extracted fromits fourth componentand decoded using amacro.

66 CHAPTER 6

Results

In this chapter,wepresent the results of our work, compare differentsettings,and analyze various parts of the implementation.Our tests focus on thetwo most importantaspects of our solution: the performance of thedata structure, especiallyfor large datasets,and the effects of our levels of detail solution, both visually and in termsofperformance. Additionally,wetakealookatthe impact of thedataset structure on neighborhood based algorithmsand the parameters of the enhancementmethods. All testsresults are generated usinganNVIDIA GeForce GTX960M/PCIe/SSE2 graphics cardand an Intel Core i7-7500U @2.70 GHzwith 16 GB RAM. Everything is rendered in full HD (1920 by 1080 pixels).

6.1 Test Data

In ordertotest our framework in differentsituations andinvestigatehow well our implementationscales,weput together asmallcollection of test data. Our mainset consists of eightdifferent-sized molecules, showninFigure 6.1.The largestmolecule currentlyfoundinthe PDB database is 3j3q, whichrepresents the structure of an HIV-1 capsid. It contains2,440,800 atoms. To test scalability beyond that, we usetwo artificially created molecular datasets consisting of 10 and 50 instancesof3j3q, containing24,408,000 and 122,040,000 atoms, respectively.

6.2 Configuring andPre-processing the Data Structure

Ourimplementation focuses on rendering asingle time-step.Buildingthe octree and calculating levels of detail is done in apre-processing step. This speeds up rendering, at the costofalonger start-up time. In this section, we look at howpre-processing computation scales fordifferent-sizedmolecules and the impact of data structureconfigurations.

67 6. Results

(a)1sva: 15,983 (b)2btv: 49,061 (c)5odv: 82,656 (d)6ncl: 305,842

(e)6o2s: 351,728 (f)4v99:546,120 (g)6qz0: 1,161,565 (h)3j3q: 2,440,800

Figure 6.1: Atoms in our test set withthe number of atoms they contain

The time it takestobuild the octreedata structure including levels of detail increases linearlywith the number of atoms in the data set, as shown in Figure6.2a.Thereare severalreasons whyone might have to calculate data sets that arelarger than thelargest singlemoleculecurrently available in thePDB database: the databasecontinues to be extendedand new molecules added, one maywanttorenderentire cells,ormolecular dynamics simulationswith severaltime-steps. Therefore, we look at howthe construction of thedatastructure scales forlarger datasets. In Figure 6.2b, we showthe timerequired to build the data setper atomfor 26 molecules, andtwo artificial large datasets.Here, we see that thetime required to calculatethe data structureincluding levelsofdetail actuallyincreases on aper-atom basis. This is duetothe overhead of sorting atoms into pages. However, it hastobenoted thatasimple nearestneighbor algorithm that checks every pointinthe data set to findneighbors has an exponentialexecutiontime. By sorting the atoms into pages withamaximumsize,the maximum number of neighbors an atom needstobecompared to is 215 − 1, even for very large data sets, whichresults in alinear increaseinthe timeittakes to computethe LOD datastructure. Thesame advantage can be leveraged when calculatingany other type of neighborhood-based algorithm. We sillhavetosortall atoms andtraverse nestedpages,whichiswhy the time does increaseper atom, but becauseofthe division of thelimit to the number of neighborhood queries, the total execution timeincreases linearly, notexponentially.

Figure 6.2cshows that thetimeittakes to load apre-processed data setalso increases linearlywith size. However, the overall time required to loadapre-processed data setis negligible compared to building it from scratcheachtimeadata set is loaded. As we

68 6.2. Configuringand Pre-processing the Data Structure

Time Required to Build Data Structure LOD Building Time / Atom

1000 3.50E-04 )

ic 3.00E-04

hm ld

it 2.50E-04 ui ar 100 2.00E-04 o b og t (l

d 1.50E-04 il nds

co 1.00E-04 bu se

to 10 5.00E-05 s

nd 0.00E+00 co Se 1 10000 100000 100000010000000 Number of atoms (logarithmic scale) Number of atoms in molecule

(a)The time it takestobuildthe data struc- (b) As LODcreation requires neighborhood ture increasesexponentially with the number of queries, the timeittakes to buildthemalso atomsinthe dataset. increases on aper-atom basis.

Time Requiredto Fetch Data

10 c) mi th ri

ga 1 (lo s nd

co 0.1 se n i me Ti 0.01 10000 1000001000000 10000000

Number of atoms (logarithmic scale) Load Build

(c) The time it takestoload pre-processed data (d) Pre-processeddatacan be loaded very alsoincreases exponentiallywith the size of the quickly compared to buildingthe datastruc- dataset. ture fromscratch.

Figure6.2:Both buildingand loading timesincrease exponentiallywith the size of the data set.Asloading onlytakes asmallfractionofthe timerequiredtobuild them,itis well worth saving the data structures. assume that thesame datasets are going to be loadedand viewed severaltimes, it is therefore worthwhile to store the pre-processeddata sets. Figure6.2dshows the time it takes to load adata set relative to the amountoftime required to build the same data set. The figure is based on the same data as Figures 6.2a and 6.2c. As building thedata structure is asignificantoverhead for large datastructures,wesavethe datastructure into a*.lod, once it is completed.

Whilethe most significantfactor determining thebuilding,loading, and rendering times is thesizeofthe data set, page sizes can also have an impact.Wecomparepage sizes rangingfrom the maximum possible atom count perpage of 32,768, to aminimum of just 256 atomsper page.The results are showninFigure6.3. The timeittakes to loadthe

69 6. Results

Time Required to Load Data Structure FPS for Different Page Sizes

1400 300 )

ed 1200 250 ck

1000 ) ta 200 (s ed

d 800 ck 150 oa ta l

600 (s to

s 100

400 FPS nd 50 co 200 Se 0 0 32768 163848192 4096 20481024512 256 3276816384 8192 4096 2048 1024 512256 Maximum number of atoms per page Maximumatoms per page

1sva 2btv 3j3q 4v99 5odv 6ncl 6o2s 6qz0 1sva 2btv 3j3q 4v99 5odv 6ncl 6o2s 6qz0

(a)The size of pageshas very little impacton (b)Onaverage, larger pagesresult in higher frame thetime it takes to load adata set rates

Figure 6.3: When comparing loadingtime and FPS,wefound that larger pages tend to perform slightlybetter.

data structure remainsvirtually the same for differentpage sizes.When looking closely,it becomesvisible thatthe page sizesthat result in lowerframerates, are very slightly faster when it comestoloading thedatasets.However, the effect is negligible.Our systemis alsooptimized forruntime, ratherthanstart-up time, so the more interesting results are changes in frame rates. We showthe averageFPS using different page sizesonaset of moleculesinFigure 6.3b.While the exact result is also influenced by the structure of the molecule,larger page sizes perform better in most cases. Forour remaining tests, we keep the maximum number of atoms perpage set to 32,768 (215).

6.3 Rendering

The mainobjective of our framework is to render large molecules at interactiveframe rates,while keeping the qualityofvisualenhancementand thelevel of detail as highas possible. Whileframe ratesare the most naturalquantitativefactor whenanalyzingrendering frameworks, they vary strongly dependingonvarious influences. Apart from hardware and implementation overhead, which are usually constant when comparing datasets within asingle framework, the screenfill rate is highly significant.Figure 6.4 shows the same molecule,using thesame settings for enhancementmethods,etc. at different zoom levels, resulting in adifferentscreen fill rate. In thiscase, theframe rateliesbetween 9and 74 FPS, mostly due to the raycasting implementation. Forframe ratebased performancetests across data sets,wethereforeuse astandardized position, where the molecule is centeredand scaled in suchaway thatits boundingbox is maximized relative to the viewpointwhen using an orthographic camera. The size of the viewpointinour tests is full HD, that is 1920 by 1080 pixels, and we useaperspectivecamerawith 60

70 6.3. Rendering

(a) 9FPS on average (b)27FPS on average

(c)59FPS onaverage (d)74FPS on average

Figure6.4:Different screenfilling (6qz0) degrees field of view. As astandard setup, alltests are conducted usingthe positions shown in Figure6.1.Additionally,weuse aGaussian surfacewith asharpness of 16 (see section 6.4.3),but no depth blur or ambientocclusion, unlessspecified otherwise.

6.3.1 Level of Detail Rendering The mainmotivation for usinglevels of detail, is to reducethe amount of data that has to be rendered in order to achieveperformance improvement, as well as making it possible to render data sets thatare toolarge to be handled by aspecific setofhardware.In this section, we showhow much the number of spheres is reduced foreachlevel, and howmuchthe frame rate increases. We alsovisuallyinvestigate the theperceived loss of detail forcoarser levels of detail. Table 6.1 shows the total number of atomsper level of detail forour test sets. The corresponding screenshots are shown inFigure 6.5. The lastcolumn in the table shows the number of pages into which the data is sorted, which alsorepresentthe last, coarsest level of detail, whereweuse asingle sphereper page, which is notshown in the figure. Because we usetwo buffers,with the secondary one coarser than theprimary buffer, this lastlevel cannotbechosen as basiclevel. Thenumbers in thetable are visualized in Figure 6.6. In Figure 6.6a, we showthe totalamountofdatapoints permolecule and LOD,providing an overview over relativesizes. As that makes it had to see the details

71 6. Results

Figure6.5:LODs (toptobottom: 1sva, 2btv, 3j3q, 4v99, 5odv,6ncl, 6o2s, 6qz0)

for smallerdatasets and coarse LODs, Figure6.6b showsthe amountofpointsper level next to eachother and on alogarithmicscale.Figure 6.6c shows theaverage amount of points perLOD relativetoother LODs.While the exact reductiondependsonthe structure of the molecule,the averagereductionbetween neighboring LODs is between 40 and 60 percent, depending on the structureand densityofthe molecule.

72 6.3. Rendering

4500000 4000000

s 3500000 nt 3000000 oi

p 2500000 of

r 2000000

be 1500000 1000000 Num 500000 0 1sva 2btv 3j3q4v995odv6ncl6o2s6qz0 Molecules

Level 0 Level 1 Level 2 Level 3 Level 4 Level 5 Level 6 Pages

(a) Totalamount of points in thetestmolecules across LODs.

10000000 1000000 100000 10000 1000 100 10 1 1sva 2btv3j3q4v99 5odv 6ncl6o2s6qz0

Level 0 Level 1 Level 2 Level 3 Level 4 Level 5 Level 6 Pages

(c)The amountbywhy the number of atoms is (b) Amout of points separated by level and reduced perLOD depends on the structure of shown in logarithmicscalefor easier compar- themolecule. Generally, points are reduced by ison. about 40-60 percent.

Figure6.6: Comparing the relativeand absoluteamountofpoints perLOD forour test data sets.

73 6. Results

Name Level 0Level 1Level 2Level 3Level 4Level 5Level 6Pages 1sva15983 7240 2205 853 409 228 142 1 2btv 49061 22508 7315 2812 1282 689 409 8 3j3q 2440800 1171809 375594 148123 68567 38951 24011 260 4v99 546120 251943 80547 32012 14882 7945 4870 36 5odv 82656 31540 6760 2777 1216 705 428 8 6ncl 305842 138484 44268 17337 7901 4180 2493 36 6o2s 351728 158129 52977 20600 9407 5094 3045 63 6qz01161565 520817 171045 67621 30994 16604 10003 131

Table 6.1: Numbers of atoms perlevel of detail

Ourgeneralrendering testfor alllevels,shown in Figure 6.7,confirmsthat allmolecules currently available in the ProteinDataBase can be rendered at interactiverates. The maximumamount of atoms is set to thehighest possiblelevel of 32,768 atomsper page. Frame rates for thestandard viewsrange from 27 to 84 in the originalresolution. As we wantour framework to support even larger structures, as well as more complicated enhancementmethods,the jumpinperformancebetween level 0and 1for 3j3q,which contains approximately 2.4millionatoms, from27to55FPS is particularly interesting. In section6.3.4,weinvestigate artificial datasets based on several instances of that molecule.

6.3.2 Caching andPer-page Rendering Ourframework has thecapabilitytorender the data structureper page,usingalist of visiblepages, or to render everythingcontained withinasingle bufferatonce. In Figure 6.8a, we compare the two rendering methods.Inbothcases, theframerates were measured with the molecules in the position showninFigure 6.1, with the entire molecule in view.For smaller moleculesthatcontain few pages,itdoesnot makeadifference whetherwerenderper-page or theentire buffer, but forlarger molecules, renderingper pageismore effectiveinour framework.Additionally,featuressuchasblendingbetween levels of detailrelyonper-page offsets, which means thatifwewanttouse them, we needtorender perpage. Our standardmethodisoptimized for per-page rendering,as the buffer is divided into blockswhen uploading perpage, which leavesempty space (see Figure6.8b). In ordertoverify thatinour framework, per-page rendering forlarge datasetsisalso faster than rendering afull, dense buffer, we compare frame rates forall test setsatthe originalresolution. As discussed in Section 6.3.4, we also needtodivide the data setatsome point, when, depending on thehardware, the molecule no longer fits into memory in itsentirety.Forthe remaining tests shown in this chapter,werender blocksofdata correspondingtopages individually. Figure6.9 demonstrates thatonlythe pages that are actuallyneededare uploaded. In screenshot6.9a,the entire molecule 4v99, divided into 36 pages, is loadedatthe

74 6.3. Rendering

FPS for Different LODs

100 90 80 70

S 60 FP 50 40 30 20 Level 0Level 1 Level 2 Level 3Level 4Level 5 Level 6

1sva 2btv 3j3q 4v99 5odv 6ncl 6o2s 6qz0

Figure 6.7: Frame rates achieved for thedatasets at differentlevels original resolution.The colorsrepresent their orderinthe cache, fromoldest (red) to newest (green). Usually, missing pages are automatically loaded whenthe perspective changes andthey becomevisible.Inorder to demonstratethatonly pagesthatare at leastpartially visibleare uploaded to thebuffer, we first zoomed intoaposition whereonly part of the data set is visible. Then, we cleared the bufferand disabled cacheupdating. After that, we zoomed out againtoview the entire molecule, without allowing the program to load missing pages. Figure6.9bshows which pages were already loaded in thepreviousview, as well as those that are missing, which are enclosed by gray bounding boxes.Inorder to observe this, we have to manually clear thecache,aspages that become invisibleare not automaticallydeleted, unless the cacheruns out ofspace.

6.3.3 Selecting the Level of Detail We want to use levels of detail in away thatincreasesperformance as much as possible or necessary, while keeping the required visualquality. Partially,visualqualitycan be parameterized and automated,but theuser hastohave to optionofchoosing or at least influencing the visualresult, as the requirements maydiffer accordingtothe task the user wishestoperform.Weprovide aset of optionsthatcombinemanual inputand automaticcalculation to control thelevel of detail.

75 6. Results

Rendering Entire Buffer And Per Page

90 80 70 60 50 S

FP 40 30 20 10 0 1sva 2btv 5odv6ncl6o2s4v996qz03j3q Molecules sorted from smallest to largest

Per page Entire buffer

(a) Perpage rendering comparedtorenderingthe entire buffer forall molecules at level 0

Entire buffer Per page

currently not visible rendered currently visible not rendered

(b)The buffer is divided into page blocks, whichcan be drawnindividually.

Figure 6.8: Rendering the entire buffer vs. page blocks

76 6.3. Rendering

(a) All36pages,level 0 (b)Pages are loaded when they are needed.

Figure6.9: Page replacement(molecule 4v99).Only visible blockswereloadedata previousposition. Stoppingthe buffer fromupdating allowsustointeract with the molecule andlookatwhichpages were notneeded in the previous view.

Manual LOD Selection

The usercan set atargetframe rateinthe interface. When the userchooses alevel in the interface, thebasiclevel of detailischanged. This means thatthe wholebuffer is clearedand divided into new chunks, basedonthe maximum sizeofatomsper page forthe newly set level of detail. Forsmalland medium sized molecules,including all examples in ourtest set, this taskdoesnot causeavisible delay. Forvery largedata sets, suchasthe examples discussed in section2.3,itcan take severalseconds or more. However, very large data sets are the main application for this option.Ifthe user wants to interactivelystudy the entire structureofadata setthat is to large to fit into memory at the original resolution,they canchoose to set alowerbasicrenderinglevel, which allowsall pages to fit into memory at thesame time.

Thesmooth transition sliderchanges the rendering level. The bufferpartition is leftas it is, but whenever apage becomes visible,the data of the selectedrenderinglevel is uploadedinsteadofthe currentbasiclevel.The sliderstarts at the basic level of detail and goes up to thehighestlevel (lowest resolution) available in the system. Forexample, if the basiclevel is setto1,the rendering level cannot be afiner resolution, but starts at 1, as thepoints containedinasingle page at level 0donot necessarily fit into thebuffer slotsbased on themaximum size of level1.The slider allows forasmooth transition between all levels, startingwith the basic level. If the slider is forexampleset to 2.5,the molecule willberenderedsomewherebetween level2and 3, as shown in Figure 6.10.

77 6. Results

(a)Level 2 (b)Level 2.5 (c)3

Figure6.10:Smoothtransition (molecule 4v99).Wecalculateafinite number of discrete levels of detail, but we caninterpolate all steps in between them.

Frame Rate DependentLOD Selection The timeout windowof10seconds before the level is reducedtothe next lowerresolution is basedonthe observedamountoftime in whichthe frame rate stabilizes in most cases.This mode only lowers the resolution, it does not automaticallyincrease to afiner resolution, even if thetarget frame rate allowsit. Though automatically returning to a better resolutionmightbeconvenient in some situations, we prioritize avoidingfrequent jumps between levels,whichslowthe renderingdown. This mode also changes therender level, notthe basic level.

View Dependent LOD Selectionand Interpolation The idea behind the view dependent level of detail renderingistoreducethe number of atoms towards the backofthe scene, as seenfromthe camera.They are renderedsmaller in aperspectiveprojection,and are more likely to be covered by otheratoms. Figure 6.11 shows the use of view dependentLODs.Inthis case, theinterpolation is between level 0, theoriginaldata, and level 2. Datapoints closetothe camera are renderedat level 0, withagradual transitiontolevel 1, the pagesinthe back have level 2astheir renderinglevel and gradually transition to level3.While acleardifference is visible between the original resolution shown in6.11a, and level 1next to it (6.11b), it is hard to seethe difference between original data6.11a, and themixedlevels6.11c. The average frame rate increases, as lesssphereshavetoberendered, while the data points in the foreground remain the same. Figure 6.11d shows where level 0isused (orange)and where level 1(violet) is used.

6.3.4 Rendering Very LargeData Sets In order to test our solution beyond themolecules currently available in the Protein Data Bank, we use twoartificial data sets. Data set1,shown in Figure6.12bconsistsof10 instances of 3j3q,the largestmolecule currently available in thePDB.Itapproximately representsthe maximum data set size we canfitintomemory as awhole usingour test hardware. Theexact number of datapoints that can be uploaded to the GPU notonly depends on thehardware itself, but also on whether or not anyother processes areusing part of it at the same time. Figure6.12a showsdataset 2, consisting of 50 instances

78 6.3. Rendering

Level Max pages Max atoms perpage 011071 32750 125802 15740 276011 5067 3183294 1996 4365466 933 5612167 536 61569797 341

Table 6.2: Numbers of atoms perlevel of detail

ViewActivepages Basic level Render level FPSFrozenRendering 112886 66 39 no page 29471 22 7nopage 39471 24 23 no page 41743 00 5nopage 52934 11 4yes buffer 613729 (all)3 422nopage

Table 6.3: Data setatdifferent viewsand LODs of 3j3q.Intotal,this example contains122,040,000 atoms, which we cannot loadinto abufferatthe sametime.Instead, it is rendered at level 2inthe screenshot, which is the maximum resolutionatwhichwecan show all itspages at thesame time. The hierarchicaldatastructureispre-computed using thestandardmaximum page size of 215.

We conduct anumberoftests on thespecialcase of adataset thatdoesnot fit into the buffer at full resolution on our system. Figure6.13 contains 6viewsatvarious resolutions of test data set 2(Figure 6.12a). Table 6.2gives an overviewoverthe maximum number of pages that fit into thebuffer at thesame time using the capacityofour hardware. The right hand columnshows the maximum number of atomsper pagefor eachlevel of detail, whilethe second columnshows howmanypages could fit into the buffer at the given level.The number of pages in thisdata set whendividing it into pages of up to 215 atoms perpage, is 13,729. Table 6.3 summarizes theconfiguration details of the individual views shown in Figure 6.13. The column"active pages" refers to howmany pagesare in the current renderlist. Therenderlist contains thepagesthatare at least partially visible in the current frame,unlessthe buffer is frozen, stopping it andthe render listfrombeing updated. Thebasiclevel of detail is the levelthatdetermines buffer partition, while the render levelrefers to the resolution at whichthe data points are actually rendered. The last two columnsindicate if the buffer is frozen,and whether we drawthe scene pagebypage, or render theentire buffer.

79 6. Results

At startup,weload allpages in the data set. If itexceeds the hardwarecapacity, the basiclevel is set to 6. View 1inFigure 6.13a shows this initialsetup. In this view,12,886 of 13,729 pages are active, i.e. at leastpartially withinthe view frustum. At 39 FPS, the frame rateisinteractive, butthe reducedresolution is clearly visible at level 6. View2 in Figure 6.13b showsapartial viewwhere about69% of pages are active, thoughthat does notnecessarily correspond to 69% of theactualdata being visible.Here,the basic renderinglevel is set to 2, the highestresolution at which we can view the entire data set. View3inFigure 6.13d is identicaltoFigure 6.13b, except that therendering level (but notthe basic level) is setto4,whichmore than triples the frame rate. View 4in Figure 6.13d shows part of thedataset at theoriginalresolution.Inorder to viewpart of thedataatahigher resolutionthanispossiblefor theentire data setatonce, the user hastozoomintothe section theywouldliketoview in detail first, andthen set the level to the desired resolution.This limits the cachetothe maximum amount of pages thatcan fit into the buffer at that level. Whenthe user movesthe camera,pages are replaced using the LRUprinciple.Ifthe camera is movedinsuchaway thatmore pages become activethan fit into the cache, the remaining pagesare ignored forthatframe. We find this behaviorpreferable to automaticallyreducing the levelofdetail, as it is easy to accidentally zoom out toomuch, thereby triggering an undesiredlevel change. Deletingand restructuring the buffercauses adelay, especially forvery large data sets. Switchingbetween basiclevels, i.e. restructuring the buffer, shouldbeavoided if it is not what theuser actually intends.View 5inFigure 6.13e showsarendering of the entire buffer, whileitisfrozen. As discussed in Figure 6.9, datathatremains in the buffer, even though it is not needed,isonly visible if the entire buffer is rendered at once, rather than per-page. In thiscase, we see the part of the data set thatwewere viewing when the bufferwas frozenonthe right side.Onthe left, we see leftoverdata from aprevious view.The finalview 6(Figure6.13f) showsthe entire data set with the basic level of detail set to 3, and the rendering level set to 4, which resultsin22FPS, aframe rate usablefor interaction.

6.4 EnhancementMethods

In thissection, we demonstratethe enhancement methodsincluded in ourframework. Theincluded methods are in screen space, whichmeans thatthey areapplied on top of a primary renderingpass. We mainly focus on visual results, andthe impactparameter settings have on framerates.

6.4.1 Blur-based EnhancementMethods for AmbientOcclusionand Edge Enhancement We summarizedifferenteffects basedonscreen space blurring in Figure6.14. In allimages in the figure, aGaussian surface with asharpness of 16 is used. Forreference, Figures 6.14a and 6.14b show the moleculewith no effects, and theHBAO+ library respectively. The effectsbased on aGaussian blurpassare shown in Figures 6.14c, 6.14d, and6.14e.

80 6.4. Enhancement Methods

ViewMethod Kernel/passesLambdaSigma anone --- blibrary --- cgaussian 21020 dgaussian 10 20 20 egaussian 10 18 2 flibrary +gauss4 20 10 ghalo2 21 - hhalo 450- ihalo 64 2- jlibrary +halo 250-

Table 6.4: Rendering detailsfor Figure6.14

Figures6.14g, 6.14h, 6.14i show the effects based on halo blurring.Figures 6.14fand 6.14j show acombination off theeffect producedbythe library,and Gaussian- or halo shader. In general, theeffectthat canbeachievedbythe two methods is rather similar.Itworks effectively for edge enhancement of differentthicknesses, dependingonthe strengthofthe blurand the parameters.The strengthofthe blur is influenced by the size of the kernel, or number of passes,depending on themethod (see forexample 6.14c vs 6.14c), but alsothe parameters lambda and sigma. Figure6.14elooks very similartoFigure 6.14c, exceptfor astronger enhancementoflines around themolecule comparedtointernal borders, even thoughituses thesame kernel sizeasFigure 6.14e. The internal structures enhanced by the library (Figure 6.14b) are hardly visible in Figure 6.14a. When applying halo blurring with alarge number of passes, as in Figure6.14i, where 64 pixels are sampled around each point, theshader achieves an ambientocclusioneffect. Compared to the ambientocclusion library,ithas theadvantage of exclusively darkeningdeeper structures at thecenter of themolecule,making themmore clearly visible. However, the darkening around the edgeofthe molecule,where the contrast caused by blurring is strongest, hides details in thatarea.Inconclusion,the blurringbased enhancement shaderswouldhavetobedevelopedfurtherinordertosensibly replacethe library for ambientocclusion blurring,but they do work well for edge enhancement. We also found acombination of edge enhancement, andblur shaders, as demonstrated in Figures 6.14f and 6.14j,useful to investigatelarger andsmaller structures within the molecule at the same time.

6.4.2 Screen-space Depth of FieldEffect

The depthoffield effect canbeused to drawattention to specific areas. Itsparameters work likethe settingsofacamera. The user can set the focallength, andthe F-stop. Additionally,the maximum circle of confusion (CoC) radius can be set.Figure 6.15and table 6.5 showdifferent settings for thedepth of field shading.Fromone view to the

81 6. Results

ViewFocal distance F-stop Max COC Effect a- -- - b1.4 1.2 6- c1.4 11 6- d4.9 1.2 6- e1.4 1.2 1.5- f1.4 1.2 14 - g1.4 1.2 6HBAO+ h1.4 1.2 6Gauss(4/20/9)

Table 6.5: Rendering details for Figure6.14

next, only asingle parameter is changed. Thefocal distance determines where, relative to the camera,the focus is, while the F-stop determines howdeepthe areainfocus is. On an optical camera,alowvalue for the F-stop means an open aperture, whichlets more lightin, but also scatters the lightrays, so the area of focusisnarrower, which we reflect in ourblurringformula. The maximum circle of confusionradiusdetermines the strength of theblur in the blurred areas. Alarger radius means thatmore neighboring pixels have to be considered when blurring.Therefore, thisisthe only parameter that directly affects the frame rate. It is also possibletocombine the effect withambient occlusion(Figure 6.15g), or edgeenhancement(Figure6.15h). In that case,the render passesfor both effects are executed consecutively.

6.4.3 Rendering the GaussianSurface Model Though the Gaussian Surface is alesscomputationally expensivealternativetothe full SolventExcluded Surface, it is the most expensiveofthe methodswecurrently provide in our framework. Thedominantfactor for itsperformance, is the extentofthe surface, as thatfactor is responsible forthe length of the intersection list that has to be checked foreachray.Weuse themethodproposed by Bruckner [Bru19]. Adetailed performance analysisofthe algorithm itself,isoutside thescope of this thesis. In this section, we analyzethe methodinthe context of our work. Figure 6.16 shows theGaussian surfacewith different sharpnessparameters, and the resulting framerate. Forthe sameviewand settings, the frame rates rangefrom 29 to 3 FPS. In mostuse cases, thegoal of rendering aGaussian surfaceistoapproximate an SES, so the sharpnessvalue of 0.5 maynot be aparticularlyrealistic case. However, it does demonstrate howdependentthe rendering performance is on the parameter of the Gaussiansurface. The method as proposed by Bruckner [Bru19], is output sensitive, so its performance dependsonits ownparameters, and thenumberofspheres in the view frustumfor thecurrent frame, rather thanthe totalamountofatomsinthe molecule. Theonly difference between images in Figure6.16isthe sharpness factor. This parameter influences the size of the neighborhood that is considered, when determining overlapping

82 6.5. Conclusion and Comparison to Similar Solutions spheres, and thereby the amount of spheresinthe intersection list. As we use ascreenspace methodtobuild the molecularsurface,the data structure used on theCPU is not directlyrelevantfor itsperformance.The only partofour hierarchical datastructure thatinfluencesitdirectly, both in termsofvisual resultsand framerates, are levelsofdetail.Figure6.17 shows three versions of molecule 5odv,using asharpness factorof2.5.The images in the top rowshowthe entire screenshot, the rowbelowa cropped out detail. Figures 6.17a and 6.17d showthe molecule at the originalresolution, without anyoptimization. The resulting framerateis12FPS.InFigures 6.17band 6.17e, the extentofthe molecular surface is attenuated, basedonthe camera distancewith a threshold of 113, and atoleranceof87, resulting in aframe rate of 15.5 FPS. Forthe last view,6.17cand 6.17f,weuse both distance based attenuation, anddistance based LOD, with athreshold of 173, and atolerance17, and achieveaframe rate of 21.5 FPS. Whilethereare some visible differences, especiallybetween detail view 6.17d and6.17f, users can set the parameters themselves,and in somecases,aslight loss of detail could well be considered acceptable, given that in this case, the framerate is almost doubled.

6.5 Conclusionand Comparison to Similar Solutions

Due to variations in hardware and alackofconsistentstandards,directcomparisons of measurements such as frames persecond, are often not meaningful.Asmentioned above, frame rates also depend on other factors,notably theperspective, so even arendering of thesame moleculeusingthe same hardware cannot necessarily be compared directly. Therefore, we mention noteworthy differencestoaselection of closely related recent work, andtry to giveageneral comparative overview,rather than presenting adirect comparisonofframe rates. Grottel et al. [GRDE10]use acombination of optimization strategies that includecaching, two-level culling, and deferred shading. Figure 6.18a showsthe results theypresentin theirpaper, fordata setsrangingfrom107,391 to 100,000,000 points. The main difference to our work is thatthey do notuse ahierarchicaldatastructure. Theyfocus on molecular dynamics simulations, and therefore find the overhead of creatingitprohibitive. As our mainoptimization strategyrelies on levelsofdetail thathavetobepre-calculated, ahierarchical datastructure is more suitable,and the overheadnegligible in ourcase. Sharmaetal. [SKNV04]douse an octree, andlevelsofdetail. However, their octree is limited to three levels, and they usepolygons to rendertheir spheres.Levels of detailin thatcase refer to theresolution ot thesphere, rather than the number of spheres. Their framework manages to render very large data sets (see Figure 6.18b), especially given thattheir research wasalready published in 2004. However, they only render points as spheres,and do not include or discussmolecular surface modelsorenhancement methods. The advantage of theirLOD solutioncomparedtooursisthatitcanbedetermined on thefly. Our methodhas theadvantage of actually reducing thenumberofspheres thathavetoevenbeconsidered for culling. Figure6.18bshows the results achieved in theirframework, Atomsviewer, without optimization,usingthe octree, and aparallel and

83 6. Results

distributed version. Their work is astandard to which most otherrelevant works refer. Contrary to manysolutionsproposed afterwards, we return to asimilar data structure, while usingmore recent techniques whenitcomes to rendering. Paruleketal. [PJR+14] on theother handfocus their work on molecular surfaces. The largest atomsizethey testis, at 58.674 atoms,relatively small compared to other implementations thatinclude tests up to billions of atoms. However, they presenta moreadvanced surface model thanany of theotherclosely related implementationswe found. According to theirresults, their mixed model, whichincludes levels of detail and three different surfacemodels, achieves up to 20x theframeratecompared to afull SES representation. Ouroptimizationsdonot boost renderingasmuch, but our systemdoes not (yet) includethe more computationally expensiveSES,sodirectcomparisons are not possible. We do,however, usethe sameclustering methodtobuild ourlevels of detail. Our surface representation setup is notasadvancedastheirs, but we do provide adata structure thatiscapable of renderingmuchlarger data sets, which could be extendedto include suchavisualization model. Thework done by Matthewsetal. [MEK+17]issimilartoours in thatthey also focus on asingle instance, andinvestigatethe useofdatastructures to boost performance. Theyemploy agrid-based data structure. The main difference is thatthey focus on moleculartrajectories, so theyavoid pre-computations.They achieveinteractiveresults while constructingthe data structure on thefly, includingambientocclusionenhancement. However, the largest dataset they renderintheir tests consists of 316,404 atoms. They do not includelevels of detail, as therequired clustering is computationallyexpensive. Formolecular dynamicsoranimations, theirwork is more performant and suitablethan ours, but it is muchmore limited in the sizes it is capableofhandling. Alot of recentwork in molecularvisualization focuses on instance based rendering (Guo et al. [GNL+15], Le Muzicetal. [LMPSV14], Falk et al.[FKE13], Lindow et al.[LBH12]). Theyachieveveryhigh numbers whenitcomestothe absolute results in terms of how many millions or billionsofatomsthe framework canrender. However, this approachis limited to very large data sets thatdoinfact containrepetitivestructures. Lindowet al. [LBH12]mentionintheir future work sectionthatthey recommendthe inclusion of LODs andocclusion cullingintheir framework, whichFalk et al. [FKE13]provide.Guo et al. [GNL+15]use avery similar type of hierarchical clustering to ours in order to build levels of detail. They prove thatitcan be used with instance rendering.Noneofthe given examples useanoctree data structure which we argue is more convenient, because it divides space moreequally in termsofdensity. As thepapersonly provide results for the entire scenes, containing up to millions of repeatinginstances,frame rates arehard to compare.Our system scales well forlarge non-repetitivestructures well beyond the scaleofthe basic molecules used in testing the instance basedrenderingsystems. In our opinion, it would howeverbeworthwhile to additionally extend our framework to be able to handlerepetitivestructures.

84 6.5. Conclusion and Comparison to Similar Solutions

(a) Level0.43fps on average (b)Level 1. 59 fpsonaverage

(c) Mixed. 51 fpsonaverage (d) Interpolation. Yellow: level 0, violet: level2

Figure 6.11: Viewdependent level of detail(molecule4v99). The usercan set adistance fromthe camera andthreshold. Theresolution of themolecule is reduced in areasfar from thecurrentcamera position, saving rendering time.

85 6. Results

(a) 50 instancesof3j3qatlevel 2 (b) 10 instances of 3j3q, originalresolution

Figure 6.12: Large test datasets

(a) View1 (b) View 2

(c) View3 (d) View4

(e) View5 (f)View 6

Figure6.13: Artificialdata set with50instances of 3j3q

86 6.5. Conclusion and Comparison to Similar Solutions

(a)Noeffects (b)Library

(c)Gauss (d)Gauss

(e)Gauss (f)Library+Gauss

(g) Halo (h)Halo

87

(i)Halo (j)Library +halo

Figure6.14: 6ncl with differentenhancementeffects. 6. Results

(a) No effects (b)F-stop 1.2

(c)F-stop 11 (d)Focal distance 4.9

(e) Max COC 1.55 (f)Max COC 14

(g)HBAO+ (h)Gaussian edge

Figure6.15: 6o2s withdepth of fieldeffects

88 6.5. Conclusion and Comparison to Similar Solutions

(a) Sharpness: -, FPS: 29 (b) Sharpness: 10, FPS: 25.5

(c)Sharpness:2,FPS:15.5 (d)Sharpness: 0.5, FPS 3

Figure 6.16: Gaussian surface (5odv)

(a)Level 0, no optimization,12 (b) Level 0, sharpness attenua- (c)Attenuation, distance based FPS tion, 15.5 FPS LOD, 21.5FPS

(d) Detail of 6.17a (e)Detailof6.17b (f)Detailof6.17c

Figure 6.17: Gaussian surface (5odv)

89 6. Results

(a)D1: 107,391; D5: 100,000,000 [GRDE10] (b) [SKNV04]

Figure6.18: Results achievedinrelated work

90 CHAPTER 7

Discussion

In this thesis, we present arendering systemfor large-scalemolecular datasets,including levelsofdetail, and enhancement methods.Our approachisanovel combination of existing techniques, in order to bridgethe gap between medium-sizedmolecular visualization schemes, and extreme-scaledatasetsthateitherrely on repeatingstructures, or do not providevisual enhancementmethods.

7.0.1 Summary andContributions The maincontributions of thisthesis are ahierarchical,multi-resolutiondata structure, and aleast recently used caching system tailored to it. We believethatour proposed combination can bridge agap between researchfocusingonadvancedrendering methods formedium-sized molecules, and those concerned with extreme scaledata sets, but with much morelimited visualizations options. Our testswere conductedondata sets containing up to over 100millionindividualatoms. While ourartificial large-scale data sets do containseveralinstances of thesame molecule, we render them as asingle molecule. At about 2.44 million atoms, 3j3q maybethe largestsingle structuretobefoundinthe Protein Data Base right now, butitiscertainly notthe most complex molecular data setanyone will ever want to render. It therefore pays off to have aframeworkcapable of rendering much largerstructures, without having anyprior informationabout its components. We propose to use our data structure in away thatcombines theadvantages of hierarchically dividedspace, and linear buffers. Contrary to manyimplementations that use aregulargrid as abasic data structure, the pages in our octree contain asimilar and controllableamountofatoms each. The advantage of this structuring becomes evidentwhen calculatingclusters in order to buildlevels of detail.Itmakes themaximum size and theapproximate maximum computation time for neighborhood-based algorithms on eachpage predictableand balanced.Asdiscussed in Section3.2.2,division of space is necessary for some tasks, suchasinour case the hierarchical clustering algorithm. Thedivision of space into density-based pages that result in fewer,and more predictable

91 7. Discussion

units canbeeffectively used wheneverneighborhood queries are necessary.Atoms on the same page are guaranteed to be neighbors withagiven amountofmaximum distance. Whilewedonot currently usethis feature,octrees also makeitrelatively easytoaccess neighboring pages, compared to othertrees, suchasthe kd-tree. Therefore, theyextend the possibilities of neighborhood queriesevenbeyond neighbors on the same page. TheLRU cache makessurethatthe buffer is managed efficiently.Only pages thatare needed areeveruploaded to the GPU,and theyremain thereuntil the space is needed by something else. This does notmakemuchofadifference forsmalltomedium sized molecules, but is highly relevantwhen dealing withvery largedata sets. Our system of differentiatingbetween basic level of detail and rendering level makes it possible to quickly change between the advantages of moredetailed andcoarser, faster resolutions, without having to upload all thedatafromscratch. Forlarge data sets,uploading to the GPU is oneofthe main bottlenecks. We provide aflexible approachtolevelsofdetail, that also makesitpossible toblend differentresolutions perpage,orbased on distance measurements.

7.0.2Limitations When it comes to the scaleofdatasets, there are twomain limitations. Firstly,the data set, including all the differentLOD resolutions does have to fit into CPUmemory, in order for our implementation to be able to handleit. Secondly,weassume that the coarsest levelofdetail fits into GPU memory.While oursystemisdesigned not to crash even if the coarsestlevel exceeds memory,weonly tested it up to amaximum size of about 100 million atoms.This data setfits into the GPU at LOD2,this is approximately aquarter of the originalatoms. Most other frameworksfor molecular visualization provide the optionofrendering aSolventExcluded Surface,whichour framework currently lacks. We provide an implementation of the lesscomputationally expensiveGaussiansurface, andassume that our neighborhood-baseddivisionofthe data setwould be an assetwhen implementing SES,but it is notincluded inthe currentversionofthe framework. Our page-based bufferslots allowustomix differentlevels of detail, whichishighly usefulwhentrying to find an optimal compromisebetween maximum resolutioninthe foreground. It also helpstoreducecomputationswhere they do nothave an effect. However, it alsoleavesbuffer space unused. Therefore, we do notuse theentire capacity of GPUmemory. It is possibleinour system to upload adata set as awhole insteadof perpage, but this comesatthe expense of loosing theadvantages we described. Levels of detailare calculated perpage, so if it is desirable to upload everythingwithout unused space on theGPU,the data hastobeuploaded at the originalresolution, without advantages such as block-wise visibility culling. When it comes to our implementation of visual effects, limitations include the fact that we do notprovide transparency, and currently disregard additional molecular information suchaschains andresidues.

92 7.0.3Future Work There are several aspects of our work thatcouldbeextended to address some of the limitations. We thinkthat it wouldbeworthwhile to extend ourdatastructure in such a way thatitcan be used forrepeatingstructures. While one of the features thatsets our proposed solution apart fromother recently presented work in the field,isthe fact that we do notrely on the existence of, or prior knowledge about,repeating structures,there is no reasonnot to combine theadvantages of both approaches. The other sourceofvery large molecular datasets are molecular dynamics simulations. Our level of detail scheme makes apre-processing step necessary, whichisnot ideal for animation-based visualizations.However, we believethat it wouldbepossibletodevelop asmart pre-processing system thatmakes useofthe similarity of consecutivetime-steps. Whenitcomes to visual effects, themain componentthat would be desirable to have is the SolventExcluded Surface. Implementing and optimized version of it would be the next stepinthat department.There are also some additional effects that we think could profit from adatastructure divided into equal neighborhoods, such as theinterreflection effect proposed by Skånberg [SVGR15]. Additionally,there are acouple of tweaks to ourexistingsystem thatcouldbeworth exploring. One such example is the interactive, manual selection of resolutionsand effects. Selecting bounding boxes of pages,ordrawingareasonthe screen, and assigning resolution and effects, could giveresearchers someuseful anddynamic tools, especially forvery large data sets or computationally expensivemethodssuchasthe SES. This type of effectcould makeexcellent use of the flexibilityofmixed-block rendering, where areas in focus could be replaced withdenser, high-resolutionblocks, while the area thatare not beinginvestigated couldberenderedatalowerresolutiontoincrease frame rates.

93

CHAPTER 8

Conclusion

Manydifferentsolutions for renderingbiomolecular datasets in particularand large point-based data sets in generalalready exist. In recentyears, there hasbeenalot of research into rendering whole cellscontaining billionsofatomsbytaking repetitive structures intoaccount, basedmostly on thework of Lindowetal[LBH12]. However, these publicationsonly work on structuresthatdocontainrepeating structures,and require additionalstructural information. In our research into the state of the artof solutions targeting largebiomolecular data sets, we observedacertain lackofattention forvery largedata sets, withoutadditionalstructuralinformation thatstill needtobe rendered at interactiveframerates, while using computationally expensivesurface models. We see twospecificadvantages of our proposed data structure. The divisionofspace into blockscontaining asimilar amountofdata, which canbecontrolled by the user, is advantageous for algorithms that rely on neighborhoodqueries. Depending on the method, only theblock itself,orthe block and itsdirect neighbors,need to be taken into account. Blocks of equalsize require approximatelythe same amount of time to calculate, whichmakes calculations predictable.Moreover, there areseveralpossible applications for theflexible buffer updatingsystem thatallows us to mix differentlevels of detail. Themain disadvantage of our proposed solution is the computationally expensivepre- processing step,whichmakes it impractical for morethanone time-step. However, we feelthatthe data structure we propose is avaluablecontributiontothe are of large pointbased-datasets, especially when neighborhood-basedcalculations areexpected, or renderingatdifferent resolutioninasingleframeare used. Of course, the data sets we presented as challengingtoour system,could already be rendered without an advanced data structure on systems with higher capacity. Butwhileitcan be expected that hardware capacitieswill continue to increase, so will thecomplexityofdata setsthat needtobevisualized. If the current maximum amountofatoms perpage no longer makessense forasystemwith higher capacity, simply adding afew bits,inorder to store longer IDs, would make it possible to save morepoints perpage. Thesystemremains the

95 8. Conclusion

same,whetherfor afew thousand atoms, or entire cells, organs, or complexorganisms. Even on asystem with hight capacity, the possibilityofusing simpleneighborhood-based algorithms with alinearincreaseinexecutiontime,insteadofanexponentialincrease, is worthwhile.

96 List of Figures

1.1Historicalexamplesofdatavisualizations...... 3

2.1 Octree datastructure by Sharma et al. [SKNV04]thatdividesthe data set into threelevels of detail, as illustrated in theirpublication...... 9 2.2 Comparisons of theobject spaceerrorofdifferent distance metricsbyGuo et al. [GNL+15] ...... 11 2.3 Dataslicesshown in Haradaetal. [HKK07]. They compare afixedgrid data structure (left) with the dynamic gridthey propose (right) ...... 16 2.4 The hierarchy levels as implemented and illustrated by Hopf and Ertl[HE03]. The illustrated hierarchy is decoupled from the storageofthe underlying point data.Pointsare storedinacontinuous array, while the hierarchicaldata structure illustratedinthe figure only storespointers...... 18 2.5 Level of detail construction as implementedbyFraedrichetal. [FAW10]. The data structure is constructedbottom-up.Particlesare copied to the next level as long as theirdiameter is larger than thegrid samplingresolution, those thatare smaller are merged...... 19 2.6 At increasing distancefromthe camera,LeMuzic et al. [LMPSV14]skip atoms, adapting the radius accordingly...... 20

3.1 The main componentsinvolved in handling the data structure.Blue indicates where the actual (per point) data is located.The black arrows showwhich componentscontaineachother...... 27 3.2 Flow of datawhilecreating thedatastructure. Blue indicates where actual pointdataissaved. Theprotein component manages thedatastructure which consists of nested page components...... 29 3.3 Illustration of page subdivisionwithmax. 3atomsper page...... 31 3.4 Octree structure forthe molecule 3j3q.The blue linesshowthe bounding boxesofpages. In denser areas,more subdivisions are used...... 31 3.5 Levelsofdetail forthe molecule 2btv...... 34 3.6 Numberofatomsatlevels 1and 5and building time, showing the number of resulting clusters for2btv forlevel 1(red) and 5(green) as well astthe time it takestobuild the data structure usingthatmethod(blue)...... 34

97 3.7 Levelsofdetail forthe molecule 2btv. In the original resolution in thetop left, themoleculecontains 49,061 atoms, while level7onthe lowerright contains two data points, oneper page...... 35 3.8 Molecule 1sva. Comparingradiuscovering allpoints in thecluster vs average distance usingthe originalresolution(orange)vs. level 5(white) ...... 36 3.9 Asingle data pointstored in a4x32bit componentvector...... 37

4.1 Rendering components. Themain components responsible forthe rendering process are shown abovethe dashed line. Attached to the SphereRenderer are the component’smost important methods...... 40 4.2 Molecule 3j3q withcoloredbounding boxes forthe pages of the octree. The pages that were least recently accessed are shown in red,the most recently accessed pages in green...... 42 4.3 Cache and buffer update. The first case in thetop rowshows acache with available space left, in thecase shown in the middle, the cache is full, but parts of thedata are notcurrently needed, andthe last illustrationshows a cachethatistoo small to holdall currently visible pages...... 43 4.4 The buffer offset is calculated based on theoriginal cache position and the size of buffer chunks...... 44 4.5 The cubeonthe left is atwo-dimensional representation of an octree data structure with different-sizedpages.Blue pages are currently at least partially within the viewfrustum (green),red pages outside it.The dots representthe level of detail at which the data is savedinthe buffer. The twored dotted pages are in thebuffersbecause theywere previously visible...... 46 4.6 Molecule 1sva.Transition of theentire molecule between level 0and 1.... 48 4.7 Illustration of the smooth transitionbetween levels in distance based rendering. Pages withacenter pointmore than adefineddistance fromthe camera are uploadedatacoarserLOD (dotted pages). Within aparameter-definedarea around that distance, we interpolate levels of detailbasedonthe distance. .. 51 4.8 Renderpasses. Shaders shown in grayare optional,the ones in white necessary forour basic rendering process...... 51 4.9 Illustration of thedepth of field algorithm from thepublication by Bukowski et al. [BHOM13] ...... 55 4.10 Depthblur on the molecule 6qz0 ...... 56 4.11 Ambientocclusion using the HBAO+ library ...... 57 4.12 Examples of Gaussian blur based enhancementonthe molecule 5odv ..... 58 4.13 Examples of halo-based ambientocclusion ...... 59

5.1 System architecture ...... 61

6.1 Atoms in our test set withthe number of atoms they contain...... 68 6.2 Bothbuilding andloadingtimes increase exponentially withthe size of the data set. As loading only takesasmallfractionofthe time required to build them,itiswell worth saving the data structures...... 69

98 6.3 When comparingloading time and FPS, we found thatlargerpages tend to perform slightlybetter...... 70 6.4 Differentscreen filling (6qz0)...... 71 6.5 LODs(toptobottom: 1sva, 2btv, 3j3q,4v99,5odv, 6ncl, 6o2s,6qz0). .... 72 6.6 Comparing therelativeand absoluteamount of points perLOD forour test data sets...... 73 6.7 Frame rates achieved for the data sets at differentlevels...... 75 6.8 Rendering theentire buffervs. pageblocks...... 76 6.9 Page replacement(molecule 4v99).Only visible blockswere loaded at a previousposition. Stoppingthe buffer from updatingallows us to interact with the molecule and look at which pages were notneeded in the previous view...... 77 6.10 Smoothtransition(molecule 4v99).Wecalculate afinite number of discrete levels of detail, but we caninterpolateall steps in between them...... 78 6.11 Viewdependentlevel of detail (molecule 4v99).The user can set adistance fromthe camera andthreshold. The resolution of themolecule is reduced in areas far from thecurrentcamera position, saving renderingtime...... 85 6.12 Large test datasets ...... 86 6.13 Artificial dataset with 50 instances of 3j3q ...... 86 6.14 6ncl with differentenhancementeffects...... 87 6.15 6o2s with depth of fieldeffects ...... 88 6.16 Gaussian surface (5odv) ...... 89 6.17 Gaussian surface (5odv) ...... 89 6.18 Resultsachieved in relatedwork ...... 90

List of Tables

6.1 Numbers of atoms perlevel of detail ...... 74 6.2 Numbers of atoms perlevel of detail ...... 79 6.3 Dataset at differentviewsand LODs ...... 79 6.4 Rendering details for Figure 6.14 ...... 81 6.5 Rendering details for Figure 6.14 ...... 82

99

Bibliography

[AGL05] James Ahrens, Berk Geveci, and Charles Law. Paraview:Anend-usertool forlarge data visualization.InThe VisualizationHandbook,pages 717–731, 2005. doi: 10.1016/B978-012387582-2/50038-1.

[BDST04] ChandrajitBajaj,Peter Djeu, Vinay Siddavanahalli, and AnthonyThane. Texmol: Interactivevisual explorationoflarge flexible multi-component molecular complexes. In ProceedingsofIEEE Visualization,pages 243–250, 2004. doi: 10.1109/VISUAL.2004.103.

[Ben75] Jon LBentley.Multidimensional binarysearchtrees usedfor associative searching. Communications of the ACM,18(9):509–517,1975. doi: 10. 1145/361002.361007.

[BG07] Stefan Bruckner and Eduard Gröller. Enhancing depth-perception with flexiblevolumetrichalos. IEEE Transactions on Visualization and Computer Graphics,13(6):1344–1351, 2007. doi: 10.1109/TVCG.2007.70555.

[BHOM13] Mike Bukowski, PadraicHennessy, Brian Osman, and Morgan McGuire. The skylandersswapforce depth-of-field shader.InGPU Pro4:Advanced Rendering Techniques,pages 175–186, 2013.

[BHP15] JohannaBeyer, MarkusHadwiger, andHanspeter Pfister.State-of-the-art in gpu-based large-scale volumevisualization. Computer Graphics Forum, 34(8):13–37, 2015. doi: 10.1111/cgf.12605.

[Bli82] James FBlinn. Ageneralizationofalgebraic surface drawing. ACM Transactions on Graphics,1(3):235–256, 1982. doi: 10.1145/357306. 357310.

[Bru19] Stefan Bruckner.Dynamic visibility-driven molecular surfaces. Computer Graphics Forum,38(2):317–329, 2019. doi: 10.1111/cgf.13640.

[BS09] LouisBavoil and MiguelSainz. Multi-layer dual-resolutionscreen-space ambientocclusion. In SIGGRAPHTalks,pages 45–45, 2009. doi: 10. 1145/1597990.1598035.

101 [BWF+00] Helen M. Berman,John Westbrook,Zukang Feng,Gary Gilliland, T. N. Bhat,HelgeWeissig,IlyaN.Shindyalov, and PhilipE.Bourne.The protein databank. NucleicAcids Research,28(1):235–242, 2000. doi: 10.1093/nar/28.1.235.

[CHP+79] CharlesCsuri, Ron Hackathorn,Richard Parent, Wayne Carlson, and Marc Howard. Towards an interactivehigh visual complexityanimation system. In Proceedings of ACMSIGGRAPH,pages 289–299, 1979. doi: 10.1145/965103.807458.

[CLK+11] Matthieu Chavent,Bruno Lévy,Michael Krone,Katrin Bidmon,Jean- Philippe Nominé, Thomas Ertl, and Marc Baaden.Gpu-powered tools boost molecular visualization. Briefings in Bioinformatics,12(6):689–701, 2011. doi: 10.1093/bib/bbq089.

[CMB13] Paul ACraig, Lea Vacca Michel, and Robert CBateman. Asurveyofeduca- tional uses of molecular visualization freeware. Biochemistry andMolecular Biology Education,41(3):193–205, 2013. doi: 10.1039/B5RP90005K.

[DVVC19] David Dubbeldam, JocelyneVreede, Thijs JH Vlugt,and Sofia Calero. Highlights of (bio-) chemicaltools and visualization software forcomputa- tional science. Current OpinioninChemicalEngineering,23(1):1–13, 2019. doi: 10.1016/j.coche.2019.02.001.

[Ede99] Herbert Edelsbrunner. Deformable smooth surface design. Discrete &Com- putationalGeometry,21(1):87–115, 1999. doi: 10.1007/PL00009412.

[FAW10] Roland Fraedrich,Stefan Auer, and Rudiger Westermann.Efficienthigh- qualityvolume renderingofsph data. IEEE Transactions on Visualization andComputer Graphics,16(6):1533–1540, 2010. doi: 10.1109/TVCG. 2010.148.

[FGE10] MartinFalk,SebastianGrottel, and Thomas Ertl.Interactiveimage-space volumevisualizationfor dynamic particlesimulations.InProceedings of SIGRAD,pages 35–43, 2010.

[FK03] Randima Fernando andMark JKilgard. TheCgTutorial: Thedefini- tive guide to programmable real-time graphics.Addison-Wesley Longman Publishing Co.,Inc., 2003.isbn: {9780321194961}.

[FKE13] MartinFalk, Michael Krone, and Thomas Ertl.Atomistic visualization of mesoscopicwhole-cell simulations using ray-casted instancing. Computer Graphics Forum,32(8):195–206, 2013. doi: 10.1111/cgf.12197.

[Fra02] Eric Francoeur.Cyruslevinthal, the klugeand the origins of interactive molecular graphics. Endeavour,26(4):127–131, 2002. doi: 10.1016/ S0160-9327(02)01468-0.

102 [FSW09] Roland Fraedrich,Jens Schneider, and Rüdiger Westermann. Exploring themillennium run-scalable renderingoflarge-scalecosmological datasets. IEEETransactionsonVisualizationand Computer Graphics,15(6):1251– 1258, 2009. doi: 10.1109/TVCG.2009.142.

[GIK+07] Christiaan PGribble, Thiago Ize, Andrew Kensler, Ingo Wald,and Steven G Parker. Acoherentgrid traversal approachtovisualizingparticle-based sim- ulation data. IEEE Transactions on Visualizationand Computer Graphics, 13(4):758–768, 2007. doi: 10.1109/TVCG.2007.1059.

[GKM+14] Sebastian Grottel,Michael Krone, Christoph Müller, Guido Reina, and ThomasErtl. Megamol—a prototyping framework forparticle-basedvi- sualization. IEEETransactions on Visualizationand Computer Graphics, 21(2):201–214, 2014. doi: 10.1109/TVCG.2014.2350479.

[GKSE12] Sebastian Grottel, MichaelKrone, Katrin Scharnowski, andThomas Ertl. Object-space ambientocclusion formolecular dynamics. In Proceedings of IEEE PacificVis,pages 209–216, 2012. doi: 10.1109/PacificVis. 2012.6183593.

[GNL+15] Dongliang Guo, Junlan Nie, Meng Liang, Yu Wang,Yanfen Wang,and Zhengping Hu. View-dependent level-of-detail abstraction for interactive atomistic visualization of biological structures. Computers&Graphics, 52(C):62–71, 2015. doi: 10.1016/j.cag.2015.06.008.

[GRDE10] Sebastian Grottel,Guido Reina,Carsten Dachsbacher, and Thomas Ertl. Coherentculling and shading forlarge molecular dynamics visualiza- tion. Computer Graphics Forum,29(3):953–962, 2010. doi: 10.1111/j. 1467-8659.2009.01698.x.

[Hay10] Callum Hay. Gaussian blurshader (glsl),2010. Accessed: 2020-07-28, https://callumhay.blogspot.com/2010/09/ gaussian-blur-shader-glsl.html.

[HDS96] William Humphrey,AndrewDalke, and Klaus Schulten. Vmd: visual moleculardynamics. Journal of Molecular Graphics,14(1):33–38, 1996. doi: 10.1016/0263-7855(96)00018-5.

[HE03] Matthias Hopf and Thomas Ertl.Hierarchicalsplatting of scattered data. In ProceedingsofIEEE Visualization,pages 433–440, 2003. doi: 10.1109/ VISUAL.2003.1250404.

[Her06] AngelHerraez. Biomolecules in the computer:Jmoltothe rescue. Bio- chemistry and MolecularBiologyEducation,34(4):255–261, 2006. doi: 10.1002/bmb.2006.494034042644.

103 [HKG+17] Pedro Hermosilla, MichaelKrone, Victor Guallar, Pere-PauVázquez, Àlvar Vinacua, and Timo Ropinski. Interactive gpu-based generation of solvent- excluded surfaces. TheVisualComputer,33(6):869–881, 2017. doi: 10. 1007/s00371-017-1397-2. [HKK07] Takahiro Harada, Seiichi Koshizuka, and YoichiroKawaguchi.Sliced data structure forparticle-basedsimulationsongpus. In Proceedings of GRAPHITE,pages 55–62, 2007. doi: 10.1145/1321261.1321271. [Hof65] August Hoffmann. On the combiningpower of atoms. In Proceedingsofthe RoyalInstitution,pages 401–430, 1865. [HSS+05] MarkusHadwiger,ChristianSigg,HenningScharsach, Khatja Bühler, and MarkusGross. Real-timeray-castingand advancedshadingofdiscrete isosurfaces. ComputerGraphics Forum,24(1):303–312, 2005. doi: 10. 1111/j.1467-8659.2005.00855.x. [HVS04] XuejunHao, AmitabhVarshney,and Sergei Sukharev. Real-time visualiza- tion of large time-varyingmolecules. In Proceedingsofthe High-Performance Computing Symposium,pages 109–114, 2004. [IWR+17] Mohamed Ibrahim, PatrickWickenhäuser, Peter Rautek, Guido Reina, and Markus Hadwiger. Screen-spacenormal distribution functioncaching for consistentmulti-resolutionrenderingoflarge particledata. IEEE Transactions on Visualizationand Computer Graphics,24(1):944–953, 2017. doi: 10.1109/TVCG.2017.2743979. [JH14] Graham TJohnson and Samuel Hertig. Aguide to the visual analysis and communication of biomolecular structural data. Nature Reviews Molecular CellBiology,15(10):690–698, 2014. doi: 10.1038/nrm3874. [JJS05] LorettaLJones, Kenneth DJordan, and Neil AStillings. Molecular visual- izationinchemistry education: the role of multidisciplinary collaboration. Chemistry Education Research and Practice,6(3):136–149, 2005. [KAD+06] Richard Keiser,Bart Adams, PhilipDutré, Leonidas JGuibas, and Mark Pauly.Multiresolutionparticle-based fluids. Technicalreport, Swiss Federal Institute of Technology Zurich, Department of ComputerScience, 2006.doi: 10.3929/ethz-a-006780981. [KGE11] MichaelKrone,Sebastian Grottel, and Thomas Ertl.Parallelcontour- buildup algorithmfor themolecular surface. In ProceedingsofIEEEBioVis, pages17–22, 2011. doi: 10.1109/BioVis.2011.6094043. [KKF+17] Barbora Kozlíková, Michael Krone, Martin Falk, Norbert Lindow, Marc Baaden, Daniel Baum, Ivan Viola, JuliusParulek, and Hans-Christian Hege. Visualization of biomolecular structures: Stateofthe art revisited. Computer Graphics Forum,36(8):178–204, 2017. doi: 10.1111/cgf.13072.

104 [KSES12] Michael Krone, John EStone, Thomas Ertl, and KlausSchulten. Fast visualization of gaussian densitysurfaces formolecular dynamicsand parti- cle systemtrajectories. In Proceedings of EuroVis(ShortPapers),pages 67–71, 2012. doi: 10.2312/PE/EuroVisShort/EuroVisShort2012/ 067-071.

[KW03] JensKrugerand Rüdiger Westermann. Acceleration techniques forgpu- based volumerendering.InProceedings of IEEE Visualization,pages 287–292, 2003. doi: 10.1109/VIS.2003.10001.

[KWN+14] Aaron Knoll, Ingo Wald, Paul Navratil, Anne Bowen, Khairi Reda, Michael EPapka, and KellyGaither.Rbf volumeray casting on mul- ticore and manycore cpus. Computer Graphics Forum,33(3):71–80, 2014. doi: 10.1111/cgf.12363.

[LBH12] Norbert Lindow, Daniel Baum, and Hans-Christian Hege. Interactive renderingofmaterialsand biologicalstructures on atomicand nanoscopic scale. Computer Graphics Forum,31(3pt4):1325–1334, 2012. doi: 10. 1111/j.1467-8659.2012.03128.x.

[LBH14] Norbert Lindow,DanielBaum,and Hans-Christian Hege.Ligand ex- cluded surface: Anew type of molecular surface. IEEETransactions on Visualization and Computer Graphics,20(12):2486–2495, 2014. doi: 10.1109/TVCG.2014.2346404.

[LBPH10] Norbert Lindow, Daniel Baum,Steffen Prohaska, and Hans-Christian Hege.Accelerated visualization of dynamicmolecular surfaces. Computer Graphics Forum,29(3):943–952, 2010. doi: 10.1111/j.1467-8659. 2009.01693.x.

[LC87] William ELorensen and Harvey ECline.Marching cubes: Ahighresolution 3d surface construction algorithm. ACMSIGGRAPH Computer Graphics, 21(4):163–169, 1987. doi: 10.1145/37401.37422.

[LCL15] Tiantian Liu, Minxin Chen, and BenzhuoLu. Parameterization formolec- ular gaussian surfaceand acomparison study of surfacemesh genera- tion. Journal of MolecularModeling,21(5):113, 2015. doi: 10.1007/ s00894-015-2654-9.

[LGF04] Frank Losasso, FrédéricGibou, andRon Fedkiw.Simulating water and smokewith an octreedatastructure. ACMTransactions on Graphics, 23(3):457–462, 2004. doi: 10.1145/1015706.1015745.

[LH91] David Laur andPat Hanrahan.Hierarchical splatting: Aprogressive refinementalgorithmfor volumerendering. ACMSIGGRAPHComputer Graphics,25(4):285–288, 1991. doi: 10.1145/122718.122748.

105 [LMPSV14] Mathieu Le Muzic, Julius Parulek, Anne-Kristin Stavrum,and Ivan Viola. Illustrativevisualization of molecular reactions using omniscientintelligence and passiveagents. Computer Graphics Forum,33(3):141–150, 2014. doi: 10.1111/cgf.12370.

[LO11] Phillip ALaplante andSeppoJOvaska. Real-time systems designand analysis:tools forthe practitioner.John Wileyand Sons, 2011. isbn: 978-0470768648.

[Lod00] Harvey F. Lodish. MolecularCellBiology.W.H.Freeman, 2000.isbn: 978-1464109812.

[LPK06] Jun Lee, Sungjun Park, andJee-In Kim. View-dependent rendering of large-scale molecularmodelsusinglevel of detail. In Proceedings of the International ConferenceonHybrid Information Technology,pages 691–698, 2006.

[LR71] ByungkookLee andFrederic MRichards. The interpretationofprotein structures: estimation of staticaccessibility. Journal of Molecular Biology, 55(3):379–400, 1971. doi: 10.1016/0022-2836(71)90324-X.

[M+13] DanielMüllneretal. fastcluster: Fast hierarchical, agglomerativeclustering routines for rand python. Journal of Statistical Software,53(9):1–18, 2013. doi: 10.18637/jss.v053.i09.

[Mar65] James Martin. Design of real-time computer systems.Prentice Hall,1965. isbn:978-1114207875.

[Max04] Nelson Max.Hierarchical molecular modelling with ellipsoids. Journal of Molecular Graphicsand Modelling,23(3):233–238, 2004. doi: 10.1016/j. jmgm.2004.07.001.

[MCG03] Matthias Müller,David Charypar, and Markus Gross. Particle-based fluid simulation for interactive applications.InProceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation,pages 154– 159, 2003. doi: 10.2312/SCA03/154-159.

[MEK+17] NickMatthews,Robert Easdon,AkioKitao,StevenHayward,and Stephen Laycock. High quality rendering of proteindynamicsinspace fillingmode. Journal of Molecular Graphics and Modelling,78(1):158–167, 2017. doi: 10.1016/j.jmgm.2017.09.017.

[MOBH11] Morgan McGuire,BrianOsman,Michael Bukowski,and Padraic Hennessy. The alchemyscreen-space ambient obscurance algorithm. In Proceedingsof the ACMSIGGRAPH Symposium on HighPerformanceGraphics,pages 25–32, 2011. doi: 10.1145/2018323.2018327.

106 [MS15] SteveMarschner and Peter Shirley. Fundamentals of computergraphics. CRCPress, 2015. isbn:978-1568814698.

[NJB07] Paul Navratil, Jarrett Johnson, and Volker Bromm.Visualization of cos- mological particle-baseddatasets. IEEE Transactions on Visualization and Computer Graphics,13(6):1712–1718, 2007. doi: 10.1109/TVCG.2007. 70526.

[PB13] Julius Parulekand Andrea Brambilla. Fast blending scheme for molecular surface representation. IEEE Transactions on Visualizationand Computer Graphics,19(12):2653–2662, 2013. doi: 10.1109/TVCG.2013.158.

[PGH+04] Eric FPettersen,ThomasDGoddard,Conrad CHuang,GregorySCouch, DanielMGreenblatt, Elaine CMeng,and Thomas EFerrin. Ucsf chimera—a visualization system for exploratory researchand analysis. Journal of Computational Chemistry,25(13):1605–1612, 2004. doi: 10.1002/jcc. 20084.

[PJR+14] Julius Parulek, Daniel Jönsson, TimoRopinski,Stefan Bruckner, Anders Ynnerman, andIvanViola. Continuous levels-of-detail and visual abstraction forseamlessmolecular visualization.33(6):276–287, 2014. doi: 10.1111/ cgf.12349.

[PV12] Julius Parulek and Ivan Viola. Implicit representation of molecular surfaces. In Proceedings of IEEE PacificVis,pages 217–224, 2012. doi: 10.1109/ PacificVis.2012.6183594.

[PZVBG00] Hanspeter Pfister, Matthias Zwicker, Jeroen VanBaar, andMarkus Gross. Surfels:Surface elements as rendering primitives. In ProceedingsofACM SIGGRAPH,pages 335–342, 2000. doi: 10.1145/344779.344936.

[QEE+05] WeiQiao,David SEbert, Alireza Entezari, Marek Korkusinski, and Gerhard Klimeck. Volqd: Directvolume renderingofmulti-million atomquantum dot simulations.InProceedingsofIEEE Visualization,pages 319–326, 2005. doi: 10.1109/VISUAL.2005.1532811.

[RE05] Guido Reina and Thomas Ertl.Hardware-accelerated glyphs formono-and dipoles in moleculardynamics visualization. In Proceedings of EuroVis, pages 177–182, 2005. doi: 10.2312/VisSym/EuroVis05/177-182.

[Ree83] William TReeves.Particle systems—atechnique formodeling aclassof fuzzy objects. ACMTransactions On Graphics,2(2):91–108,1983. doi: 10.1145/357318.357320.

[RGE19] GuidoReina, PatrickGralka,and Thomas Ertl.Adecade of particle-based scientific visualization. The EuropeanPhysical Journal Special Topics, 227(14):1705–1723, 2019. doi: 10.1140/epjst/e2019-800172-4.

107 [Ric77] Frederic MRichards. Areas, volumes,packing, and proteinstructure. Annual Review of Biophysics and Bioengineering,6(1):151–176,1977. doi: 10.1146/annurev.bb.06.060177.001055.

[RK09] Steffen Raschdorfand Michael Kolonko. Loose octree: adatastructure for the simulation of polydisperse particlepackings. Technical report, Clausthal UniversityofTechnology,2009.

[RL00] Szymon Rusinkiewicz and Marc Levoy.Qsplat: Amultiresolutionpoint renderingsystem forlarge meshes. In Proceedings of ACMSIGGRAPH, pages 343–352, 2000. doi: 10.1145/344779.344940.

[RM00] Paul Readand Mark-Paul Meyer. Restoration of motion picturefilm. Elsevier, 2000. isbn:978-0750627931.

[RT06] Guodong Rong andTiow-Seng Tan. Jump flooding in gpuwith applications to voronoi diagram anddistancetransform. In Proceedingsofthe Symposium on Interactive 3D Graphics and Games,pages 109–116, 2006. doi: 10. 1145/1111411.1111431.

[RTW13] Florian Reichl, Marc Treib, and RüdigerWestermann.Visualizationof big sph simulations via compressed octree grids. In Proceedings of the IEEEInternational ConferenceonBig Data,pages 71–78, 2013. doi: 10.1109/BigData.2013.6691717.

[Sam90] Hanan Samet. The design and analysis of spatial data structures.Addison- Wesley Reading, MA, 1990. isbn:978-0201502558.

[Sch15] LLC Schrödinger.The pymol molecular graphics system,version1.8. 2015.

[SGG15] Joachim Staib,Sebastian Grottel, and Stefan Gumhold. Visualization of particle-based data withtransparency andambient occlusion. Computer Graphics Forum,34(3):151–160, 2015. doi: doi:10.1111/cgf.12627.

[SI94] Hikmet Senayand EveIgnatius. Aknowledge-basedsystem forvisualization design. IEEE ComputerGraphics and Applications,14(6):36–47, 1994. doi: 10.1109/38.329093.

[SKNV04] Ashish Sharma, RajivKKalia, Aiichiro Nakano, and PriyaVashishta. Scal- ableand portable visualizationoflarge atomisticdatasets. Computer Physics Communications,163(1):53–64, 2004. doi: 10.1016/j.cpc.2004.07. 008.

[SMK+16] Karsten Schatz, Christoph Müller, Michael Krone,Jens Schneider, Guido Reina,and ThomasErtl. Interactivevisualexploration of atrillion parti- cles. In Proceedings of the IEEE Symposium on LargeData Analysis and Visualization,pages 56–64, 2016. doi: 10.1109/LDAV.2016.7874310.

108 [SMW95] Roger ASayle andEJames Milner-White.Rasmol:biomoleculargraphics forall. Trends in BiochemicalSciences,20(9):374–376, 1995. doi: 10. 1016/S0968-0004(00)89080-5.

[SVGR15] RobinSkånberg,Pere-PauVazquez, Victor Guallar, and TimoRopinski. Real-time molecular visualization supporting diffuseinterreflections and ambientocclusion. IEEETransactions on Visualizationand Computer Graphics,22(1):718–727, 2015. doi: 10.1109/TVCG.2015.2467293.

[SWBG06] Christian Sigg, TimWeyrich, Mario Botsch, and Markus HGross.Gpu- basedray-castingofquadratic surfaces.InProceedingsofthe Symposium on Point-BasedGraphics,pages 59–65, 2006. doi: 10.2312/SPBG/SPBG06/ 059-065.

[TA96] Maxim Totrovand RubenAbagyan. The contour-buildup algorithm to calculate the analytical molecular surface. Journal of Structural Biology, 116(1):138–143, 1996. doi: 10.1006/jsbi.1996.0022.

[TCM06] Marco Tarini, Paolo Cignoni,and Claudio Montani. Ambient occlusion and edge cueing for enhancing real time molecular visualization. IEEE Transactions on Visualizationand Computer Graphics,12(5):1237–1244, 2006. doi: 10.1109/TVCG.2006.115.

[TL04] Rodrigo Toledo and Bruno Levy.Extending the graphicpipelinewith new gpu-accelerated primitives.Technical report, INRIA, 2004.

[Tra92] AnthonySTravis. Augustwilhelm hofmann (1818–1892). Endeavour, 16(2):59–65, 1992. doi: 10.1016/0160-9327(92)90003-8.

[Tuf83] Edward RTufte. TheVisualDisplayofQuantitative Information.Graphics Press,1983. isbn: 1930824130.

[Tuf97] Edward RTufte. Visualand statistical thinking: Displays of evidencefor making decisions.Graphics PressCheshire, CT,1997. isbn:978-0961392130.

[Tur07] KenTurkowski. Incremental computation of thegaussian.InGPUGems 3,pages 877–890, 2007.

[VdW73] Johannes DiderikVan derWaals. OverdeContinuïteit van den Gas-en Vloeistoftoestand.Sijthoff,1873.

[VDZLBI11] Matthew VanDer Zwan, Wouter Lueks, Henk Bekker, andTobiasIsenberg. Illustrativemolecular visualization with continuous abstraction. Computer Graphics Forum,30(3):683–690, 2011. doi: 10.1111/j.1467-8659. 2011.01917.x.

109 [Wes89] Lee Westover. Interactivevolume rendering. In Proceedings of the Chapel Hill Workshop on Volume Visualization,pages 9–16, 1989. doi: 10.1145/ 329129.329138.

[WIK+06] Ingo Wald, ThiagoIze, Andrew Kensler, Aaron Knoll, and Steven G Parker. Raytracinganimated scenesusing coherent gridtraversal. ACM Transactions on Graphics,25(3):485–493, 2006. doi: 10.1145/1141911. 1141913.

[WKJ+15] Ingo Wald, AaronKnoll, Gregory PJohnson, Will Usher, Valerio Pascucci, and MichaelEPapka. Cpuray tracing largeparticledatawith balanced pkd trees.InProceedings of IEEE SciVis,pages 57–64, 2015. doi: 10. 1109/SciVis.2015.7429492.

[WSB14] Thomas Waltemate,Björn Sommer, and Mario Botsch. Membranemapping: combining mesoscopic and molecular cell visualization. In Proceedings of the Eurographics WorkshoponVisualComputing for Biologyand Medicine, pages89–96, 2014. doi: 10.2312/vcbm.20141187.

[XZY17] XiangyunXiao, ShuaiZhang,and XuboYang. Real-time high-quality surface rendering forlarge scaleparticle-basedfluids. In Proceedings of the ACMSIGGRAPHSymposium on Interactive 3D Graphics and Games, pages1–8, 2017. doi: 10.1145/3023368.3023377.

[ZCW+04] HuabingZhu,TonyKai-Yun Chan,Lizhe Wang, Wentong Cai,and Simon See. Aprototypeofdistributed molecular visualization on computational grids. FutureGeneration Computer Systems,20(5):727–737, 2004. doi: 10.1016/j.future.2003.11.023.

[ZD17] TobiasZirr andCarstenDachsbacher. Memory-efficienton-the-fly voxeliza- tion and rendering of particledata. IEEE Transactions on Visualizationand Computer Graphics,24(2):1155–1166, 2017. doi: 10.1109/TVCG.2017. 2656897.

110