MASARYKOVA UNIVERZITA FAKULTA}w¡¢£¤¥¦§¨  INFORMATIKY !"#$%&'()+,-./012345

Engine for Molecule in a Web Browser

MASTER’S THESIS

Jaromír Svoboda

Brno, spring 2014 Declaration

Hereby I declare, that this paper is my original authorial work, which I have worked out by my own. All sources, references and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source.

Jaromír Svoboda

Advisor: RNDr. David Sehnal

ii Acknowledgement

I would like to thank my supervisor RNDr. David Sehnal for patient guidance and informed advice throughout writing this thesis.

iii Abstract

The main focus of this master’s thesis is the design and implementa- tion of lightweight molecular visualization engine (called LiveMol) in form of a JavaScript library. The engine utilizes widely adopted WebGL API to display GPU-accelerated graphics in web browsers. Due to the size of complex protein molecules, the primary goal is high performance. Furthermore, the design of LiveMol enables users to extend the core functionality by simply defining custom coloring schemes of molecule models or implement completely new visual- ization modes.

iv Keywords molecular visualization, protein, secondary structure, WebGL, LiveMol

v Contents

I INTRODUCTION 1 1 Introduction ...... 2 II CURRENT DEVELOPMENTS 3 2 Current Molecular Visualization Software ...... 4 2.1 Desktop Applications ...... 4 2.1.1 Visual Molecular Dynamic ...... 4 2.1.2 PyMOL ...... 4 2.1.3 RasMol/OpenRasMol ...... 5 2.1.4 BALL/BALLView ...... 5 2.1.5 ...... 5 2.1.6 QuteMol ...... 5 2.1.7 ...... 6 2.2 JavaScript-based Web Applications ...... 6 2.2.1 /JSmol ...... 6 2.2.2 ChemDoodle ...... 6 3 WebGL and Related Technologies ...... 7 3.1 HTML5 ...... 7 3.2 OpenGL ES 2.0 ...... 7 3.3 WebGL ...... 8 3.3.1 Security ...... 8 3.4 Relevant JavaScript Features ...... 9 3.4.1 RequestAnimationFrame ...... 9 3.4.2 Typed Arrays ...... 9 3.4.3 Web Workers ...... 10 III THEORY 11 4 Chemical Background ...... 12 4.1 Molecule Structure ...... 12 Atom ...... 12 Bond ...... 12 Molecule ...... 12 Amino Acid ...... 12 Protein ...... 13 4.2 Protein Structure ...... 13 Primary Structure ...... 13

vi Secondary Structure ...... 13 Tertiary Structure ...... 13 Quaternary Structure ...... 14 5 Molecular Visualization Methods ...... 15 5.1 Van der Waals Method ...... 15 5.2 Balls and Sticks ...... 16 5.3 Sticks ...... 16 5.4 Lines ...... 17 5.5 Ribbon ...... 17 5.6 Cartoon ...... 18 6 Transformations ...... 19 6.1 Linear Transformation ...... 19 6.1.1 Vector Space ...... 19 6.1.2 Linear Map ...... 19 6.2 Affine Transformation ...... 20 6.2.1 Affine Space ...... 20 6.2.2 Affine Map ...... 21 Essential 3D Matrix Transformations ...... 21 7 Splines ...... 23 7.1 Bézier Curve ...... 23 7.2 Bézier Spline ...... 23 7.3 Continuity ...... 24 7.4 Frenet-Serret formulas ...... 25 IV IMPLEMENTATION 26 8 Tools ...... 27 8.1 TypeScript ...... 27 8.2 Three.js ...... 27 8.3 jQuery ...... 28 9 Design ...... 29 9.1 Input Format ...... 29 Atoms ...... 30 Bonds ...... 30 Residues ...... 30 Helices ...... 30 Sheets ...... 31 9.2 Internal Architecture ...... 31 9.2.1 General Design ...... 31

vii 9.2.2 Asynchronous Model Generation ...... 32 9.2.3 Materials and Shaders ...... 33 9.3 Visualization Modes ...... 34 9.4 Extensibility ...... 36 9.4.1 Theme ...... 36 9.4.2 Visualization Mode ...... 37 9.4.3 Extension Example: Tunnels Visualization . . . 37 V RESULTS AND DISCUSSION 39 10 Performance Tests ...... 40 10.1 Model Generation ...... 40 10.2 Frame Rate ...... 42 10.3 Memory Consumption ...... 43 11 Limitations ...... 45 11.1 Memory ...... 45 11.2 Mesh Generation Time ...... 45 11.3 Highlighting ...... 46 12 Potential Extensions ...... 47 12.1 Additional Visualization Modes ...... 47 12.2 Complex Shaders ...... 47 12.3 Non-interactive Rendering ...... 47 VI CONCLUSION 48 13 Conclusion ...... 49 VII APPENDIX 50 A Compilation ...... 51 A.1 Command-line TypeScript Compiler ...... 51 A.2 Visual Studio 2013 ...... 52 B Attached Files ...... 53

viii PART I

INTRODUCTION 1 Introduction

The considerable technological advancements of the last decades en- abled completely new approaches to many problems and gave rise to several scientific disciplines — many of them using information technology to store, retrieve, organize and analyze information in unprecedented way. Bioinformatics, one of the most successful such fields, applies the methods to biological data, primarily molecular- level structures such as proteins. A major achievement in biology, facilitated by technological prog- ress, concerns experimentally determining (and making publicly ac- cessible) the structure of many proteins, nucleic acids, etc. For in- stance, accumulates a wealth of such data — more than 100 000 structures at the time of writing of this thesis. Since the molecular structure is typically characterized by coordinates of indi- vidual atoms, human comprehension requires some kind of visual- ization. The thesis aims to design and develop a lightweight molecular (especially protein) visualization engine called LiveMol capable of displaying molecular data as well as calculated protein features such as tunnels. Furthermore, it is required to run in major web browsers (without any third-party plug-ins) and needs to be sufficiently ef- ficient to process even very large proteins. To that end, the engine utilizes WebGL (Web Graphics Library), whose recent wide-spread adoption and outstanding performance make it a viable option. The first part introduces current molecular visualization software — the traditional desktop applications as well as the web-based al- ternatives. An overview of key technologies such as JavaScript and WebGL follows. In the theoretical part, I briefly mention the chemical and biolog- ical components the library works with, various visualization meth- ods and relevant math concepts. The chapters concerning implementation present essential tools used throughout the development and the library design including features such as implemented visualization modes or extensibility. The final part gives an account of the engines performance, its limitations and potential expansions.

2 PART II

CURRENT DEVELOPMENTS 2 Current Molecular Visualization Software

In following sections I present a brief overview of most popular cur- rent applications used for viewing, analyzing or modifying molecu- lar structures.

2.1 Desktop Applications

Unless noted otherwise, undermentioned applications support all major desktop platforms (Windows, and Mac OS X).

2.1.1 Visual Molecular Dynamic Often abbreviated as VMD, Visual [1] belongs to the most advanced programs. It is distributed free of charge and open-sourced under UIUC Open Source License [2]. VMD can pro- cess a wide variety of formats (more then 60 of them) thanks to nu- merous built-in parsing plug-ins as well as visualization modes and collection of structure manipulation and visualization tools. Authors provide a guide of the software architecture and source code, facilitating creation of additional modules (VMD is written in C++) and plug-ins (using Tcl/Tk and Python). Besides adding sup- port for other file formats, users’ plug-ins can implement new user interface features, molecule modifications, visualization methods, analyses and simulations. Extensive set of plug-ins is available at [3].

2.1.2 PyMOL Despite somewhat nonintuitive GUI (Graphical User Interface), Py- MOL [4] is perhaps the most popular molecular visualization soft- ware. The source code is available under Python License [5], exe- cutables are free for students and teachers. Other users can either purchase commercial subscription (including support) or compile an executable from source code. It primarily generates high-quality im- ages and animations but features many tools for analysis and edit- ing of molecules too. The popularity stems from the extensibility of the program as well, although the application is written mostly in

4 2. CURRENT MOLECULAR VISUALIZATION SOFTWARE

C/C++, the embedded Python interpreter allows for easy scripting and plug-in creation. Multitude of third-party plug-ins exist, some of the most notable ones are: CAVER, GPSSpyMOL and MOLE 2.0 [6].

2.1.3 RasMol/OpenRasMol

Originally developed in 1995, RasMol [7] pioneered the molecular visualization software on desktop. The last version (2009) offers all standard visualization modes either under the terms of RASMOL li- cense [8] or the GNU General Public License [9] .

2.1.4 BALL/BALLView

BALL [10] (Biochemical Algorithms Library) C++ framework con- sists of various algorithms, data structures and classes useful for de- veloping biochemical applications — file import/export, analysis, modification, visualization, python scripting interface, etc. BALLView utilizes QT GUI toolkit to create stand-alone molecule visualization application using BALL framework. It features many advanced visualization methods as well as molecule editing tools and Python scripting. Both BALL framework and BALLView appli- cation are open source and licensed under the GNU Lesser General Public License [11].

2.1.5 Gabedit

Gabedit [12] incorporates several bioinformatics software packages (MOLPRO, , GAMESS, Orca, OpenMopac, Q-Chem, etc.) as a common GUI under permissive BSD License [13]. It includes tools for visualization, molecule building and analysis.

2.1.6 QuteMol

Although slightly dated (last stable release in 2007) and not sup- ported on Linux, QuteMol [14] offers lightweight high quality visu- alizations of molecules in PDB file format under the GNU General Public License [9].

5 2. CURRENT MOLECULAR VISUALIZATION SOFTWARE

2.1.7 Avogadro Primarily a molecular editor, Avogadro [15] has all the standard fea- tures such as various visualization modes, wide import/export for- mat support, plug-in architecture and Python scripting. It is licensed under the GNU General Public License [9].

2.2 JavaScript-based Web Applications

2.2.1 Jmol/JSmol Jmol [16] differs from the majority of aforementioned applications in that it is written in Java, making it not only cross-platform by default but also suitable for web deployment via Java applets. It is much more lightweight than many of the preceding programs as well, hav- ing virtually no features besides molecule visualization. A fall-back solution (JSmol) aimed at platforms not supporting Java (iPad, iPhone) or Java applets (Android) was recently imple- mented in JavaScript using HTML5/WebGL. The desktop application, Java applet, JavaScript alternative and a module that can be integrated into any Java application (JmolViewer) are free and open-source under GNU Lesser General Public License [11].

2.2.2 ChemDoodle ChemDoodle [17] application accentuates yet another approach — chemical drawing and publishing. It offers tools for 2D sketching of molecules, chemical reactions, glassware and calculating several properties of drawn objects. From the point of view of this thesis the associated JavaScript library (ChemDoodle Web Components) is more noteworthy. Simi- larly to JSmol, it utilizes WebGL to render 3D visualizations of molec- ular data without any third-party plug-ins. Unlike the commercial desktop ChemDoodle, Web Components toolkit is released under the GNU General Public License [9] and available free of charge.

6 3 WebGL and Related Technologies

WebGL [18] (Web Graphics Library) enables GPU-accelerated (GPU stands for ) 3D rendering in web browser without any third-party plug-ins.

3.1 HTML5

WebGL is not a part of HTML5 (HyperText Markup Language ver- sion 5) standard [19][20] (although first of the two competing specifi- cations mentions it), but strongly relies upon it. Among many other improvements to HTML, the standard introduces new el- ement. The canvas offers two contexts: 2D providing methods and properties to draw and manipulate raster graphics and 3D exposing WebGL API (Application Programming Interface), both of them as a DOM (Document Object Model) interfaces.

3.2 OpenGL ES 2.0

Both WebGL OpenGL ES (Open Graphics Library for Embedded Sys- tems) standards are developed and maintained by Khronos Group [21]. The WebGL design builds upon OpenGL ES 2.0 [22] standard, an adaptation of OpenGL suited specifically for mobile or otherwise restricted platforms (low performance, limited battery life, etc.). The 2.0 version of OpenGL ES introduces essential changes to rendering pipeline, comparable to the transition from OpenGL 3.0 to 3.1 — most notably the transition from fixed-function pipeline to a programmable pipeline. While the programmable pipeline gives the developer much greater control over rendering, it requires writing shaders1, making the development more difficult and less intuitive.

1. Programs written in GLSL (OpenGL Shading Language) running on GPU, called for every vertex and pixel.

7 3. WEBGL AND RELATED TECHNOLOGIES 3.3 WebGL

WebGL [23][24] closely complies with the OpenGL ES version 2.02, with minor differences due to in-browser restrictions and JavaScript memory management. Similarly to OpenGL, WebGL provides rather low-level functionality making it flexible and powerful but cumber- some at the same time. Some of the most popular libraries abstracting away from the rendering details are Three.js, PhiloGL and GLGE. Nowadays, all major web browsers implement WebGL standard (Chrome 9+, Firefox 4+, Opera 12+, Safari 5.1+, Internet Explorer 11+). Most browsers adopted WebGL early on, until recently Inter- net Explorer was the only exception. Microsoft Security Response Center (MSRC) expressed some worries concerning WebGL security [25] and advised against widespread WebGL support, others even demonstrated specific security issues [26].

3.3.1 Security WebGL directly exposes hardware functionality to web in previously unheard-of way. Various vulnerabilities may occur in OEM (Orig- inal Equipment Manufacturer) components. Suggested solution of blacklisting (and blocking) demonstrably unsafe hardware/software configurations would disrupt user experience according to MSRC re- port. Safe web browsing would in fact require latest GPU drivers. Moreover, the and GPU infrastructure wasn’t designed to be resistant to malicious shaders and geometries, it pre- sumes trusted applications. Attacked-supplied shaders can take so long to render as to effectively cause a DoS (Denial of Service) attack. At the same time, access to buffers containing rendered data wasn’t originally restricted leading to privacy breach concerns. WebGL took several measures to mitigate the worries. It enforces some additional restrictions on shaders (compared to OpenGL ES 2.0) to prevent out-of-range array accesses. Cross-Origin Resource Sharing requires explicit permission from client application as well as server. Nevertheless, individual browsers are advised to imple- ment safeguards against unacceptably long rendering times. The DoS

2. The current working draft of WebGL 2.0 follows changes made in OpenGL ES version 3.0. 8 3. WEBGL AND RELATED TECHNOLOGIES

attack possibility remains the only known major vulnerability [27].

3.4 Relevant JavaScript Features

The popularity of JavaScript language caused it to be used even in scenarios it wasn’t originally designed for, WebGL being perhaps the most demanding.

3.4.1 RequestAnimationFrame Animation or interactivity of any kind depends upon frequent screen refresh. JavaScript developers traditionally used setInterval3 or setTimeout4 methods to repeatedly execute the redrawing routine. However, wide spread of interactive web sites proved the two afore- mentioned methods not equal to the task. Nowadays, vast majority of browsers offer an alternative method requestAnimationFrame5 [28] especially for animation purposes. While the implementation details differ from browser to browser, the improvements (over setInterval or setTimeout methods) gen- erally include: synchronization with other animations, suspending the animation in background tabs, adjusting frequency to display re- fresh rate and preventing queuing of old frames waiting to be ren- dered. The animations produced by requestAnimationFrame are overall smoother as well as more optimized for performance and bat- tery life.

3.4.2 Typed Arrays To mitigate bad performance of JavaScript language while communi- cating with other APIs (such as WebGL) through binary data (caused by multiple type conversions along the way), Khronos Group de- signed a standard allowing JavaScript to handle binary data with well-known byte layouts directly. The new ArrayBuffer [29] type

3. Calls a function repeatedly with fixed delay. 4. Calls a function after a specified time interval. 5. Request the browser to call a function before the next screen redraw.

9 3. WEBGL AND RELATED TECHNOLOGIES representing generic fixed-length binary buffer can be accessed as an array of specific type6 by creating particular view.

3.4.3 Web Workers Complex applications often require performing computationally in- tensive tasks. Their execution in the main thread (simultaneously handling user interface) naturally implies loss of interactivity (i.e. unresponsive GUI). Although JavaScript lacks language-level multi- threading, major browsers implement Web Workers interface [30], allowing separate script files (with another context) to run in back- ground. Nevertheless, since Web Workers aren’t a language-level feature (rather they are an API implemented by browser), they have two ma- jor drawbacks: background scripts operate in separate context and they communicate with the main thread through serialized data7 (causing memory overhead).

6. 8-bit, 16-bit or 32-bit long signed or unsigned integers or 32-bit or 64-bit long floats. 7. There are exceptions, some browsers enable directly passing on binary data (a Typed Array instance) by simple reference change.

10 PART III

THEORY 4 Chemical Background

In the following section I present the essential chemical structures [31] the engine is designed to visualize.

4.1 Molecule Structure

Atom The smallest unit of chemical element, consisting of protons, neu- trons (together forming central, dense and positively charged nu- cleus) and electrons (surrounding nucleus).

Bond An electromagnetic attraction between electrons and nuclei or be- tween dipoles giving rise to multi-atomic molecules. The most com- mon types of bonds are relatively strong (intramolecular) covalent, metallic, ionic and comparatively weaker dipole-dipole interactions, hydrogen bond and van der Waals force.

Molecule A group of atoms held together by strong bond(s), a smallest particle of chemical compound. Strictly speaking, molecule is a neutral group of atoms, however the term is often used in more relaxed way and applied even to non-neutral groups.

Amino Acid An amino acid [32] consists of amine and carboxylic acid function groups and a variable side-chain (whose structure depends on par- ticular amino acid). Specifically, α-amino acid is such a molecule of 1 formula H2N − CHR − COOH , where NH2 represents the amine group, COOH the carboxylic acid group and R the side-chain.

1. Both amino and carboxylic groups are attached to the same carbon atom.

12 4. CHEMICAL BACKGROUND

Protein A linear chain-like conformation of amino acids (residues) joined by peptide bonds2. The alternating sequence of oxygen, carbon and ni- trogen atoms is called backbone. Protein consists of one or more such chains forming biologically active unit.

4.2 Protein Structure

Proteins [32] vary greatly in size (ranging from hundreds to hun- dreds of thousands atoms), consequently several levels of abstraction are used to describe their features.

Primary Structure Primary structure refers to the unique sequence of amino acids mak- ing up the protein. It is usually recorded as a sequence of amino acid abbreviations starting at the amino terminus3 (N-terminus) and end- ing at the carboxyl terminus (C-terminus).

Secondary Structure The amino acid chain folds over itself forming either definite sec- ondary structures (stabilized by hydrogen bonds) or random coils. The two most common secondary structures found in proteins are α- helix (right-handed coil, 3.6 residues per turn) and β-sheet (several parallel or anti-parallel β-strands4).

Tertiary Structure In order to attain its biological function, the secondary structures spa- tially arrange into rather stable overall shape (preserved by disulfide bridges and hydrogen bonds). The spatial configuration of all atoms

2. Peptide bond is a covalent bond formed through release of water molecule composed of H+ from amine group and OH− from carboxylic group of adjacent amino acids. 3. The termini are named based on the free group. 4. Almost fully extended stretch of backbone, 3 to 10 amino acids in length.

13 4. CHEMICAL BACKGROUND in molecule (tertiary structure) is described by their Cartesian coor- dinates.

Quaternary Structure The quaternary structure characterizes spatial arrangement of chains in case the protein contains more than one chain.

14 5 Molecular Visualization Methods

The chapter briefly introduces the most common ways to visual- ize molecules. The presented methods show secondary and tertiary structure of proteins; visualizing surfaces, tunnels, cavities, etc. is be- yond the scope of the thesis.

5.1 Van der Waals Method

Van der Waals method represents each atom of molecule as a hard sphere. Chemical element of the atom determines both color and ra- dius of the sphere — color according to some color scheme1, radius as the van der Waals radius of the element. Since covalent bond is generally shorter than the sum of corre- sponding Van der Waals radii, overlapping spheres signify an exis- tence of bond between the two atoms. Overall, van der Waals method illustrates the volume of molecule but hides the inner structure.

Figure 5.1: Van der Waals (sphere) representation of microsomal cy- tochrome P450 3A4[34] (PDB ID 1TQN) molecule generated by Py- Mol 1.7.0

1. A predetermined element/color dictionary, for instance the CPK coloring [33].

15 5. MOLECULAR VISUALIZATION METHODS 5.2 Balls and Sticks

Balls and sticks model depicts atoms as hard spheres colored with respect to element (same as the van der Waals method) of either con- stant radius (across elements) or radius determined by correspond- ing van der Waals radius2. In contrast to the van der Waals method, spheres don’t overlap and the inner structure isn’t hidden. Bonds are represented by cylinders of smaller radius than any of the spheres, usually split in half with each part colored same as the adjacent sphere.

5.3 Sticks

Unlike the two previous methods, sticks visualization mode doesn’t display atoms. Their positions and elements are nonetheless appar- ent since all bonds are visualized the same way as in the case of balls and sticks method. The only difference being that it uses capsules (cylinders with hemispherical endings) instead of plain cylinders. The resulting model still conveys all the information available (except atoms without any bond) but is less cluttered.

Figure 5.2: Balls and sticks rep- Figure 5.3: Sticks representa- resentation of 1TQN molecule tion of 1TQN molecule gener- generated by PyMol 1.7.0 ated by PyMol 1.7.0

2. A fraction thereof.

16 5. MOLECULAR VISUALIZATION METHODS 5.4 Lines

Similarly to sticks method, lines model omits atoms and depicts only bonds — this time using plain lines, once again split in half and col- ored according to corresponding atoms. Such visualization makes depth perception difficult (thin lines have no shading, their intersec- tions are ambiguous), on the other hand it’s much less performance- intensive.

5.5 Ribbon

Whereas previously mentioned methods can visualize molecules in general, ribbon visualization [35] (or alpha trace) is intended specif- ically for proteins. It shows only bonds between respective C-alpha carbon atoms (either as cylinders or lines) capturing the basic spatial arrangement of backbone chain while hiding other details. Alternatively, a smooth curve can be interpolated through the backbone instead of displaying individual bonds.

Figure 5.4: Lines representa- Figure 5.5: Ribbon representa- tion of 1TQN molecule gener- tion of 1TQN molecule gener- ated by PyMol 1.7.0 ated by PyMol 1.7.0

17 5. MOLECULAR VISUALIZATION METHODS 5.6 Cartoon

Cartoon mode [35][36] improves on ribbon by highlighting the two most prevalent secondary structures using special shapes: cylindri- cal spiral ribbons to indicate α-helices and thick arrows (showing direction3 and twist of the strand) to indicate β-sheets. Sections of backbone forming neither α-helix nor β-sheet are represented by thin tube. Optionally, atoms of non-standard residues may be visualized using lines or sticks.

Figure 5.6: Cartoon representation of 1TQN molecule generated by PyMol 1.7.0

3. By convention, the backbone is oriented from amino terminus to carboxyl ter- minus.

18 6 Transformations

The aforementioned visualization methods generally fall into two categories: those consisting of many instances of primitive objects (e.g. van der Waals, sticks) an those created by extruding a shape along a spline (e.g. cartoon). The first category uses matrix transfor- mations extensively to distribute the primitives.

6.1 Linear Transformation

6.1.1 Vector Space Vector space [37][38] over field F is a set V together with operations +, · satisfying following conditions for all elements x, y, z ∈ V and a, b ∈ F :

x + y = y + x (commutativity) x + (y + z) = (x + y) + z (vector addition associativity) x + 0 = 0 + x = x (additive identity) x · 1 = 1 · x = x (multiplicative identity) x + (−x) = 0 (additive inverse) a · (b · x) = (a · b) · x (scalar multiplication associativity) (a + b) · x = a · x + b · x (scalar sum distributivity) a · (x + y) = a · x + a · y (vector sum distributivity)

6.1.2 Linear Map Linear map (or linear transformation) is a function f : V → W , where V,W are vector spaces over field F , if following conditions hold for any x, y ∈ V and a ∈ F :

f(x + y) = f(x) + f(y) (additivity) f(a · x) = a · f(x) (degree 1 homogeneity)

19 6. TRANSFORMATIONS

If both V and W are finite-dimensional and have defined bases, any linear transformation f : V → W can be expressed as a matrix, a suitable format for computation (e.g. composing linear transforma- tions simply by matrix multiplication). For example, linear mapping Rn → Rm corresponds to matrix multiplication A·v, where A is m×n matrix and v is the mapped vector. However, from the homogeneity condition (when a = 0) it fol- lows that the zero element of V (0V ) is mapped to zero element of W (0W ):

f(0V ) = f(0 · 0V ) = 0 · f(0V ) = 0W

Substituting the abstract notion of vector space by a coordinate space (R3 for instance), linear map fixes the origin — translation can’t therefore be represented as a linear transformation.

6.2 Affine Transformation

Affine transformations [39] alleviate the previously mentioned limi- tation of linear transformations (i.e. the fixed origin).

6.2.1 Affine Space

Affine space A with an underlying vector space V is a set together with a map V ×A → A, (v, a) 7→ v +a satisfying following conditions for all elements x, y ∈ V and a ∈ A:

x + (y + a) = (x + y) + a (associativity) 0 + a = a (left identity) V → A : v 7→ v + a is bijective (uniqueness)

Informally, affine space differs from vector space in that it lacks distinguished origin point.

20 6. TRANSFORMATIONS

6.2.2 Affine Map

Affine map is a function f : A → B (from affine space A to affine space B) if transformation g:

f(Q) − f(P ) = g(Q − P )

is linear for any P,Q ∈ A. In other words, affine transformation consists of linear transfor- mation and translation — it doesn’t have to map origin to origin. Trivially, affine transformation f of vector x to vector y can therefore be represented as: y = f(x) = A · x + b (6.1) where A is linear transformation and b is translation vector. Using homogeneous1 instead of Cartesian coordinates (e.g. 3D Euclidean space becomes 4D projective space) in form of augmented matrix enables representing both the linear transformation and trans- lation as single matrix multiplication. Following equation specifies transformation f of (6.1).

y  A b x = 1 0 ··· 0 1 1

Essential 3D Matrix Transformations Applying the concept of affine map finally allows us to represent all the necessary transformations [40] needed to position and modify the primitives to create complex molecular models. 0 0 0 Scaling point (x, y, z) to (x , y , z ) using scale factor (sx, sy, sz):

 0     x sx 0 0 0 x 0 y   0 sy 0 0 y  0 =     z   0 0 sz 0 z 1 0 0 0 1 1

1. Point (x1, x2, ..., xn−1, xn) in homogeneous coordinates corresponds to ( x1 , x2 , ..., xn−1 ) in Cartesian coordinates. xn xn xn

21 6. TRANSFORMATIONS

0 0 0 Translating point (x, y, z) to (x , y , z ) by vector (tx, ty, tz):

 0     x 1 0 0 tx x 0 y  0 1 0 ty y  0 =     z  0 0 1 tz  z 1 0 0 0 1 1

Rotating point (x, y, z) to (x0, y0, z0) by an angle θ around an arbi- trary axis defined by unit vector (ax, ay, az) :

 0  2    x tax + c taxay + saz taxaz − say 0 x 0 2 y  taxay − saz tay + c tayaz + sax 0 y  0 =  2    z  taxaz + say tayaz − sax taz + c 0 z 1 0 0 0 1 1 where c = cos(θ), s = sin(θ), t = 1 − cos(θ).

22 7 Splines

The other category of common visualizations (besides those consist- ing of many duplicated primitives) comprises mainly various extru- sions along splines [41]. The most prominent representatives of such visualizations are ribbon (see 5.5) and cartoon (see 5.6).

7.1 Bézier Curve

Bézier curve [42] BP0P1...Pn is a parametric curve (with parameter t ∈ [0, 1]) given by control points P0P1...Pn recursively defined as follows:

BP (t) = P

BP0P1...Pn (t) = (1 − t) · BP0P1...Pn−1 (t) + t · BP1P2...Pn (t) and explicitly defined as follows:

n X n B (t) = (1 − t)n−itiP P0P1...Pn i i i=0

7.2 Bézier Spline

Constructing complex curves simply by increasing the order of Bézier curve (i.e. adding more control points) is impractical due to the lack of local control1 and computational difficulty of solving high-degree polynomials. Generally, a curve spline refers to a piecewise-defined polyno- mial function with high degree of smoothness at the knots2 (joints). In the case of Bézier spline, the spline consists of Bézier curves. Let a nondecreasing sequence t1, t2, ..., tk where ti ∈ [0, 1], i ∈ [1, 2, ...k] determine positions of internal knots (joints) and k + 1 sets i i i i P , i ∈ [1, k + 1] of points P1,P2, ..., Pn defining Bézier curve consti- tuting the ith spline segment. The Bézier spline can be defined as

1. Each control point influences the whole curve. 2. The connections of polynomial pieces.

23 7. SPLINES

follows:  n P n n−i i 1 S1(t) = i (1 − t) t Pi t ∈ [0, t1]  i=0  n  P n n−i i 2 S2(t) = i (1 − t) t Pi t ∈ (t1, t2] S(t) = i=0  .  .  n  P n n−i i k+1 Sk+1(t) = i (1 − t) t Pi t ∈ (tk, 1] i=0

7.3 Continuity

Each of the Bézier curves making up the spline is independent of any other, points of subsequent Bézier curves need to be explicitly positioned to ensure spline continuity [43]. Two related notions describe continuity of a curve (or a spline consisting of curves): parametric and geometric continuity [44][45]. A curve has parametric continuity of degree n (Cn) if all deriva- tives up to nth order agree at any point of the curve. Namely, C0 signifies that the curve isn’t interrupted, C1 the continuity of first derivatives, C2 the continuity of second derivatives, etc. The notion of geometric continuity involves the particular shape of the curve. A curve is said to be Gm-continuous if it can be repa- rameterized to have Cn continuity. Specifically, G0 continuity at the joint point of two curves means the coordinates agree, G1 implies G0 and parallel tangent lines, G3 respectively requires G2 and common center of curvature at the joint point. To attain C0-continuity (as well as G0-continuity) between two Bézier curves P (t),Q(t) of the same order, defined by sets of points (P0,P1, ..., Pn) and (Q0,Q1, ..., Qn), the last point of the first curve must be identical to the first point of the second curve:

Q0 = Pn (7.1) C1-continuity requires C0-continuity and Q0(0) = P 0(1), therefore:

Q1 − Q0 = Pn − Pn−1 (7.2)

Combining (7.1) and (7.2) and expressing Q1 gives:

Q1 = 2Pn − Pn−1

24 7. SPLINES

Similarly, C2-continuity demands C1-continuity as well as Q00(0) = P 00(1), giving:

Q2 − 2Q1 + Q0 = Pn − 2Pn−1 + Pn−2 (7.3)

And combining (7.1), (7.2) and (7.3) we get:

Q2 = 4Pn − 4Pn−1 + Pn−2

7.4 Frenet-Serret formulas

In order to create 3D mesh from Bézier spline, an imaginary profile (a 2D shape) is extruded along the spline. We use Frenet-Serret ap- paratus [46][47] to describe position and rotation of the profile while tracing the spline. Define unit vector T tangent to the curve, oriented in the direc- tion of movement; normal unit vector N and their cross product B (binormal vector) — together forming a Frenet-Serret or TNB frame. Frenet-Serret theorem gives the relationship between derivatives of T,N,B, torsion τ and curvature κ of a continuous curve:

T 0   0 κ 0 T  N 0 = −κ 0 τ N B0 0 −τ 0 B

25 PART IV

IMPLEMENTATION 8 Tools

The chapter presents essential implementation tools — the program- ming language and essential libraries.

8.1 TypeScript

JavaScript lacks several useful features (especially for large-scale ap- plication development — an area it wasn’t designed for), TypeScript [48] aims to alleviate some of its shortcomings without giving up any of its advantages. TypeScript is a free open-source programming language licensed under Apache License 2.0 [49] developed by Microsoft. It compiles to JavaScript (itself an implementation of ECMAScript 5 [50]) but closely conforms to ECMAScript 6 [51] working draft (with the ex- ception of static typing — unlike ECMAScript 6, TypeScript sup- ports it). Being strict superset of JavaScript, any JavaScript program is valid TypeScript as well. TypeScript enables conventional class-based object-oriented pro- gramming (rather than JavaScripts prototype-based approach). It in- cludes features such as optional static typing (through type annota- tions)1, type inference, generics, interfaces, classes, namespaces and new anonymous function syntax. All of those help to make source code more structured, maintainable and easier to write2 — but still compilable to pure JavaScript.

8.2 Three.js

Three.js [52] is a free open source [53] JavaScript library designed for general 3D graphics programming. On top of abstracting from the low-level details of WebGL3, it provides limited fall-back rendering

1. Type information is often supplied in separate header files available for many popular JavaScript libraries. 2. Thanks to advanced IDE (Integrated Development Environment) support fa- cilitated by static typing. 3. In a way that still grants full access to GLSL shading capabilities.

27 8. TOOLS

options (regular HTML5 canvas or SVG4 canvas) for platforms with- out WebGL support. The library organizes the rendering data and processes in object- oriented way into scenes consisting of meshes (a combinations of geometry and material), cameras and lights. Moreover it provides many useful presets (materials, cameras, geometries, visual effects), particle system, animation support, data export/import and useful 3D math functions.

8.3 jQuery

Free and open source jQuery [54][55] is the most popular JavaScript library. It simplifies client-side JavaScript application development in several ways: advanced DOM elements selection and manipula- tion, improved event handling and Ajax support [56], animating, etc. LiveMol utilizes primarily the jQuery implementation of promise de- sign pattern [57].

4. Scalable Vector Graphics

28 9 Design

The chapter provides brief overview of overall architecture and some interesting features of the LiveMol library.

9.1 Input Format

Parsing various molecular file formats [58] isn’t the goal of the thesis, instead the library uses custom JSON (JavaScript Object Notation) format. The top-level object1 has the following relevant properties: Atoms, Bonds, Residues, Helices and Sheets. Listing 9.1: Simplified example of input format 1 { "Atoms":{ 2 "1":{ 3 "Id":1, 4 "Symbol": "N", 5 "Position":[-30.07,8.178,-13.891], 6 "SerialNumber":1, 7 "RecordType": "ATOM", 8 "Name": "N", 9 "ResidueSequenceNumber":28, 10 "ResidueName": "HIS"}}, 11 "Bonds":[{ 12 "A":3320, 13 "B":3810, 14 "Type": "Metallic"}, 15 "Residues":[{ 16 "Name": "SER", 17 "Chain": "A", 18 "SerialNumber":29, 19 "Atoms":[11,12,13,14,15,16], 20 "CAlpha":12, 21 "CarbonylOxygen":14}], 22 "Helices":[{ 23 "StartResidue":{ 24 "Chain": "A", 25 "SerialNumber":31}, 26 "EndResidue":{

1. Object is set of name/value pairs.

29 9. DESIGN

27 "Chain": "A", 28 "SerialNumber":36}}], 29 "Sheets":[{ 30 "StartResidue":{ 31 "Chain": "A", 32 "SerialNumber":71}, 33 "EndResidue":{ 34 "Chain": "A", 35 "SerialNumber":76}}] 36 }

Atoms Atoms (line 1 of listing 9.1) object contains records representing all the atoms in the molecule indexed by atom id. Each atom entry iden- tifies chemical element, position, id, serial number2, name3, residue sequence number and residue name of the atom in question.

Bonds Bonds (line 11 of listing 9.1) is an array of bond entries holding in- formation about bonds in molecule — each of them gives an account of the two bonded atoms (ids) and the type of bond.

Residues In the residues array (line 15 of listing 9.1), every residue is charac- terized by its name (similar to residue name in atom entry), identifier of the chain it belongs to, serial number, indices of its atoms and in- dices of its C-alpha carbon and carbonyl oxygen.

Helices To communicate the determined secondary structures of a protein, Helices array (line 22 of listing 9.1) contains chain identification and residue number of the two residues denoting the concerned he- lix’s start and end.

2. From the original molecular file format, usually same as atom id. 3. Unique in given residue.

30 9. DESIGN

Sheets Similarly to Helices, Sheets (line 29 of listing 9.1) describes the start and end residues of sheets on particular chain.

9.2 Internal Architecture

9.2.1 General Design As stated before, LiveMol isn’t a stand-alone application. Instead it provides molecular visualization capabilities in form of a library — other JavaScript applications can easily incorporate it to display molecules. Consequently, LiveMol lacks any definite graphical user interface (GUI), it is controlled through its API4, the specific binding of API functions to GUI depends on the client application. Furthermore, the engine needs be able to smoothly visualize even large molecules and work as unobtrusively as possible — computa- tionally intensive tasks (e.g. geometry generation), potentially caus- ing the application GUI to become unresponsive, must be delegated to background threads5. The internal representation of chemical data (Molecule class) closely conforms to the input format structure (see 9.1). Classes Atom, Bond and Residue hold the same information as their counterpart entries in input format and Structure class describes both helices and sheets collectively. Its generateRibbons method calculates the particular shape of cartoon model6. MoleculeDrawing represents the particular generated drawing (geometries and materials) of molecule. It contains references to the Molecule class holding the visualized data; relevant Three.js ob- jects (meshes, scene); visualization mode identifier; several dictio- naries describing the relationship between mesh vertices and dis- played atoms, bonds or residues and a Theme instance. Moreover, its applyTheme method changes the theme of drawing. The central component of LiveMol is the Scene class. An instance

4. Application Programming Interface 5. Using Web Worker API (see 3.4.3). 6. Other models can be determined trivially from atoms and bonds properties of the Molecule class.

31 9. DESIGN

of Scene represents the canvas where the models of molecules are displayed and provides the API to control and use the engine. It en- compasses the functions for creating, removing and manipulating molecules and molecule drawings as well as a way to subscribe to predefined events.

Figure 9.1: Simplified class diagram illustrating the core functional- ity.

9.2.2 Asynchronous Model Generation

The library uses Web Worker API (see 3.4.3 and 9.4.2) to generate the molecule model (an instance of MoleculeDrawing class) in back- ground thread. The createModelAsync method initiates the model creation and returns a promise [57]. That way the client application can either simply add the finished model into the LiveMol scene (us- ing addModel method) or discard it at will (in case user canceled the model generation for example).

32 9. DESIGN

9.2.3 Materials and Shaders

Each molecule representation consists of only few meshes — the ob- vious practice of representing each chemical item (atom, bond or residue) with separate mesh results in very poor performance due to Three.js internal implementation of objects7. All such objects must be merged into few large ones to achieve high enough performance (i.e. tens of frames per second). Thus, parts of the mesh representing many chemical items differ in color. While Three.js provides predefined materials capable of col- oring parts of the mesh differently, they aren’t designed to deal with large enough meshes (e.g. sticks model of large molecule consists of tens of millions triangles). Feasible solution involves storing the col- oring information in binary buffer (as opposed to JavaScript array) and implementing a custom GLSL shader to render colors according to the buffer. Generally, shaders consist of two subprograms run on GPU de- termining properties of vertices (vertex shader) and pixels (fragment shader) from the supplied buffers. LiveMol uses two shaders, flat shader and Phong shader [59]. Flat shading simply renders each pixel colored by the correspond- ing color — it passes on the value from buffer. Because of the virtual lack of shading (i.e. all colors have the same brightness), the spatial arrangement and shape may be difficult to distinguish without view- ing the mesh from more than one viewpoint. On the other hand, the colors are clear and unobstructed by too dark shades. Phong shader assigns each pixel a darker or lighter shade of the color from buffer by direction of light and face vertex normal vec- tor8. Computing shading in this way is a fast alternative to more ad- vanced (and computationally intensive) methods such as raytracing or ambient occlusion.

7. No more than few hundreds meshes can be effectively displayed at the same time because of significant overhead caused by swapping buffers of every object. 8. Each triangle face has three vertex normal vectors corresponding to its vertices.

33 9. DESIGN 9.3 Visualization Modes

At the time of writing the thesis, LiveMol implements four essential visualization modes (lines, sticks, cartoon and charges mode) sup- porting custom coloring using themes (see 9.4.1):

Lines (see 5.4) displays all bonds as simple thin lines without shad- ing colored according to the bonded atoms.

Figure 9.2: Lines visualization of 1TQN colored by CPK (see 5.1).

Sticks (see 5.3) models each bond as a simple 3D geometry. In order to display even the largest molecules (primarily to reduce the memory footprint of such models), engine implements several “lev- els of detail”. Each level of detail causes the bond to be represented by different geometry (progressively increasing the number of tri- angles) — from simple octahedron to increasingly detailed capsule. The following images show sticks models of several levels of details colored by CPK scheme.

34 9. DESIGN

Figure 9.3: Level of detail 1 Figure 9.4: Level of detail 1 sticks model of 1TQN. (detail).

Figure 9.5: Level of detail 3 Figure 9.6: Level of detail 3 sticks model of 1TQN. (detail).

Figure 9.7: Level of detail 5 Figure 9.8: Level of detail 5 sticks model of 1TQN. (detail).

35 9. DESIGN

Cartoon (see 5.6) uses ribbons and coils to depict secondary struc- ture of molecule. They are created by extruding rectangle, ellipse or circle along a spline given by C-alpha carbon atoms of residues (see 4.1). Charges is a modification of usual sticks and balls model (see 5.2). Compared to sticks and balls, it colors bonds (sticks) in slightly dif- ferent way (smooth transition rather than sharp separation between halves) and allows the user to determine the size of atoms (balls) through theme. The name comes from primary application — visu- alizing electric charges of atoms.

Figure 9.9: Cartoon visualiza- Figure 9.10: Charges visualiza- tion of 1TQN. tion of 1TQN.

9.4 Extensibility

LiveMol enables users to simply extend its visualization capabilities in two ways: adding custom themes or implementing completely new visualization modes.

9.4.1 Theme A visualization of molecule conveys information in two ways: the shape (geometry) and the color (material). The mode of visualization

36 9. DESIGN determines the shape (e.g. thick ribbons and coils in the case of car- toons mode or capsules in the case the of sticks mode), but choosing color is much less straightforward — the same geometry can repre- sent many distinct features of molecule simply by applying different colors to its parts. To this end, LiveMol utilizes the concept of theme — a single object describing the coloring of geometries. A theme encompasses functions that are given a particular chemical item (atom, bond or residue) as an argument and return the color of the relevant geome- try part. The theme object can be easily supplied by client application to color the generated geometries in any conceivable way. Addition- ally, the functions determining colors can access external data (i.e. besides those supplied by LiveMol).

9.4.2 Visualization Mode In the initial stage of molecule visualization the engine needs to: gen- erate the particular geometry, record the relationship between the geometry and relevant chemical items of molecule9 (e.g. faces cor- responding to particular atom) and use the recorded relationships to color the generated geometry according to some theme. Since all these tasks can’t be generally carried out sufficiently quickly, they need to be performed in background thread (see 3.4.3) not to inter- fere with the GUI thread. Therefore, to add a new visualization mode one has to imple- ment a Web Worker script complying with LiveMol API. Namely, it must understand the simple internal representation of the molecule it is given, be able to produce geometry, mappings and colors of the model and communicate them to LiveMol in a defined way.

9.4.3 Extension Example: Tunnels Visualization Particularly interesting features of many biologically active mole- cules are tunnels (channels) — access/egress paths to their interior voids. Among others, MOLE 2.0 [6] and MOLEonline 2.010 [60] ap- plications compute such channels.

9. Engine uses such maps for coloring and highlighting parts of the geometry. 10. MOLEonline 2.0 in-browser visualization requires Java plug-in. 37 9. DESIGN

A simple extension to LiveMol enables it to process the elemen- tary 3D file format11 used by the aforementioned applications to ex- port the calculated tunnel geometries.

Figure 9.11: Sticks visualiza- Figure 9.12: Cartoon visualiza- tion of 1TQN and its tunnels. tion of 1TQN and its tunnels.

11. Containing vertex positions and face indices.

38 PART V

RESULTS AND DISCUSSION 10 Performance Tests

The chapter presents performance assessments of the library from several points of view: the time it takes to generate particular model (i.e. geometry and colors), the frame rate of visualization and mem- ory footprint. The tests were executed using Google Chrome Version 34 and Mozilla FireFox Version 29 on two hardware configurations: config- uration A (desktop computer) and configuration B (notebook), both of them running Windows 8.1 Pro 64-bit. Configuration A: Intel Core i7 3820 (3.60GHz), 16 GB DDR3 RAM, NVIDIA GeForce GTX 690. Configuration B: Intel Core i7 4700MQ (2.40GHz), 16 GB DDR3 RAM, NVIDIA GeForce GT 730M.

10.1 Model Generation

Following tables show approximately how long it takes to create a model of molecule. The presented times are arithmetic averages of 10 consecutive tests.

Time (s) Molecule 2BBA 2RFK 3QS8 4KZG 4GF5 3J5X 4CR3 1JJ2 Atom Count 1805 5009 11045 20082 43542 60539 80172 98573 Lines 0.096 0.203 0.391 0.711 1.782 2.643 3.304 3.771 Sticks (LOD1 1) 0.153 0.257 0.508 0.924 1.851 3.374 3.942 5.124 Sticks (LOD 2) 0.181 0.452 1.014 1.452 3.129 4.337 5.817 10.863 Sticks (LOD 3) 0.271 0.872 1.352 2.231 5.868 8.316 —2 — Sticks (LOD 4) 0.349 1.195 1.688 3.448 6.574 — — — Sticks (LOD 5) 0.495 1.984 2.502 5.120 — — — — Cartoon 0.206 0.336 0.909 1.968 3.310 3.174 4.019 3.530 Charges 0.311 0.870 1.378 2.755 5.121 — — —

Table 10.1: Generation times in Chrome 34 using configuration A.

1. Level of detail 2. Can’t be generated due to memory limit imposed by browser (see 10.3).

40 10. PERFORMANCE TESTS

Time (s) Molecule 2BBA 2RFK 3QS8 4KZG 4GF5 3J5X 4CR3 1JJ2 Atom Count 1805 5009 11045 20082 43542 60539 80172 98573 Lines 0.124 0.242 0.539 0.903 2.317 3.131 4.104 4.512 Sticks (LOD 1) 0.231 0.312 0.655 1.271 2.514 4.034 5.079 7.502 Sticks (LOD 2) 0.253 0.521 1.267 1.713 3.697 5.514 7.435 13.813 Sticks (LOD 3) 0.348 1.057 1.740 2.891 6.730 9.894 — — Sticks (LOD 4) 0.450 1.579 2.315 3.939 8.045 — — — Sticks (LOD 5) 0.627 2.346 3.201 6.420 — — — — Cartoon 0.241 0.503 1.103 2.471 4.003 3.891 5.316 4.884 Charges 0.392 1.041 1.746 3.221 6.304 — — —

Table 10.2: Generation times in Chrome 34 using configuration B.

Time (s) Molecule 2BBA 2RFK 3QS8 4KZG 4GF5 3J5X 4CR3 1JJ2 Atom Count 1805 5009 11045 20082 43542 60539 80172 98573 Lines 0.087 0.181 0.304 0.711 1.941 2.809 3.833 4.274 Sticks (LOD 1) 0.141 0.234 0.451 0.924 1.997 3.817 4.729 6.351 Sticks (LOD 2) 0.169 0.403 0.847 1.452 3.640 5.019 7.442 13.160 Sticks (LOD 3) 0.235 0.810 1.034 2.231 7.104 — — — Sticks (LOD 4) 0.309 1.074 1.471 3.448 — — — — Sticks (LOD 5) 0.451 1.314 2.971 5.120 — — — — Cartoon 0.179 0.291 0.631 2.170 3.866 3.784 5.634 4.764 Charges 0.287 0.678 1.571 3.017 6.410 — — —

Table 10.3: Generation times in FireFox 29 using configuration A.

Time (s) Molecule 2BBA 2RFK 3QS8 4KZG 4GF5 3J5X 4CR3 1JJ2 Atom Count 1805 5009 11045 20082 43542 60539 80172 98573 Lines 0.101 0.181 0.351 0.929 2.164 3.629 4.203 4.795 Sticks (LOD 1) 0.179 0.234 0.534 1.215 2.241 5.407 5.477 7.841 Sticks (LOD 2) 0.232 0.403 1.324 1.867 4.312 6.382 9.231 15.078 Sticks (LOD 3) 0.301 0.810 1.894 2.597 9.003 — — — Sticks (LOD 4) 0.378 1.074 2.140 4.352 — — — — Sticks (LOD 5) 0.543 1.314 3.330 6.174 — — — — Cartoon 0.217 0.291 0.836 2.576 4.499 4.396 7.115 5.897 Charges 0.365 0.678 1.867 3.436 7.537 — — —

Table 10.4: Generation times in FireFox 29 using configuration B.

41 10. PERFORMANCE TESTS

Figure 10.1: A graph of model generation times for lines and sticks (LOD 1) modes in Chrome 34 using configuration A.

In conclusion, the observed time complexity of model generation algorithms is O(n), where n equals to number of chemical bonds in the case of lines, sticks and charges modes or number of secondary structures in the case of cartoon mode. Such results were expected because the algorithms process each model element (i.e. chemical bond or a secondary structure) in constant time.

10.2 Frame Rate

Following frame rates are an average of rates measured while ro- tating the model (the whole model being visible) for 10 seconds. Browser RequestAnimationFrame takes display refresh rate into account thus the 60 frames per second upper limit on frame rate.

42 10. PERFORMANCE TESTS

Frame Rate (FPS3) Molecule 2BBA 2RFK 3QS8 4KZG 4GF5 3J5X 4CR3 1JJ2 Atom Count 1805 5009 11045 20082 43542 60539 80172 98573 Lines 60 60 60 60 60 60 60 60 Sticks (LOD 1) 60 60 60 60 60 60 60 60 Sticks (LOD 2) 60 60 60 60 60 60 60 60 Sticks (LOD 3) 60 60 60 60 60 60 — — Sticks (LOD 4) 60 60 60 60 60 — — — Sticks (LOD 5) 60 60 60 60 — — — — Cartoon 60 60 60 60 60 60 60 60 Charges 60 60 60 60 60 — — —

Table 10.5: Frames per second in Chrome 34 using configuration A.

Frame Rate (FPS4) Molecule 2BBA 2RFK 3QS8 4KZG 4GF5 3J5X 4CR3 1JJ2 Atom Count 1805 5009 11045 20082 43542 60539 80172 98573 Lines 60 60 60 60 60 60 57 54 Sticks (LOD 1) 60 60 60 60 57 58 39 34 Sticks (LOD 2) 60 60 60 60 32 30 24 17 Sticks (LOD 3) 60 60 60 49 17 11 — — Sticks (LOD 4) 60 60 60 34 12 — — — Sticks (LOD 5) 60 60 35 11 — — — — Cartoon 60 60 60 60 47 49 46 51 Charges 60 60 60 35 14 — — —

Table 10.6: Frames per second in Chrome 34 using configuration B.

Overall, the frame rate clearly depends on model size and GPU performance. However, WebGL implementation in browsers is suffi- ciently effective to allow displaying even very large models on mod- est (i.e. mobile) graphic cards.

10.3 Memory Consumption

Each molecule visualization consists of vertices (points in 3D space) that define either lines5 or triangle faces6. Furthermore, each vertex

5. Two vertices denote line ends. 6. Three vertices specify triangle corners.

43 10. PERFORMANCE TESTS holds information about its color and (in the case of vertices specify- ing triangle faces) normal vector7. Following tables present vertex count and estimated8 memory consumption of particular molecule visualizations.

Vertices (thousands) Molecule 2BBA 2RFK 3QS8 4KZG 4GF5 3J5X 4CR3 1JJ2 Atom Count 1805 5009 11045 20082 43542 60539 80172 98573 Lines 6 21 42 81 167 261 326 397 Sticks (LOD 1) 116 378 752 1465 3014 4698 5870 7146 Sticks (LOD 2) 349 1133 2257 4394 9043 14095 17609 21439 Sticks (LOD 3) 872 2833 5643 10990 22609 35237 — — Sticks (LOD 4) 1221 3966 7900 15386 31652 — — — Sticks (LOD 5) 2093 6799 13543 26376 — — — — Cartoon 90 231 780 1518 2784 2247 2846 1968 Charges 1200 3458 7467 13765 29524 — — —

Table 10.7: Thousands of vertices defining the visualization.

Memory (MB) Molecule 2BBA 2RFK 3QS8 4KZG 4GF5 3J5X 4CR3 1JJ2 Atom Count 1805 5009 11045 20082 43542 60539 80172 98573 Lines 0.15 0.48 0.96 1.86 3.83 5.97 7.46 9.08 Sticks (LOD 1) 3.99 12.97 25.83 50.31 103.49 161.30 201.51 245.35 Sticks (LOD 2) 11.98 38.90 77.49 150.86 310.48 483.91 604.54 736.06 Sticks (LOD 3) 29.94 97.26 193.74 377.31 776.21 1209.77 — — Sticks (LOD 4) 41.92 136.16 271.23 528.24 1086.69 — — — Sticks (LOD 5) 71.86 233.42 464.97 905.55 — — — — Cartoon 3.09 7.93 26.79 52.13 95.59 77.12 97.71 67.58 Charges 41.19 118.73 256.36 472.57 1013.64 — — —

Table 10.8: Typed arrays’ size of the visualizations.

7. Shader uses normal vector of vertex to calculate shading. 8. Vertex is defined by nine 32-bit numbers: 3D position, RGB color and 3D nor- mal vector (except lines mode). The vertex data are stored in typed array (see 3.4.2), without additional overhead of JavaScript objects.

44 11 Limitations

Using WebGL GPU-accelerated graphics even very large proteins (such as 1JJ2 consisting of 98573 atoms and 99256 bonds) can be dis- played and viewed smoothly using modest hardware configuration. Nevertheless, preceding performance tests hint at some limitations of the presented solution.

11.1 Memory

The primary encountered limitation concerns memory. Although a JavaScript application can in theory use all the memory system has to offer, in practice the script becomes unresponsive much sooner due to internal restrictions of the particular JavaScript engine. The pre- cise upper limit on memory usage depends on browser, OS and hard- ware configuration — throughout the testing, scripts consistently be- came unresponsive when memory consumption exceeded 1.3 GB in Chrome and 1.05 GB in FireFox. On the conservative assumption that the model itself can occupy at most 1 GB and since each vertex is given by nine 32-bit numbers (see 10.3), approximately 7 457 000 bonds can be displayed in lines mode and 414 000 bonds in sticks mode (level of detail 1).

11.2 Mesh Generation Time

Another difficulty of displaying large molecules comes from the fun- damentally linear-time generation of the visualization mesh. Creat- ing large geometries can take tens of seconds (see 10.1). Although the generation itself runs in background thread, other potentially time- consuming tasks must be executed in the main thread1 causing mo- mentarily unresponsive GUI.

1. The background thread (Web Worker) operates in separate context — it can’t for instance manipulate the WebGL context.

45 11. LIMITATIONS 11.3 Highlighting

LiveMol supports selecting (highlighting) the elements of molecule using mouse. To that end, it casts a ray given by the positions of cam- era and mouse, searches for a collision of such ray and the molecule geometry and interprets it using pregenerated maps expressing the relationships between individual geometry faces and chemical items (see 9.4.2). The currently utilized implementation of raycasting (the one sup- plied by Three.js) unfortunately runs in linear time. Therefore, the performance of highlighting feature is satisfactory2 only in the case of relatively small molecules/models (hundreds of thousands of ver- tices). Upon future implementation of raycasting algorithm using mul- tidimensional data structure such as octree [61] enabling element lo- cation in logarithmic time, the performance is expected to improve significantly.

2. Running in a small fraction of second.

46 12 Potential Extensions

12.1 Additional Visualization Modes

LiveMol implements only a fraction of possible ways to visualize a protein or a molecule in general. The most notable omitted are Van der Waals and ribbon methods (see 5). Nevertheless, various other visualization methods may depict every conceivable aspect of molecule.

12.2 Complex Shaders

Due to performance considerations, the library uses only simple (and fast) shaders (see 9.2.3). While such approach results in high perfor- mance (i.e. smooth visualization of large molecules), more complex shaders would enable more complex visualization features (at least for small molecules) such as sheen, transparency or textures.

12.3 Non-interactive Rendering

LiveMol provides interactive (GPU-accelerated) molecule visualiza- tion. Although somewhat beyond the scope of the library, advanced visualization applications often offer optional non-interactive ren- dering mode producing high-quality pictures1 or animations. Forgo- ing the interactivity requirement (i.e. having to render tens of frames per second) allows completely different approach: rendering excep- tionally detailed images of arbitrary resolution. Moreover, since the non-interactive rendering engine can use CPU (Central Processing Unit) instead of GPU (and because CPU lacks the restrictions of GPU), it can apply advanced rendering techniques such as ray tracing light- ing or ambient occlusion.

1. Pictures presented in chapter 5 for instance were rendered using PyMOLs non- interactive mode.

47 PART VI

CONCLUSION 13 Conclusion

The primary goal of the thesis was to develop interactive molecular visualization library/engine. To enhance user experience and porta- bility, the engine is browser-based and not dependent on any third- party plug-ins. Additionally, it needs to be lightweight and easily embeddable into other web applications. Due to the potential size of visualized molecules, the main consideration when designing the library is its performance. In order to fulfill the requirements (e.g. portability, performance), the implemented library (called LiveMol) utilizes WebGL — a widely adopted and sufficiently effective way to display GPU-accelerated graphics in web browser. The thesis outlines alternative software solutions and relevant technologies. It briefly introduces the chemical background, current popular methods to visualize molecules (especially proteins) and es- sential mathematical concepts. Part of the thesis gives account of overall design and interesting features of the library. Finally, various performance tests are presented and limits of the proposed solution as well as its potential extensions analyzed. The resultant LiveMol library implements four commonly used visualization modes, but accentuates extensibility — adding new mo- de is straightforward and custom themes allow users to modify vi- sualization colors at will. The performance tests show LiveMol can display large molecules in adequate quality.

49 PART VII

APPENDIX A Compilation

The source files of LiveMol are sorted into two categories: the ones forming the core library (LiveMol_core folder) and Web Worker scripts (LiveMol_workers folder). The core files should be com- piled into one JavaScript file while the Web Workers scripts must be compiled into separate files. JQuery and Three.js definition files (in lib folder) are required for successful compilation, the folder structure should be as follows: • LiveMol

– lib – LiveMol_core – LiveMol_workers

A.1 Command-line TypeScript Compiler

A command-line TypeScript compiler (tsc) is available at [62] as a Node.js [63] package. In order to concatenate the output files into a single one we can use the --out option, its argument is the name of file. Another con- venient option is --removeComments (discards the comments from original TypeScript files). For full list of tsc options see [64]. Therefore, the command to compile the core library (called from LiveMol_core folder) into LiveMol.js is as follows: tsc --removeComments --out LiveMol.js Atom.ts Bond.ts Element.ts Events.ts Molecule.ts MoleculeDrawing.ts Residue.ts Ribbon.ts Scene.ts Shaders.ts Structures.ts Themes.ts

More generally, to compile all TypeScript files in current folder into LiveMol.js: dir *.ts /b /s > ts-files.txt tsc --out LiveMol.js @ts-files.txt

51 A.COMPILATION del ts-files.txt

The separate Web Worker scripts (in LiveMol_workers folder) can be compiled using following commands1: tsc --removeComments CartoonWorker.ts tsc --removeComments LinesWorker.ts tsc --removeComments MeshWorker.ts tsc --removeComments SticksWorker.ts tsc --removeComments StructureWorker.ts

A.2 Visual Studio 2013

The current version of TypeScript Visual Studio 2013 plug-in (Update 2; available at [62]) unfortunately doesn’t allow developer to compile the TypeScript files in the required way (i.e. combine some, leave oth- ers separate). One can nevertheless use the post-build events to exe- cute essentially the same commands as those mentioned previously (they differ only in using the Visual Studio path variables).

To compile the core library: dir “$(SolutionDir)\...\*.ts” /b /s > ts-files.txt tsc --out “$(SolutionDir)\...\LiveMol.js” @ts-files.txt del ts-files.txt

To compile Web Workers scripts: tsc “$(SolutionDir)\...\CartoonWorker.ts” tsc “$(SolutionDir)\...\LinesWorker.ts” tsc “$(SolutionDir)\...\MeshWorker.ts” tsc “$(SolutionDir)\...\SticksWorker.ts” tsc “$(SolutionDir)\...\StructureWorker.ts”

1. Alternatively, the command for compiling all files in a folder can be used omit- ting the --out option.

52 B Attached Files

• Source code of LiveMol in LiveMol_source

– Core library files in LiveMol_source/LiveMol_core – Web Workers in LiveMol_source/LiveMol_workers – TypeScript definition files of essential libraries in lib

• Compiled core library and Web Worker scripts in bin

• Necessary JavaScript libraries in lib_js

• VS 2013 project of sample application in LiveMol_example

• Text of the thesis, LaTeX source files and pictures in thesis

53 Bibliography

[1] Theoretical and Computational Biophysics Group. VMD Soft- ware. http://www.ks.uiuc.edu/Research/vmd, 2013. [Online; accessed 18-April-2014]. [2] Theoretical and Computational Biophysics Group. University of Illinois Open Source License. http://www.ks.uiuc.edu/ Research/vmd/plugins/pluginlicense.html, 2006. [Online; accessed 19-April-2014]. [3] Theoretical and Computational Biophysics Group. VMD Plug- ins. http://www.ks.uiuc.edu/Research/vmd/plugins, 2013. [Online; accessed 19-April-2014]. [4] Inc. Schrodinger. PyMOL. http://www.pymol.org, 2014. [Online; accessed 27-April-2014]. [5] Python Software Foundation. Python License. https:// docs.python.org/2/license.html, 2014. [Online; ac- cessed 19-April-2014]. [6] David Sehnal, Radka Svobodová Vaˇreková, Karel Berka, Lukáš Pravda, Veronika Navrátilová, Pavel Banáš, Crina-Maria Ionescu, Michal Otyepka, and Jaroslav Koˇca. Mole 2.0: ad- vanced approach for analysis of biomacromolecular channels. Journal of Cheminformatics, 5(39), 2013. [7] Bernstein and Sons. RasMol and OpenRasMol. http:// .org, 2009. [Online; accessed 17-April-2014]. [8] Bernstein and Sons. RasMol License. https://gnu.org/ licenses/gpl.html, 2005. [Online; accessed 17-April-2014]. [9] Free Software Foundation. GNU General Public License. https://gnu.org/licenses/gpl.html, 2007. [Online; ac- cessed 16-April-2014]. [10] Hans-Peter Lenhof, Oliver Kohlbacher, and Andreas Hilde- brandt. BALL. http://www.ball-project.org, 2014. [On- line; accessed 29-April-2014].

54 B.ATTACHED FILES

[11] Free Software Foundation. GNU Lesser General Public License. https://www.gnu.org/licenses/lgpl.html, 2007. [On- line; accessed 19-April-2014].

[12] Abdul-Rahman Allouche. Gabedit. http://gabedit. sourceforge.net/, 2011. [Online; accessed 20-April-2014].

[13] Abdul-Rahman Allouche. License for Gabedit. https: //sites.google.com/site/allouchear/Home/ gabedit/license, 2011. [Online; accessed 17-April-2014].

[14] Marco Tarini and Paolo Cignoni. QuteMol. http://qutemol. sourceforge.net, 2007. [Online; accessed 17-April-2014].

[15] Marcus Hanwell et al. Avogadro. http://avogadro.cc/ wiki/Main_Page, 2014. [Online; accessed 17-April-2014].

[16] Jmol development team. Jmol. http://jmol.sourceforge. net, 2014. [Online; accessed 19-April-2014].

[17] iCHemLabs. ChemDoodle. http://www.chemdoodle.com, 2014. [Online; accessed 21-April-2014].

[18] Khronos Group. WebGL Specification . http://www. khronos.org/registry/webgl/specs/latest/1.0, 2014. [Online; accessed 20-April-2014].

[19] Web Hypertext Application Technology Working Group. HTML Living Standard. http://www.whatwg.org/specs/ web-apps/current-work/#is-this-html5, 2014. [On- line; accessed 20-April-2014].

[20] World Wide Web Consortium. HTML5 Candidate Recommen- dation . http://www.w3.org/TR/html5, 2014. [Online; ac- cessed 20-April-2014].

[21] Khronos Group. Khronos Group. http://www.khronos. org, 2014. [Online; accessed 29-April-2014].

[22] Khronos Group. OpenGL ES Common Profile Spec- ification. http://www.khronos.org/registry/gles/

55 B.ATTACHED FILES

specs/2.0/es_full_spec_2.0.25.pdf, 2010. [Online; ac- cessed 20-April-2014].

[23] Diego Cantor and Brandon Jones. WebGL Beginner’s Guide. Packt Publishing, Birmingham, 2012.

[24] Kouichi Matsuda and Rodger Lea. WebGL Programming Guide: Interactive 3D Graphics Programming with WebGL. Addison-Wesley Professional, Boston, 2013.

[25] Microsoft Security Response Center. WebGL Considered Harmful. http://blogs.technet.com/b/srd/archive/ 2011/06/16/webgl-considered-harmful.aspx, 2011. [Online; accessed 20-April-2014].

[26] James Forshaw, Paul Stone, and Michael Jor- don. WebGL - More WebGL Security Flaws. http://www.contextis.com/research/blog/ webgl-more-webgl-security-flaws/, 2011. [Online; accessed 20-April-2014].

[27] Khronos Group. WebGL Security. http://www.khronos. org/webgl/security, 2013. [Online; accessed 21-April- 2014].

[28] World Wide Web Consortium. Timing Control for Script-based Animations. http://www.w3.org/TR/ animation-timing, 2013. [Online; accessed 16-April-2014].

[29] Khronos Group. Typed Array Specification. http://www. khronos.org/registry/typedarray/specs/latest, 2013. [Online; accessed 9-April-2014].

[30] World Wide Web Consortium. Web Workers. http://www. w3.org/TR/workers, 2012. [Online; accessed 15-April-2014].

[31] Wolfgang Demtröder. Atoms, Molecules and Photons: An Intro- duction to Atomic- Molecular- and Quantum Physics. Springer, Berlin, first edition, 2002.

56 B.ATTACHED FILES

[32] John McMurry. Organic Chemistry. Cengage Learning, Stam- ford, eight edition, 2011.

[33] Robert Corey and Linus Pauling. Molecular models of amino acids, peptides, and proteins. Review of Scientific Instruments, 24(8):621–627, 1953.

[34] Jason Yano, Michael Wester, Guillaume Schoch, Keith Griffin, David Stout, and Eric Johnson. The Structure of Human Micro- somal Cytochrome P450 3A4 Determined by X-ray Crystallog- raphy. http://www.jbc.org/content/279/37/38091. full, 2004. [Online; accessed 25-April-2014].

[35] Jane S. Richardson. Schematic drawings of protein structures. Methods in Enzymology, (115):359–380, 1985.

[36] InterKnowlogy. 3D Molecule Viewer. http:// 3dmoleculeviewer.codeplex.com, 2008. [Online; ac- cessed 5-May-2014].

[37] Serge Lang. Linear algebra. Springer-Verlag, New York, first edition, 1987.

[38] Paul Halmos. Finite-dimensional vector spaces. Springer- Verlag, New York, first edition, 1974.

[39] Marcel Berger. Geometry I. Springer, Berlin, first edition, 1987.

[40] Philip Schneider Berger and David Eberly. Geometric Tools for Computer Graphics. Morgan Kaufmann, San Francisco, first edition, 2003.

[41] Gerald Farin. Curves and surfaces for computer-aided geomet- ric design. Elsevier Science and Technology Books, Amsterdam, fourth edition, 1997.

[42] Ph. Barry. Encyclopedia of Mathematics: Bézier curve. http://http://www.encyclopediaofmath.org/ index.php/B%C3%A9zier_curve, 2012. [Online; accessed 29-April-2014].

57 B.ATTACHED FILES

[43] Neil Dodgson. Bézier Curves. https://www.cl.cam.ac. uk/teaching/2000/AGraphHCI/SMEG/node3.html, 2000. [Online; accessed 29-April-2014].

[44] Brian Barsky and Tony DeRose. Geometric continuity of para- metric curves: Three equivalent characterizations. Computer Graphics and Applications, 9(6):60–68, 1989.

[45] Serge Lang. Undergraduate Texts in Mathematics. Springer- Verlag, New York, second edition, 1997.

[46] Andrew Hanson. Quaternion Frenet Frames: Making Optimal Tubes and Ribbons from Curves. http://www.cs.indiana. edu/pub/techreports/TR407.pdf, 2007. [Online; accessed 29-April-2014].

[47] E.L. Evtushik. Encyclopedia of Mathematics: Moving-frame method. http://www.encyclopediaofmath.org/index. php?title=Moving-frame_method&oldid=17828, 2011. [Online; accessed 29-April-2014].

[48] Microsoft Corporation. TypeScript Language Specification Ver- sion 1.0. http://www.typescriptlang.org/Content/ TypeScript%20Language%20Specification.pdf, 2014. [Online; accessed 1-May-2014].

[49] The Apache Software Foundation. Apache License, Version 2.0. http://www.apache.org/licenses/LICENSE-2.0. html, 2004. [Online; accessed 29-April-2014].

[50] Ecma International. ECMAScript Language Specifica- tion 5.1. http://www.ecma-international.org/ publications/files/ECMA-ST/Ecma-262.pdf, 2011. [Online; accessed 29-April-2014].

[51] Ecma International. Draft Specification for Ecma-262 Edi- tion 6. http://wiki.ecmascript.org/doku.php?id= harmony:specification_drafts, 2014. [Online; accessed 29-April-2014].

58 B.ATTACHED FILES

[52] Jos Dirksen. Learning Three.js: The JavaScript 3D Library for WebGL. Packt Publishing, Birmingham, 2013. [53] Ricardo Cabello. Three.js License. https://github.com/ mrdoob/three.js/blob/master/LICENSE, 2014. [Online; accessed 1-May-2014]. [54] The jQuery Foundation. jQuery License. https://jquery. org/license, 2014. [Online; accessed 30-April-2014]. [55] Jonathan Chaffer and Karl Swedberg. Learning jQuery - Fourth Edition. Packt Publishing, Birmingham, 2013. [56] Mozilla Developer Network. Ajax. https://developer. mozilla.org/en-US/docs/AJAX, 2014. [Online; accessed 30-April-2014]. [57] Jake Archibald. JavaScript Promises. http://www. html5rocks.com/en/tutorials/es6/promises, 2014. [Online; accessed 30-April-2014]. [58] development team. OpenBabel Supported Formats. http://openbabel.org/wiki/Category:Formats, 2006. [Online; accessed 1-May-2014]. [59] B. T. Phong. Illumination for computer generated pictures. Communications of ACM, 18(6):311–317, 1975. [60] Karel Berka, Ondˇrej Hanák, David Sehnal, Pavel Banáš, Veronika Navrátilová, Deepti Jaiswal, Crina-Maria Ionescu, Radka Svobodová Vaˇreková, Jaroslav Koˇca, and Michal Otyepka. Moleonline 2.0: interactive web-based analysis of biomacromolecular channels. Nucleic Acids Research, 40(Web Server issue):W222–W227, 2012. [61] Hanan Samet. Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann, San Francisco, first edition, 2006. [62] Microsoft Corporation. TypeScript. http://www. typescriptlang.org, 2014. [Online; accessed 10-May- 2014].

59 B.ATTACHED FILES

[63] Joyent Inc. Node.js. http://nodejs.org, 2014. [Online; ac- cessed 10-May-2014].

[64] Joyent Inc. TypeScript Compiler. https://www.npmjs.org/ package/typescript-compiler, 2014. [Online; accessed 10-May-2014].

60