Engine for Molecule Visualization in a Web Browser
Total Page:16
File Type:pdf, Size:1020Kb
MASARYKOVA UNIVERZITA FAKULTA}w¡¢£¤¥¦§¨ INFORMATIKY !"#$%&'()+,-./012345<yA| Engine for Molecule Visualization in a Web Browser MASTER’S THESIS Jaromír Svoboda Brno, spring 2014 Declaration Hereby I declare, that this paper is my original authorial work, which I have worked out by my own. All sources, references and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source. Jaromír Svoboda Advisor: RNDr. David Sehnal ii Acknowledgement I would like to thank my supervisor RNDr. David Sehnal for patient guidance and informed advice throughout writing this thesis. iii Abstract The main focus of this master’s thesis is the design and implementa- tion of lightweight molecular visualization engine (called LiveMol) in form of a JavaScript library. The engine utilizes widely adopted WebGL API to display GPU-accelerated graphics in web browsers. Due to the size of complex protein molecules, the primary goal is high performance. Furthermore, the design of LiveMol enables users to extend the core functionality by simply defining custom coloring schemes of molecule models or implement completely new visual- ization modes. iv Keywords molecular visualization, protein, secondary structure, WebGL, LiveMol v Contents I INTRODUCTION 1 1 Introduction ............................2 II CURRENT DEVELOPMENTS 3 2 Current Molecular Visualization Software ..........4 2.1 Desktop Applications ....................4 2.1.1 Visual Molecular Dynamic . .4 2.1.2 PyMOL . .4 2.1.3 RasMol/OpenRasMol . .5 2.1.4 BALL/BALLView . .5 2.1.5 Gabedit . .5 2.1.6 QuteMol . .5 2.1.7 Avogadro . .6 2.2 JavaScript-based Web Applications ............6 2.2.1 Jmol/JSmol . .6 2.2.2 ChemDoodle . .6 3 WebGL and Related Technologies ...............7 3.1 HTML5 ............................7 3.2 OpenGL ES 2.0 .......................7 3.3 WebGL ............................8 3.3.1 Security . .8 3.4 Relevant JavaScript Features ................9 3.4.1 RequestAnimationFrame . .9 3.4.2 Typed Arrays . .9 3.4.3 Web Workers . 10 III THEORY 11 4 Chemical Background ...................... 12 4.1 Molecule Structure ..................... 12 Atom . 12 Bond . 12 Molecule . 12 Amino Acid . 12 Protein . 13 4.2 Protein Structure ...................... 13 Primary Structure . 13 vi Secondary Structure . 13 Tertiary Structure . 13 Quaternary Structure . 14 5 Molecular Visualization Methods ............... 15 5.1 Van der Waals Method ................... 15 5.2 Balls and Sticks ....................... 16 5.3 Sticks ............................. 16 5.4 Lines ............................. 17 5.5 Ribbon ............................ 17 5.6 Cartoon ............................ 18 6 Transformations ......................... 19 6.1 Linear Transformation ................... 19 6.1.1 Vector Space . 19 6.1.2 Linear Map . 19 6.2 Affine Transformation ................... 20 6.2.1 Affine Space . 20 6.2.2 Affine Map . 21 Essential 3D Matrix Transformations . 21 7 Splines ............................... 23 7.1 Bézier Curve ......................... 23 7.2 Bézier Spline ......................... 23 7.3 Continuity .......................... 24 7.4 Frenet-Serret formulas ................... 25 IV IMPLEMENTATION 26 8 Tools ................................ 27 8.1 TypeScript .......................... 27 8.2 Three.js ............................ 27 8.3 jQuery ............................ 28 9 Design ............................... 29 9.1 Input Format ......................... 29 Atoms . 30 Bonds . 30 Residues . 30 Helices . 30 Sheets . 31 9.2 Internal Architecture .................... 31 9.2.1 General Design . 31 vii 9.2.2 Asynchronous Model Generation . 32 9.2.3 Materials and Shaders . 33 9.3 Visualization Modes .................... 34 9.4 Extensibility ......................... 36 9.4.1 Theme . 36 9.4.2 Visualization Mode . 37 9.4.3 Extension Example: Tunnels Visualization . 37 V RESULTS AND DISCUSSION 39 10 Performance Tests ........................ 40 10.1 Model Generation ...................... 40 10.2 Frame Rate .......................... 42 10.3 Memory Consumption ................... 43 11 Limitations ............................ 45 11.1 Memory ........................... 45 11.2 Mesh Generation Time ................... 45 11.3 Highlighting ......................... 46 12 Potential Extensions ....................... 47 12.1 Additional Visualization Modes .............. 47 12.2 Complex Shaders ...................... 47 12.3 Non-interactive Rendering ................. 47 VI CONCLUSION 48 13 Conclusion ............................. 49 VII APPENDIX 50 A Compilation ............................ 51 A.1 Command-line TypeScript Compiler ........... 51 A.2 Visual Studio 2013 ...................... 52 B Attached Files ........................... 53 viii PART I INTRODUCTION 1 Introduction The considerable technological advancements of the last decades en- abled completely new approaches to many problems and gave rise to several scientific disciplines — many of them using information technology to store, retrieve, organize and analyze information in unprecedented way. Bioinformatics, one of the most successful such fields, applies the methods to biological data, primarily molecular- level structures such as proteins. A major achievement in biology, facilitated by technological prog- ress, concerns experimentally determining (and making publicly ac- cessible) the structure of many proteins, nucleic acids, etc. For in- stance, Protein Data Bank accumulates a wealth of such data — more than 100 000 structures at the time of writing of this thesis. Since the molecular structure is typically characterized by coordinates of indi- vidual atoms, human comprehension requires some kind of visual- ization. The thesis aims to design and develop a lightweight molecular (especially protein) visualization engine called LiveMol capable of displaying molecular data as well as calculated protein features such as tunnels. Furthermore, it is required to run in major web browsers (without any third-party plug-ins) and needs to be sufficiently ef- ficient to process even very large proteins. To that end, the engine utilizes WebGL (Web Graphics Library), whose recent wide-spread adoption and outstanding performance make it a viable option. The first part introduces current molecular visualization software — the traditional desktop applications as well as the web-based al- ternatives. An overview of key technologies such as JavaScript and WebGL follows. In the theoretical part, I briefly mention the chemical and biolog- ical components the library works with, various visualization meth- ods and relevant math concepts. The chapters concerning implementation present essential tools used throughout the development and the library design including features such as implemented visualization modes or extensibility. The final part gives an account of the engines performance, its limitations and potential expansions. 2 PART II CURRENT DEVELOPMENTS 2 Current Molecular Visualization Software In following sections I present a brief overview of most popular cur- rent applications used for viewing, analyzing or modifying molecu- lar structures. 2.1 Desktop Applications Unless noted otherwise, undermentioned applications support all major desktop platforms (Windows, Linux and Mac OS X). 2.1.1 Visual Molecular Dynamic Often abbreviated as VMD, Visual Molecular Dynamics [1] belongs to the most advanced programs. It is distributed free of charge and open-sourced under UIUC Open Source License [2]. VMD can pro- cess a wide variety of formats (more then 60 of them) thanks to nu- merous built-in parsing plug-ins as well as visualization modes and collection of structure manipulation and visualization tools. Authors provide a guide of the software architecture and source code, facilitating creation of additional modules (VMD is written in C++) and plug-ins (using Tcl/Tk and Python). Besides adding sup- port for other file formats, users’ plug-ins can implement new user interface features, molecule modifications, visualization methods, analyses and simulations. Extensive set of plug-ins is available at [3]. 2.1.2 PyMOL Despite somewhat nonintuitive GUI (Graphical User Interface), Py- MOL [4] is perhaps the most popular molecular visualization soft- ware. The source code is available under Python License [5], exe- cutables are free for students and teachers. Other users can either purchase commercial subscription (including support) or compile an executable from source code. It primarily generates high-quality im- ages and animations but features many tools for analysis and edit- ing of molecules too. The popularity stems from the extensibility of the program as well, although the application is written mostly in 4 2. CURRENT MOLECULAR VISUALIZATION SOFTWARE C/C++, the embedded Python interpreter allows for easy scripting and plug-in creation. Multitude of third-party plug-ins exist, some of the most notable ones are: CAVER, GPSSpyMOL and MOLE 2.0 [6]. 2.1.3 RasMol/OpenRasMol Originally developed in 1995, RasMol [7] pioneered the molecular visualization software on desktop. The last version (2009) offers all standard visualization modes either under the terms of RASMOL li- cense [8] or the GNU General Public License [9] . 2.1.4 BALL/BALLView BALL [10] (Biochemical Algorithms Library) C++ framework con- sists of various algorithms, data structures and classes useful for de- veloping biochemical applications — file import/export, analysis, modification, visualization, python scripting interface, etc. BALLView utilizes