Computational Environment for Cellular Imaging By

COMPUTATIONAL ENVIRONMENT FOR CELLULAR IMAGING BY FLUORESCENCE MICROSCOPY GERNOT STOCKER DOCTORAL THESIS Graz University of Technology Institute for Genomics and Bioinformatics Petersgasse 14, 8010 Graz, Austria Graz, February 2007 Abstract In the postgenomic era, high-throughput screening methods have identified a large number of genes which still lack functional characterization. Fluorescence microscopy can be used to localize labeled proteins from those genes in determining their functional role in cellular processes. In order to perform a systematic analysis of the generated microscopy images, structured storage of the data and organized documentation of its experiments are inevitable. To address this problem a platform independent computational environment for handling microscopy data, named Scientific MIcroscopy Laboratory Environment (SMILE), was developed. The project oriented workflow of a typical experiment guides the design of the software for data storage, data retrieval and consistent chronological documentation. This approach inher- ently leads to structured recording of the experiment, conserving the complete experimental context of data records at any time. State-of-the-art software technology was used to imple- ment a multi-tiered J2EE application offering an intuitive Web-based user interface. The SMILE application back-end can be accessed through a generic programming interface and is open for integration of external imaging tools, such as ImageJ. Computationally intensive tasks for server-side image processing and analysis are transparently delegated to an adequate high performance computing infrastructure, and results are stored within the underlying relational database management system. The SMILE computational environment offers researchers an integrative software platform for scientific collaboration, while hiding the necessary complexity of data management and IT infrastructure. Because of its modular and flexible architecture, SMILE is ready to face the future challenges of microscopy data management. Keywords: microscopy, experiment workflow, SMILE database, J2EE, high performance computing i Publications This thesis was based on the following publications, as well as upon unpublished work: Papers J. Rainer, F. Sanchez-Cabo, G. Stocker, A. Sturn, Z. Trajanoski. CARMAweb: comprehensive R- and BioConductor-based Web service for microarray data analysis. Nucleic Acids Res. 34: W498-W503 (2006) PMID: 16845058 C. Vogl, F. Sanchez-Cabo, G. Stocker, S. Hubbard, O. Wolkenhauer, Z. Trajanoski. A fully Bayesian model to cluster gene-expression profiles. Bioinformatics. 21 Suppl 2: ii130-ii136 (2005) PMID: 16204092 M. Maurer, R. Molidor, A. Sturn, J. Hartler, H. Hackl, G. Stocker, A. Prokesch, M. Scheideler, Z. Trajanoski. MARS: microarray analysis, retrieval, and storage system. BMC Bioinformatics. 6: 101-101 (2005) PMID: 15836795 H. Hackl, M. Maurer, B. Mlecnik, J. Hartler, G. Stocker, D. Miranda-Saavedra, Z. Trajanoski. GOLDdb: Genomics of lipid-associated disorders database. BMC Genomics. 5: 93-93 (2004) PMID: 15588328 G. Stocker, D. Rieder, Z. Trajanoski. ClusterControl: a Web interface for distributing and moni- toring bioinformatics applications on a Linux cluster. Bioinformatics. 20: 805-807 (2004) PMID: 14751976 G. G. Thallinger, S. Trajanoski, G. Stocker, Z. Trajanoski. Information management systems for pharmacogenomics. Pharmacogenomics. 3: 651-667 (2002) PMID: 12223050 Conference proceedings and poster presentations J. Hartler, G. G. Thallinger, G. Stocker, A. Sturn, T. Burkard, E. Korner,¨ K. Mechtler, Z. Tra- janoski, Z. Management and Analysis of Proteomics LC-MS/MS Data. Fourth International Symposium of the Austrian Proteomics Platform, Seefeld in Tirol. 2007 Jan 28. J. Hartler, G. G. Thallinger, G. Stocker, A. Sturn, T. Burkard, E. Korner,¨ T. Fuchs, K. Mechtler, Z. Trajanoski, Z. MASPECTRAS: Web-based System for Storage, Retrieval, Quantification and Analysis of Proteomic LC MS/MS Data. Third International Symposium of the Austrian Pro- ii COMPUTATIONAL ENVIRONMENT FOR CELLULAR IMAGING BY FLUORESCENCE MICROSCOPY teomics Platform, Seefeld in Tirol. 2006 Jan 16. J. Hartler, G. G. Thallinger, G. Stocker, A. Sturn, T. Burkard, E Korner,¨ T. Fuchs, K. Mechtler, Z. Trajanoski. MASPECTRAS: Web-based System for Storage, Retrieval, and Analysis of Pro- teomic LC MS/MS Data. HUPO 4th Annual World Congress, Munich. 2005 Aug 29. F. Sanchez-Cabo, H. Hackl, S. Hubbard, G. Stocker, Z. Trajanoski, O. Wolkenhauer, C. Vogl. A Fully Bayesian Model to Cluster Gene Expression Profiles. ISMB/ECCB 2004, Glasgow. 2004 Jul 31- Aug 04. G. Stocker, J. McNally, Z. Trajanoski. SMILE: Scientific microscopy lab environment. OEGBMT Symposium fur¨ Biomedizinische Technik, Graz, Austria. 2004 Nov 12-13. H. Hackl, T. Burkard, C. Paar, R. Fiedler, A. Sturn, G. Stocker, R. M. Rubio, J. Quackenbush, A. Schleiffer, F. Eisenhaber, Z. Trajanoski. Large scale gene expression analysis and functional annotation of adipocyte differentiation. Keystone Symposia: Molecular Control of Adipogenesis and Obesity, Banff, Canada. 2004 Mar 04-10. H. Hackl, T. Burkard, C. Paar, R. Fiedler, A. Sturn, G. Stocker, R. Rubio, J. Quackenbush, A. Schleiffer, F. Eisenhaber, Z. Trajanoski. Large Scale Expression Profiling and Functional An- notation of Adipocyte Differentiation. First International Symposium of the Austrian Proteomics Platform. Seefeld, Austria. 2004 Jan 26. iii Contents 1 Introduction . 1 1.1 Workflow analysis ................................... 2 1.2 Objectives ........................................ 9 2 Results . 10 2.1 Overview ........................................ 10 2.2 Scientific MIcroscopy Laboratory Environment (SMILE) .............. 11 2.2.1 Functionality overview ............................. 11 2.2.2 Architectural design and implementation ................... 15 2.2.3 Universal back-end infrastructure ....................... 19 2.3 JClusterService ..................................... 22 2.3.1 Analysis module for SMILE .......................... 23 2.3.2 Application in comparative transcriptomics and genomics ......... 23 2.3.3 Application in proteomics ........................... 24 2.4 Production environment ................................ 25 3 Discussion . 26 4 Outlook . 30 5 Methods . 32 5.1 Computational system components .......................... 32 5.1.1 Java 2 Enterprise Edition J2EE ........................ 32 5.1.2 Enterprise Java Beans (EJB) ......................... 36 5.1.3 Spring, Java/J2EE Application Framework .................. 39 5.1.4 Relational database management system .................. 46 5.1.5 Object relational persistence with Hibernate ................. 47 iv COMPUTATIONAL ENVIRONMENT FOR CELLULAR IMAGING BY FLUORESCENCE MICROSCOPY CONTENTS 5.1.6 J2EE Web services .............................. 49 5.1.7 Tapestry, a component oriented Web framework .............. 52 5.1.8 Model Driven Architecture ........................... 56 5.1.9 AndroMDA ................................... 57 5.1.10 Software configuration management ..................... 62 5.1.11 High performance computing with Rocks Clusters Linux .......... 64 5.2 Basic overview of fluorescence microscopy ..................... 67 A Bibliography . 69 B Glossary . 83 C Acknowledgments . 86 D Publications . 87 v Chapter 1 Introduction The sequencing of the human genome [International Human Genome Sequencing Consortium, 2004; Venter et al., 2001] identified thousands of as yet uncharacterized genes. With the local- ization of their transcribed proteins in subcellular components, it is possible to draw conclusions about the functional roles of these genes [O’Rourke et al., 2005]. Due to the steady development of fluorescence microscopy, reliable tools for assessing protein locations have become available [Giepmans et al., 2006]. Experiments using such methods to address significant biological questions result in image data which must be processed, analyzed, quantified and stored over long time periods on an adequate infrastructure. The necessity for a data management system is clear, and several initiatives were started to standardize the way of describing and storing microscopy data output [Goldberg et al., 2005; Swedlow et al., 2003, 2006]. Several database systems for specific applications [Carazo et al., 1999; Gonzalez-Couto et al., 2001; Kals et al., 2005; Liang et al., 2002] have been developed, but most of these focus on a targeted experimental setup and are not universally applicable. The Open Microscopy Environment (OME) project [Swedlow et al., 2003] is the most actively developed scientific database project for universal microscopy data management, devoting no- ticeable effort into standardizing methods of image storage [Goldberg et al., 2005]. OME has in- troduced a new tiff-based file format, which contains XML formatted data about the creation and development of an image. Several microscopy software packages from industry and academia are already able to read the format and to access OME’s server back-end. Technologically, OME is offering a Perl interface, used for back-end image storage and the implementation of a data-centric lightweight Web interface. Recently, OME was extended by the J2EE-based server application OMERO, which also handles the management of image data and is accessed by a standalone Java client, named Schoola. OME is particularly focused on handling image data but lacks an easy-to-use Web interface for data acquisition during the development of experi- 1 COMPUTATIONAL ENVIRONMENT FOR CELLULAR IMAGING BY FLUORESCENCE

Computational Environment for Cellular Imaging By

Ioc Containers in Spring

Design Pattern Interview Questions

Dependency Injection with Unity

Designpatternsphp Documentation Release 1.0

JAVA CONCURRENCY ONLINE TRAINING – Java2aspire

Semantic Description of Services and Service Factories for Ambient Intelligence Rémi Emonet

Sun Certified Enterprise Architect for Java EE Study Guide / Mark Cade, Humphrey Sheil

J2EE Architecture and Patterns in Enterprise Systems

How to Reduce Costs of Business Logic Maintenance

Web Applications Engineering: Web Design Patterns and Guideline

[ Team Lib ] Crawford and Kaplan's J2EE Design Patterns Approaches

Autofac Documentation Release 6.0.0