Data Source Registration in the Virtual Laboratory

Faculty of Physics and Applied Computer Science Master thesis Marek Pomocka major: applied computer science specialisation: computer techniques in science and technology Data source registration in the Virtual Laboratory Supervisor: Marian Bubak, Ph.D. Consultants: Piotr Nowakowski, M.Sc. Daniel Harężlak, M.Sc. Cracow, September 2009 Aware of criminal liability for making untrue statements I decleare that the following thesis was written personally by myself and that I did not use any sources but the ones mentioned in the dissertation itself. 2 Cracow, September 2009 The subject of the master thesis and the internship by Marek Pomocka, student of 5th year major in Applied Computer Science, specialisation in computer techniques in science and technology The subject of the Master Thesis: Data source registration in the Virtual Laboratory Supervisor: Marian Bubak, Ph.D. Reviewer: Piotr Gronek, Ph.D. A place of the internship: Academic Computer Centre Cyfronet AGH, Cracow Programme of the Master Thesis and the Internship 1. Discussion with the supervisor and consultants on realization of the thesis. 2. Collecting and studying the references relevant to the thesis topic. 3. The internship: • getting to know the environment of Virtual Laboratory and the problem to be solved • learning the necessary programming languages • identifying project requirements and possible implementation technologies • drafting the design • discussion with the supervisor on the proposed design • preparation of the Internship report. 4. Specifying detailed software requirements. 5. Prototyping possible solutions. 6. Making decisions regarding the implementation. 7. Creating complete design plan. 8. Implementing the solution. 9. Correctness tests, measuring performance and software limits. 10. Final analysis of the problem and to what extend the created software solves it, conclusions – discussion and final approval by the thesis supervisor. 11. Typesetting the thesis. Dean’s office delivery deadline: 30 September 2009 3 Acknowledgements I would like to express my thanks to Marian Bubak and Piotr Nowakowski for their invaluable help, guidance, advice and thought- fulness. Furthermore, I would like to thank David and Gillian Crowther for their language help. I dedicate this thesis to my mother who was always with me. 4 Contents 1 Definitions, acronyms and abbreviations 11 1.1 Acronyms and abbreviations ............................. 11 1.2 Definitions ....................................... 12 2 Introduction 17 2.1 Motivation ....................................... 17 2.2 Objectives ....................................... 23 2.3 Organization of the thesis .............................. 24 3 Background 26 3.1 The GridSpace platform ............................... 26 3.2 GridSpace Engine deployment ............................ 32 3.3 The Virtual Laboratory ............................... 36 3.4 Data access in ViroLab ................................ 41 3.5 Other projects based on GridSpace platform .................... 50 3.6 Storage services in gLite ............................... 56 4 Needs to be addressed / Problems to be solved 65 4.1 Providing access to EGEE/WLCG data sources .................. 65 4.2 Integration with the GridSpace Engine ....................... 65 4.3 Automation of certificate management ....................... 66 4.4 Extending the DSR plug-in to enable registration of LFC data sources ..... 66 5 Related work 67 5.1 Other virtual laboratories .............................. 67 5.2 Attempts to make the Grid service-oriented .................... 73 5.3 Data access and persistence in Grid projects .................... 75 5.4 Libraries providing access to gLite data resources ................. 77 6 General software requirements 79 6.1 Scope ......................................... 79 6.2 Product perspective ................................. 79 6.3 Product functions ................................... 81 6.4 User characteristics .................................. 81 6.5 Constraints ...................................... 82 6.6 Assumptions and dependencies ........................... 82 5 7 Detailed requirements 83 7.1 Functional requirements ............................... 83 7.2 User interfaces .................................... 85 7.3 Software interfaces .................................. 86 7.4 Performance requirements .............................. 95 7.5 Software system attributes .............................. 95 8 Design description 98 8.1 Design decisions .................................... 98 8.2 Organization of Design description ......................... 100 8.3 Identified stakeholders and design concerns ..................... 101 8.4 Design views ..................................... 101 8.4.1 Composition ................................. 103 8.4.2 Logical .................................... 105 8.4.3 Dependency .................................. 118 8.4.4 Information .................................. 123 8.4.5 Interface .................................... 124 8.4.6 Interaction .................................. 127 9 Verification and validation 131 9.1 Functional tests .................................... 131 9.2 Performance tests ................................... 140 10 Conclusions 149 10.1 Summary ....................................... 149 10.2 Future work ...................................... 149 11 References 151 A LFC Data Source – User guide 176 A.1 Data access workflow: registering the data source, storing credentials, using the data source from a script ............................... 176 A.2 DACConnector LFC DS specific constructors ................... 177 A.3 LFC Data Source methods .............................. 178 CGW’09 abstract 183 6 List of Tables 1 Acronyms and abbreviations ............................. 11 2 Definitions ....................................... 12 3 Examples of Grid computing applications ..................... 19 4 Functional requirements ............................... 84 5 User interface requirements ............................. 85 6 Software interface requirements ........................... 88 7 Synopsis of LFC DS non-functional requirements ................. 96 8 Design concerns and views addressing them .................... 102 9 Identified stakeholders and their design concerns .................. 102 10 Design viewpoints specifications ........................... 102 11 LFCDS Java client library$LFCDS server performance test ........... 145 12 GScript LFC connector$LFCDS server performance test ............. 146 13 GScript LFC connector$LFCDS server performance test over WAN ...... 148 List of Figures 1 GridSpace Engine in Virtual Laboratory environment ............... 27 2 A process of executing an experiment from Experiment Repository ....... 28 3 Three levels of Grid Operation Invoker abstraction [33]. ............. 30 4 Grid Operation Invoker architecture and external components, with which it communicates [33]. .................................. 31 5 GrAppO architecture [152]. ............................. 31 6 agiLe MONitoring ADherence Environment (leMonAdE) architecture divided into two parts: Infrastructure monitoring and Application Monitoring [152]. .. 32 7 Virtual Laboratory framework conceptual components. .............. 38 8 Experiment pipeline – one of the central ideas behind Virtual Laboratory [108]. 39 9 PROToS architecture [27]. ............................. 40 10 Layered view onto ViroLab architecture. On top there are three kinds of users: experiment developers, scientists and clinical virologists using dedicated interfaces that, in turn, communicate with runtime components that manage com- putational and data resources located in Grid, clusters or individual computers [198]. ......................................... 41 11 A more technical view of the ViroLab structure with all main constituents illus- trated [108]. ..................................... 41 12 Cooperation model between experiment (application) creators and users of these experiments [46, 109]. ................................ 42 7 13 Interactions between components during execution of a sample experimental plan with source code was provided from listing 1 [46]. .............. 42 14 Architecture of data access in ViroLab. ...................... 44 15 DAC2 data access workflow as described in the text. ............... 45 16 A DSR form that appears when adding a new data source. ........... 45 17 DSR form for providing data source credentials. ................. 46 18 Data source connector hierarchy in DAC2. .................... 47 19 DAS security mechanisms [16, 19]. ........................ 49 20 Data integration scenarios in ViroLab Data Access Services [18]. ........ 50 21 Structure of GREDIA middleware [133]. ...................... 51 22 Architecture of Appea platform [44]. ........................ 52 23 An overview of GREDIA data management services [14]. ............ 53 24 ChemPo architecture [202]. ............................ 54 25 Structure of PL-Grid ................................. 55 26 Filenames in gLite .................................. 58 27 Catalogues in gLite [138] ............................... 59 28 Client tools for interacting with gLite storage [1] .................. 63 29 Execution of gfal_open function [1] ......................... 64 30 Virtual Laboratory for e-Science architecture (figure from [238]) ......... 67 31 myExperiment architecture – figure shared on myExperiment website by David de Roure, myExperiment director, using Creative Commons Attribution-Share Alike 3.0

Load more