Prototype of a Portal for Scientific Data Access Based on Scala

MASTER’S THESIS Prototype of a Portal for Scientific Data Access Based on Scala carried out at Course Programme Information Technologies and IT Marketing By: Florian Topf PID: 1210320012 Graz,onDecember16,2013 ............................. Florian Topf Declaration of Authorship I hereby declare, that I have written this thesis without any help from others and without the use of documents and aids other than those stated, that I have mentioned all used sources and that I have cited them correctly according to established academic citation rules. .................................... Florian Topf i Ehrenwörtliche Erklärung Ich erkläre ehrenwörtlich, dass ich die vorliegende Arbeit selbstständig und ohne fremde Hilfe verfasst, andere als die angegebenen Quellen nicht benützt und die benutzten Quellen wörtlich zitiert sowie inhaltlich entnommene Stellen als solche kenntlich gemacht habe. .................................... Florian Topf ii Acknowledgements The RTD work described in this thesis was accomplished within the EU FP7-SPACE project IMPEx (Integrated Medium for Planetary Exploration, project number 262863). The author is thankful to all contributions and suggestions made to the IMPEx portal project by the participating data providers in particular with regard to the establish- ment of web service interfaces and metadata trees. The overall process of elaborating this thesis and the IMPEx portal subsequently was supervised by the IMPEx project manager DI Tarek Al-Ubaidi from the IWF (Institut für Weltraumforschung/Space Research Institute) of the Austrian Academy of Sci- ences in Graz. Additionally Prof. DI Manfred Steyer gave guideance with his didactic role at the Campus 02 and as supervisor of this thesis. The author would also like to thank Martin Odersky and his team for the excellent documentation of the programming language Scala and related libraries provided at the dedicated website1 which in particular supported the preliminary development of the IMPEx portal described in this thesis. Florian Topf Graz, on December 16, 2013 1Scala language website: http://www.scala-lang.org/ iii Abstract Recent studies by the author have shown that the “Integrated Medium for Planetary Exploration” (IMPEx) FP7-SPACE project is evolving towards becoming a fully oper- ational Virtual Observatory (VO). The service-oriented architecture provides state-of- the-art web services and a robust XML data model which enables the exploitation of observational- and simulated data collated from research performed within planetary plasma- and magnetospheric physics. The author shows in this thesis primarily that the elaborated infrastructure of IMPEx is extendable with new services, complying with the IMPEx data model and proto- col. Secondly, the author focuses on a recently revived software paradigm: functional programming. This paradigm originates from initial attempts to develop a general language for eval- uating mathematical expressions: lambda calculus. This thesis examines the features provided by this formalism and how they are transformed into a modern programming language. It is shown using the example of the object-functional language Scala how functional aspects are merged into a wider spread paradigm in software development: object-oriented programming. The result of this thesis is a centralised portal for IMPEx which enables the usage of IMPEx web services with frameworks available in the scope of Scala. It is shown how functional aspects support distributed and scalable software with their concise syntax and semantics, as well as handling of XML structured documents. It is concluded that Scala can be considered a new base technology for the development of more integrated simulation environments enabling complex scientific workflows. iv Kurzfassung Vorangegangene Studien des Autors haben gezeigt, dass sich das FP7-SPACE Projekt “Integrated Medium for Planetary Exploration” (IMPEx) in Richtung eines vollstän- dig definierten Virtuellen Observatorium (VO) entwickelt. Die serviceorientierte Ar- chitektur stellt state-of-the art Webservices und ein robustes XML Datenmodell zur Verfügung, welche die Nutzung von Observations- und Simulationsdaten aus For- schungsergebnissen im Bereich der Plasma- und Magnetosphärenphysik ermöglicht. Der Autor zeigt in dieser Arbeit primär wie diese erarbeitete Infrastruktur mit neu- en Services erweiterbar ist, die ebenfalls mit dem IMPEx Datenmodell und Protokoll konform sind. Sekundär beschäftigt sich der Autor mit einem kürzlich wiederbeleb- ten Softwareparadigma: Funktionale Programmierung. Dieses Paradigma hat ihren Usprung in den ersten Versuchen eine allgemeine Spra- che für die Evaluierung von mathematischen Ausdrücken zu entwickeln: das Lambda- Kalkül. Diese Arbeit untersucht die Eigenschaften dieses Formalismus und zeigt auf, wie diese in eine moderne Programmiersprache transformiert werden. Es wird am Beispiel der objekt-funktionalen Sprache Scala gezeigt wie funktionale Aspekte mit einem weitaus mehr verbreiteten Paradigma verschmolzen werden können: Objekt- orientierte Programmierung. Das Ergebnis dieser Arbeit ist ein zentralisiertes IMPEx Portal, welches die Verwen- dung von IMPEx Webservices mit Hilfe der entsprechenden Frameworks in Scala er- möglicht. Es wird gezeigt, wie funktionale Aspekte mit deren kompakten Syntax und Semantik die Entwicklung von verteilter und skalierbarer Software unterstützen sowie die Handhabung von XML Dokumenten vereinfachen. Es wird zum Schluss ge- kommen, dass Scala als neue Basistechnologie für die Entwicklung von integrierten Simulationsumgebungen gesehen werden kann, welche in weiterer Folge komplexe wissenschaftliche Workflows ermöglichen. v Contents 1. Introduction 2 1.1. PresentStatusandMotivation. ... 2 1.2. ConceptualFormulation . 4 1.3. ObjectivesofthisThesis . .. 4 1.4. ApproachandMethodology. 5 1.5. CompositionofthisThesis. .. 5 1.6. WorkdonebytheAuthor .......................... 6 2. Elements of Functional Programming 7 2.1. LambdaCalculusandtheTuringMachine . ... 8 2.1.1. Lambda-Expressions . 8 2.1.2. Lambda-Conversion . 10 2.1.3. Lambda-Definability . 11 2.2. FunctionalStyle ................................ 13 2.3. ExpressionsandTypes . 15 2.3.1. PrimitiveTypes ............................ 17 2.3.2. Polymorphism ............................ 17 2.4. EvaluationStrategies . .. 18 2.4.1. Call-By-Name............................. 18 2.4.2. Call-By-Value ............................. 19 2.5. DataStructures ................................ 20 2.5.1. Tuples ................................. 20 2.5.2. Lists .................................. 21 2.5.3. Trees .................................. 22 2.5.4. Arrays ................................. 23 2.5.5. AbstractDataTypes . 24 2.6. AdvancedFunctionalFeatures . .. 25 3. Application to Object-oriented Programming 28 3.1. TheScalaProgrammingLanguage . 29 3.1.1. FunctionsandEvaluation . 30 3.1.2. AbstractionsandClasses . 32 3.1.3. TypesandPolymorphisms . 35 3.1.4. CollectionsandOperations . 39 3.2. FunctionalJavaScript . .. 44 4. Evaluation of Object-functional Capabilities 48 4.1. DomainSpecificLanguagesandParsers . ... 49 vi Contents 4.2. ParallelandConcurrentProgramming . .... 54 4.2.1. ActorsandtheAkkaframework . 55 4.2.2. ScalaFuturesandUseCases . 57 4.3. Side-effectsandBasicI/O . .. 58 4.4. ScalaWebFrameworks............................ 62 5. The IMPEx Portal 68 5.1. PurposeandAims .............................. 69 5.2. UserRequirements .............................. 71 5.2.1. SearchCapabilities . 72 5.2.2. VisualisationCapabilities . .. 75 5.2.3. CommunicationCapabilities . 77 5.2.4. ProcessingCapabilities. 79 5.2.5. UserInterfaceRequirements . 81 5.2.6. BackendandAdministrationfeatures . .. 84 5.3. ArchitecturalDesign . 86 5.3.1. OverallArchitecture . 86 5.3.2. TheIMPExRegistry . .. .. .. .. .. .. .. 91 5.3.3. TheIMPExMessagingAPI . 96 6. Results and Discussion 102 7. Conclusions and Outlook 106 A. IMPEx Configuration 109 B. IMPEx Portal Map 112 C. IMPEx Portal Main Classes 120 Glossary 122 List of Figures 126 List of Tables 127 Listings 128 Bibliography 129 1 1. Introduction The EU FP7-SPACE project “Integrated Medium for Planetary Exploration” (IMPEx) is currently evolving towards a standardised Virtual Observatory (VO) and is reach- ing a major milestone in development of distinct services and databases which were specified in Topf (2012a). After a challenging process of defining homogenous web service interfaces and elaborating a robust data model for specific semantics, the project is now entering its first public review phase. This thesis is a product of this phase as it is discovering new possibilities to provide the actual IMPEx services to a wider audience for scientific and educational purposes through one single entry point, the IMPEx portal. 1.1. Present Status and Motivation Currently the IMPEx project is in its implementation phase after definition of the IMPEx architecture which is comprised of three distinct service layers according to Topf (2012a, pp. 25): the user layer, the tool layer and the resource layer. The basis of the IMPEx infrastructure is formed by the resource layer which provides access to stateless data services through metadata registries for observational and simulation data. The data services are comprised of searchable XML trees compliant with the IMPEx data model described in Topf (2012a, pp. 30). These metadata files are accessible through the database endpoints described in the IMPEx configuration file (see appendix A). On top of the resource layer all database endpoints

Load more