File Size for Two Types of Query As the Retrieved Resultset Increases

PROVENANCE SUPPORT FOR SERVICE-BASED INFRASTRUCTURE Shrija Rajbhandari School of Computer Science Cardiff University This thesis is submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy April 2007 UMI Number: U585009 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion. Dissertation Publishing UMI U585009 Published by ProQuest LLC 2013. Copyright in the Dissertation held by the Author. Microform Edition © ProQuest LLC. All rights reserved. This work is protected against unauthorized copying under Title 17, United States Code. ProQuest LLC 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, Ml 48106-1346 Declaration This work has not previously been accepted in substance for any degree and is not concurrently submitted in candidature for any degree. Signed (candidate) Date STATEMENT 1 This thesis is being submitted in partial fulfillment of the requirements for the degree of PhD. C \ - j r s Signed ................. 2 2 2 2 ^ ......................... (candidate) D ate...... STATEMENT 2 This thesis is the result of my own investigations, except otherwise stated. Other sources are acknowledged by explicit references. Signed ................ 2 2 ^ 2 ........................ (candidate) Date ..... STATEMENT 3 I hereby give consent for my thesis, if accepted, to be available for photocopying and for interlibrary loan, and for the title and summary to be made available to outside organisations. Signed ................2 2 2 ^ 2 ........................ (candidate) Date ..... A bstract Service-based architectures represent the next evolutionary step in the develop ment of e-science, namely, the transformation of the Internet from a commercial mar ketplace to a mechanism for sharing multidisciplinary scientific resources. Although scientists in many disciplines have become increasingly reliant on distributed comput ing technologies for data processing and dissemination, the record of the processing history and origin of a data product, that is its data provenance , is often nonexistent, incomplete or impossible to recover by potential users. This thesis aims to address data provenance issues in service-based environments, particularly to answer how a scientist who performs a workflow execution in such an environment can (1) docu ment the data provenance for a data item created by the execution, and (2) use the provenance documentation as a recipe to re-execute the workflow. This thesis pro poses a provenance model for delivering data provenance support in a service-based environment. Through the use of an example scenario of a scientific workflow in the Astrophysics domain, we explore and identify components of the provenance model. The provenance model proposes a technique to collect and record data provenance for service-based workflow executions. The technique facilitates the collection of data provenance of workflow execution at runtime. In order to record the collected data provenance, the thesis also proposes a specification to represent provenance to de scribe the processing history whereby a piece of data was derived. The thesis also proposes query interfaces that allow recorded provenance to be queried, has formu lated a technique to construct provenance graphs, and supports the re-execution of past workflows. The provenance representation specification, the collection technique, and the query interfaces have been used to implement a prototype system to demon- Abstract iv strate the proposed model. The thesis also experimentally evaluates the scalability of the components implemented. Contents Title P a g e............................................................................................................ i Declaration ......................................................................................................... ii A bstract............................................................................................................... iii Table of C ontents ............................................................................................... v List of Figures..................................................................................................... viii List of Tables..................................................................................................... xi Acknowledgments ............................................................................................... xii 1 Introduction 1 1.1 Motivation ................................................................................................. 1 1.2 Research Objectives and Approach ...................................................... 5 1.3 Research Contributions ........................................................................... 7 1.4 Outline of the Thesis ............................................................................... 8 2 Literature Review 10 2.1 Introduction .............................................................................................. 10 2.2 Provenance in Query-based Data Processing S y stem s ......................... 12 2.3 Provenance in Domain-Specific A pplications ........................................ 18 2.4 Provenance Middleware and Provenance in Other Application Systems ....................................................................... 22 2.5 Granularity of Provenance ..................................................................... 25 2.6 Use and Benefits of Provenance ............................................................ 26 2.6.1 Data Quality Benefits ................................................................ 26 2.6.2 Data Processing Benefits .............................................................. 28 2.7 Service Oriented Architecture and Provenance .................................... 32 2.7.1 Identifying Specific Tasks in a Provenance System .................... 34 2.7.2 Provenance Web Services .............................................................. 36 2.8 Summary .................................................................................................. 40 3 Provenance Model for a Web Process 42 3.1 Introduction ............................................................................................... 42 v Contents vi 3.1.1 Architecture Contributions .......................................................... 44 3.2 An Example Scenario ................................................................................ 46 3.3 Provenance Model ................................................................................... 51 3.3.1 Identifying and Representing Provenance .................................. 54 3.3.2 Capturing and Recording Provenance ........................................ 63 3.3.3 Provenance Querying and Reasoning ........................................ 67 3.4 Summary .................................................................................................. 70 4 Provenance Representation and Capture in SOAs 72 4.1 Introduction ............................................................................................... 72 4.2 Provenance Modelling ............................................................................ 74 4.2.1 Identifying the service-Provenance of a process ...................... 77 4.2.2 Identifying process-Provenance ................................................ 78 4.2.3 Identifying service-Provenance .................................................... 80 4.2.4 Identifying Data Link ................................................................... 84 4.2.5 Provenance Format (p-format) .................................................... 86 4.3 Provenance Collection Service ................................................................ 87 4.3.1 Provenance Recording Interface ................................................. 87 4.4 S u m m a ry .................................................................................................. 94 5 Provenance Querying and Analysis Tool 96 5.1 Introduction ............................................................................................... 96 5.2 Process Provenance Query Interface ....................................................... 97 5.2.1 Process Provenance Graph Construction .................................. 97 5.2.2 Process Re-Execution ................................................................... 103 5.3 Provenance Reasoning Query Interface ................................................. 104 5.3.1 Query Data C om m and ................................................................ 106 5.3.2 Query Data F ilter ......................................................................... I l l 5.4 Summary .................................................................................................. 120 6 Provenance Prototype: Implementation 122 6.1 Introduction ............................................................................................... 122 6.2 Architecture of the Provenance Collection Service ............................... 123 6.3 Interface Implementation of the Provenance Query Service ............... 130 6.4 S u m m a ry .................................................................................................. 135 7 Evaluation 137 7.1

File Size for Two Types of Query As the Retrieved Resultset Increases

Are We Losing Our Ability to Think Critically?

Autumn 2Copy2:First Draft.Qxd

The Best Nurturers in Computer Science Research

Curating the CIA World Factbook 29 the International Journal of Digital Curation Issue 3, Volume 4 | 2009

Transforming Databases with Recursive Data Structures

A Survey on Scientific Data Management

Towards a Multi-Discipline Network Perspective

O. Peter Buneman Curriculum Vitæ – Jan 2008

The Hyperview Approach to the Integration of Semistructured Data

Using Links to Prototype a Database Wiki

RSE Fellows Ordered by Area of Expertise As at 11/10/2016

Path Queries on Compressed XML∗