Data Life Cycle: Introduction, Definitions and Considerations
Total Page:16
File Type:pdf, Size:1020Kb
Data Life Cycle: Introduction, Definitions and Considerations EUDAT, Sept. 25, 2014 Prof. Peter Fox ([email protected], @taswegian, #twcrpi) Tetherless World Constellation Chair, Earth and Environmental Science/ Computer Science/ Cognitive Science/ IT and Web Science) Rensselaer Polytechnic Institute, Troy, NY USA Life Cycle 2 Definitions • Data (management) life-cycle broad elements - – Acquisition: Process of recording or generating a concrete artefact from the concept (see transduction) – Curation: The activity of managing the use of data from its point of creation to ensure it is available for discovery and re-use in the future – Preservation: Process of retaining usability of data in some source form for intended and unintended use • Stewardship: Process of maintaining integrity for acquisition, curation, preservation • BUT… 3 Definitions ctd. • (Data) Management: Process of arranging for discovery, access and use of data, information and all related elements. Also oversees or effects control of processes for acquisition, curation, preservation and stewardship. Involves fiscal and intellectual responsibility. 4 5 Digital Curation Centre 6 MIT DDI Alliance Life Cycle 7 Data-Information-Knowledge Ecosystem Producers Consumers Experience Data Information Knowledge Creation Presentation Integration Gathering Organization Conversation Context 8 Data Life Cycle embedded in Research Life Cycle • Information Life Cycle • Knowledge Life Cycle Type of knowledge created • Tacit (created and stored informally): – Human memory – Localize, e.g. hard drive of the computer – Movement of tacit information into a formalized structure • Explicit (created and sorted formally): – Network shared – Network Web site/intranet – Informal knowledge-management system – Document-management system 10 – Formal KM system Curation… Producers Consumers Quality Control Quality Assessment Fitness for Purpose Fitness for Use Trustee Trustor 11 Acquisition meets science 12 Enough with the unlabelled ARROWS and INTERFACES! Workflows and Life Cycles • Yann covered much of this but the question remains: – Why is the life cycle not “just” a workflow? – Well it is: sort of….. • Workflows give internal “provenance” • To capture embedding of data life cycle in research life cycle + external provenance • Provenance in this data pipeline/ life cycle? • Provenance is metadata in context • What context? – Who you are? – What you are asking? – What you will use 20080602 Fox VSTO et al. 15 the answer for? Modeling a Provenance Use Case – data workflow Raw Image • What calibrations Optics Flat-field have been applied Calibration Calibration to this image? Process Data Angle of Calibration Incidence Process Calibration Solar Science concepts Data Junk Data Filtering Filter Data Processing concepts Process Provenance concepts Data Product 16 #_A0 What calibrations have been applied #RawImage Calibration to this image? rdf:type • We construct a query returns any hasInferenceRule #Flat Field #_A0 individuals with type Calibration wasDerivedFrom Calibration #Intermediate used as the InferenceRule in the 1 justification from any artifact the current artifact was derived from. #Angle of #_A1 Incidence • We assume that any calibration Calibration applied to an artifact the current wasDerivedFrom #Intermediate 2 rdf:type artifact was derived from can also Calibration be considered as „applied‟ to the hasInferenceRule current artifact, and that the #_A2 wasDerivedFrom property is wasDerivedFrom transitive #Image 17 Concept Alignment (PML) Instrument Source Observation Period DateTime SourceUsage Data Capture Rule hasSourceUsage Justification Raw Data Conclusion NodeSet Calibration Rule hasAntecedentList Engine Data Calibration Justification Conclusion Data NodeSet Product 18 Plug for provenance in life cycle • Provenance concepts describe how domain concepts are related • Domain and provenance models should be independent, but aligned • Aligning with a well-supported provenance model can enhance interoperability and tool support • Aligned knowledge base supports complex multi-domain query and search 19 Life cycle is a complex issue but no longer intracable • Must be – Managed – Modelled – Documented – Contextualized Information models Provenance • As part of the use case, but also often outside it (pre-condition, trigger, …) 20.