<<

IBM Research - Haifa

Long Term Digital Preservation: An IT Perspective

Simona Rabinovici-Cohen IBM Research - Haifa [email protected] June 22, 2011

Presented at TRANSISTOR 2011: Preservation techniques and methodologies for digital audiovisual works Crete, June 22-25, 2011 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa IBM Research – Over 3,000 Researchers Worldwide

2 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa

Identity Card

° Haifa research lab – IBM’s largest Research facility outside the US – Employs ~500 researchers – Spans many IBM Research strategy areas

° Storage Systems research group – Our mission is to advance the state of art of IBM’s storage systems and management products – We conduct research in advanced storage functionalities – Very active in data preservation – Partner in concluded CASPAR EU project – Lead ENSURE EU project – Lead standardization efforts in SNIA

3 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa

Agenda

° Background – The Challenge of LTDP – Preservation Approaches – The OAIS Standard

° Haifa Storage Tools for LTDP – Preservation DataStores (PDS) – LTDP Assessment Tool – SIRF Standardization – PDS Cloud

° Publications

4 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa The Long Term Digital Preservation (LTDP) Challenge

° These documents were created by pre-digital societies. The media and information content are still interpretable.

Dead Sea Scroll, ~70AD. Media: Copper. Language: Hebrew.

Mayan Glyph, Palenque ~630AD.

° This information was created a few years ago. – Will the media last for 20 years? – Will it be possible to access, interpret and present the data in 20 years? 50? 100?

5 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa

The Incredible Growth of Digital Data

° IDC Report 3/2007: 6-Fold growth in 4 • IDC Report 5/2010: 44-Fold growth in 11 years years – 2006 – 161 exabytes (10^18 bytes) • 2009 – 0.8 zetabytes (10^21 bytes) data was created • 2010 – 1.2 zetabytes • 3 million times the information in all the books ever written • 2020 – 35 zetabytes • 12 stacks of books from Earth to the • 1 ZB is a pile of DVDs over Sun 250000km high • You could wrap earth for 4 times – 2010 – 988 exabytes

6 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa Analog vs. Digital Preservation of Information

Millennium Very Hard

Centuries Hard

Decades

y e e t t t it em m c c ex ce gr st ti tim je je nt n te y ife fe b b o a in s Legend l li t o o c en t n ia r c ad ’s v c tio d to ra e ct ro je a e c xt r e p b rv M fa e to bj s o e Digital o o t’ g s rm t ity d ec in re fo ty il n j ur p l ili b ta ob s e ca b A rs g n th Analog si A e in E g hy nd w in P U no rv K se re P 7 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa

The Digital Dark Age: Data Sustainability Paradox

° The world becomes more digital with richer interpretations and usages for the data – Growth in “born-digital” data: HDTV, digital cameras, healthcare devices, imaging – Conversion of formerly analog information to digital: Films, voice calls, TV signals ° But preservation of digital data is much more difficult than analog data

Paradox As the world becomes digital, the world’s data is more in danger to be lost ! Our ability to store digital bits increases, but our ability to store them over time decreases !

8 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa

What’s Driving LTDP? – The SNIA Survey

Preservation of business history Other Top External Factors Driving Preservation of business history Other Protection of customer privacy Security Risk Long-Term Retention Protection of customer privacy Security Risk Protection of business or Protection of business or Business Risk Security Risk intellectual assets Business Risk Security Risk Requirements: intellectual assets Retaining history for Retaining history for Business Risk competitiveness or protection Business Risk Legal Risk, competitiveness or protection Protection from compliance or Compliance Protection from compliance or Compliance legal fines Requirements Compliance Regulations, legal fines Requirements Meeting regulatory Compliance Meeting regulatory Compliance requirements Requirements Business Risk requirements Requirements Meeting regulatory Meeting regulatory Legal Risk requirements Legal Risk requirements Concern with ligitation Concern with ligitation Legal Risk protection Legal Risk protection 0% 10% 20% 30% 40% 50% 60% 0% 10% 20% 30% 40% 50% 60% Percent of Respondents Percent of Respondents

Source: SNIA-100 Year Archive Requirements Survey, January 2007. >100 Years What does Long-Term 18.3% 38.8% >50-100 Years Mean? 13.1% >21-50 Years More than 20 years 15.7% >11-20 Years retention is required by 12.3% >7-10 Years 70% of polls.

1.9% >3-6 Years

0.0% 5.0% 10.0% 15.0% 20.0% 25.0% 30.0% 35.0% 40.0% 9 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa

What’s Driving LTDP? – Data for Future Use

° Limone Sul Garda brought new drug via genes preservation – 1000 residents – Many have long life (40 residents live 100+) – No thickness of blood vessels – even if cholesterol is high

– Many residents have the gene that generates A1-milano protein – A1-milano quickly removes fat from arteries leading it to liver – A new drug for cardiovascular diseases was discovered

° Can we do this with digital data? Should we preserve data for future unknown benefit?

10 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa

Bit Preservation vs. Logical Preservation

° Bit preservation – ability to restore the bits in the presence of – storage media degradation, storage media obsolescence, environmental catastrophes like fire, flooding, etc. • The life-span of disks: 3-5 years, tapes: 5-10 years, CDs and DVDs: 10- 20 years – Products exist to some extend – copy services, refreshment, error correcting codes modules ° Logical preservation - preserving the understandability and usability of the data in the future – current technologies for computer hardware and software may not exist anymore, and the users of the data may not be born yet. – How does one ensure the provenance of the data? – How does one ensure only legitimate users can access the data? – Technology is still in research phase

11 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa

Preservation Approach - Museum

° Museum Approach to LDTP –Original state of content and rendering devices preserved –Maintained and operational –Pros • No loss of information –Cons • Expensive • Time bounded • Not scalable • Warranty + spare parts

12 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa

Preservation Approach - Emulation

° Emulation Approach to LDTP –Adapt rendering device by emulating it • Up to date software + computers –Pros • Reduces problem to preserving emulation platform • Cost proportional to number of rendering formats –Cons • Upfront investment • Only for data coupled with software • Does not allow new interpretations of the data

13 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa

Preservation Approach - Migration ° Migration Approach to LDTP –Migrate to newer formats –Pros • Less investment when data ingested • Allows new uses of data –Cons • Can introduce noise • Cost proportional to data size • Continuous cost

14 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa

Preservation Approach - Descriptive

° Descriptive Approach to LDTP –Add to fully describe representation of data • Allows writing code in future to process format

Capture –Pros • No loss of information Data • Minimal assumptions on future Storage Render • Delays cost until needed –Cons Metadata • Doesn’t support proprietary formats • May have future high cost

15 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa

Preservation Approach - Encapsulation ° Encapsulation Approach to LDTP –Group together data and related metadata • Includes instructions to enable future interpretation –Pros • Most flexible • Consistent with everything but Museum approach • OAIS compliant –Cons • Doesn’t tell you what to do

16 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa

Open Archival Information System (OAIS) ° ISO standard reference model (ISO:14721:2002) Functional Model ° Provide fundamental ideas, concepts and a reference model for long-term archives ° Archival Information Package (AIP) - a logical structure for the preservation object that needs to be stored to enable future interpretation ° Content Data Object (CDO) – raw data to be preserved

Information Model

AIP 17 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa OAIS AIP Logical Structure

Content Information Preservation Descriptive Information

Reference Provenance Representation Information Representation Context Fixity Information

Access Rights

0-1 1-* ° Content Data Object - the raw data that is the focus of the preservation. ° Representation Information (RepInfo) – the information required to interpret the raw data to its designated community. ° Reference – globally unique and persistent identifiers for the content information. ° Provenance – the history and the origin of the content information and any changes that may have taken place since it was originated, and who has had custody of it since it was originated. ° Context – documents reason for creation of the content information and relationship to its environment. ° Fixity – a demonstration that the particular content information has not been altered in an undocumented manner. ° Access Rights - the information that identifies the access restrictions pertaining to the Content Information, including the legal framework, licensing terms, and access control.

18 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa

Preservation in a Nutshell with OAIS

Object to Preserve Usable Preserved and Metadata Object

Submission Information Dissemination Information P r

Package (SIP) Package (DIP) e L s o e g r

Create i Extract AIP, v c a Archival “tools” and a t l Information Descriptive i

MetaData o Logical Emulators Package Information needed to make n (AIP) Transformations AIP usable

e.g. AIP for Word e.g. doc to pdf/a e.g. VM image with e.g. (1) spec of format 2003 SP3 document transformation Office 2003 SP3 (2) text summary P r e s e B r i v t

a t i o n

19 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa

Agenda

° Background – The Challenge of LTDP – Preservation Approaches – The OAIS Standard

° Haifa Storage Tools for LTDP – Preservation DataStores (PDS) – LTDP Assessment Tool – SIRF Standardization – PDS Cloud

° Publications

20 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa What do we have in Haifa? ° Infrastructure: Preservation DataStores (PDS) – CASPAR: Cultural, Artistic and Scientific knowledge for Preservation, Access and Retrieval • PDS provides the storage infrastructure of CASPAR EU project – Archiving: Long term retention capabilities to existing systems • Demo of preservation support to enterprise content management (ECM) and archiving systems – ENSURE: Enabling kNowledge, Sustainability, Usability and Recovery for Economic Value • Examining use of cloud for preservation infrastructure in ENSURE EU project

° Assessment: Long Term Digital Preservation Assessment (LTDPA) – Research tool to evaluate organization’s ability to preserve its digital resources. Based upon emerging standard audit checklists (ISO 14721)

° Standards: Storage Networking Industry Association (SNIA) – IBM co-chairs the Long Term Retention technical working group – LTR develops a Self-contained Information Retention Format (SIRF)

21 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa

Preservation Aware Storage: A New Storage Paradigm

Preservation Aware Storage storage component of a digital preservation system that has built-in support for preservation.

° While traditional storage supports bit preservation at most, preservation aware storage supports logical preservation as well. ° Preservation Aware Storage supports offloading functionality to the storage layer – Decrease the probability of – Simplify the applications – Provide improved performance and robustness

22 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa

Preservation DataStores (PDS) Overview

° OAIS–based preservation-aware storage that supports LTDP ° Offload OAIS-based functionality to: – Decrease probability of data loss – Simplify the applications – Provide improved performance and robustness ° Manage preservation metadata ° Supports automation of preservation processes

° PDS is the storage infrastructure of EU project CASPAR ° PDS is available at alphaWorks - http://www.alphaworks.ibm.com/tech/pds

23 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa

PDS Main Functionality

– AIP generation - generation of preservation metadata and creation of AIPs with various packaging formats • Metadata enrichment - automatic extraction of metadata from the submitted content data and addition of RepInfo and Preservation Descriptive Information (PDI)

– Data transformations - provide the ability to load transformation modules (storlets), apply them on AIPs and generate new AIP versions • Storlets are restricted modules with predefined interfaces used to execute data intensive functions, e.g., transformations, fixity calculation

– Fixity management – flexible periodic fixity (integrity) checks where multiple loadable fixity modules can be used and the fixity values are stored in a standard PREMIS (v2) format

– RepInfo management - allows sharing, search and categorization of RepInfos

24 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa

CASPAR and PDS

•PDS was deployed in ESA for GOME data •PDS was deployed in ASemantics •PDS is available at AlphaWorks 25 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa

AIP Generation Example: Preserving an Audiovisual Object

° Object – A Windows Media Player video clip ° “Kia playing violin for gesture analysis on 2007-02-16 at 12:42:17 in ICSRiM - University of Leeds” – This is part of i-Maestro project demonstration and shows violin bowing visualization. – This is a synchronized version of video, sound and 3d motion for analysis of violin bowing.

26 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa

AIP Contents

Content Information Preservation Descriptive Information

Content Data Object Reference Provenance

Representation Information Context Fixity

Access Rights Rep Info

27 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa

Content Data Object

Video clip in Windows Media Video format

28 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa

Representation Information

Record 1: Windows Media Player Homepage (URL)

Record 2: Notes sheet (PDF)

29 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa Provenance

Record 1: Pre-ingest T 2009 15 pKeia> pIlnagyiengs tvesture analysisamIneg>est AIPcord 3: Access 2007-02-16T12:42:17.490000Z FixityValidaATuSeS EMDay 19 21:33:21 IDT 2009 eICreSnRciMeV- Uanlivdearstiiotyn o=f NLeOedTs<_/PPeErfRorFmOanRceMPElace> This is part of i-Maestro projecstAracticoen> D Access AIP Violin bowing visualisationess>FixityValidation=PASSED archive/15/gnRpevac_odredm o4_:v iAdecoc_e1-s1s.wmv ReferenceValidation=PASSED WMVDate>Thu May 20 09:23:34 IDT 2088 This a synchronised version of video, sound and 3d motion for analysis of violin bowing<tioAnc>cess Access AIP FixityValidation=PASSED ReferenceValidation=PASSED

30 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa

Context

Record 1 http://www.i-maestro.org/

Record 2 http://www.icsrim.org.uk/

31 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa

Fixity

Record 1: External null CRC32 3e80d89c

Record 2: Internal Tue May 19 21:29:38 IDT 2009 CRC32 3e80d89c

32 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa Transformation with Storlets

ingest original AIP PDS Web Services

PDS Server original content

original RepInfo Original AIP

original PDI*

*Preservation Descriptive Information

33 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa Transformation with Storlets

load transformation PDS Web Services

PDS Server transformation module (content)

RepInfo for transformed Original AIP Transformation AIP content

PDI of transformation RepInfo of transformation

34 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa Transformation with Storlets

invoke transform AIP PDS Web Services

PDS Server

Original AIP Transformation AIP Transformed Content

RepInfo for Transformed PDI generated Content New AIP by PDS

35 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa The LTDPA tool helps in assessing the capabilities of an archival organization to deliver Long Term Digital Preservation services

Based on OCLC RLC audit checklist, the tool helps in evaluating the compliance level of Organization, Processes and Technology with the OAIS reference model and best practices.

The Tool holds: ° Knowledge of expert community & ISO Best Practices ° Assessment checklist ° Evaluation metrics The Tool enables: ° web based data collection ° quantitative analysis ° report generation ° common repository buildup & usage The Tool can be used for: ° Identifying gaps between Current State vs. Best Practice or vs. Desired State ° Comparative analysis and Industry benchmarks ° Knowledge Transfer 36 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa

LTDPA Tool Workflow

1 Gather Data (1) Within an LTDPA engagement, engagement definition is set up, indicating individual client respondents. LTDPA Repository

The respondents can then log-in using the LTDPA web survey tool, fill out surveys and save their responses back to the repository.

(2) With all of the client responses 2 Analyze & Report stored in the database, the engagement leader can now load the collective set of clients results, view basic statistical results, analyze the data, and export diagrams and data LTDPA Repository to MSOffice formats for deliverable creation

(3) Since all the data gets stored in 3 Historical & Benchmarking Analysis ‹ Year over Year the single repository, the results from a given engagement can be used ‹ Group vs Group again, either as a time sequence LTDPA Repository when the assessment is performed ‹ Client vs Industry again at that same client, or as part of a benchmarking exercise, etc. ‹ Etc. Based on CAT Overview presentation, Matt Callery, IBM Research, Fall 2006 37 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa “Client X” LTDP Assessment Summary – Current-State and Desired State Maturity-Levels

Index:

38 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa Self-contained Information Retention Format (SIRF)

° Being developed by Storage Networking Industry Association (SNIA), Long Term Retention (LTR), Technical Working Group (TWG) – Co-chaired by IBM and Symantec

° SIRF is a logical container format appropriate for long-term storage of digital information – Preserves collections of objects and their relationships – Includes generic metadata that can be extended with domain specific information for fast access – Can be mapped to and physically migrated between a wide variety of underlying storage systems

° SIRF use cases and requirements document is released for public review – http://www.snia.org/tech_activities/publicreview

39 39 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa

An Analogy ° Standard archival box Photo courtesy Oregon State Archives – Archivists gather together a group of related items, known as a collection – Collection is placed in a physical box container – The box is labeled with information about its content e.g., name and reference number, date, contents description, destroy date • And there’s an online (XML) finding aid – When contents migrated they’re added to box

° SIRF is the digital equivalent – Logical container for a set of (digital) preservation objects and a catalog – The SIRF catalog contains metadata related to the entire contents of the container as well as to the individual objects – SIRF standardizes the information in the catalog 40 40 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa

SIRF Components

A SIRF container includes: ° A magic object: identifies SIRF container and its version ° Numerous preservation objects that are immutable ° A catalog that is – Updatable – Contains metadata to make container and preservation objects portable into the future without external functions

41 41 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa

Agenda

° Background – The Challenge of LTDP – Preservation Approaches – The OAIS Standard

° Haifa Storage Tools for LTDP – Preservation DataStores (PDS) – LTDP Assessment Tool – Standardization – PDS Cloud

° Publications

42 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa Cloud Computing: Hottest Topic in the Industry…

43 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa

Cloud Technologies

° Usage – Amazon S3 stores over 260 billion objects today and 1 trillion by 2012 – 15% of all digital data will be in the cloud by 2020 (IDC) with another 20% touching the cloud ° Why is it so appealing for preservation? – Cost, availability, scalability

44 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa Cloud Computing: What’s Driving it?

1. Cost Reduction: vs. –Cloud: Highly virtualized with many users sharing the same hardware

2. Technology Maturity Cycle °New: Wow, it works! –Business: Focus higher in the solution stack –Cloud: Companies who are moving to the cloud are focusing on their business, not technology. vs.

3. Payment model: Pay per use to reduce bar of adoption vs. –Cloud: Pay per use with immediate provisioning

45 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa Preservation in the Cloud: Is it an option? Or can it be made into an option?

° Relevance to LTDP – Can YouTube become the long-term-digital-preservation solution for videos? – Can GoogleDocs become the long-term-digital-preservation solution for documents? – Etc, ect… ° There’s a lot missing today – Security and SLA guarantees – Access and performance model (% downloads vs % uploads) – Auditability, compliance and regulatory – Long term trust in the cloud provider – Preservation layer

46 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa ENSURE: Enabling kNowledge Sustainability, Usability and Recovery for Economic value

° ENSURE is a recently started (Feb ’11) FP7, Call 6 EU Project, coordinated by IBM in the area of digital preservation

° There is a need to take a more business/industry-oriented focus

° ENSURE addresses this need by focusing on HCLS and finance use cases

° In addressing this need, ENSURE’s specific objectives are driven by the needs of businesses and regulatory compliance

47 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa Overview of ENSURE’s area for innovation

Ability to compose different quality Apply cutting edge ICT to digital solutions at different costs preservation solutions

Preservation Lifecycle Management Content-aware long-term data of environmental changes, evolution protection of ontologies, quality of digital objects etc. over time

4488 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa Benefits and issues in using a cloud model for digital preservation

Cloud Security Cloud TechnologyT he Benefits of Clouds: Requirements Requirements • Scalable in number of – Support for object – Multi-cloud support o(bejxepcotsrt ,a nd size of data provenance, certification, replication) • Pay-as-you-go auditing, … – Programmatic vis•ibSilihtyaring across geographic – Trust over time (SLAs, events) domains – available – Computation near daantyawhere – Changes over time – Integration with lifecycle management

4499 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa PDS Cloud: Extending PDS to the Cloud

AIP

° Map OAIS AIP and the links among AIPs to the cloud data model ° Multi cloud support while considering self-containment and self-describing implications ° Study the use of computational cloud storage for preservation (PDS Storlets) ° Support flexible integrity (fixity) checks and auditing capabilities ° Map preservation policies to the cloud including use of offline media

50 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa

Agenda

° Background – The Challenge of LTDP – Preservation Approaches – The OAIS Standard

° Haifa Storage Tools for LTDP – Preservation DataStores (PDS) – LTDP Assessment Tool – Standardization – PDS Cloud

° Publications

51 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa Publications

° “Towards SIRF: Self-contained Information Retention Format” – 4th Annual International Systems and Storage Conference (SYSTOR), May 30-June 1, 2011 ° “Preservation DataStores” chapter in “Advanced Digital Preservation” book – www.springer.com/978-3-642-16808-6 ° "Using XFDU for CASPAR Information Packaging" – OCLC Systems & Services: International Digital Library Perspectives, Vol. 26 No. 2, 2010 ° "Authenticity and Provenance in Long Term Digital Preservation: Modeling and Implementation in Preservation Aware Storage“ – USENIX First Workshop on the Theory and Practice of Provenance (TaPP), February 23, 2009, San Francisco ° “Preservation DataStores: New Storage Paradigm for Preservation Environments“ – IBM Journal of Research and Development on storage Technologies and Systems, Volume 52, Number 4/5, 2008 ° “Preservation DataStores: Architecture for Preservation Aware Storage” – IEEE Conference on Mass Storage Systems and Technologies (MSST), September 2007, San Diego, USA. ° “The Need for Preservation Aware Storage - A Position Paper". – ACM SIGOPS Operating Systems Review, Special Issue on File and Storage Systems, Volume 41, Issue 1 (Jan 2007), pp 19-23. ° “Towards OAIS-Based Preservation Aware Storage - A White Paper“. – http://www.haifa.il.ibm.com/projects/storage/datastores/public.html

52 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa

Many Thanks … to the IBM Haifa Team !!

Shimon Agassi Ealan Henis Aner Hamama Orit Edelstein Michael Factor John Marberg Kenneth Nagin Dalit Naor Leeat Ramati Petra Reshef Shahar Ronen Eliot Salant

53 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation IBM Research - Haifa

54 http://www.haifa.il.ibm.com/projects/storage/datastores/index.html © 2011 IBM Corporation