Database Preservation: the Dbpreserve Approach

(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 6, No. 12, 2015 Database Preservation: The DBPreserve Approach Arif Ur Rahman Gabriel David and Muhammad Muzammal and Cristina Ribeiro Department of Computer Science Departamento de Engenharia Informatica´ Bahria University, Islamabad Faculdade de Engenharia, Universidade do Porto Pakistan Rua Roberto Frias, Porto, Portugal Abstract—In many institutions relational databases are used of data over time. Selecting a preservation strategy, tools and as a tool for managing information related to day to day formats for preserving data is a complicated task. Typically, activities. Institutions may be required to keep the information decisions depend on the aims for given settings and insti- stored in relational databases accessible because of many reasons tutional needs [5]. However, study shows that the selection including legal requirements and institutional policies. However, of open formats is better for preservation than proprietary the evolution in technology and change in users with the passage formats whenever possible [18]. Data stored in open formats of time put the information stored in relational databases in danger. In the long term the information may become inaccessible may be easily accessible in the long term. New software may when the operating system, database management system or the be developed for accessing data stored in open formats in application software is not available any more or the contextual case the existing software becomes obsolete. Moreover, widely information not stored in the database may be lost thus affecting used formats should be used for preservation as they raise the the authenticity and understandability of the information. prospect that they will continue to be used for a long time [28]. In addition to this, formats for which a variety of writing and This paper presents an approach for preserving relational databases for the long-term. The proposal involves migrating rendering tools are available should be given preference [20]. a relational database to a dimensional model which is simple Metadata inherent to data and about the whole environment to understand and easy to write queries against. Practical around them should be collected. It includes all the technical transformation rules are developed by carrying out multiple case metadata like the software used for creation and preservation studies. One of the case studies is presented as a running example of data and non-technical metadata like the people involved in the paper. Systematic implementation of the rules ensures no in the creation of data and the reasons behind the creation. loss of information in the process except for the unwanted details. Another good practice may be to minimize the dependence The database preserved using the approach is converted to an of digital objects on users and software from the operational open format but may be reloaded to a database management environment and preservation environment. system in the long-term. Keywords—Database Preservation, Transformation Rules Different types of digital objects are created and managed by organizations for their operation including text documents, images, graphics, audio, videos, databases, websites and soft- I. INTRODUCTION ware. Different preservation approaches are required to pre- There are many advantages of working digitally. Digitally serve different types of digital objects as their nature and stored data is easily accessible, manageable and helps in structure are different [4], [25]. For example emulation may providing faster and better services to users than their paper be used as a preservation strategy if it is required to provide based counterparts. However, with the benefits also come some access to obsolete software but if the requirement is to provide trade-offs. For example the constant change in software and access to data then migration may be used. The focus of this hardware technologies affect the data. This change may turn paper is on relational database preservation. data unreadable as the old hardware, operating system and application software used for creating, storing and managing Software engineering good practices enforce the three lay- may not be supported by the latest technology. However, ers approach to information systems design. The interface layer data may be relevant for the purpose of evidence of activi- is accessible to users and get access to the data layer through ties, institutional memory, scientific significance or historical the business rules layer. The data layer is implemented in a significance and therefore preserving them may be required. relational database. Relational databases are complex digital Sometimes keeping organizational data is not a choice but it is objects with a well-defined data structure. They are based on mandated by law or by organizational policy. An example is the the formal foundations of relational modeling proposed by [9]. duration for which a university should keep the grades obtained They organize collections of data items into formally-described by students in the courses they studied. This information is no tables. They are designed using the rules of normalization longer required in day-to-day matters after a student completes which is a step-by-step reversible process of replacing a his degree but it may be required as an evidence in the long- given collection of relations by successive collections in which term. the relations have a progressively simpler and more regular structure [6], [16]. Normalization involves splitting large data Keeping in view the significance of data, digital preser- tables into smaller and smaller tables, until reaching a point vation may be required for ensuring the constant availability in which all functional dependencies among columns are 255 | P a g e www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 6, No. 12, 2015 dependencies on the primary key of the corresponding table. is produced following the model migration approach which Thus, preserving the uniqueness of the primary key ensures proposes to migrate a relational database to a dimensional the uniqueness of the representation of the fact subject to model in the preservation procedure. Moreover, practical trans- the dependency. This helps databases to be consistent and formation rules are proposed which help in carrying out the efficient in capturing facts. However, because of normalization model migration procedure. The AIP is stored for the long- the model of a database may become too fragmented and term which may be accessed in the future using a simple to difficult to understand. use and platform-independent tool. The proposed approach is a step further on the existing approaches for database The interface and business rules layers are implemented in preservation. code (triggers, functions, stored procedures and application). It is a fact that preserving code is an even harder issue than II. RELATED WORK preserving data. Code is normally platform-dependent hence the requirement is to preserve the whole engine required to Significant research has already been conducted for pre- run it. Otherwise, there is a danger of losing the derived infor- serving relational databases for the long-term. The conclu- mation which the code is able to generate. The preservation of sions discard approaches like building technology museums code may refer to the generic software preservation problem for preserving specimens of machines, system software and which has been addressed in several ways e.g. emulation and applications, in all their main versions, so that the backups technology preservation, but none of them makes it platform- of every significant system could be used whenever required. independent [11], [19], [23]. The interface part dealing with Emulation is another approach which suggests simulating the presentation and interaction aspects is less relevant from an old hardware or software in new machines [10], [13]. However information preservation perspective. Part of the business rules it is not a permanent solution as technology changes very deal with access control, compliance with organizational policy fast and writing new emulators is required whenever a change of the new transactions, and other operational aspects which in technology occurs. More promising research suggests the are not very relevant as well. However, part of the business conversion of a database into an open and neutral format with rules contain functions able to compute complex derived data a significant amount of semantics associated, hence making which may be absent in the actual database. An example of it independent of the details of the actual DBMS. Such an this is a ranking function for scholarship granting. approach is the software-independent archival of relational database approach presented in the sequel. Data stored in database systems are vulnerable to loss because it may become inaccessible and unreadable when the A. SIARD software needed to interpret them or the hardware on which that software runs becomes obsolete or are lost. Data may also The Swiss Federal Archives (SFA) proposed the Software- become difficult to understand if the contextual information Independent Archival of Relational Databases (SIARD) for needed to interpret them is not known or is lost. Furthermore, if preserving relational databases for the future [12]. A software the structure of the information is too complex

Database Preservation: the Dbpreserve Approach

Solutions for the Chora of Metaponto Publication Series

Database Preservation DPC Training Course

File Format Guidelines for Management and Long-Term Retention of Electronic Records

Curriculum Vitae KEITH W

Preserving Geospatial Data

Preserving Transactional Data and the Accompanying Challenges Facing Companies and Institutions That Aim to Re-Use These Data for Analysis Or Research

Table of Contents in Alphabetical Order

Database Archiving Review

Monumental Computations. Digital Archaeology of Large Urban And

Developing Cultural Heritage Preservation Databases Based on Dublin Core Data Elements

Significant Properties in the Preservation of Relational Databases

Preserving the Documents of the Past and Making Them Accessible to the Future! Volume 45, Number 1 (176) July 2017