Preserving Queries Within Relational Databases

Preserving Queries within Relational Databases By: Somaia Alaofe 2139515 [email protected] College of Science and Engineering Supervisor: Dr Denise de Vries March 2018 “Submitted to the School of Computer Science, Engineering, and Mathematics in the Faculty of Science and Engineering in partial fulfilment of the requirements for the degree of master of computer science at Flinders University – Adelaide Australia.” i DECLARATION "I certify that this work does not incorporate without acknowledgment any material previously submitted for a degree or diploma in any university; and that to the best of my knowledge and belief it does not contain any material previously published or written by another person except where due reference is made in the text." Signature Date ___________________________ ___________________________ Somaia Alaofe 09/03/2018 ii ABSTRACT Most scientific research is stored in digital formats; however, with technology obsolescence, important resources for future research can be lost. Relational databases (RDBs), including component tables, relationships, and queries, are an important digital format for preserving research results. Therefore, preserving databases for a long time, regardless of any software or hardware issues, is critical. Previous studies have developed database preservation but have focused on preserving data, relationships, and keys rather than reports, pages, and queries within a database. Therefore, this study proposes a strategy for long-term digital preservation of queries within RDBs because queries are a major component of providing accurate, relevant data that lead to better insights, better decision making, and improved business. This research surveyed literature on RDB preservation and identified a gap in query preservation. There are a range of challenges in place regarding the variation of manipulation languages between RDBs, and four database management systems were surveyed to determine how queries can be manipulated on each platform. The proposed solution analyses relational algebra trees to preserve queries and researches eXtensible Markup Language (XML) to present relational algebra trees in XML format. Therefore, this research suggests an optimal approach to query preservation in various RDB platforms. A preservation methodology for queries is proposed based on converting all data manipulation language representation among various RDBs into one format: an XML file. This XML file does not preserve the data manipulation language itself but rather the relational algebra trees of queries. Thus, the logic of queries will be preserved in both human and computer readable formats. iii ACKNOWLEDGMENTS I would like to take this opportunity to first and foremost thank God for being my strength and guide in the writing of this thesis. Also, I would like to thank my husband Badr and my parents for their continuous support and encouragement throughout my research. This achievement would not have been possible without them. I would like to express my sincere gratitude to my advisor Dr Denis De Vries for her continuous support of my thesis, as well as her patience, motivation, enthusiasm and immense knowledge. Also, I thank Dr Carl Mooney who has supported my thesis through the database sample that he has provided me with. Finally, I would like to thank my wonderful children, Leen and Hassan, for always making me smile and for understanding those times when I had to work on this thesis instead of spending time with them. I hope I have been a good mother and that I have not lost too much during the tenure of my study. iv TABLE OF CONTENTS DECLARATION ii ABSTRACT iii ACKNOWLEDGMENTS iv TABLE OF CONTENTS v LIST OF FIGURES vii LIST OF TABLES viii CHAPTER ONE: INTRODUCTION 9 1.1 Background .......................................................................................................................................... 9 1.2 Research Problem .............................................................................................................................. 10 1.3 Research Objective ............................................................................................................................ 10 1.4 Research Questions ........................................................................................................................... 10 1.5 Research Significance......................................................................................................................... 10 1.6 Research Methodology ...................................................................................................................... 11 1.7 Thesis Structure ................................................................................................................................. 11 CHAPTER TWO: LITERATURE REVIEW 13 2.1 Introduction ....................................................................................................................................... 13 2.2 RDBs ................................................................................................................................................... 13 2.2.1 Digital Data 13 2.2.2 Databases 14 2.2.3 RDBs 14 2.2.4 Queries 14 2.3 Long-Term Preservation .................................................................................................................... 16 2.3.1 Preservation Context 16 2.3.2 Current Implementation Practices 16 2.3.3 Repositories 16 2.3.4 Emerging Database Preservation Technologies 17 2.3.5 Challenges to Data Preservation 17 2.4 Relational algebra .............................................................................................................................. 18 2.4.1 Algebra Definition 18 2.5 Limits of Existing Practices ................................................................................................................. 19 2.6. Optimisation ..................................................................................................................................... 19 2.7 Research Gaps .................................................................................................................................... 19 CHAPTER THREE: METHODOLOGY 21 3.1 Introduction ....................................................................................................................................... 21 v 3.2 Research Procedures ......................................................................................................................... 21 3.3 RDB Characteristics ............................................................................................................................ 22 3.3.1 MS Access 22 3.3.2 MS SQL Server 24 3.3.3 MySQL 26 3.3.4 Oracle 27 3.4 Survey Results .................................................................................................................................... 29 3.5 Proposed Solution .............................................................................................................................. 32 3.5.1 Relational Algebra 32 3.5.2 XML as a Preservation Strategy 33 3.6 Discussion .......................................................................................................................................... 34 CHAPTER FOUR: CASE STUDY 36 4.1 Introduction ....................................................................................................................................... 36 4.2 Queries Translation ............................................................................................................................ 36 4.2.1 Ms Access 37 4.2.2 MS SQL Server 40 4.2.3 MYSQL 43 4.2.4 Oracle 48 CHAPTER FIVE: CONCLUSION 54 5.1 Conclusion .......................................................................................................................................... 54 5.2 Future Work ....................................................................................................................................... 54 REFERENCES 56 Appendices 63 Appendix: Conversion Algorithm ............................................................................................................. 63 vi LIST OF FIGURES Figure 1. MS Access architecture (Oracle, 2013) .......................................................................... 23 Figure 2. Relational algebra tree for previous expression.............................................................. 33 Figure 3. RA tree for select query represented by union ............................................................... 37 Figure 4. RA tree for select query represented by minus ............................................................... 39 Figure 5. RA tree for select query in MS SQL Server .................................................................... 40 Figure 6. RA tree for update query in MS SQL Server .................................................................. 42 Figure 7. RA tree for select query in MY SQL ............................................................................... 43 Figure 8. RA tree for second select query in MY SQL ................................................................... 43 Figure 9. RA tree for third select query in MY SQL ........................................................................ 44 Figure 10. RA tree for fourth select query in MY SQL ................................................................... 45 Figure 11. RA tree for update query in MY SQL ...........................................................................

Load more