HIERARCHICAL and SEMANTIC DATA MANAGEMENT and QUERYING for PATIENT RECORDS and PERSONAL PHOTOS by BRENDAN DAVID ELLIOTT Submit
Total Page:16
File Type:pdf, Size:1020Kb
HIERARCHICAL AND SEMANTIC DATA MANAGEMENT AND QUERYING FOR PATIENT RECORDS AND PERSONAL PHOTOS By BRENDAN DAVID ELLIOTT Submitted in partial fulfillment of the requirements For the degree of Doctor of Philosophy Dissertation Adviser: Dr. Z. Meral Özsoyoğlu Department of Electrical Engineering and Computer Science CASE WESTERN RESERVE UNIVERSITY January, 2009 CASE WESTERN RESERVE UNIVERSITY SCHOOL OF GRADUATE STUDIES We hereby approve the thesis/dissertation of Brendan David Elliott candidate for the Doctor of Philosophy degree *. (signed)Z. Meral Özsoyoğlu (chair of the committee) Daniela Calvetti H. Andy Podgurski Guo-Qiang Zhang Gultekin Özsoyoğlu (date) September 19, 2008 *We also certify that written approval has been obtained for any proprietary material contained therein. 1 Table of Contents List of Tables ................................................................................................................................................... 6 List of Figures ................................................................................................................................................. 7 Acknowledgements ....................................................................................................................................... 10 Abstract ......................................................................................................................................................... 12 1 Introduction ............................................................................................................................................... 14 1.1 Part I: Pedigree Data Management (Chapters 2–6) ......................................................................... 15 1.2 Part II: Semantic Personal Photo Management (Chapters 7-9) ....................................................... 16 1.3 Part III: Semantic Query Processing (Chapters 10-12) ................................................................... 18 Part I: Pedigree Data Management ................................................................................................................ 22 2 An Overview of Pedigree Data Management ............................................................................................ 22 2.1 Pedigrees ......................................................................................................................................... 23 2.2 Pedigree Querying........................................................................................................................... 25 2.3 Previous Work ................................................................................................................................ 26 2.4 Pedigree Modeling .......................................................................................................................... 29 2.5 Discussion ....................................................................................................................................... 30 3 PQL: A Language for Querying Pedigree Data ......................................................................................... 31 3.1 Query starting steps ......................................................................................................................... 31 3.2 Basic axis steps: .............................................................................................................................. 31 3.3 Gendered steps ................................................................................................................................ 32 3.4 Simple Attributes ............................................................................................................................ 33 3.5 Conditional Steps—predicates ........................................................................................................ 33 3.6 Set expression steps ........................................................................................................................ 34 3.7 User-defined (macro) steps ............................................................................................................. 35 3.8 Aggregate Functions ....................................................................................................................... 36 3.9 Combined use with XPath ............................................................................................................... 36 3.10 More Examples ............................................................................................................................... 37 3.11 Discussion ....................................................................................................................................... 38 4 Evaluation of PQL Queries using NodeCodes ........................................................................................... 39 4.1 Labeling for PQL: NodeCodes for Pedigree graphs ....................................................................... 39 4.1.1 Representing Paths with NodeCodes ....................................................................................... 41 4.1.2 Query Evaluation with NodeCodes .......................................................................................... 43 4.2 NodeCode Updates ......................................................................................................................... 46 4.2.1 Child insertion .......................................................................................................................... 47 4.2.2 Progenitor insertion .................................................................................................................. 47 4.2.3 Missing link insertion .............................................................................................................. 48 4.2.4 Merging pedigrees ................................................................................................................... 49 4.3 Synthetic Data Generation .............................................................................................................. 49 4.4 Experimental Results ...................................................................................................................... 50 4.4.1 Experimental data: ................................................................................................................... 51 4.4.2 Experiments on Real Data ....................................................................................................... 52 4.4.3 Experiments on Synthetic Data ................................................................................................ 55 4.5 Discussion ....................................................................................................................................... 58 5 Efficient Evaluation of Inbreeding Queries ............................................................................................... 59 5.1 Inbreeding ....................................................................................................................................... 59 5.2 Review of Pedigree Graph Structure and NodeCodes .................................................................... 60 5.3 Inbreeding Calculations .................................................................................................................. 62 5.4 Inbreeding Coefficient .................................................................................................................... 63 5.5 Calculating Inbreeding Coefficient with NodeCodes...................................................................... 64 5.5.1 Identifying Common Ancestors ............................................................................................... 65 5.5.2 Identifying pairs of paths from common ancestors .................................................................. 67 5.5.3 Identifying Overlapping Pairs of Paths .................................................................................... 68 2 5.5.4 Complexity of Algorithm: ........................................................................................................ 70 5.6 Experiments .................................................................................................................................... 71 5.6.1 Experimental Data ................................................................................................................... 71 5.6.2 Experimental Setup .................................................................................................................. 72 5.6.3 Experimental Results ............................................................................................................... 73 5.7 Discussion ....................................................................................................................................... 80 6 Family NodeCodes for Inbreeding Queries ............................................................................................... 81 6.1 Family-level Graph ......................................................................................................................... 81 6.1.1 Family-level Graph Structure .................................................................................................. 82 6.1.2 Scalability of Family-level Pedigree Graphs ........................................................................... 84 6.2 Family-level