Technologies for Main Memory Data Analysis
Total Page:16
File Type:pdf, Size:1020Kb
TECHNOLOGIES FOR MAIN MEMORY DATA ANALYSIS DISSERTATION FOR THE AWARD OF THE DOCTORAL DIPLOMA ATHENS UNIVERSITY OF ECONOMICS AND BUSINESS 2020 Marios D. Fragkoulis Department of Management Science and Technology Athens University of Economics and Business ii Department of Management Science and Technology Athens University of Economics and Business Email: [email protected] Copyright 2017 Marios Fragkoulis This work is licensed under a Creave Commons Aribuon-ShareAlike 4.0 Internaonal License. iii Supervised by Professor Diomidis Spinellis iv To my parents Contents 1 Introducon 1 1.1 Context ......................................... 1 1.2 Problem statement ................................... 2 1.3 Proposed soluon and contribuons .......................... 2 1.4 Research methodology ................................. 3 1.5 Thesis outline ...................................... 3 2 Related Work 6 2.1 Orthogonal persistence ................................. 7 2.1.1 Background ................................... 7 2.1.2 Scope ...................................... 8 2.1.3 Surveys on orthogonal persistence ....................... 9 2.1.4 Dimensions of persistent systems ....................... 11 2.1.4.1 Persistence implementaon ..................... 12 2.1.4.2 Data denion and manipulaon language ............. 12 2.1.4.3 Stability and resilience ........................ 12 2.1.4.4 Protecon .............................. 12 2.1.5 Persistent systems performing transacons .................. 15 2.1.5.1 Persistent operang systems .................... 15 2.1.5.2 Persistent programming systems .................. 16 2.1.5.3 Persistent stores ........................... 17 2.1.6 Persistent systems performing distributed processing ............. 18 2.1.6.1 Persistent operang systems .................... 19 2.1.6.2 Persistent programming systems .................. 19 2.1.6.3 Persistent stores ........................... 20 2.1.7 Performance Opmisaons .......................... 22 2.2 Programming language query support ......................... 23 2.2.1 Query languages and interacve interfaces for object-oriented data ..... 24 2.2.1.1 Database query languages for object-oriented data ........ 24 2.2.1.2 Query comprehensions to programming language collecons ... 25 2.2.1.3 Ad-hoc queries to main memory D(B)MSs ............. 25 2.2.1.4 Ad-hoc queries to object collecons ................ 26 2.2.2 Relaonal representaons for objects ..................... 26 2.2.2.1 Object-relaonal mapping ...................... 26 2.2.2.2 Relaonships and associaons as rst class programming language constructs .............................. 27 2.2.2.3 Relaonal interface to programming language data structures ... 27 2.2.3 N1NF relaonal data model .......................... 28 2.3 Diagnosc tools ..................................... 29 v vi 2.3.1 Operang system relaonal interfaces ..................... 29 2.3.2 Kernel diagnosc tools that use built-in performance counters ........ 30 2.3.3 Kernel diagnosc tools that use program instrumentaon techniques .... 30 2.3.4 Applicaon diagnosc tools .......................... 31 3 Query interface design 33 3.1 Relaonal representaon of objects .......................... 33 3.1.1 Mapping has-a associaons .......................... 34 3.1.2 Mapping many-to-many associaons ..................... 35 3.1.3 Mapping is-a associaons ........................... 36 3.2 A domain specic language for dening relaonal representaons of data structures 37 3.2.1 Struct view denion .............................. 37 3.2.2 Virtual table denion ............................. 41 3.3 Mapping a relaonal query evaluaon to the underlying object-oriented environment 42 3.4 Relaonal algebra extension for virtual table instanaons .............. 45 3.5 Soware architecture .................................. 45 4 Query interface implementaon 49 4.1 Generave programming ................................ 49 4.2 Virtual table implementaon .............................. 50 4.3 SQL support ....................................... 50 4.4 Query opmizaons ................................... 51 4.5 Query interface ..................................... 51 4.6 Embedding i in applicaons ........................... 52 4.7 Loadable module implementaon for the Linux kernel ................ 53 4.7.1 Query interfaces ................................ 54 4.7.2 Security ..................................... 54 4.7.3 Synchronized access to kernel data structures ................. 55 4.7.3.1 Denion of consistency ...................... 55 4.7.3.2 Synchronizaon in i ...................... 55 4.7.3.3 Limitaons .............................. 55 4.7.4 Reliability .................................... 56 4.7.5 Deployment and maintenance ......................... 56 4.7.6 Portability .................................... 57 4.8 Implementaon for the Valgrind framework ...................... 57 5 Empirical Validaon 58 5.1 Data analysis of C++ applicaons ............................ 58 5.1.1 Method ..................................... 58 5.1.2 Use cases .................................... 59 5.1.3 Presentaon of measurements ......................... 62 5.1.3.1 LOC measurements ......................... 63 5.1.3.2 CPU execuon me measurements ................. 63 5.1.3.3 Query memory use measurements ................. 63 5.1.4 Results ..................................... 64 5.2 Diagnoscs in the Linux kernel ............................. 65 5.2.1 Method ..................................... 65 5.2.2 Use cases .................................... 65 5.2.2.1 Operaon integrity ......................... 65 5.2.2.2 Security audits ............................ 68 vii 5.2.2.3 Performance ............................. 70 5.2.3 Presentaon of measurements ......................... 70 5.2.3.1 Query execuon eciency ..................... 71 5.2.3.2 Impact on system performance ................... 71 5.2.4 Results ..................................... 72 5.2.4.1 Query execuon eciency ..................... 72 5.2.4.2 Impact on system performance ................... 73 5.3 Analysis of memory proles in the Valgrind instrumentaon framework ....... 75 5.3.1 Method ..................................... 75 5.3.2 Use cases .................................... 75 5.3.2.1 Memcheck .............................. 76 5.3.2.2 Cachegrind .............................. 80 5.3.2.3 Callgrind ............................... 81 5.3.3 Presentaon of measurements ......................... 82 5.3.4 Results ..................................... 83 5.4 User study ........................................ 86 5.4.1 Soware typology ............................... 87 5.4.2 Movaon and quesons ........................... 87 5.4.3 Experimental design .............................. 89 5.4.3.1 Users ................................. 89 5.4.3.2 Seng ................................ 89 5.4.3.3 Tasks ................................. 89 5.4.4 Experimental evaluaon ............................ 90 5.4.5 Experimental results .............................. 91 5.4.6 User interface preferences ........................... 91 5.4.7 Threats to validity ................................ 93 5.4.8 Limitaons ................................... 93 6 Conclusions and Future Work 94 6.1 Summary of results ................................... 94 6.2 Overall contribuon ................................... 95 6.3 Future work ....................................... 96 6.4 Conclusions ....................................... 96 A Appendix 98 A.1 C++ and SQL code for the queries used in the evaluated applicaons ......... 98 A.1.1 Stellarium .................................... 98 A.1.2 QLandKarte ................................... 101 A.1.3 CScout ..................................... 106 A.2 Idenfy possible privilege escalaon aacks with AWK ................ 107 A.3 Dynamic diagnosc tasks via Systemtap scripts .................... 108 A.4 Diagnosc tasks with i , Systemtap, and DTrace ................. 110 Bibliography 112 List of Figures 1.1 The research methodology followed .......................... 4 2.1 Map of research work related to this thesis ...................... 6 3.1 has-a associaon .................................... 34 3.2 Has-one and has-many associaons between data structures are normalized or de- normalized in the virtual relaonal schema. ...................... 34 3.3 Many-to-many associaon ............................... 36 3.4 Inheritance and subtype polymorphism support .................... 37 3.5 Full support of polymorphic containers ......................... 37 3.6 syntax in notaon ................................ 38 3.7 The C++ i under the library’s namespace .................. 47 3.8 Soware architecture .................................. 48 4.1 Web-based query interface ............................... 52 4.2 Steps for plugging i in an applicaon. ...................... 53 5.1 Class diagram and virtual table schema for the Stellarium, QLandkarte, and CScout applicaons ....................................... 60 5.2 Linux kernel models ................................... 65 5.3 Fixed query cost depending on number of s. .................... 73 5.4 Impact on system performance ............................. 74 5.5 Workow for embedding i to Valgrind ...................... 75 5.6 Memcheck’s data structure model and relaonal representaon ........... 77 5.7 Cachegrind’s data structure model and relaonal representaon ........... 84 5.8 Callgrind’s data structure model and relaonal representaon ............ 85 5.9 i ’s scalability ..................................