The Pennsylvania State University Schreyer Honors College

THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING PERFORMANCE COMPARISON OF THREE DATALOG ENGINES COREY CAPOOCI SPRING 2018 A thesis submitted in partial fulfillment of the requirements for a baccalaureate degree in Computer Engineering with honors in Computer Engineering Reviewed and approved* by the following: Gang Tan Associate Professor of Computer Science and Engineering Thesis Supervisor John Sampson Assistant Professor of Computer Science and Engineering Honors Adviser * Signatures are on file in the Schreyer Honors College. i ABSTRACT Datalog is a lightweight deductive database system that uses a logic programming language in order to construct queries and to update the database [10]. Programmers can implement the deductive database and logic programming language into a single system called a Datalog engine. Researchers are utilizing Datalog engines for problems such as program analysis, information extraction, and network monitoring. Due to a renewed popularity in the Datalog language, it would useful to understand the performance of multiple engines when faced with large datasets. By exposing each engine to the same problems and datasets, the experiment hopes to find a correlation between engine performance and problem size using the execution time of the queries. The results of the experiment were not entirely consistent. Amongst the three Datalog engines, Soufflé, PA-Datalog, and Datalog Educational System (DES), tested, the results show that Soufflé performs the best for the connectivity algorithm, strongly connected components algorithm, and weakly connected components algorithm. For these algorithms, PA-Datalog and DES performs comparably to Soufflé for some of the less dense graphs. PA-Datalog performs the best for the shortest path algorithm especially for graphs with larger node and density values. Overall, the results revealed that none of the Datalog engines consistently outperforms the others, but for some algorithms, choosing the correct engine can make an appreciable difference in execution time. Therefore, determining the best Datalog engine is meaningful for certain applications, but hard to predict without initial testing. ii TABLE OF CONTENTS LIST OF FIGURES ..................................................................................................... iv LIST OF TABLES ....................................................................................................... v ACKNOWLEDGEMENTS ......................................................................................... vii Chapter 1 Introduction ................................................................................................. 1 Chapter 2 Background ................................................................................................. 3 Datalog ............................................................................................................................. 3 Datalog Engines ............................................................................................................... 5 PA-Datalog ............................................................................................................... 5 Soufflé ...................................................................................................................... 6 Datalog Educational System .................................................................................... 7 Graph Algorithms ............................................................................................................ 8 Connectivity ............................................................................................................. 8 Strongly Connected Components ............................................................................. 9 Weakly Connected Components .............................................................................. 10 Shortest Path ............................................................................................................. 11 Chapter 3 Experiment Design and Methodologies ...................................................... 12 Experiment Layout ........................................................................................................... 12 Sample Data Generation .................................................................................................. 15 Chapter 4 Results ......................................................................................................... 17 Chapter 5 Summary and Conclusion ........................................................................... 21 Appendix A Performance Results for Connectivity Algorithm .................................. 22 Appendix B Performance Results for Strongly Connected Components Algorithm . 26 Appendix C Performance Results for Weakly Connected Components Algorithm ... 30 Appendix D Performance Results for Shortest Path Algorithm ................................. 34 Appendix E Code for PA-Datalog .............................................................................. 38 Appendix F Code for Soufflé ...................................................................................... 41 Appendix G Code for Datalog Educational System ................................................... 44 Appendix H Python Script for Random Graph Generation and File Creation ........... 46 iii BIBLIOGRAPHY ........................................................................................................ 51 iv LIST OF FIGURES Figure 1. Example Datalog Code ............................................................................................. 4 Figure 2. Connectivity Code .................................................................................................... 8 Figure 3. Strongly Connected Components Code .................................................................... 9 Figure 4. Weakly Connected Components Code ..................................................................... 10 Figure 5. Shortest Path Code ................................................................................................... 11 Figure 6. Density Equation for a Graph ................................................................................... 13 Figure 7. Connectivity Performance Graph ............................................................................. 17 Figure 8. Strongly Connected Components Performance Graph ............................................. 18 Figure 9. Weakly Connected Components Performance Graph .............................................. 19 Figure 10. Shortest Path Performance Graph ........................................................................... 20 v LIST OF TABLES Table 1. Graph Characteristics for Weakly Connected Components, Strongly Connected Components, and Connectivity ........................................................................................ 14 Table 2. Graph Characteristics for Shortest Path ..................................................................... 14 Table 3. Connectivity Results for 50 Nodes and 0.005 Graph Density ................................... 22 Table 4. Connectivity Results for 50 Nodes and 0.25 Graph Density ..................................... 22 Table 5. Connectivity Results for 50 Nodes and 0.5 Graph Density ....................................... 23 Table 6. Connectivity Results for 50 Nodes and 0.75 Graph Density ..................................... 23 Table 7. Connectivity Results for 100 Nodes and 0.005 Graph Density ................................. 23 Table 8. Connectivity Results for 100 Nodes and 0.25 Graph Density ................................... 24 Table 9. Connectivity Results for 100 Nodes and 0.5 Graph Density ..................................... 24 Table 10. Connectivity Results for 100 Nodes and 0.75 Graph Density ................................. 24 Table 11. Connectivity Results for 200 Nodes and 0.005 Graph Density ............................... 25 Table 12. Connectivity Results for 200 Nodes and 0.25 Graph Density ................................. 25 Table 13. Strongly Connected Components Results for 50 Nodes and 0.005 Graph Density . 26 Table 14. Strongly Connected Components Results for 50 Nodes and 0.25 Graph Density ... 26 Table 15. Strongly Connected Components Results for 50 Nodes and 0.5 Graph Density ..... 27 Table 16. Strongly Connected Components Results for 50 Nodes and 0.75 Graph Density ... 27 Table 17. Strongly Connected Components Results for 100 Nodes and 0.005 Graph Density 27 Table 18. Strongly Connected Components Results for 100 Nodes and 0. 25 Graph Density 28 Table 19. Strongly Connected Components Results for 100 Nodes and 0.5 Graph Density ... 28 Table 20. Strongly Connected Components Results for 100 Nodes and 0.75 Graph Density . 28 Table 21. Strongly Connected Components Results for 200 Nodes and 0.005 Graph Density 29 Table 22. Strongly Connected Components Results for 200 Nodes and 0.25 Graph Density . 29 Table 23. Weakly Connected Components Results for 50 Nodes and 0.005 Graph Density .. 30 Table 24. Weakly Connected Components Results for 50 Nodes and 0.25 Graph Density .... 30 vi Table 25. Weakly Connected Components Results for 50 Nodes and 0.5 Graph Density ...... 31 Table 26. Weakly Connected Components Results for 50 Nodes and 0.75 Graph Density .... 31 Table 27. Weakly Connected Components Results for 100 Nodes and 0.005 Graph Density 31 Table 28. Weakly Connected Components

Load more