Replication Aspects in Distributed Systems

Replication Aspects in Distributed Systems Thesis submitted in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY by Shantanu Sharma Submitted to the Senate of Ben-Gurion University of the Negev February 17, 2016 Beer-Sheva Replication Aspects in Distributed Systems Thesis submitted in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY by Shantanu Sharma Submitted to the Senate of Ben-Gurion University of the Negev Approved by the advisor: Approved by the Dean of the Kreitman School of Advanced Graduate Studies: February 17, 2016 Beer-Sheva This work was carried out under the supervision of Professor Shlomi Dolev In the Department of Computer Science Faculty of Natural Science Research-Student’s Affidavit when Submitting the Doctoral Thesis for Judgment I, Shantanu Sharma, whose signature appears below, hereby declare that: X I have written this Thesis by myself, except for the help and guidance offered by my Thesis Advisors. X The scientific materials included in this Thesis are products of my own research, culled from the period during which I was a research student. Digitally signed by Shantanu Sharma Shantanu DN: cn=Shantanu Sharma, o=UCI, ou=CS, email=shantnu. Date: February 17, 2016 Student’s name: Shantanu Sharma Signature: [email protected], c=US Sharma Date: 2016.09.04 00:47:34 -07'00' Acknowledgements First, I would like to thank almighty “God” who gave me the inspiration to pursue research as my beautiful career and to have rendezvous with two distinguished professors, Prof. Shlomi Dolev and Prof. Jeffrey D. Ullman, during the whole journey of my graduate study. Without Prof. Dolev’s guidance, this work would be impossible. I am heartily thankful to Prof. Dolev for giving me a golden opportunity to work with him and his large research group. He introduced me to the worlds of “Self-Stabilization” and “MapReduce.” He encouraged me to work on MapReduce, which was a black-box for me in the beginning. Undoubtedly, I am the luckiest student to have him as my Doctoral degree’s advisor. It is extremely rare to have a mentor like him who is very cool and brilliant. It is not an exaggeration that I learnt a lot from him in my personal and professional life. I would also like to thank Prof. Jeffrey D. Ullman for assisting me in understanding the models of MapReduce. He worked very closely with me on all research papers related to MapReduce. It was great to have continuous guidance from Prof. Ullman throughout the duration of my PhD. He encouraged me to think differently and allowed frequent discussion on almost everything. He always provided me thoughtful insights and spent a lot of time improving my writing skills. Prof. Ullman is extremely energetic, and I treasure the time when we discussed Meta-MapReduce (Chapter 6) over an entire day, while we both were in the United States (Prof. Ullman had arrived from India in the same morning). Beyond the thesis work, he helped me in non-technical aspects as well; I learnt how to become a better human being and a better student/researcher of computer science. I hope that I will be able to sustain his acquaintances throughout my professional and personal life. I am thankful to Prof. Foto N. Afrati (National Technical University of Athens), who also discussed key-points in our research papers related to MapReduce. Prof. Afrati gave many remarks to better understand and design efficient algorithms for MapReduce and their proofs. I am also thankful to Prof. Jonathan Ullman (Northeastern University) for his significant contribution in our data cube related paper (Chapter 8). I had the opportunity to work with Prof. Ephraim Korach (BGU), who helped me in developing mathematical proofs. I was delighted to work with Prof. Elad M. Schiller (Chalmers University of Technology), who explained the core of distributed computing systems, in addition to papers on self-stabilizing communication. I am also thankful to Prof. Ehud Gudes (BGU) for assisting me over a long period of time and involving me in EMC’s World-Wide-Hadoop project. Prof. Gudes was also active during two survey papers under the same project. I am also pleased to work with other students of Prof. Dolev: Ariel Hanemann who worked on self-stabilizing communication, Philip Derbeko who helped me in a review paper related to privacy aspects of MapReduce, Yin Li who assisted me in figuring out security issues in MapReduce, and Nisha Panwar who helped me to explore a new area, the fifth generation of mobile communication. Beyond the scope of this thesis, I am very thankful to Ximing Li and, again, Nisha Panwar. Nisha helped me in many aspects that are completely beyond this thesis. I would like to thank the Head of Department of Computer Science and all the secretaries of the department who helped me at various stages. This part is incomplete without acknowledging my master’s advisor, Prof. Awadhesh Kumar Singh (National Institute of Technology Kurukshetra). Prof. Singh is the person who showed the right path for my career in my master degree. His endless contributions made me aware of the technical world. I am also thankful to Prof. Sukumar Ghosh (University of Iowa) who inspired me to pursue the higher study abroad. I was also honored by Prof. Maria Potop-Butucaru (University Pierre et Marie Curie), Prof. Ajoy K. Datta (University of Nevada, Las Vegas), Prof. Keren Censor-Hillel (Technion), Prof. Neeraj Mittal (The University of Texas at Dallas), Dr. Marco Pistoia (IBM TJ Watson Research Center), and Prof. Moshe Vardi (Rice University) for providing opportunities to present my research work. Most importantly, I must thank my family for their support, patience, and understanding my need to focus my time and effort in my thesis. Without my family, I would not be here on this Earth to complete this thesis, let alone any other accomplishments in the past or future. Table of Contents Abstract i 1 Introduction 1 1.1 End-to-End Communication Algorithms . .1 1.2 MapReduce . .2 1.3 Overview of the Tasks Investigated . .4 1.4 Our Contributions and Thesis Outline . .6 I Replication Aspects in a Communication Algorithm 10 2 Background of a Self-Stabilizing End-to-End Communication Algorithm 11 2.1 Unreliable Communication Channels . 12 2.2 The Interleaving Model . 13 2.3 The Task . 14 3 Self-Stabilizing End-to-End Algorithm 15 3.1 A First Attempt Solution . 15 3.2 Self-Stabilizing End-to-End Algorithm (S 2 E 2 C ).............. 18 II Replication Aspects in MapReduce 21 4 Intractability of Mapping Schemas 22 4.1 Preliminarily and Motivating Examples . 22 4.2 Mapping Schema and Tradeoffs . 24 4.3 Intractability of Finding a Mapping Schema . 26 5 Approximation Algorithms for the Mapping Schema Problems 28 5.1 Preliminary Results . 28 5.2 Optimal Algorithms for Equal-Sized Inputs . 31 5.3 Generalizing the Technique for the Reducer Capacity q > 3 and Inputs of Size q=k, k > 3 ............................... 37 ≤ 5.4 Generalizing the AU method ......................... 40 5.5 A Hybrid Algorithm for the A2A Mapping Schema Problem ......... 45 5.6 Approximation Algorithms for the A2A Mapping Schema Problem with an Input > q=2 .................................. 46 5.7 An Approximation Algorithm for the X2Y Mapping Schema Problem .... 49 6 Meta-MapReduce 50 6.1 The System Setting . 51 6.2 Meta-MapReduce: Description . 52 6.3 Extensions of Meta-MapReduce . 57 7 Interval Join 62 7.1 Preliminaries . 62 7.2 Unit-Length and Equally Spaced Intervals . 65 7.3 Variable-Length and Equally Spaced Intervals . 67 8 Computing Marginals of a Data Cube 72 8.1 Preliminaries . 72 8.2 Computing Many Marginals at One Reducer . 74 8.3 The General Case . 80 III Replication Aspects in Secure and Privacy-Preserving MapReduce 82 9 Security and Privacy Aspects in MapReduce 83 9.1 Security and Privacy Challenges in MapReduce . 84 9.2 Privacy Requirements in MapReduce . 85 9.3 Adversarial Models for MapReduce Privacy . 85 10 Privacy-Preserving Computations using MapReduce 87 10.1 Motivating Examples . 87 10.2 System Settings . 88 10.3 Creation and Distribution of Secret-Shares of a Relation . 90 10.4 Count Query . 93 10.5 Search and Fetch Queries . 95 10.6 Equijoin . 99 10.7 Range Query . 103 11 Conclusion and Future Work 105 Bibliography 108 A Pseudocodes of the Self-Stabilizing End-to-End Communication Algorithm 118 A.1 Detailed Description of Algorithms 5 and 6 . 119 A.2 Correctness of Algorithms 5 and 6 . 120 B Proof of NP-Hardness of Mapping Schema Problems (Chapter 4) 128 C Pseudocodes of Approximation Algorithms for Mapping Schema Problems and their Proofs (Chapter 5) 132 C.1 Preliminary Proofs of Theorems on Lower and Upper Bounds . 132 C.2 Lower Bounds for Equal-Sized Inputs . 133 C.3 Algorithm for an Odd Value of the Reducer Capacity (Algorithm 7A) . 134 C.4 Algorithm for an Even Value of the Reducer Capacity (Algorithm 7B) . 135 C.5 Proof of Lemmas and Theorems related to Algorithms 7A and 7B . 136 C.6 The First Extension to the AU method (Algorithm 8) . 139 C.7 The Second Extension to the AU method (Algorithm 9) . 141 C.8 A Theorem related to A Big Input . 142 C.9 Theorems related to the X2Y Mapping Schema Problem ........... 142 D Proofs of Theorems related to Meta-MapReduce (Chapter 6) 144 E Proof of Theorems related to Interval Join (Chapter 7) 146 E.1 Proof of Theorems and Algorithm related to Unit-Length and Equally Spaced Intervals . 146 E.2 Proof of Theorems and Algorithm related to Variable-Length and Equally Spaced Intervals .

Load more