File System Virtualization and Service for Grid Data Management
Total Page:16
File Type:pdf, Size:1020Kb
FILE SYSTEM VIRTUALIZATION AND SERVICE FOR GRID DATA MANAGEMENT By MING ZHAO A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF THE PHILOSOPHY UNIVERSITY OF FLORIDA 2008 1 c 2008 Ming Zhao 2 To my wife and my parents 3 ACKNOWLEDGMENTS First and foremost, I would like to express my deepest gratitude to my advisor, Prof. Renato Figueiredo, for his excellent guidance throughout my Ph.D. study. I am greatly indebted to him for providing me such an exciting research opportunity, constantly supporting me on things no matter big or small, and patiently giving me time and helping me grow. I would also like to gratefully and sincerely thank Prof. Jos´eFortes for his exceptional leadership of the ACIS lab and his invaluable advice in every aspect of my academic pursuit. I have learned enormously from them about research, teaching, and advising, and they will always be examples for me to follow in my future career. I am very grateful to the other members of my supervisory committee, Prof. Sanjay Ranka and Prof. Tao Li, for taking time out of their busy schedules to review my work. Their constructive criticism and comments are very helpful to improving my work and are highly appreciated. I also wish to thank Prof. Alan George and Prof. Oscar Boykin for their advice and help. My heartfelt thanks and appreciation are extended to my current and former fellow ACIS lab members. The lab is where I have obtained solid support of my research, gained precious knowledge and experience, and grown from a student to a professional. I am especially thankful to my colleague and good friend, Prapaporn, for her careful proofing of the manuscript as well as our close collaboration in the DDDBMI project. A special note of gratitude is due to Dr. Gang Rong at Tsinghua University, my formal advisor on my master's degree study, for encouraging and assisting me to pursue my Ph.D. overseas. Finally, and most importantly, I would like to thank my family and I owe everything I have achieved to them. My parents' unwavering belief in me and unending caring of me are what have made me the person I am today. My brother, Hui, has been my best friend since my childhood and has always been there when I need him. My wife, Jing, has been both a loving companion and a supportive colleague, bringing endless inspiration, joy, and 4 passion into my life. It is truly wonderful to have her sharing every moment with me in the past five years, and I look forward to walking hand in hand with her through the new journey ahead of us. 5 TABLE OF CONTENTS page ACKNOWLEDGMENTS.................................4 LIST OF TABLES..................................... 10 LIST OF FIGURES.................................... 11 ABSTRACT........................................ 13 CHAPTER 1 INTRODUCTION.................................. 15 1.1 Application-Transparent Grid-Wide Data Access............... 17 1.2 Application-Tailored Grid Data Provisioning................. 18 1.3 Service-Based Autonomic Data Management................. 19 2 BACKGROUND AND RELATED WORK..................... 21 2.1 Typical Grid Data Management Approaches................. 21 2.2 Traditional Distributed File Systems..................... 23 2.3 Application-Tailored Grid File Systems.................... 26 2.3.1 Need for Application-Tailored Enhancements............. 26 2.3.2 Caching and Consistency........................ 28 2.3.3 Security................................. 34 2.3.3.1 Security in distributed file systems............. 35 2.3.3.2 Security in grid systems................... 37 2.3.4 Fault Tolerance............................. 38 2.4 Service-Oriented and Autonomic Data Management............. 40 2.5 Support for Distributed Virtual Machines................... 42 3 DISTRIBUTED FILE SYSTEM VIRTUALIZATION............... 44 3.1 User-Level Proxy-Based Virtualization.................... 44 3.1.1 Architecture............................... 44 3.1.2 NFS-Based GVFS............................ 47 3.1.2.1 User-level NFS proxy..................... 47 3.1.2.2 Multi-proxy GVFS...................... 50 3.2 Evaluation.................................... 52 3.2.1 Setup................................... 52 3.2.2 Stat.................................... 52 3.2.3 IOzone.................................. 54 3.2.4 PostMark................................ 56 6 4 APPLICATION-TAILORED DISTRIBUTED FILE SYSTEMS......... 58 4.1 Motivating Examples.............................. 58 4.2 Performance................................... 61 4.2.1 Client-Side Disk Caching........................ 61 4.2.1.1 Design............................. 61 4.2.1.2 Deployment.......................... 63 4.2.1.3 Application-tailored configurations............. 64 4.2.1.4 Evaluation........................... 65 4.2.2 Multithreaded Data Transfer...................... 68 4.2.2.1 Design and implementation................. 68 4.2.2.2 Evaluation........................... 70 4.3 Consistency................................... 71 4.3.1 Architecture............................... 72 4.3.2 Invalidation Polling Based Cache Consistency............ 75 4.3.2.1 Protocol............................ 75 4.3.2.2 Bootstraping......................... 77 4.3.2.3 Failure handling....................... 77 4.3.3 Delegation Callback Based Cache Consistency............ 78 4.3.3.1 Delegation........................... 78 4.3.3.2 Callback............................ 80 4.3.3.3 State maintenance...................... 81 4.3.3.4 Failure handling....................... 82 4.3.4 Evaluation................................ 83 4.3.4.1 Setup............................. 83 4.3.4.2 Make.............................. 84 4.3.4.3 PostMark........................... 85 4.3.4.4 Lock.............................. 87 4.3.4.5 Software repository...................... 89 4.3.4.6 Scientific data processing................... 91 4.4 Security..................................... 92 4.4.1 Secure Tunneling Based Private Grid File System.......... 93 4.4.1.1 Secure data tunneling.................... 93 4.4.1.2 Security model........................ 95 4.4.1.3 Evaluation........................... 97 4.4.2 The SSL-Enabled Secure Grid File System.............. 100 4.4.2.1 Design............................. 100 4.4.2.2 Implementation........................ 101 4.4.2.3 Deployment.......................... 104 4.4.2.4 Evaluation........................... 105 4.5 Fault Tolerance................................. 115 4.5.1 Virtualization of Data Sets....................... 116 4.5.2 Replication Schemes........................... 117 4.5.3 Application-Transparent Failover.................... 119 4.5.4 Evaluation................................ 120 7 5 APPLICATION STUDY: SUPPORTING GRID VIRTUAL MACHINES.... 122 5.1 Architecture................................... 122 5.2 Virtual Machine Aware Data Transfer..................... 124 5.3 Integration with VM-Based Grid Computing................. 126 5.4 Evaluation.................................... 128 5.4.1 Setup................................... 128 5.4.2 Performance of Application Executions within VMs......... 129 5.4.3 Performance of VM Cloning...................... 133 6 SERVICE-ORIENTED AUTONOMIC DATA MANAGEMENT......... 137 6.1 Service-Based Data Management....................... 137 6.1.1 Architecture............................... 137 6.1.2 The WSRF-Based Data Management Services............ 140 6.1.2.1 File system service...................... 140 6.1.2.2 Data scheduler service.................... 140 6.1.2.3 Data replication service................... 141 6.1.3 Application-Tailored Data Sessions.................. 142 6.1.3.1 Grid data access and file transfer.............. 142 6.1.3.2 Cache consistency....................... 144 6.1.3.3 Fault tolerance........................ 145 6.1.4 Security Architecture.......................... 148 6.1.5 Usage Examples............................. 150 6.1.5.1 Virtual machine based grid computing........... 150 6.1.5.2 Workflow execution...................... 152 6.2 Autonomic Data Management......................... 154 6.2.1 Autonomic Data Scheduler Service................... 155 6.2.2 Autonomic Data Replication Service.................. 158 6.2.2.1 Data replication degree and placement........... 158 6.2.2.2 Data replica regeneration.................. 160 6.2.3 Autonomic File System Service..................... 161 6.2.3.1 Client-side file system service................ 161 6.2.3.2 Server-side file system service................ 163 6.2.4 Evaluation................................ 164 6.2.4.1 Setup............................. 164 6.2.4.2 Autonomic session redirection................ 164 6.2.4.3 Autonomic cache configuration............... 167 6.2.4.4 Autonomic data replication................. 168 7 CONCLUSION.................................... 170 7.1 Summary.................................... 170 7.2 Future Work................................... 173 7.2.1 Performance............................... 173 7.2.2 Intelligence................................ 175 8 7.2.3 Integration................................ 176 REFERENCES....................................... 178 BIOGRAPHICAL SKETCH................................ 188 9 LIST OF TABLES Table page 4-1 Overhead of private GVFS for the LaTeX and SPECseis benchmarks...... 99 5-1 Performance of parallel VM cloning......................... 135 10 LIST OF FIGURES Figure page 2-1 Typical NFS setup.................................. 24 2-2 Architecture of an autonomic element.......................