The University of Arizona

Remote Data Access in Scientific Computing Item Type text; Electronic Dissertation Authors Choi, Illyoung Citation Choi, Illyoung. (2020). Remote Data Access in Scientific Computing (Doctoral dissertation, University of Arizona, Tucson, USA). Publisher The University of Arizona. Rights Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction, presentation (such as public display or performance) of protected items is prohibited except with permission of the author. Download date 24/09/2021 21:38:11 Link to Item http://hdl.handle.net/10150/656767 REMOTE DATA ACCESS IN SCIENTIFIC COMPUTING by Illyoung Choi __________________________ Copyright © Illyoung Choi 2020 A Dissertation Submitted to the Faculty of the DEPARTMENT OF COMPUTER SCIENCE In Partial Fulfillment of the Requirements For the Degree of DOCTOR OF PHILOSOPHY In the Graduate College THE UNIVERSITY OF ARIZONA 2020 2 THE UNIVERSITY OF ARIZONA GRADUATE COLLEGE As members of the Dissertation Committee, we certify that we have read the dissertation prepared by: titled: and recommend that it be accepted as fulfilling the dissertation requirement for the Degree of Doctor of Philosophy. _________________________________________________________________ Date: ____________ _________________________________________________________________ Date: ____________ _________________________________________________________________ Date: ____________ _________________________________________________________________ Date: ____________ Final approval and acceptance of this dissertation of the final copies of the dissertation to the Graduate College. I hereby certify that I have read this dissertation prepared under my direction and recommend that it be accepted as fulfilling the dissertation requirement. _________________________________________________________________ Date: ____________ ACKNOWLEDGEMENTS I would like to thank my Ph.D. advisor John Hartman. I was able to make this dissertation possible with his guidance and encouragement. He was a good mentor. I also owe thanks to Larry Peterson, Bonnie Hurwitz, and Nirav Merchant. I was able to finish this long Ph.D. journey with opportunities they gave, and their support. I also would like to thank to my Ph.D. committee members, Michelle Strout and Beichuan Zhang, for their feedback. I would also like to thank my collaborators on the work in this dissertation. During my time with Syndicate project and Hurwitz Lab, I worked with many talented people. I am grateful to Jude Nelson, Zack Williams, and Alise Ponsero for their efforts. Lastly, I would like to thank my wife Mihyun Lim. I would not be able to finish the Ph.D. without her patience and sacrifice. 3 TABLE OF CONTENTS LIST OF FIGURES .........................................................................................................................6 LIST OF TABLES .........................................................................................................................11 ABSTRACT ...................................................................................................................................12 CHAPTER 1. INTRODUCTION ..................................................................................................14 1.1 DATA STAGING ............................................................................................................................. 16 1.2 ON-DEMAND DATA TRANSFER ................................................................................................ 19 1.3 DATA TRANSFER OPTIMIZATIONS .......................................................................................... 21 1.4 CONTRIBUTIONS .......................................................................................................................... 23 CHAPTER 2. RELATED WORK .................................................................................................25 2.1 DATA STAGING ............................................................................................................................. 25 2.2 ON-DEMAND DATA TRANSFER ................................................................................................ 28 2.3 DATA AND COMPUTATION CO-SCHEDULING ...................................................................... 31 2.4 LIMITATIONS OF EXISTING WORK .......................................................................................... 33 CHAPTER 3. SYNDICATE ..........................................................................................................35 3.1 SYNDICATE ARCHITECTURE ..................................................................................................... 36 3.1.1 PROTOCOL .............................................................................................................................. 39 3.1.2 DATA PUBLISHING ................................................................................................................ 40 3.1.3 DATA ACCESS ........................................................................................................................ 41 3.2 APPLICATION TO SCIENTIFIC COMPUTING ........................................................................... 42 3.3 SYNDICATE DATASET MANAGER (SDM) ............................................................................... 46 3.4 EXPERIMENTAL SETUP ............................................................................................................... 52 3.5 EXPERIMENTAL RESULTS .......................................................................................................... 55 3.5.1 BLOCK SIZE SENSITIVITY ................................................................................................... 55 3.5.2 INTERCONTINENTAL DATA ACCESS ............................................................................... 57 3.5.3 INTERSTATE DATA ACCESS ............................................................................................... 62 3.6 SYNDICATE LIMITATIONS ......................................................................................................... 66 3.7 CONCLUSION ................................................................................................................................. 67 4 CHAPTER 4. STARGATE ...........................................................................................................68 4.1 DESIGN CONSIDERATIONS ........................................................................................................ 68 4.2 ARCHITECTURE ............................................................................................................................ 70 4.2.1 PEER-TO-PEER ARCHITECTURE ......................................................................................... 72 4.2.2 PRODUCER AND SUBSCRIBER CLUSTERS ...................................................................... 75 4.2.3 STARGATE VOLUME ............................................................................................................. 77 4.2.4 STARGATE PROTOCOL ......................................................................................................... 79 4.2.5 MULTI-TIER CACHING.......................................................................................................... 85 4.2.6 DATA TRANSPORT ................................................................................................................ 86 4.2.7 COMPUTATION, CACHE, AND TRANSFER CO-LOCATION IN SUBSCRIBER CLUSTERS......................................................................................................................................... 90 4.2.9 TRANSFER AND STORAGE CO-LOCATION IN PRODUCER CLUSTERS ..................... 95 4.2.10 SCALABILITY AND FAULT-TOLERANCE ....................................................................... 97 4.3 EVALUATION ................................................................................................................................. 98 4.3.1 CONCURRENT DATA TRANSFER BANDWIDTH ............................................................. 98 4.3.2 TRANSFER SERVICE NODE DETERMINATION ............................................................. 101 4.3.3 DATA ACCESS BENCHMARKS .......................................................................................... 105 4.3.4 COMPARISON TO SYNDICATE ......................................................................................... 115 4.4 CONCLUSION ............................................................................................................................... 117 CHAPTER 5. CONCLUSION.....................................................................................................118 REFERENCES ............................................................................................................................121 5 LIST OF FIGURES Figure 3.1: Syndicate architecture. Syndicate interconnects geo-distributed users and storage via Syndicate Gateways. An Acquisition Gateway provides data access to data repositories (data sources). A User Gateway provides data access interfaces to various compute environments. A Replica Gateway stores user data. The Metadata Service is a central service that maintains information about users, volumes, and gateways. ............................37 Figure 3.2: Multi-tier data caching in Syndicate. Syndicate overcomes the WAN bandwidth limitations

The University of Arizona

Légende : Montage Webdav/DAFVS2 [Tuto/Howto] [GNU/Linux] Montage

Pipenightdreams Osgcal-Doc Mumudvb Mpg123-Alsa Tbb

Irods User Group Meeting 2016 Proceedings

The Evolution of File Systems

Silenus – a Federated Service-Oriented Approach To

A Federated Service-Oriented Approach to Distributed File Systems

Technical Notes All Changes in Fedora 13

1. Why POCS.Key

Apache - Korean Usergroup Webdav : Collaboration Based on Apache

Machine Learning Cloud Getting Started Guide

Grid4all D4.8 User Manual

EWB Denver Webdisk Manual Version 0.91