Practical Erasure Codes for Storage Systems the Study

PRACTICAL ERASURE CODES FOR STORAGE SYSTEMS THE STUDY OF ENTANGLEMENT CODES, AN APPROACH THAT PROPAGATES REDUNDANCY TO INCREASE RELIABILITY AND PERFORMANCE Thèse soutenue le 19 mai 2017 à la Faculté des Sciences chaire de Systèmes Complexes programme doctoral Université de Neuchâtel pour l’obtention du grade de Docteur ès Sciences par Verónica del Carmen Estrada Galiñanes acceptée sur proposition du jury: Professeur Peter Kropf, président du jury Université de Neuchâtel, Suisse Professeur Pascal Felber, directeur de thèse Université de Neuchâtel, Suisse Professeur Patrick T. Eugster, rapporteur TU Darmstadt, Allemagne Professeur Ethan L. Miller, rapporteur Université de California à Santa Cruz, États-Unis Neuchâtel, UNINE, 2017 Faculté des Sciences Secrétariat-décanat de Faculté Rue Emile-Argand 11 2000 Neuchâtel – Suisse Tél : + 41 (0)32 718 21 00 E-mail : [email protected] IMPRIMATUR POUR THESE DE DOCTORAT La Faculté des sciences de l'Université de Neuchâtel autorise l'impression de la présente thèse soutenue par Madame Verónica del Carmen Estrada Galiñanes Titre: “Practical Erasure Codes for Storage Systems: The Study of Entanglement Codes, an Approach that Propagates Redundancy to Increase Reliability and Performance” sur le rapport des membres du jury composé comme suit: • Prof. Pascal Felber, directeur de thèse, UniNE • Prof. Peter Kropf, UniNE • Prof. Ethan L. Miller, University of California, Santa Cruz, USA • Prof. Patrick Eugster, TU Darmstadt, D Neuchâtel, le 06.06.2017 Le Doyen, Prof. R. Bshary Imprimatur pour thèse de doctorat www.unine.ch/sciences He explained that an Aleph is one of the points in space that contains all other points. — Jorge Luis Borges (1899-1986), Argentine writer. This work is dedicated to Vinton Cerf for the Internet, Brewster Kahle for the Internet Archive, David S. H. Rosenthal for his blog on digital preservation, my brother Alejandro D. V. Estrada for preserving our family photos, and Rudolf E. Martin, whose love and support sustained me throughout the years . Acknowledgements First and foremost, I would like to thank my advisor Prof. Pascal Felber for his motivation, patience, dedication and extraordinary support throughout these project. In addition to my advisor, I would like to thank the other members of my thesis committee: Prof. Ethan L. Miller, Prof. Patrick Eugster and Prof. Peter Kropf for their insightful feedback and hard questions. This thesis would have been impossible without the support of the Swiss National Science Foundation that promotes collaborative research and encourages young researchers. Many thanks to the researchers that make the “Trustworthy Cloud Storage” Sinergia project possible, in particular, from EPFL DIAS Prof. Anastasia Ailamaki, from EPFL DSLAB Prof. George Candea, from ETH Systems Security Group Prof. Srdjan Capkun, from University of Lugano Prof. Fernando Pedone, and from University of Neuchâtel Prof. Pascal Felber. Many thanks to the Department of Computer Science from University of Houston, to the Goldman Research Group from the European Bioinformatics Institute, the ETH Systems Group, and the Depart- ment of Mathematics from University of Zürich for inviting me to discussions and as a seminar speaker. Many thanks to all the organizations that make possible my frequent attendance to conferences, workshops and other events: Usenix, Google, ACM-W Europe, ACM, Bureau égalité des chances UNINE, and the Internet Archive. And many thanks to CUSO and the Program REGARD for the organization of a variety of courses, ateliers and winter schools. I would like to thank the University of Neuchâtel for giving me a wonderful workplace and opening the door to a large network of resources. In addition to a professional network, this period gave me some lifetime friends. Thanks to Dr. Etienne Riviére who was my first contact with the SNSF sinergia project that motivated this thesis and for his kind support to some of my summer projects like GSoC. Thanks to Dr. Hugues Mercier and Dr. Valerio Schiavoni for pushing me with non-conform zone questions. Thanks to Prof. Elisa Gorla, who made me aware that career planning is an important aspect of a research proposal. Thanks to Dr. Relinde Jurrius for her thoughful and pedagogigal comments. She has the ability to listen carefully avoiding preconceived ideas. Thanks to Muriel Besson for her dedication to PhD students in the mentoring program. Thanks to Dr. Anita Sobe, Dr. Patrick Marlier, Andrei Lapin and Rafael Pires for being great teaching assistant partners. Thanks to Dr. José Valerio for his collaboration in the Tahoe prototype that preceded my research for this thesis. Thanks to Maria Carpen-Amarie and Mirco Koher for reviewing some parts of this thesis. Thanks to Emilie Auclair and other secretaries that help me with the administrative work. Thanks to i Acknowledgements Yarco, Mascha and to all my colleagues for listening to my rehearsals, reading my unfinished work, sharing with me lunch time, coffee breaks, game nights, via ferratas and a plethora of adventures throughout all these years. Thanks to Raph and to Pascal for introducing me to the Alps and for rescuing me when it was necessary. I would like to thank the University of California, Santa Cruz, the Jack Baskin School of Engi- neering, the Center for Research in Storage Systems (CRSS), and the Storage Systems Research Center (SSRC) for hosting my 6-month research doc mobility project. Many thanks to the faculty and associated researchers, in particular to Prof. Ethan L. Miller, Prof. Darrell D. E. Long, Dr. Andy Hospodor, Dr. Ahmed Amer, Dr. Peter Alvaro, Prof. Thomas Schwarz and Prof. Carlos Maltzahn. During my stay, I had the opportunity to attend many seminars, hear interesting talks and discuss with prestigious guests. Many thanks to the participants of the weekly SSRC seminar meetings . Thanks to Cynthia McCarley for her help with administrative issues. Thanks to the International Student and Scholar service office, in particular to Parinaz Zartoshty and Keri Toma for being very kind and helpful towards my inquiries. Thanks to Ivo Gimenez for the nice talk after my PhD crisis and for organizing the social lab events. Thanks to my colleagues, special thanks to Yan Li, Ana McTaggart and Rahul Talari for integrating me in the group and to Andy who donate his house for my farewell party. I would like to thank many researchers, engineers and friends that volunteer their time asking me plenty of questions, answering my questions, reading my drafts, identifying unclear parts, criticizing my work, but overall caring about my research and consequently, encouraging my writing. Special thanks to my co-author Prof. Jehan-François Pâris, who drove my attention towards simple entanglements. Many thanks to Prof. Ruediger Urbanke, who welcomed me in his office to discuss coding theory. Many thanks to Dr. Kevin Greenan, Dr. Jay J. Wylie, who answered my emails on reliability. Many thanks to Dr. Jay Lofstead for being a good listener. Many thanks to Prof. James S. Plank for all his contributions to the field of erasure coding and for answering my emails regarding the jerasure library. Many thanks to Dr. Luciano Bello and Kragen Javier Sitaker for discussing entanglement codes and the Prolog tool to verify minimal erasure patterns. Many thanks to Théo Beigbeder for help in the translations. Many thanks to Rudolf Martin for several discussions. Many thanks to Dr. Luiz Barroso for responding my enquiries about availability in datacenters. Many thanks to Tim Feldman for answering my emails regarding storage. Many thanks to Dr. Jossy Sayir for his honest advices. Many thanks to Dr. Sage Weil and Ceph engineers for clarifying some aspects of the Ceph system’s architecture. Many thanks to my mentors at SIGCOMM 2015, Dr. Paolo Costa and Prof. John Byres. During conferences, poster sessions, doctoral workshops and other events I had the opportunity to discuss different aspects of my research with a large audience, in particular, many thanks, in no particular order, to: Dr. Cheng Huang, Dr. Donald Porter, Dr. Jesus Carretero, Pat Helland, Prof. Christof Fetzer, Prof. Torsten Braun, Prof. Joachim Rosenthal, Dr. Danny Harnik, Dr. Dalit Naor, Dr. Hillel Kolodner, Dr. Eitan Yaakobi, Dr. John Wilkes, Dr. Brent Welch, Prof. Willy Zwaenepoel, Dr. Antonia Wachter-Zeh, Prof. Gustavo Alonso, Dr. Erik Riedel, Robert Krahn, Jonnahtan Saltarin, Prof. Luc Gauthier, Charles Randall and to all my anonymous reviewers. ii Acknowledgements Many thanks to all the people who are working in projects to preserve digital information. Two projects that raised my attention during my research were the Internet Archive and LOCKSS program based at Stanford Libraries. I’d like to thank the people behind the Internet Archive, in particular, Brewster Kahle, John C. Gonzalez, and Wendy Hanamura who received my visits a couple of times. And thanks to the Chief Scientist of the LOCKSS Program Dr. David S. H. Rosenthal for his constant dedication to his blog about archival storage. Indirectly, many people prepared me for this PhD journey, in particular I would like to express my deepest gratitute to all my former professors and instructors, in special to Prof. Gabriel Venturino for classes on circuit analysis and to Prof. Ignacio Alvarez-Hamelin for introducing me to complex systems. Special thanks to my master advisor Prof. Aki Nakao, without his trust it would have been impossible to do my master in Japan and my life would be completely different. Many thanks to Dr. Carlos Rosito for his efforts to ensure that my scholarship application would arrived before deadline. Many thanks to Dr. Roberto Perazzo for his encour- agement and good advices. Many thanks to Carlos Benitez for his support. I would like to thank my family and friends whose companionship and support helped me to perservere with my goal. In particular, my special thanks go to my brave mother who gave me the best she could with her scarce resources and to my father for becoming a role model in surviving, a cheerful companion and guide in the recent years.

Load more