Towards More Scalability and Flexibility for Distributed Storage Systems
Total Page:16
File Type:pdf, Size:1020Kb
Towards more scalability and flexibility for distributed storage systems These` de doctorat de l’Universite´ Paris-Saclay prepar´ ee´ a` Tel´ ecom´ ParisTech Ecole doctorale n◦580 Ecole Doctorale Sciences et Technologies de l’Information et de la Communication (ED STIC) Specialit´ e´ de doctorat : Informatique NNT : 2019SACLT006 These` present´ ee´ et soutenue a` Paris, le 15 Fevrier´ 2019, par GUILLAUME RUTY Composition du Jury : Andre-Luc´ Beylot Professeur, ENSEEIHT (IRIT) Rapporteur Stefano Secci Professeur, CNAM Rapporteur Raouf Boutaba Professeur, University of Waterloo Examinateur Nadia Boukhatem Professeur, Telecom ParisTech (LTCI) President´ Damien Saucez Charge´ de Recherche, INRIA Examinateur Jean-Louis Rougier Professeur, Telecom ParisTech (LTCI) Directeur de these` Andre´ Surcouf Distinguished Engineer, Cisco Systems (PIRL) Co-encadrant de these` Mark Townsley Fellow, Cisco Systems (PIRL) Invite´ TELECOM PARISTECH DOCTORAL THESIS Towards more scalability and flexibility for distributed storage systems Author: Supervisor: Guillaume Ruty Dr. Jean-Louis Rougier A thesis submitted in fulfillment of the requirements for the degree of Doctor of Philosophy in the Cisco Systems Paris Innovation and Research Lab (PIRL) Laboratoire Traitement et Communication de l’Information (LTCI) May 1, 2019 i Declaration of Authorship I, Guillaume Ruty, declare that this thesis titled, “Towards more scalability and flexibility for distributed storage systems” and the work presented in it are my own. I confirm that: • This work was done wholly or mainly while in candidature for a re- search degree at this University. • Where any part of this thesis has previously been submitted for a de- gree or any other qualification at this University or any other institu- tion, this has been clearly stated. • Where I have consulted the published work of others, this is always clearly attributed. • Where I have quoted from the work of others, the source is always given. With the exception of such quotations, this thesis is entirely my own work. • I have acknowledged all main sources of help. • Where the thesis is based on work done by myself jointly with others, I have made clear exactly what was done by others and what I have contributed myself. Signed: Date: 01/05/2019 ii “La maturité de l’homme est d’avoir retrouvé le sérieux qu’on avait au jeu quand on était enfant.” Alain Damasio, La Horde du Contrevent iii TELECOM PARISTECH Abstract Laboratoire Traitement et Communication de l’Information (LTCI) Doctor of Philosophy Towards more scalability and flexibility for distributed storage systems by Guillaume Ruty iv The exponentially growing demand for storage puts a huge stress on tra- ditionnal distributed storage systems. While storage devices’ performance keep improving over time, current ditributed storage systems struggle to keep up with the rate of data growth, especially with the rise of cloud and big data applications. Furthermore, the performance balance between stor- age, network and compute devices has shifted and the assumptions that are the foundation for most distributed storage systems are not true anymore. This dissertation explains how several aspects of such storage systems can be modified and rethought to make a more efficient use of the resource at their disposal. It presents 6Stor, an original architecture that uses a dis- tributed layer of metadata to provide flexible and scalable object-level stor- age, then proposes a scheduling algorithm improving how a generic storage system handles concurrent requests. Finally, it describes how to improve legacy filesystem-level caching for erasure-code-based distributed storage systems, before presenting a few other contributions made in the context of short research projects. Les besoins en terme de stockage, en augmentation exponentielle, sont difficilement satisfaits par les systèmes de stockage distribué traditionnels. Même si les performances des disques continuent à s’améliorer, les systèmes de stockage distribué actuels peinent à suivre le croissance du nombre de données requérant d’êtres stockées, notamment à cause de l’avènement des applications de big data. Par ailleurs, l’équilibre de performances entre dis- ques, cartes réseau et processeurs a changé et les suppositions sur lesquelles se basent la plupart des systèmes de stockage distribué actuels ne sont plus vraies. Cette dissertation explique de quelle manière certains aspects de tels sys- tèmes de stockages peuvent être modifiés et repensés pour faire une utilisa- tion plus efficace des ressources qui les composent. Elle présente 6Stor, une architecture de stockage nouvelle qui se base sur une couche de métadon- nées distribuée afin de fournir du stockage d’objet de manière flexible tout en passant à l’échelle. Elle détaille ensuite un algorithme d’ordonnancement des requêtes permettant à un système de stockage générique de traiter les requêtes de clients en parallèle de manière plus équitable. Enfin, elle décrit comment améliorer le cache générique du système de fichier dans le con- texte de systèmes de stockage distribué basés sur des codes correcteurs avant de présenter des contributions effectuées dans le cadre de courts projets de recherche. v Acknowledgements The work presented here could not have been done without the help and support of many people. I would first and foremost like to thank my advisors, Jean-Louis Rougier and André Surcouf for their continuous support and insight as well as for their good company. They made these 3 years feel like 1 and really focused my attention on the relevant topics when I started to feel lost in the diversity of subjects at hand. I would also like to thank Aloys Augustin and Victor Nguyen, who joined the 6Stor project as developpers under a Cisco tech fund. Aloys really fleshed out the crude code base that I wrote as a first prototype of 6Stor. We also had lengthy discussions about certain design or implementation details of 6Stor during which his insight helped me elaborate the global architecture. He also implemented RS3 – our storage scheduler – in 6Stor’s storage servers. Victor mainly worked on the 6Stor block device implementation and on the erasure-code based storage system replica cache. This section could not go without a hearty mention to Cisco and to Mark Townsley, who founded and runs Cisco’s PIRL (Paris Innovation and Re- search Lab), and recruited me first as a research intern then as a PhD student, and without whom this work would simply not exist. He has consistently been a driving force behind 6Stor, from the project’s origins to the end of my PhD. The same mention goes to Jérome Tollet, who piqued my curiosity on numerous occasions and subjects during our car rides or coffee breaks, and participated to the elaboration of RS3 with Aloys and me, in addition of be- ing a merry desk neighbour. My final thanks go to my fellow PhD students and friends, namely Jacques Samain, Yoann Desmouceaux, Marcel Enguehard, Mohammed Hawari and Hassen Siad. Whether we gathered around the lunch table, the coffee ma- chine or the babyfoot, they always kept the occasional dullness at bay and heavily contributed to making these 3 years truly special. vi Contents Declaration of Authorshipi Abstract iv Acknowledgementsv 1 What you should know about distributed storage systems6 1.1 The different types of distributed storage system architectures6 1.1.1 Network Attached Storage (NAS) and Storage Area Net- work (SAN).........................7 1.1.2 Peer-to-Peer (P2P) networks................7 1.1.3 Distributed Hash Tables (DHTs).............9 1.1.4 Master-Slaves architectures................ 13 1.1.5 Summarize......................... 16 1.2 Reliability in distributed storage systems............ 16 1.2.1 Mirroring.......................... 16 1.2.2 Replication......................... 17 1.2.3 Erasure Codes........................ 19 1.2.4 Erasure codes and replication: what is the trade-off.. 20 1.3 Consistency and consensus.................... 22 1.3.1 Theoretical frameworks.................. 22 Consistency and Availability: the CAP theorem.... 22 Database characteristics: ACID and BASE........ 24 Client-centric and data-centric consistency models... 25 1.3.2 Consensus and consistency: how to reach it....... 27 Consensus algorithms: Paxos and Raft.......... 27 Latency and Consistency, the (N,W,R) quorum model. 28 1.4 Examples of distributed storage systems............. 31 2 6Stor 33 2.1 Why we built 6Stor from scratch................. 34 2.1.1 Software layering...................... 34 2.1.2 Architectural reasons.................... 34 2.1.3 Ceph............................. 34 2.1.4 GFS.............................. 36 2.1.5 Scaling the metadata layer and embracing the hetero- geneity............................ 37 2.2 6Stor architecture.......................... 38 2.2.1 Architecture Description.................. 38 2.2.2 Attributing IPv6 prefixes to MNs............. 39 vii 2.2.3 6Stor: An IPv6-centric architecture............ 40 2.2.4 Description of basic operations.............. 42 2.2.5 Consistency......................... 45 2.3 Expanding or shrinking the cluster without impacting the clus- ter’s performance.......................... 47 2.3.1 Storage Nodes........................ 47 2.3.2 Metadata Nodes...................... 48 2.3.3 Availability and data transfer............... 48 2.4 Coping with failures: reliability and repair model....... 50 2.4.1 Reliability.......................... 50 2.4.2 Reacting to failures..................... 50 Short failure......................... 50 Definitive failure.....................