On Reducing Latency in Geo-Distributed Systems Through State Partitioning and Caching

Université de Neuchâtel Faculté des Sciences Institut d’Informatique On Reducing Latency in Geo-Distributed Systems through State Partitioning and Caching par Raluca Halalai Thèse présenté à la Faculté des Sciences pour l’obtention du grade de Docteur ès Sciences Acceptée sur proposition du jury: Prof. Pascal Felber, directeur de thèse Université de Neuchâtel, Suisse Prof. Philippe Cudré-Mauroux, Université de Fribourg, Suisse Prof. Fernando Pedone, Université de la Suisse italienne, Suisse Prof. Etienne Rivière, Université catholique de Louvain, Belgique Dr. Valerio Schiavoni, Université de Neuchâtel, Suisse Prof. François Taïani, Université de Rennes 1, France Soutenue le 14 mai 2018 Faculté des Sciences Secrétariat-décanat de Faculté Rue Emile-Argand 11 2000 Neuchâtel – Suisse Tél : + 41 (0)32 718 21 00 E-mail : [email protected] IMPRIMATUR POUR THESE DE DOCTORAT La Faculté des sciences de l'Université de Neuchâtel autorise l'impression de la présente thèse soutenue par Madame Raluca HALALAI Titre: “On Reducing Latency in Geo-Distributed Systems through State Partitioning and Caching” sur le rapport des membres du jury composé comme suit: • Prof. Pascal Felber, directeur de thèse, Université de Neuchâtel, Suisse • Dr Valerio Schiavoni, Université de Neuchâtel, Suisse • Prof. François Taïani, Université de Rennes 1, France • Prof. Fernando Pedone, Université de la Suisse italienne, Lugano, Suisse • Prof. Etienne Rivière, Université catholique de Louvain, Belgique • Prof. Philippe Cudré-Mauroux, Université de Fribourg, Suisse Neuchâtel, le 15 mai 2018 Le Doyen, Prof. R. Bshary Imprimatur pour thèse de doctorat www.unine.ch/sciences Acknowledgements This work would not have been possible without the people who have supported me throughout this journey. I am deeply grateful to my advisor, Pascal Felber, for his patience and wisdom, and for always encouraging me while also giving me the freedom to become independent in pursuing my goals. I feel very fortunate to having worked with him. I thank my outstanding thesis committee: Philippe Cudré-Mauroux, Fernando Pedone, Etienne Rivière, Valerio Schi- avoni, and François Taïani. Their insightful comments and advice helped crystallize the vision of this thesis. All the work presented here is the result of collaboration with many incredibly bright people to whom I am thankful for the many insightful discussions that brought clarity to the most difficult problems. Last but not least I thank my family and friends for their patience, support, and encouragement throughout these years. Résumé Les systèmes distribués modernes sont de plus en plus grands, et sont déployés dans plusieurs régions géographiques. L’objectif final de tels systèmes est de fournir des services à leurs utilisateurs ainsi que haute disponibilité et bonne performance. Cette thèse propose des techniques pour réduire le latence perçue par des utilisateurs. Pour commencer, nous considérons les systèmes qui utilisent la technique de réplication de machines à états afin de garantir la cohérence des données. La technique de réplication de machines à états copie un service à plusieurs emplacements et coordonne les répliques afin de sérialiser toutes les commandes émis par des clients. La coordination à grande échelle a un impact significatif sur la performance du système. Nous étudions comment le partitionnement d’état peut aider à réduire les performances sans affecter la sémantique du système. Premièrement, nous formalisons les conditions dans lesquelles un service est partitionnable et proposons une approche de partitionnement d’état générique. Nous partitionnons un service de coordination géo-distribué et montrons qu’il surpasse son homologue non partitionné, tout en offrant les mêmes garanties. Nous augmentons notre système avec un partitionnement d’état dynamique, qui s’adapte à la charge de travail. Notre évaluation montre que le partitionnement d’état dynamique a un impact positif sur les performances du notre système de fichiers. Finalement, nous étudions le compromis entre la latence et les coûts de stockage dans les systèmes de stockage qui utilisent des techniques de codage d’effacement. Afin d’améliorer les performances de lecture, les systèmes de stockage utilisent des caches qui sont proches des clients. Cependant, les stratégies de mise en cache traditionnelles ne sont pas conçu pour les particularités du codage d’effacement et ne sont pas bien adaptés à ce scénario. Nous avons proposé un algorithme pour mettre en cache des données codées et nous l’avons utilisé pour implémenter une système de mise en cache basée sur Memcached. Notre algorithme reconfigure le cache en fonction de la charge de travail et peut surpasser la performance des po- litiques de mise en cache traditionnelles comme Least Recently Used et Least Frequently Used. Mots clés : systèmes géo-distribués, cohérence, partitionnement d’état, mise en cache, codage d’effacement Abstract Modern distributed systems are increasingly large, spanning many datacenters from different geographic regions. The end goal of such systems is to provide services to their users with high availability and good performance. This thesis proposes approaches to reduce the access latency perceived by end users. First, we focus on systems that rely on the state machine replication approach in order to guarantee consistency. State machine replication copies a service at multiple physical loca- tions and coordinates replicas – possibly from distant regions, in order to serialize all requests issued by clients. Coordination at large scale has a significant impact on the performance of the system. We investigate how state partitioning can help reduce performance without breaking the semantics of the system. First, we formalize conditions under which a service is partitionable and proposed a generic state partitioning approach. We build a partitioned geo-distributed coordination service and show that it outperforms its non-partitioned coun- terpart, while providing the same guarantees. We further apply state partitioning in order to build a geo-distributed file system, which performs comparable to other de-facto industry implementations. We augment our system with dynamic state partitioning, which moves files among data centers in order to adapt to workload patterns. Our experiments show that performing state partitioning on the fly has a positive impact on the performance of the file system when the workload exhibits access locality. Second, we investigate the tradeoff between latency and storage cost in storage systems that employ erasure coding techniques. In order to improve read performance, storage systems often use caches that are close to clients. However, traditional caching policies are not designed for the particularities of erasure coding and are not well-suited for this scenario. We proposed an algorithm for caching erasure-coded data and use it to implement a caching layer based on Memcached in front of the Amazon S3 storage system. Our caching algorithm reconfigures the cache based on workload patterns and is able to outperform traditional caching policies such as Least Recently Used and Least Frequently Used. Keywords: geo-distributed systems, strong consistency, state partitioning, caching, erasure coding Contents 1 Introduction 1 1.1 Problem definition....................................2 1.2 Proposed solution.....................................2 1.2.1 State partitioning in geo-distributed systems................2 1.2.2 Workload-aware state partitioning.......................3 1.2.3 Caching tailored to erasure-coded storage systems.............3 1.2.4 Summary of results................................4 1.3 Dissertation plan.....................................5 2 Background and Related Work7 2.1 CAP theorem........................................7 2.2 Replication mechanisms for strong consistency...................8 2.3 State partitioning.....................................9 2.4 Adaptive state partitioning............................... 11 2.5 Consistency in distributed file systems........................ 12 2.5.1 File systems with strong consistency..................... 12 2.5.2 File systems with weak consistency...................... 14 2.6 Storage cost-aware systems............................... 14 2.6.1 Erasure coding in storage systems....................... 15 2.6.2 Caching...................................... 15 2.6.3 Caching erasure-coded data.......................... 17 3 State Partitioning in Geo-Distributed Systems 19 3.1 Introduction........................................ 19 3.2 System model....................................... 20 3.3 Partitioning theorems.................................. 23 3.4 Protocols.......................................... 24 3.4.1 Initial construction................................ 24 3.4.2 A queue-based construction.......................... 25 3.4.3 Ensuring disjoint access parallelism...................... 25 3.5 ZooFence......................................... 28 3.5.1 Overview...................................... 28 3.5.2 Client-side library................................. 29 3.5.3 Executor...................................... 29 3.6 Evaluation......................................... 31 3.6.1 Concurrent queues service........................... 31 3.6.2 BookKeeper.................................... 33 3.7 Summary.......................................... 34 4 Workload-Aware State Partitioning 35 4.1 Introduction........................................ 35 4.2 System model and definitions.............................

Load more