Designing an Elastic and Scalable Social Network Application

Universitécatholique de Louvain Louvain School of Engineering Computing Science Engineering Department Designing an elastic and scalable social network application Promoter: Master thesis presented for the Pr. Peter Van Roy obtention of the grade of master in Readers: computer engineering, option Pr. Marc Lobelle networking and security, by Boris Mej´ıas Xavier De Coster and Matthieu Ghilain. Louvain-la-Neuve Academic year 2010 - 2011 ii Acknowledgments The Bwitter team would like to thank Pr. Peter Van Roy for his help and insightful comments. We also want to thank Boris Mej´ıasfor his guidance and availability during the whole project. We thank Florian Schintke, member of the Scalaris developing team, for his help during our analysis of Scalaris and the numerous answers he provided to our questions. We also thank Quentin Hunin for his support and constructive feedback during the last few weeks of the redaction. Finally, we would also like to thank our families, our friends and our girlfriends, Inèsand Lorraine, for their unconditional support and encouragement. iii Abstract The amount of traffic on web based social networks is very difficult to predict. In order to avoid wasting resources during low traffic periods or being overloaded during peak periods, it would be interesting to adapt the amount of resources dedicated to the service. In this work we detail the design and implementation of our own social network application, called Bwitter. Our first goal is to make Bwitter performance scales with the number of machines we dedicate to it. Our second goal is linked to our first one, we want to make Bwitter elastic so that it can react to flash crowds without suspending its services by adding resources in order to handle this load. To achieve the desired scalability and elasticity, Bwitter is implemented on a scalable key/value datastore with transactional capabilities running on the Cloud. During our tests we study the behaviour of Bwitter using the Scalaris datastore and having both running on Amazon's Elastic Compute Cloud. We show that the performance of Bwitter increases almost linearly with the number of resources we allocate to it. Bwitter is also able to improve its performance significantly in a matter of minutes. ii Contents I The Project vii 1 Introduction 1 1.1 Social networks . .1 1.2 Scalable Data Stores . .2 1.3 The Cloud . .3 1.4 The Bwitter project . .3 1.4.1 Twitter . .3 1.4.2 Bwitter . .5 1.4.3 Contributions . .5 1.5 Roadmap . .6 2 State-of-the-art 7 2.1 Scalable datastores . .7 2.1.1 Key/value Stores . .7 2.1.2 Document Stores . .8 2.1.3 Extensible Record Stores . .9 2.1.4 Relational Databases . .9 2.2 Peer-to-peer systems . 11 2.3 DHT . 13 2.4 Study of scalable key/value stores properties . 13 2.4.1 Network topology . 14 2.4.2 Storage abstraction . 15 2.4.3 Replication strategy and consistency model . 16 2.4.4 Transactions . 17 2.4.5 Churn . 18 iii 2.4.6 Security . 19 2.5 The Cloud . 20 2.6 Conclusion . 21 3 The Architecture 23 3.1 The requirements . 23 3.1.1 Non-Functional requirements . 23 3.1.2 Functional requirements . 25 3.1.3 Conclusion . 26 3.2 Architecture . 26 3.2.1 Open peer-to-peer architecture . 27 3.2.2 Cloud Based architecture . 29 3.2.3 The popular value problem . 30 3.2.4 Conclusion . 32 4 The Datastore 33 4.1 The datastore choice . 33 4.1.1 Identifying what we need . 33 4.1.2 Our two choices . 34 4.2 General Design . 36 4.3 Design of the datastore . 38 4.3.1 Key uniqueness . 39 4.3.2 Push approach design details . 39 4.3.3 The Pull Variation . 44 4.3.4 Conclusion . 46 4.4 Running multiple services using the same datastore . 46 4.4.1 The unprotected data problem . 47 4.4.2 Key already used problem . 50 4.4.3 Conclusion . 51 5 Algorithms and Implementation 53 5.1 Implementation of the cloud based architecture . 53 5.1.1 Open peer-to-peer implementation . 53 5.1.2 First cloud based implementation . 54 5.1.3 Final cloud based implementation . 55 iv 5.2 Nodes Manager . 57 5.3 Scalaris Connections Manager . 59 5.3.1 Failure handling . 61 5.4 Bwitter Request Handler . 62 5.4.1 The push approach . 63 5.4.2 The pull approach . 75 5.4.3 Theoretical comparison of Pull and Push approach . 78 5.5 Conclusion . 86 6 Experiments 87 6.1 Working with Amazon . 87 6.1.1 Choosing the right instance type . 87 6.1.2 Choosing an AMI . 89 6.1.3 Instance security group . 89 6.1.4 Constructing Scalaris AMI . 89 6.2 Working with Scalaris . 90 6.2.1 Launching a Scalaris ring . 90 6.2.2 Scalaris performance analysis . 91 6.3 Bwitter tests . 107 6.3.1 Experiment measures discussion . 108 6.3.2 Push design tests . 110 6.3.3 Pull scalability test . 125 6.3.4 Conclusion: Pull versus Push . 127 6.4 Conclusion . 129 7 Conclusion 131 7.1 Further work . 132 II The Annexes 137 8 Beernet Secret API 139 8.1 Without replication . 139 8.1.1 Put . 139 8.1.2 Delete . 139 8.2 With replication . 140 v 8.2.1 Write . 140 8.2.2 CreateSet . 140 8.2.3 Add . 141 8.2.4 Remove . 141 8.2.5 DestroySet . 142 9 Bwitter API 143 9.1 User management . 143 9.1.1 createUser . 143 9.1.2 deleteAccount . 144 9.2 Tweets . 144 9.2.1 postTweet . 144 9.2.2 reTweet . 145 9.2.3 reply . 145 9.2.4 deleteTweet . 146 9.3 Lines . 147 9.3.1 addUser . 147 9.3.2 removeUser . 147 9.3.3 allUsersFromLine . 148 9.3.4 allTweet . 148 9.3.5 getTweetsFromLine . 149 9.3.6 createLine . 150 9.3.7 deleteLine . 150 9.3.8 getLineNames . 151 9.4 Lists . 151 9.4.1 addTweetToList . 151.

Load more