Concurrent Programming for Scalable Web Architectures
Total Page:16
File Type:pdf, Size:1020Kb
Concurrent Programming for Scalable Web Architectures Diploma Thesis VS-D01-2012 Institute of Distributed Systems Faculty of Engineering and Computer Science Ulm University Benjamin Erb April óþ, óþÕó is thesis has been written by Benjamin Erb in óþÕÕ/óþÕó as a requirement for the completion of the diploma course Media Informatics at Ulm University. It has been submitted on April óþ, óþÕó. Benjamin Erb Mail: [email protected] WWW: http://www.benjamin-erb.de Institute of Distributed Systems Faculty of Engineering and Computer Science, Ulm University James-Franck-Ring ÉþÕ Ulm, Germany is work is licensed under a Creative Commons Attribution-ShareAlike ì.þ Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/ or send a letter to Creative Commons, ÕßÕ Second Street, Suite ìþþ, San Francisco, California, ɦÕþ¢, USA. Title Image Bustling Beijing by Trey Ratcli, released under CC-BY-NC-SA ó.þ license. http://www.flickr.com/photos/stuckincustoms/5069047950/ http://creativecommons.org/licenses/by-nc-sa/2.0/ Icon Sets Iconic (http://somerandomdude.com/work/iconic/) by P.J. Onori, released under CC-BY-SA ì.þ license. http://creativecommons.org/licenses/by-sa/3.0/) Picol (http://picol.org) by Melih Bilgil, released under released under CC-BY-SA ì.þ license. http://creativecommons.org/licenses/by-sa/3.0/ Abstract Web architectures are an important asset for various large-scale web applications, such as social networks or e-commerce sites. Being able to handle huge numbers of users concurrently is essential, thus scalability is one of the most important features of these architectures. Multi- core processors, highly distributed backend architectures and new web technologies force us to reconsider approaches for concurrent programming in order to implement web applications and full scalability demands. While focusing on dierent stages of scalable web architectures, we provide a survey of competing concurrency approaches and point to their adequate usages. iii Preface About this Work My name is Benjamin Erb and I have been studying Media Informatics at Ulm University since óþþä. is work represents a major requirement for the completion of my diploma course in óþÕó. For a long time, I have been taking a great interest in web technologies, scalable architectures, distributed systems and programming concurrency. As a consequence, the topic of my thesis covers most of these interests. In fact, it considers the overlap of these subjects when it comes to the design, implementation and programming of scalable web architectures. I hope to provide a comprehensive introduction to this topic, which I have missed so far. As such a primer might also be interesting for many others, I am happy to release my thesis under a free license to the general public. is thesis incorporates both academic research papers and practically orientated publications and resources. In essence, I aimed for a survey of dierent approaches from a conceptual and theoretical perspective. Hence, a quantitative benchmarking of concrete technologies was out of scope for my work. Also, the extents of the subjects brought up only allowed for a brief overview. e bibliography and the referenced resources provide a good starting point for further readings. I am very interested in your feedback, your thoughts on the topic and your ideas! Feel free to contact me (http://www.benjamin-erb.de) or get in touch with me via Twitter: @b_erb v vi Acknowledgements First of all, I would like to thank my advisor Jörg Domaschka for his steady and precious support. He gave me substantial liberties, but was always available for a lot of good advices and formative chit-chats when I needed them. I also thank my supervisors, Prof. Dr. Hauck and Prof. Dr. Weber, especially for allowing me to work on such a conceptual and comprehensive topic for my thesis. Aer all, both of them need to be held responsible for my interests in these topics to some degree. I want to thank all of my proofreaders—namely and alphabetically— Christian Koch, Katja Rogers, Linda Ummenhofer, Lisa Adams, Matthias Matousek, Michael Müller, Nicolai Waniek and Timo Müller. I also want to thank Jan Lehnardt and Lena Herrmann who reinforced my decision to release this thesis. Finally, I want to say to everyone, who—directly or indirectly—helped me and supported me during the time I wrote this thesis: ank you! A labyrinth of symbols...An invisible labyrinth of time. — Jorge Luis Borges, e Garden of Forking Paths Contents 1 Introduction1 1.1 Motivation ........................................ 1 1.2 Scope of this Thesis ................................... 3 1.3 Methodology of the Study ............................... 3 1.4 Road Map ......................................... 4 2 The World Wide Web, Concurrency and Scalability5 2.1 The World Wide Web .................................. 5 2.1.1 Uniform Resource Identifiers ........................ 5 2.1.2 The Hypertext Transfer Protocol ...................... 6 2.1.3 Web Formats .................................. 9 2.2 Web Applications .................................... 10 2.2.1 Web Sites .................................... 10 2.2.2 Web Services .................................. 11 2.3 Concurrency ....................................... 14 2.3.1 Concurrency and Parallelism ........................ 14 2.3.2 Models for Programming Concurrency .................. 16 2.3.3 Synchronization and Coordination as Concurrency Control ...... 17 2.3.4 Tasks, Processes and Threads ........................ 18 2.3.5 Concurrency, Programming Languages and Distributed Systems .. 19 2.4 Scalability ......................................... 19 2.4.1 Horizontal and Vertical Scalability ..................... 20 2.4.2 Scalability and other Non-functional Requirements ........... 20 2.4.3 Scalability and Concurrency ......................... 21 2.4.4 Scalability of Web Applications and Architectures ............ 22 2.5 Summary ......................................... 22 3 The Quest for Scalable Web Architectures 23 3.1 Traditional Web Architectures ............................ 23 3.1.1 Server-Side Technologies for Dynamic Web Content .......... 23 3.1.2 Tiered Architectures .............................. 25 ix x Contents 3.1.3 Load-Balancing ................................ 26 3.2 Cloud Architectures ................................... 29 3.2.1 Cloud Computing ............................... 29 3.2.2 PaaS and IaaS Providers ........................... 30 3.3 An Architectural Model for Scalabale Web Infrastructures ............ 32 3.3.1 Design Guidelines and Requirements ................... 32 3.3.2 Components .................................. 34 3.3.3 Critical Reflection of the Model ....................... 41 3.4 Scaling Web Applications ............................... 41 3.4.1 Optimizing Communication and Content Delivery ........... 41 3.4.2 Speeding up Web Site Performance .................... 42 3.5 Summary ......................................... 43 4 Web Server Architectures for High Concurrency 45 4.1 Overview ......................................... 45 4.1.1 Request Handling Workflow ......................... 46 4.1.2 The C10K Problem ............................... 47 4.1.3 I/O Operation Models ............................. 47 4.2 Server Architectures .................................. 49 4.2.1 Thread-based Server Architectures ..................... 49 4.2.2 Event-driven Server Architectures ..................... 52 4.2.3 Combined Approaches ............................ 56 4.2.4 Evaluation .................................... 58 4.3 The Case of Threads vs. Events ............................ 59 4.3.1 The Duality Argument ............................ 59 4.3.2 A Case for Threads ............................... 61 4.3.3 A Case for Events ............................... 62 4.3.4 A Conflation of Distinct Concepts ...................... 63 4.3.5 Conclusion ................................... 65 4.4 Summary ......................................... 66 5 Concurrency Concepts for Applications and Business Logic 69 5.1 Overview ......................................... 69 5.2 Concurrency Based on Threads, Locks and Shared State ............. 72 5.2.1 The Implications of Shared and Mutable State .............. 72 5.2.2 Case Study: Concurrency in Java ...................... 74 5.2.3 Multithreading and Locks for Concurrent Application Logic ...... 76 5.3 Concurrency via Software Transactional Memory ................. 78 5.3.1 Transactional Memory ............................ 78 5.3.2 Software Transactional Memory ...................... 79 Contents xi 5.3.3 The Transactional Memory / Garbage Collection Analogy ....... 80 5.3.4 Case Study: Concurrency in Clojure ..................... 81 5.3.5 STM for Concurrent Application Logic ................... 84 5.4 Actor-based Concurrency ............................... 84 5.4.1 The Actor Model ................................ 84 5.4.2 Actor Implementations for Concurrent Programming ......... 85 5.4.3 Programming with Actors .......................... 87 5.4.4 Case Study: Concurrency in Scala ...................... 88 5.4.5 Actors for Concurrent Application Logic .................. 89 5.5 Event-driven Concurrency ............................... 90 5.5.1 Event-driven Architectures .......................... 91 5.5.2 Single-threaded Event-driven Frameworks ................ 91 5.5.3 Case Study: Concurrency in node.js .................... 93 5.5.4 Event-driven Concurrent Application Logic ................ 94 5.6 Other Approaches