Towards a Solution to the Red Wedding Problem

Towards a Solution to the Red Wedding Problem Christopher S. Meiklejohn Heather Miller Universite´ catholique de Louvain Northeastern University Instituto Superior Tecnico´ Ecole´ Polytechnique Fed´ erale´ de Lausanne Zeeshan Lakhani Comcast Cable Abstract rative applications which rely on replicated, shared state. Edge computing promises lower latency interactions for The first generation of edge computing services pro- clients operating at the edge by shifting computation vided by Content Delivery Networks (CDNs) let cus- away from Data Centers to Points of Presence which are tomers supply code that will be run at PoPs during the more abundant and located geographically closer to end HTTP request-response cycle. Akamai [1] allows cus- users. However, most commercially available infrastruc- tomers to author arbitrary JavaScript code that operates ture for edge computing focuses on applications without in a sandboxed environment and is permitted to mod- shared state. In this paper, we present the Red Wedding ify responses before the content is returned to the user. Problem, a real-world scenario motivating the need for Fastly [7] gives users the ability to extend their CDN stateful computations at the edge. We sketch the design with similar rewrite functionality by authoring code in and implementation of a prototype database for opera- the Varnish Configuration Language, which is also ex- tion at the edge that addresses the issues presented in the ecuted in a restricted environment. In both cases, cus- Red Wedding Problem and present issues around imple- tomers are not allowed to inspect or modify content at menting our solution on commercial edge infrastructure the PoP and can only modify requests to and from the due to limitations in these offerings. origin and to and from the end user. 1 Edge Computing Second generation services, such as Ama- zon’s Lambda [2] and its extension to the edge, Edge computing promises lower latency interactions for Lambda@Edge, enables the customer to write stateless clients operating at the edge by shifting computation functions that either run inside one of Amazon’s region away from Data Centers (DCs) to Points of Presence availability zones or at any number of the hundreds (PoPs) which are more abundant and located geograph- of PoPs it has through its CloudFront caching service. ically closer to end users. Not only do PoPs reduce While Lambda provides binary execution in a restricted latency, but they also serve to alleviate load on origin containerized environment, Lambda@Edge is limited to servers enabling applications to scale to sizes previously Node.JS applications. Both Google’s Cloud Functions unseen. and Microsoft Azure’s Functions at IoT Edge provide Recently, large-scale cloud providers have started pro- similar services to Lambda and Lambda@Edge. viding commercial access to this edge infrastructure through the use of “serverless” architectures, giving ap- This paper defines a real-world scenario for motivat- plication developers the ability to run arbitrary code at ing stateful computations at the edge, titled the Red Wed- these PoPs. To support this, client code is normally ex- ding Problem. We sketch the design and implementa- ecuted inside of a transient container where the application of a prototype eventually consistent, replicated peer- tion developer must resort to using storage that’s typi- to-peer database operating at the edge using Amazon cally provided only at the DC. Therefore, to take most Lambda that is aimed at addressing the issues presented advantage of the edge, application developers are incen- by the Red Wedding Problem. In realizing our prototype tivized to use these architectures for applications where on real-world edge infrastructure, we ran into numerous there is no shared state, thereby ruling out one of the most limitations imposed by Amazon’s infrastructure that are common types of applications developed today: collabo- presented here. 2 The Red Wedding Problem 3 Solving the Red Wedding Problem We present the Red Wedding Problem12: an industry use Our proposed solution is presented in Figure 1. In solv- case presented to us by a large commercial CDN around ing the Red Wedding Problem, we had the following de- the handling of traffic spikes at the edge. sign considerations: Game of Thrones [9] is a popular serial fantasy drama • Application logic at the edge. Application logic that airs every Sunday night on HBO in the United for authentication and authorization, as well as States. During the hours leading up to the show, through- for mapping user requests into database reads and out the hour-long episode, and into the hours following writes, must also live at the PoP; the airing, the Game of Thrones Wiki [6] experiences a whirlwind of traffic spikes. During this occurrence, ar- • Elastic replica scalability at the PoP. To avoid ticles related to characters that appear in that episode, moving the bottleneck from the origin to the PoP, the page for the details of the episode, and associated there must be elastic scalability at the PoP that sup- pages to varying concepts referenced in the episode un- ports the creation of additional replicas on demand; dergo traffic spikes—read operations upon viewing re- • Inter-replica communication. Under the assump- lated pages—and write spikes while updating related tion that communication within the PoP is both low pages as events in the episode unfold. latency and cost-effective, as traffic never needs to While handling read spikes is what CDNs were de- leave a facility, replicas executing within the PoP signed for, the additional challenge of handling write should be able to communicate with one another for spikes is what makes the Red Wedding Problem inter- data sharing. This avoids expensive round-trip com- esting. More specifically, the Red Wedding Problem re- munication with the origin server; quires that the programmer be made aware of and opti- mize for the following constraints: • Convergent data structures. As modifications will be effecting data items concurrently, the data model needs to support objects that have well-defined • Low latency writes. By accepting and servicing merge semantics to ensure eventual convergence; writes at the PoP, user’s experience lower latency requests when compared to an approach that must • Origin batching. To alleviate load on the origin route writes to the origin; server updates should be able to be combined to re- move redundancy and leverage batching to increase • Increased global throughput. By accepting writes throughput when propagating updates periodically at the PoP, and avoiding routing writes through to to the origin server. the origin DC, write operations can be periodically sent to the origin in batches, removing the origin Our solution assumes that application code is con- DC as a global throughput bottleneck. tainerized and can be deployed to the PoP to interpose on requests to the origin: enabling the application to scale However, several design considerations of the Red independently at the edge. As demand from end users Wedding Problem make the problem difficult to solve. ramps up, more instances of the application are spawned to handle user requests. These instances of the applica- • Storing state. How should state be stored at the tion generate read and write traffic that is routed to the edge, especially when leveraging “serverless” in- database. In traditional architectures, these application frastructures at the edge which are provided by most instances would normally communicate with a database cloud providers today? operating at the DC. We use a containerized database to enable data stor- • Arbitrating concurrency. How should concurrent age at the PoP. Each of these database instances would writes be arbitrated when accepting writes at the be scaled in the same manner as the application code edge to minimize conflicts and maximize batching? running at the PoP, and state would be bootstrapped ei- • Application logic. As clients do not communicate ther from other replicas operating in the same facility, with the database directly in most applications, how or the DC. These instances of the database would be in- should application logic be loaded at the edge? stantiated on demand and interpose on read and write operations from the application code inside of the PoP. By leveraging other replicas, the system can increase the probability of using the freshest state possible. 1Private communication, Fastly. 2Many examples of the Red Wedding Problem exist: live European Finally, the use of concurrent data structures can be football game commentary posted on Reddit is one such example. used at the edge to avoid conflicting updates that cannot Writes routed to origin us-west-1 cf1 Traditional architecture cf2 Read serviced by PoP cache State is propagated between Lambda State is batched to invocations origin server cf3 Read and writes cf4 Ideal handled by the PoP architecture eu-west-1 Figure 1: Architecture diagram with both the traditional and ideal architectures presented, where (traditional) writes are sent to the origin and reads are serviced by the edge; and (ideal) writes are stored in transient database instances at the edge as the load increases. be merged. For instance, several previous works [5, 13, users specify the events that uploaded code should re- 3] have identified how to build abstract data types where spond to and users’ application code is guaranteed to au- all operations commute or have predefined merge func- tomatically scale, when necessary,

Load more