Probing Web Apis to Build Detailed Workload Profiles

From the Outside Looking In: Probing Web APIs to Build Detailed Workload Profiles Nan Deng, Zichen Xu, Christopher Stewart and Xiaorui Wang The Ohio State University Abstract tures ranging from storage to maps to social network- ing [13]. For providers, web APIs strengthen brand and Cloud applications depend on third party services for fea- broaden user base without the cost of programming new tures ranging from networked storage to maps. Web- features. In 2013, the Programmable Web API index based application programming interfaces (web APIs) grew by 32% [7], indexing more than 11,000 APIs. make it easy to use these third party services but hide details about their structure and resource needs. How- Web APIs hide the underlying system’s structure and ever, due to the lack of implementation-level knowl- resource usage from cloud application developers, al- edge, cloud applications have little information when lowing API providers to manage resources as they see these third party services break or even unproperly im- fit. For example, a storage API returns the same data plemented. This paper outlines research to extract work- whether the underlying system fetched the data from load details from data collected by probing web APIs. DRAM or disk. However, when API providers man- The resulting workload profiles will provide early warn- age their resources poorly, applications that use their API ing signs when web APIs have broken component. Such suffer. Early versions of the Facebook API slowed one information could be used to build feedback loops to application’s page load times by 75% [25]. When Face- deal with possible high response times of web APIs. It book’s API suffered downtime, hundreds of applications, will also help developers choose between competing web including CNN and Gawker, went down as well [33]. APIs. The challenge is to extract profiles by assuming While using a web API, developers would like to know if that the systems underlying web APIs use common cloud the underlying system is robust. That is, will the API pro- computing practices, e.g., auto scaling. In early results, vide fast response times during holiday seasons? How we have used blind source separation to extract per-tier will its resource needs grow over time? Depending on delays in multi-tier storage services using response times the answers, developers may choose competing APIs or collected from API probes. We modeled median and use the API sparingly [13, 33]. 95th percentile delay within 10% error at each tier. Fi- nally, we set up two competing storage services, one of An API’s workload profile describes its canonical re- which used a slow key-value store. We probed their APIs source needs and can be used to answer what-if ques- and used our profiles to choose between the two. We tions. Based on APIs’ recent profiles, cloud applications showed that looking at response times alone could lead are able to adjust their behavior accordingly to either to the wrong choice and that detailed workload profiles mask low response times of slow APIs, or take advantage provided helpful data. of fast APIs. Prior research on workload profiling used 1) white box methods, e.g., changing the OS to trace request contexts across distributed nodes [22, 26] or 2) black 1 Introduction box methods that inferred resource usage from logs [29]. Both approaches would require data collected within a Cloud applications enrich their core content by using ser- API provider’s system but, as third parties, providers vices from outside, third party providers. Web applica- have strong incentives to provide only good data about tion programming interfaces (web APIs) enable such in- their service. Without trusted inside data, workload pro- teraction, allowing providers to define and publish profiles must be forged by probing the API and collecting tocols to access their underlying systems. It is now com- data outside of the underlying system (e.g., client-side mon for cloud applications to use 7 to 25 APIs for fea- observed response times). 1 In this paper, we propose research on creating work- that increase response times can hurt revenues more than load profiles for web APIs. Taken by itself, data col- they reduce costs. Developers could use response times lected by probing web APIs under constrains the wide measured by probing the API to assess the API’s value. range of systems that could produce such data. However, However, response times reflect current usage patterns. when we combined that data with constraints imposed If request rates or mixes change, response times may by common cloud computing practices, we have created change a lot. The challenge for our research is to extract usable and accurate workload profiles. One cloud com- profiles that apply to a wide range of usage patterns. puting practice that we have used is auto scaling which A key insight is that common cloud computing prac- constrains queuing delays, making processing time a key tices constrain a web API’s underlying systems. Web factor affecting observed response times. In early work, APIs hosted on the cloud are implicitly releasing data we have found success profiling processing times with about their system design. In this section, we describe blind source separation methods. Specifically, we used paradigms widely accepted as best practices in cloud observed response times as input for independent com- computing. Also, they constrain underlying system ponent analysis (ICA) and extracted normalized process- structures and resource usage enough to extract usable ing times in multi-tier systems. These per-tier distribu- workload profiles. tions are our workload profiles. Tiered Design: The systems that power web APIs We validated our profiles with a multi-tier storage ser- must support concurrent requests. They use distributed vice. We used CPU usage thresholds to scale out a Redis and tiered systems where each request traverses a few cache and database on demand. Our profiles captured nodes across multiple tiers (a tier is a software platform, 50th, 75th and 95th percentile service times within 10% e.g., Apache Httpd) and tiers spread across many nodes. of direct measurements. We showed that our profiles Client-side observed response times are mixtures of per- can help developers choose between competing APIs tier delays. Multiple tiers confound response times since by setting up two storage services. One used Apache relatively slow tiers can be masked by other tiers, hiding Zookeeper as a cache instead of Redis, a mistake re- the effect of the slow tier on response time [30]. In the ported in online forums [6,8]. Zookeeper is a poor choice cloud, tiers are divided by Linux processes, containers for an object cache because it fully replicates content on or virtual machines. Each tier’s resource usage can be all nodes. We lowered the request arrival rate for the ser- tracked independently. vice with Zookeeper cache such that our API probes ob- Auto Scaling: APIs hosted the cloud can add and re- served lower average and 95th percentile response times move resources on demand. Such auto scaling reduces compared to the other service. These response times variability in queuing delay, i.e., the portion of response could be misleading because the service that used Re- time spent waiting for access to resources at each tier. dis was more robust to increased request rates. Fortu- Since per-tier delays and their variance can be reduced by nately, our workload profiles revealed a warning sign: auto scaling [17, 18, 20], it could further reduce the vis- Tier 1 processing times on the service using Zookeeper ibility of a poorly implemented component to outsiders. had larger variance than expected. This signaled that too Meanwhile, the stability of per-tier delays caused by auto many resources, i.e., not just DRAM on a single node, scaling [17] gives users opportunity to collect and ana- were involved in processing. lyze more consistent response times with less considera- The remainder of this paper is arranged as follows: We tion about the changes of the per-tier delay distributions. discuss cloud computing practices that make web API Make the Common Case Fast: To keep response times profiling tractable in Section 2. We make the case for low, API providers trim request data paths. In the com- blind source separation methods in Section 3 and then mon case, a request touches as few nodes and resources present promising early results with ICA in Section 4. as possible with each tier performing only operations Related work on workload profiling is covered in Sec- that affect the request’s output. Well implemented APIs tion 5. We conclude by discussing future directions for make the common case as fast as possible and uncom- the proposed research. mon cases rare. This design philosophy skews processing times. Imbalanced processing time distributions are 2 The Cloud Constrains Workloads inherently non-Gaussian. Alternative Research Directions: Our research treats Salaries for programmers and system managers can data sharing across administrative domains as a funda- make up 20% of an application’s total expenses [32]. mental challenge. An alternative approach would en- Web APIs offer value by providing new features without able data sharing by building trusted data collection using costly programmer time. However, slow APIs can and dissemination platforms. Developers would prefer drive away customers. Shopping carts abandoned due to APIs hosted on such platforms and robust APIs would slow response times cost $3B annually [14]. Web APIs be used most often. The challenge would be enticing 2 API providers to use the platform. Another approach mon case fast leads to imbalanced, non-Gaussian pro- would have API providers support service level agree- cessing times. Auto scaling ensures that processing times ments with punitive consequences for poor performance.

Load more