Programming Clouds
Total Page:16
File Type:pdf, Size:1020Kb
Programming Clouds James Larus Microsoft Research One Microsoft Way Redmond, WA 98052 [email protected] Abstract. Cloud computing provides a platform for new software applications that run across a large collection of physically separate computers and free computation from the computer in front of a user. Distributed computing is not new, but the commodification of its hardware platform—along with ubiquitous networking; powerful mobile devices; and inexpensive, embeddable, network- able computers—heralds a revolution comparable to the PC. Software development for the cloud offers many new (and some old chal- lenges) that are central to research in programming models, languages, and tools. The language and tools community should embrace this new world as fer- tile source of new challenges and opportunities to advance the state of the art. Keywords: cloud computing, programming languages, software tools, optimi- zation, concurrency, parallelism, distributed systems. 1 Introduction As I write this paper, cloud computing is a hot new trend in computing. By the time you read it, the bloom may be off this rose, and with a sense of disillusionment at yet another overhyped fad, popular enthusiasm may have moved on to the next great idea. Nevertheless, it is worth taking a close look at cloud computing, as it represents a fundamental break in software development that poses enormous challenges for the programming languages and tools. Cloud computing extends far beyond the utility computing services offered by Amazon’s AWS, Microsoft’s Azure, or Google’s AppEngine. These services provide a foundation for cloud computing by supplying on-demand, internet computing re- sources on a vast scale and at low cost. Far more significant, however, is the software model this hardware platform enables; one in which software applications are exe- cuted across a large collection of physically separate computers and computation is no longer limited to the computer in front of you. Distributed computing is not new, but the commodification of its hardware platform—along with ubiquitous networking; powerful mobile devices; and inexpensive, embeddable, networkable computers— may bring about a revolution comparable to the PC. Programming the cloud is not easy. The underlying hardware platform of clusters of networked parallel computers is familiar, but not well supported by programming models, languages, or tools. In particular, concurrency, parallelism, distribution, and R. Gupta (Ed.): CC 2010, LNCS 6011, pp. 1–9, 2010. © Springer-Verlag Berlin Heidelberg 2010 2 J. Larus availability are long-established research areas in which progress and consensus has been slow and painful. As cloud computing becomes prevalent, it is increasingly imperative to refine existing programming solutions and investigate new approaches to constructing robust, reliable software. The languages and tools community has a central role to play in the success of cloud computing. Below is a brief and partial list of areas that could benefit from further research and development. The discussion is full of broad generalizations, so if I malign or ignore your favorite language or your research, excuse me in advance. 1. Concurrency. Cloud computing is an inherently concurrent and asynchronous computation, in which autonomous processes interact by exchanging messages. This architecture gives raise to two forms of concurrency within a process: • The first, similar to an operating system, provides control flow to respond to inher- ently unordered events. • The second, similar to a web server, supports processing of independent streams of requests. Neither use of concurrency is well supported by programming models or lan- guages. There is a long-standing debate between proponents of threads and event handling [1-3] as to which model best supports concurrency. Threads are close to a familiar, sequential programming model, but concurrency still necessitates synchronization to avoid unexpected state changes in the midst of an apparently sequential computation. Moreover, the high overhead of a thread and the cost of context switching limits concurrency and constrains system architectures. Event handlers, on the other hand, offer low overhead and feel more closely tied to the underlying events. However, handlers provide little program structure and scale poorly to large systems. They also require developers to explicitly manage pro- gram state. Other models, such as state machines or Actors, have not yet emerged in a general-purpose programming language. 2. Parallelism. Cloud computing runs on parallel computers, both on the client and server. Parallelism currently is the dominate approach to increasing processor per- formance without exceeding power dissipation limitations [4]. Future processors are likely to become more heterogeneous, as specialized functional units greatly increase performance or reduce power consumption for specific tasks. Parallelism, unfortunately, is a long-standing challenge for computer science. Despite four decades of experience with parallel computers, we have not yet reached consensus on the underlying models and semantics or provided adequate programming languages and tools. For most developers, shared-memory parallel programs are still written in the assembly language of threads and explicit synchro- nization. Not surprisingly, parallel programming is difficult, slow, and error-prone and will be a major impediment in developing high-performance cloud applications. The past few years have seen promising research on new, higher-level parallel programming models, such as transactional memory and deterministic execution [5, 6]. Neither is a panacea, but both abstractions could hide some complexities of parallelism. 3. Message passing. The alternative to shared-memory parallel programming is message passing, ubiquitous on the large clusters used in scientific and technical Programming Clouds 3 computing. Because of its intrinsic advantages, message passing will be the pri- mary parallel programming model for cloud computing as well. It scales across very large numbers of machines and is suited for distributed systems with long communications latencies. Equally important, message passing is a better pro- gramming model than shared memory as it provides inherent performance and cor- rectness isolation with clearly identified points of interactions. Both aspects con- tribute to more secure and robust software systems [7]. Message passing can be more difficult to program than shared memory, in large measure because it is not directly supported by many programming languages. Message-passing libraries offer an inadequate interface between the asynchronous world of messages and the synchronous control flow of procedure calls and re- turns. A few languages, such as Erlang, integrate message into existing language constructions such as pattern matching [8], but full support for messages requires communications contracts, such as Sing# [9], and tighter integration with the type system and memory model. 4. Distribution. Distributed systems are a well-studied area with proven solutions for difficult problems such as replication, consistency, and quorum. This field has fo- cused considerable effort on understanding the fundamental problems and in for- mulating efficient solutions. One challenge is integrating these techniques into a mainstream programming model. Should they reside in libraries, where developers need to invoke operations at appropriate points, or can they be better integrated into a language, so developers can state properties of their code and the run-time system can ensure correct execution? 5. High availability. The cloud end of cloud computing provides of services poten- tially used by millions of clients, and these services must be highly available. Fail- ures of systems used by millions of people are noteworthy events widely reported by the media. And, as these services become integrated into the fabric of everyday life, they become part of the infrastructure that people depend on for their busi- nesses, activities, and safety. High availability is not the same as high reliability, the focus of much research on detecting and eliminating software bugs. A reliable system that runs slowly un- der heavy load may fail to provide a necessary level of service. Conversely, com- ponents of a highly available system can fail frequently, but a properly architected system will continue to provide adequate levels of service [10]. Availability starts at the architecture level of the system, but programming lan- guages have an important role to play in the implementation. Existing language provide little support for systematically handling unexpected and erroneous condi- tions beyond exceptions, which are notoriously difficult to use properly [11]. Error handling is complex and delicate code that runs when program invariants are vio- lated, but it is often written as an afterthought and rarely thoroughly tested. Better language support, for example lightweight, non-isolated transactions, could help developers handle and recover from errors [12]. 6. Performance. Performance is primarily a system-level concern in cloud computing. Many performance problems involve shared resources running across large numbers of computers and complex networks. Few techniques exist to analyze a design or 4 J. Larus system in advance, to understand bottlenecks or predict performance. As a conse- quence, current practice is to build, overprovision, measure, tweak, and pray. One pervasive