From Cloud Computing to Sky Computing Ion Stoica and Scott Shenker UC Berkeley
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Building a SAN-Less Private Cloud with IBM Powervm and IBM Powervc
Front cover Building a SAN-less Private Cloud with IBM PowerVM and IBM PowerVC Javier Bazan Lazcano Stephen Lutz Redpaper International Technical Support Organization Building a SAN-less Private Cloud with IBM PowerVM and IBM PowerVC July 2018 REDP-5455-02 Note: Before using this information and the product it supports, read the information in “Notices” on page v. Third Edition (July 2018) This edition applies to Version 1, Release 4, Modification 1 of IBM Cloud PowerVC Manager for Software-Defined Infrastructure V1.1 (product number 5765-VCD). © Copyright International Business Machines Corporation 2017, 2018. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents Notices . .v Trademarks . vi Preface . vii Authors. vii Now you can become a published author, too! . viii Comments welcome. viii Stay connected to IBM Redbooks . viii Chapter 1. Architecture of a software-defined infrastructure solution with IBM Power Systems servers and IBM PowerVC . 1 1.1 Software-defined infrastructure . 2 1.2 The motivation behind software-defined infrastructure . 2 1.3 Use cases . 4 1.3.1 Use case 1: Building a storage area network-less cloud . 4 1.3.2 Use case 2: Building an iSCSI-backed cloud . 5 1.3.3 More use cases. 5 1.4 Rack topologies and components . 6 1.4.1 Starter cloud . 7 1.4.2 Mini cloud . 8 1.4.3 Rack scale . 9 1.4.4 Storage/management switches. 10 1.4.5 Data switches . 10 1.4.6 Network nodes . -
Is 'Distributed' Worth It? Benchmarking Apache Spark with Mesos
Is `Distributed' worth it? Benchmarking Apache Spark with Mesos N. Satra (ns532) January 13, 2015 Abstract A lot of research focus lately has been on building bigger dis- tributed systems to handle `Big Data' problems. This paper exam- ines whether typical problems for web-scale companies really benefits from the parallelism offered by these systems, or can be handled by a single machine without synchronisation overheads. Repeated runs of a movie review sentiment analysis task in Apache Spark were carried out using Apache Mesos and Mesosphere, on Google Compute Engine clusters of various sizes. Only a marginal improvement in run-time was observed on a distributed system as opposed to a single node. 1 Introduction Research in high performance computing and `big data' has recently focussed on distributed computing. The reason is three-pronged: • The rapidly decreasing cost of commodity machines, compared to the slower decline in prices of high performance or supercomputers. • The availability of cloud environments like Amazon EC2 or Google Compute Engine that let users access large numbers of computers on- demand, without upfront investment. • The development of frameworks and tools that provide easy-to-use id- ioms for distributed computing and managing such large clusters of machines. Large corporations like Google or Yahoo are working on petabyte-scale problems which simply can't be handled by single computers [9]. However, 1 smaller companies and research teams with much more manageable sizes of data have jumped on the bandwagon, using the tools built by the larger com- panies, without always analysing the performance tradeoffs. It has reached the stage where researchers are suggesting using the same MapReduce idiom hammer on all problems, whether they are nails or not [7]. -
The Cloud Is Here: Embrace the Transition
Hybrid cloud SaaS IaaS Private cloud Public cloud PaaS The cloud is here: embrace the transition How organizations can stop worrying and learn to “think cloud” Clouds in the forecast 3 The cloud’s sunny side 4 Avoid common pitfalls 6 Doing cloud “right” 12 Follow the steps 16 Conclusion 21 Contacts 22 Endnotes 23 The cloud is here: embrace the transition | Introduction Clouds in the forecast The emergence of cloud computing has business and information technology (IT) leaders asking fundamental questions: How can we better understand the risks and opportunities that cloud computing presents? How do we take advantage of these opportunities, not fall behind, and not make costly mistakes? How do we survive and thrive in the cloud? The growth and maturation of the cloud Some notable examples of leading marketplace has not only proven the software deployed via cloud services viability of commercial cloud offerings include business productivity suites (such but also represents a fundamental shift as Google G Suite, Microsoft Office 365), in technology management philosophy. customer relationship management With traditional practices garnering limited (e.g., Salesforce.com), enterprise service return on investment and providing limited management (e.g., ServiceNow), talent business agility, organizations are moving management (e.g., SuccessFactors, Taleo), away from the do‑it‑yourself mentality and and even full suites of Enterprise Resource toward using technology services from Planning (ERP) solutions. This list is rapidly cloud services providers (CSPs), who can do expanding as software companies begin things better, faster, and often cheaper. the transition from delivering their software as on‑premise installations to software While many organizations are apprehensive as a service (SaaS) delivery models. -
Integrated High Definition LED Television User's Guide
Integrated High Definition LED Television User’s Guide: 32L4300U / 39L4300U / 50L4300U / 58L4300U 50L7300U / 58L7300U / 65L7300U If you need assistance: Toshiba's Support Web site support.toshiba.com For more information, see “Troubleshooting” on page 160 in this guide. Owner's Record The model number and serial number are on the back and side of your television. Record these numbers, whenever you communicate with your Toshiba dealer about this Television. Model name: Serial number: Register your Toshiba Television at register.toshiba.com Note: To display a High Definition picture, the TV must be receiving a High Definition signal (such as an over- the-air High Definition TV broadcast, a High Definition digital cable program, or a High Definition digital satellite program). For details, contact your TV antenna installer, cable provider, or GMA300019013 satellite provider 6/13 2 CHILD SAFETY: PROPER TELEVISION PLACEMENT MATTERS TOSHIBA CARES • Manufacturers, retailers and the rest of the consumer electronics industry are committed to making home entertainment safe and enjoyable. • As you enjoy your television, please note that all televisions – new and old- must be supported on proper stands or installed according to the manufacturer’s recommendations. Televisions that are inappropriately situated on dressers, bookcases, shelves, desks, speakers, chests, carts, etc., may fall over, resulting in injury. TUNE IN TO SAFETY • ALWAYS follow the manufacturer’s recommendations for the safe installation of your television. • ALWAYS read and follow all instructions for proper use of your television. • NEVER allow children to climb on or play on the television or the furniture on which the television is placed. • NEVER place the television on furniture that can easily be used as steps, such as a chest of drawers. -
Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks
Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation Haoyuan Li, Ali Ghodsi, Matei Zaharia, Scott Shenker, and Ion Stoica. 2014. Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks. In Proceedings of the ACM Symposium on Cloud Computing (SOCC '14). ACM, New York, NY, USA, Article 6 , 15 pages. As Published http://dx.doi.org/10.1145/2670979.2670985 Publisher Association for Computing Machinery (ACM) Version Author's final manuscript Citable link http://hdl.handle.net/1721.1/101090 Terms of Use Creative Commons Attribution-Noncommercial-Share Alike Detailed Terms http://creativecommons.org/licenses/by-nc-sa/4.0/ Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks Haoyuan Li Ali Ghodsi Matei Zaharia Scott Shenker Ion Stoica University of California, Berkeley MIT, Databricks University of California, Berkeley fhaoyuan,[email protected] [email protected] fshenker,[email protected] Abstract Even replicating the data in memory can lead to a signifi- cant drop in the write performance, as both the latency and Tachyon is a distributed file system enabling reliable data throughput of the network are typically much worse than sharing at memory speed across cluster computing frame- that of local memory. works. While caching today improves read workloads, Slow writes can significantly hurt the performance of job writes are either network or disk bound, as replication is pipelines, where one job consumes the output of another. used for fault-tolerance. Tachyon eliminates this bottleneck These pipelines are regularly produced by workflow man- by pushing lineage, a well-known technique, into the storage agers such as Oozie [4] and Luigi [7], e.g., to perform data layer. -
The Eddie Awards Issue
THE MAGAZINE FOR FILM & TELEVISION EDITORS, ASSISTANTS & POST- PRODUCTION PROFESSIONALS THE EDDIE AWARDS ISSUE IN THIS ISSUE Golden Eddie Honoree GUILLERMO DEL TORO Career Achievement Honorees JERROLD L. LUDWIG, ACE and CRAIG MCKAY, ACE PLUS ALL THE WINNERS... FEATURING DUMBO HOW TO TRAIN YOUR DRAGON: THE HIDDEN WORLD AND MUCH MORE! US $8.95 / Canada $8.95 QTR 1 / 2019 / VOL 69 Veteran editor Lisa Zeno Churgin switched to Adobe Premiere Pro CC to cut Why this pro chose to switch e Old Man & the Gun. See how Adobe tools were crucial to her work ow and to Premiere Pro. how integration with other Adobe apps like A er E ects CC helped post-production go o without a hitch. adobe.com/go/stories © 2019 Adobe. All rights reserved. Adobe, the Adobe logo, Adobe Premiere, and A er E ects are either registered trademarks or trademarks of Adobe in the United States and/or other countries. All other trademarks are the property of their respective owners. Veteran editor Lisa Zeno Churgin switched to Adobe Premiere Pro CC to cut Why this pro chose to switch e Old Man & the Gun. See how Adobe tools were crucial to her work ow and to Premiere Pro. how integration with other Adobe apps like A er E ects CC helped post-production go o without a hitch. adobe.com/go/stories © 2019 Adobe. All rights reserved. Adobe, the Adobe logo, Adobe Premiere, and A er E ects are either registered trademarks or trademarks of Adobe in the United States and/or other countries. -
Discretized Streams: Fault-Tolerant Streaming Computation at Scale
Discretized Streams: Fault-Tolerant Streaming Computation at Scale" Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, Ion Stoica University of California, Berkeley Published in SOSP ‘13 Slide 1/30 Motivation" • Faults and stragglers inevitable in large clusters running “big data” applications. • Streaming applications must recover from these quickly. • Current distributed streaming systems, including Storm, TimeStream, MapReduce Online provide fault recovery in an expensive manner. – Involves hot replication which requires 2x hardware or upstream backup which has long recovery time. Slide 2/30 Previous Methods" • Hot replication – two copies of each node, 2x hardware. – straggler will slow down both replicas. • Upstream backup – nodes buffer sent messages and replay them to new node. – stragglers are treated as failures resulting in long recovery step. • Conclusion : need for a system which overcomes these challenges Slide 3/30 • Voila ! D-Streams Slide 4/30 Computation Model" • Streaming computations treated as a series of deterministic batch computations on small time intervals. • Data received in each interval is stored reliably across the cluster to form input datatsets • At the end of each interval dataset is subjected to deterministic parallel operations and two things can happen – new dataset representing program output which is pushed out to stable storage – intermediate state stored as resilient distributed datasets (RDDs) Slide 5/30 D-Stream processing model" Slide 6/30 What are D-Streams ?" • sequence of immutable, partitioned datasets (RDDs) that can be acted on by deterministic transformations • transformations yield new D-Streams, and may create intermediate state in the form of RDDs • Example :- – pageViews = readStream("http://...", "1s") – ones = pageViews.map(event => (event.url, 1)) – counts = ones.runningReduce((a, b) => a + b) Slide 7/30 High-level overview of Spark Streaming system" Slide 8/30 Recovery" • D-Streams & RDDs track their lineage, that is, the graph of deterministic operations used to build them. -
The Design and Implementation of Declarative Networks
The Design and Implementation of Declarative Networks by Boon Thau Loo B.S. (University of California, Berkeley) 1999 M.S. (Stanford University) 2000 A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the GRADUATE DIVISION of the UNIVERSITY OF CALIFORNIA, BERKELEY Committee in charge: Professor Joseph M. Hellerstein, Chair Professor Ion Stoica Professor John Chuang Fall 2006 The dissertation of Boon Thau Loo is approved: Professor Joseph M. Hellerstein, Chair Date Professor Ion Stoica Date Professor John Chuang Date University of California, Berkeley Fall 2006 The Design and Implementation of Declarative Networks Copyright c 2006 by Boon Thau Loo Abstract The Design and Implementation of Declarative Networks by Boon Thau Loo Doctor of Philosophy in Computer Science University of California, Berkeley Professor Joseph M. Hellerstein, Chair In this dissertation, we present the design and implementation of declarative networks. Declarative networking proposes the use of a declarative query language for specifying and implementing network protocols, and employs a dataflow framework at runtime for com- munication and maintenance of network state. The primary goal of declarative networking is to greatly simplify the process of specifying, implementing, deploying and evolving a network design. In addition, declarative networking serves as an important step towards an extensible, evolvable network architecture that can support flexible, secure and efficient deployment of new network protocols. Our main contributions are as follows. First, we formally define the Network Data- log (NDlog) language based on extensions to the Datalog recursive query language, and propose NDlog as a Domain Specific Language for programming network protocols. -
IBM Worklight V5.0.6 Technology Overview
IBM Software Technical White Paper WebSphere IBM Worklight V5.0.6 Technology overview IBM Worklight—Overview Contents IBM® Worklight® software helps enable organizational leaders to extend their business to mobile devices. This software provides an open, 1 IBM Worklight—Overview comprehensive and advanced mobile application platform for smart- phones and tablets, helping organizations of all sizes to efficiently 2 IBM Worklight—Components develop, connect, run and manage mobile and omni-channel applications. 3 Development tools Leveraging standards-based technologies and tools, the IBM team has created Worklight software that provides a single integrated platform. 8 Runtime server environment This platform includes a comprehensive development environment, 9 The IBM Worklight Console mobile-optimized runtime middleware, a private enterprise application store and an integrated management and analytics console—all supported 9 IBM Worklight Device by a variety of security mechanisms. Runtime components 10 Security and authentication Develop. The IBM Worklight Studio and the IBM Worklight software mechanisms development kit (SDK) simplify the development of mobile and omni- channel applications (apps) throughout multiple mobile platforms, includ- ing iOS, Android, BlackBerry, Windows 8, Windows Phone and Java ME. The IBM Worklight optimization framework fosters code reuse while delivering rich user experiences that match the styling requirements of each target environment. With such code reuse, IBM Worklight reduces costs of development, reduces time-to-market and provides strong support for your ongoing management efforts. Connect. The IBM Worklight Server architecture and adapter technol- ogy simplifies the integration of mobile apps with back-end enterprise systems and cloud-based services. The IBM Worklight Server is designed IBM Software Technical White Paper WebSphere to fit quickly into your organization’s IT infrastructure and is Console. -
Apache Spark: a Unified Engine for Big Data Processing
contributed articles DOI:10.1145/2934664 for interactive SQL queries and Pregel11 This open source computing framework for iterative graph algorithms. In the open source Apache Hadoop stack, unifies streaming, batch, and interactive big systems like Storm1 and Impala9 are data workloads to unlock new applications. also specialized. Even in the relational database world, the trend has been to BY MATEI ZAHARIA, REYNOLD S. XIN, PATRICK WENDELL, move away from “one-size-fits-all” sys- TATHAGATA DAS, MICHAEL ARMBRUST, ANKUR DAVE, tems.18 Unfortunately, most big data XIANGRUI MENG, JOSH ROSEN, SHIVARAM VENKATARAMAN, applications need to combine many MICHAEL J. FRANKLIN, ALI GHODSI, JOSEPH GONZALEZ, different processing types. The very SCOTT SHENKER, AND ION STOICA nature of “big data” is that it is diverse and messy; a typical pipeline will need MapReduce-like code for data load- ing, SQL-like queries, and iterative machine learning. Specialized engines Apache Spark: can thus create both complexity and inefficiency; users must stitch together disparate systems, and some applica- tions simply cannot be expressed effi- ciently in any engine. A Unified In 2009, our group at the Univer- sity of California, Berkeley, started the Apache Spark project to design a unified engine for distributed data Engine for processing. Spark has a programming model similar to MapReduce but ex- tends it with a data-sharing abstrac- tion called “Resilient Distributed Da- Big Data tasets,” or RDDs.25 Using this simple extension, Spark can capture a wide range of processing workloads that previously needed separate engines, Processing including SQL, streaming, machine learning, and graph processing2,26,6 (see Figure 1). -
AIX Dec 2004.P65
110 December 2004 In this issue 3 Finding the biggest file in directories on the same filesystem 4 Creating a physical partition map of an hdisk by logical volume 9 AIX console mirroring 11 Using ClamAV 14 Happy 18th birthday AIX 16 Deploying multiple compiler versions on AIX 27 Mirroring tips and techniques in AIX 36 Generating a logo 41 Default user password settings in AIX 47 AIX news © Xephon Inc 2004 AIX Update Published by Disclaimer Xephon Inc Readers are cautioned that, although the PO Box 550547 information in this journal is presented in good Dallas, Texas 75355 faith, neither Xephon nor the organizations or USA individuals that supplied information in this Phone: 214-340-5690 journal give any warranty or make any Fax: 214-341-7081 representations as to the accuracy of the material it contains. Neither Xephon nor the contributing Editor organizations or individuals accept any liability of Trevor Eddolls any kind howsoever arising out of the use of such E-mail: [email protected] material. Readers should satisfy themselves as to the correctness and relevance to their Publisher circumstances of all advice, information, code, Bob Thomas JCL, scripts, and other contents of this journal E-mail: [email protected] before making any use of it. Subscriptions and back-issues Contributions A year’s subscription to AIX Update, When Xephon is given copyright, articles comprising twelve monthly issues, costs published in AIX Update are paid for at the rate $275.00 in the USA and Canada; £180.00 in of $160 (£100 outside North America) per the UK; £186.00 in Europe; £192.00 in 1000 words and $80 (£50) per 100 lines of code Australasia and Japan; and £190.50 elsewhere. -
Comparison of Spark Resource Managers and Distributed File Systems
2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom) Comparison of Spark Resource Managers and Distributed File Systems Solmaz Salehian Yonghong Yan Department of Computer Science & Engineering, Department of Computer Science & Engineering, Oakland University, Oakland University, Rochester, MI 48309-4401, United States Rochester, MI 48309-4401, United States [email protected] [email protected] Abstract— The rapid growth in volume, velocity, and variety Section 4 provides information of different resource of data produced by applications of scientific computing, management subsystems of Spark. In Section 5, Four commercial workloads and cloud has led to Big Data. Traditional Distributed File Systems (DFSs) are reviewed. Then, solutions of data storage, management and processing cannot conclusion is included in Section 6. meet demands of this distributed data, so new execution models, data models and software systems have been developed to address II. CHALLENGES IN BIG DATA the challenges of storing data in heterogeneous form, e.g. HDFS, NoSQL database, and for processing data in parallel and Although Big Data provides users with lot of opportunities, distributed fashion, e.g. MapReduce, Hadoop and Spark creating an efficient software framework for Big Data has frameworks. This work comparatively studies Apache Spark many challenges in terms of networking, storage, management, distributed data processing framework. Our study first discusses analytics, and ethics [5, 6]. the resource management subsystems of Spark, and then reviews Previous studies [6-11] have been carried out to review open several of the distributed data storage options available to Spark. Keywords—Big Data, Distributed File System, Resource challenges in Big Data.