2017 PDL Fall Update

NEWSLETTER ON PDL ACTIVITIES AND EVENTS • FALL 2 0 1 7 http://www.pdl.cmu.edu/ PDL CONSORTIUM The PDL is 25! MEMBERS by Joan Digney, Greg Ganger & Bill Courtright Broadcom, Ltd. Citadel After a successful formative workshop in late 1992, Dr. Garth Gibson officially Dell EMC launched the PDL in 1993 with 7 students from CMU’s CS and ECE Departments. Google Having recently finished his Ph.D. research, which defined the industry standard Hewlett-Packard Labs RAID terminology for redundant disk arrays, Gibson guided PDL researchers Hitachi, Ltd. in advanced disk array research. The name “Parallel Data Lab” came from this Intel Corporation initial focus on parallelism in storage systems. In the PDL’s formative years, its Microsoft Research researchers developed technologies for improving failure recovery performance MongoDB (parity declustering) and maximizing performance in small-write intensive work- NetApp, Inc. loads (parity logging). They also developed an aggressive prefetching technology Oracle Corporation (transparent informed prefetching, or TIP) for converting serial access patterns Salesforce into highly parallel work- Samsung Information Systems America loads capable of exploit- Seagate Technology ing large disk arrays. Two Sigma The PDL first received Toshiba actual lab space at CMU Veritas (Wean Hall 3607) to go Western Digital with its name in January of 1994. As it grew, PDL CONTENTS became more spread out, with people in various ar- The PDL is 25! .............................. 1 eas of Wean Hall and the PDL News & Awards........................3 D-Level of Hamerschlag Hall. Today, PDL people Defenses & Proposals .......................4 are primarily located in Recent Publications ........................6 the Robert Mehrabian Garth and the Scotch Parallel File System Collaborative Innovation THE PDL PACKET Center (RMCIC) and the Gates Center for Computer Science.The equipment populating PDL lab spaces is often state of the art, a rare case for academic re- EDITOR search, and is upgraded frequently as technology advances. In large part, we have Joan Digney our sponsor companies to thank for this. CONTACTS Greg Ganger PDL’s initial seed funding came from CMU’s Data Storage Systems Center (DSSC), PDL Director then directed by Mark Kryder, and from DARPA (from which much PDL fund- Bill Courtright ing has come over the years). Additional original funding came from the member PDL Executive Director companies of the PDL Consortium, whose initial members were AT&T Global Karen Lindenfelser Information Systems, Data General, IBM, Hewlett-Packard, Seagate and Storage PDL Administrative Manager Technology. Today, PDL funding comes from DoD, DoE, NSF, and the current The Parallel Data Laboratory PDL Consortium members. The list of consortium members has become long Carnegie Mellon University over the years, as companies join, merge, and change research focus. It is currently 5000 Forbes Avenue Pittsburgh, PA 15213-3891 comprised of the following: Broadcom, Ltd., Citadel, Dell EMC, Google, Hewlett- tel 412-268-6716 Packard Labs, Hitachi, Ltd., Intel Corporation, Microsoft Research, MongoDB, fax 412-268-3010 continued on page 2 THE PDL IS 25! continued from page 1 NetApp, Inc., Oracle Corporation, Salesforce, Samsung Information Systems America, Seagate Technol- ogy, Two Sigma, Toshiba, Veritas, and Western Digital. VMware has also been generous with their financial support. In 1995, Gibson and Dr. David Nagle launched a new PDL project called Network-Attached Secure Disks (NASD), a network-attached storage architecture for achieving cost-effec- tive scalable bandwidth. In addition to their research advances, Gibson founded and chaired an industry An early PDL group photo, ca. ~1995. working group within the Information power distributors and 22 remote con- magnitude improvements in energy Storage Industry Consortium (INSIC) sole servers with a total of 3,683 cables. efficiency. to transfer the new technology and The DCO provides a computation move towards standardization of the » We continue to research the technol- and storage utility to resource-hungry NASD architecture. ogy advances needed for cloud com- research activities such as data mining, puting. Our OpenCloud and Open- In 1999, Nagle took over as PDL Di- design simulation, network intrusion Cirrus clouds provide resources for rector when Gibson went on leave to detection, and visualization. The DCO real users, as well as provide us with co-found Panasas. In 2000, Dr. Greg houses several large research clusters in- invaluable Hadoop logs, instrumen- Ganger, who joined the ECE faculty cluding Susitna, Marmot and Narwhal. tation data, and case studies. and the PDL in 1997, jointly directed the PDL with Nagle, then became Since then, each year has seen the de- » We have explored exciting new stor- Director in 2001 when Nagle went on velopment of many exiting ideas in the age technologies, such as NVM and leave. Early research initiatives under PDL. Here are a few examples: Flash SSDs. Even the disk drive is Ganger’s leadership included Self- » DISC (data-intensive supercomput- changing, with technologies like Securing Devices, PASIS (Survivable ing) allowed applications to extract shingled magnetic recording creating Storage), and Self-* (tuning, manag- deep insight from huge and dynam- a need to reconsider usage patterns ing, ...) Storage. ically-changing datasets. and interfaces. In 2006, after several years of prepara- » We explored the use of virtual ma- » Focus has reemerged on database sys- tion, the PDL opened the Data Center chines using FSVAs (file system tems, including work on automated Observatory (DCO), a machine room virtual appliances) to address the database tuning, deduplication in with over 1,875 square feet of space. As porting problems associated with databases, incremental computation, of May 2017, it is populated with 1077 the client-side component of most and more exploitation of NVM in data- computers, connected to 76 switches, 111 cluster-based designs (including bases. In particular, Andy Pavlo’s Pelo- ours). ton DBMS project combines several of » The Perspective these activities into what he refers to home storage system was as a “self-driving” database, seeking designed to simplify data to achieve autonomous adaptation to management and sharing workload and resource conditions. among the many storage- » The breadth of analytics frameworks enhanced devices (e.g., and other cloud computing activi- DVRs, iPODs, laptops). ties continues to grow and has led to resource scheduling challenges. Our » Led by Prof. David TetriSched project developed new Andersen, the FAWN (fast ways of allowing users to express their array of wimpy nodes) per-job resource type preferences project explored new (e.g., machine locality or hardware cluster architectures able accelerators) and explored the trade- Garth and students; from L to R, Dan Stodolsky, Hugo Patterson, to provide data-intensive Garth, and Bill Courtright. computing with order of continued on page 16 2 THE PDL PACKET AWARDS & OTHER PDL NEWS October 2017 chine and deep learning — an area of of providing a Lorrie Cranor Awarded FORE scientific, academic, and commercial common shared Systems Chair of Computer endeavour that will shape our world file system used Science over the next generation.” by large net- Frank Pfenning, Head of the Depart- works of peo- We are very ment of Computer Science, notes that ple, AFS in- pleased to an- “this is a tremendous opportunity for troduced novel nounce that, Garth, but we will sorely miss him in approaches to in addition to the multiple roles he plays in the de- caching, secu- a long list of partment and school: Professor (and rity, manage- accomplish- all that this entails), Co-Director of ment and administration. ments, which the MCDS program, and Associate The award recipients, including has included Dean for Masters Programs in SCS.” Computer Science Professor Mahadev a term as the Satyanarayanan, built the Andrew File Chief Tech- We are sad to see him go and will miss nologist of the Federal Trade Com- him greatly, but the opportunities System in the 1980s while working as mission, Lorrie Cranor has been presented here for world level innova- a team at the Information Technol- made the FORE Systems Professor of tion are tremendous and we wish him ogy Center (ITC) — a partnership Computer Science and Engineering & all the best. between Carnegie Mellon and IBM. Public Policy at CMU. The ACM Software System Award is June 2017 Lorrie provided information that presented to an institution or indi- “the founders of FORE Systems, Inc. Dana Van Aken’s SIGMOD viduals recognized for developing a established the FORE Systems Profes- Paper Featured in Amazon AI software system that has had a lasting sorship in 1995 to support a faculty Blog influence, reflected in contributions member in the School of Computer to concepts, in commercial accep- Please see Amazon’s AI blog at https:// Science. The company’s name is an tance, or both. aws.amazon.com/blogs/ai/tuning- acronym formed by the initials of the your-dbms-automatically-with-ma- AFS is still in use today as both an founders’ first names. Before it was chine-learning/ to read about Tun- open-source system and as a file sys- acquired by Great Britain’s Marconi in ing Your DBMS Automatically with tem in commercial applications. It 1998, FORE created technology that Machine Learning, an article by Dana has also inspired several cloud-based allows computer networks to link and Van Aken, based on the SIGMOD ‘17 storage applications. Many universi- transfer information at a rapid speed. paper “Automatic Database Manage- ties integrated AFS before it was in- Ericsson purchased much of Marconi troduced as a commercial application. in 2006.” The chair was previously ment System Tuning Through Large- scale Machine Learning,” which she In addition to Satya, the recipients held by CMU University Professor of the award include former faculty Emeritus, Edmund M. Clarke. co-authored with Andy Pavlo and Geoff Gordon. member Alfred Z. Spector, alumnus Michael L. Kazar, Robert N. Sidebo- September 2017 June 2017 tham, David A.

2017 PDL Fall Update

Microsoft and Cray to Unveil $25,000 Windows-Based Supercomputer

0321741331.Pdf

The Fourth Paradigm

The Spirit of Innovation Why Innovation and Value in Japan? Why Now?

AI Chips: What They Are and Why They Matter

2017 CFHT Annual Report

Managing Tail Latency in Large Scale Information Retrieval Systems

The Computer Games Journal Ltd Registered Company No

Methods for Answer Extraction in Textual Question Answering

Microsoft Corp. BUY Company Update : Enterprise Software

Research Statement Deepak Narayanan

What Is Kodu?