COMMUNICATIONS CACM.ACM.ORG OF THEACM 11/2016 VOL.59 NO.11

SEXAS AN ALGORITHM THE THEORY OF EVOLUTION UNDER THE LENS OF COMPUTATION

Association for Computing Machinery 31st IEEE • 2017• INTERNATIONAL May 29-June 2, 2017 Paral lel and Buena Vista Palace Hotel Distributed Orlando, Florida USA Processing SYMPOSIUM

www.ipdps.org

Orlando is home to a rich offering of indoor and outdoor attractions. Located a mile from Walt Disney World® and 4 miles from Epcot, the Buena Vista Palace Hotel is a 5-minute walk from Downtown Disney with a complimentary shuttle to all Disney Theme Parks and Water Parks. The sprawling Lake Buena Vista resort offers a full menu of amenities and family friendly activities as well as ideal meeting space for IPDPS 2017.

ANNOUNCING 24 WORKSHOPS PLANNED FOR IPDPS 2017 IN ORLANDO GENERAL CHAIR IPDPS Workshops are the “bookends” to the three-day technical program of contributed papers, Michela Taufer (University of Delaware, USA) keynote speakers, roundtable workshops, a PhD student forum, and industry participation. They provide the IPDPS community an opportunity to explore special topics and present work that is PROGRAM CHAIR more preliminary or cutting-edge than the more mature research presented in the main symposium. Marc Snir (University of Illinois at Urbana Champaign, USA) Each workshop has its own website and submission requirements, and the submission deadline for most workshops is after the main conference author notification date. See the IPDPS Workshops WORKSHOPS CHAIR page for links to Call for Papers for each workshop and due dates. Bora Uçar (CNRS and ENS Lyon, France)

IPDPS WORKSHOPS MONDAY 29 MAY 2017 (Check final schedule) WORKSHOPS VICE-CHAIR Erik Saule (University of North Carolina Charlotte, USA) HCW Heterogeneity in Computing Workshop RAW Reconfigurable Architectures Workshop STUDENT PARTICIPATION CHAIR HiComb High Performance Computational Biology Trilce Estrada (University of New Mexico, USA) EduPar NSF/TCPP W. on Parallel and Distributed Computing Education ParLearning Parallel and Distributed Computing for Machine Learning and Big Data Analytics ROUNDTABLE WORKSHOPS PDCO Parallel / Distributed Computing and Optimization These condensed workshops, organized and animated by a GABB Graph Algorithms Building Blocks few people, will be held on Tuesday and Thursday in a AsHES Accelerators and Hybrid Exascale Systems “roundtable” setting designed to promote one-on-one interaction. They will focus on an emerging area of interest to HIPS High Level Programming Models and Supporting Environments IPDPS attendees, especially topics that complement and APDCM Advances in Parallel and Distributed Computational Models “round out” the areas covered by the regular workshops. HPPAC High-Performance, Power-Aware Computing HPBDC High-Performance Big Data Computing PhD FORUM & STUDENT MENTORING This event will include traditional poster presentations by PhD IPDPS WORKSHOPS FRIDAY 2 JUNE 2017 (Check final schedule) students enhanced by a program of mentoring and coaching in scientific writing and presentation skills and a special CHIUW Chapel Implementers and Users Workshop opportunity for students to hear from and interact with senior LSPP Large-Scale Parallel Processing: Practices and Experiences researchers attending the conference. PDSEC Parallel and Distributed Scientific and Engineering Computing JSSPP Job Scheduling Strategies for Parallel Processors INDUSTRY PARTICIPATION DPDNS Dependable Parallel, Distributed and Network-centric Systems IPDPS extends a special invitation for companies to become an IPDPS 2017 Industry Partner and join us in Orlando to IPDRM Emerging Parallel and Distributed Runtime Systems and Middleware share in the benefits of associating with an international iWAPT International Workshop on Automatic Performance Tunings community of top researchers and practitioners in fields ParSocial Parallel and Distributed Processing for Computational Social System related to parallel processing and distributed computing. Visit BigDataEco Big Data Regional Innovation Hubs and Spokes: the IPDPS website to see ways to participate. Accelerating the Big Data Innovation Ecosystem GraML Graph Algorithms and Machine Learning EMBRACE Evolvable Methods for Benchmarking Realism IMPORTANT DATES and Community Engagement Conference Author Notification January 8, 2017 REPPAR Reproducibility in Parallel Computing Workshop Call for Papers Deadlines Most Fall After January 8, 2017

SPONSORED BY: IN COOPERATION WITH: ACM SIGARCH & SIGHPC IEEE Computer Society Technical Committee on Computer Architecture Technical Committee on Parallel Processing IEEE Computer Society Technical Committee on Distributed Processing Previous A.M. Recipients

1966 A.J. Perlis 1967 Maurice Wilkes 1968 R.W. Hamming 1969 Marvin Minsky 1970 J.H. Wilkinson 1971 John McCarthy 1972 E.W. Dijkstra 1973 Charles Bachman 1974 Donald Knuth 1975 Allen Newell 1975 Herbert Simon 1976 Michael Rabin 1976 Dana Scott 1977 John Backus 1978 Robert Floyd 1979 Kenneth Iverson ACM A.M. TURING AWARD 1980 C.A.R Hoare 1981 Edgar Codd 1982 Stephen Cook NOMINATIONS SOLICITED 1983 Ken Thompson 1983 Dennis Ritchie Nominations are invited for the 2016 ACM A.M. Turing Award. 1984 Niklaus Wirth 1985 Richard Karp This is ACM’s oldest and most prestigious award and is given 1986 John Hopcroft to recognize contributions of a technical nature which are of 1986 Robert Tarjan 1987 John Cocke lasting and major technical importance to the computing field. 1988 Ivan Sutherland The award is accompanied by a prize of $1,000,000. 1989 William Kahan 1990 Fernando Corbató Financial support for the award is provided by Google Inc. 1991 Robin Milner 1992 Butler Lampson Nomination information and the online submission form 1993 Juris Hartmanis 1993 Richard Stearns are available on: 1994 Edward Feigenbaum http://amturing.acm.org/call_for_nominations.cfm 1994 Raj Reddy 1995 Manuel Blum 1996 Amir Pnueli Additional information on the Turing Laureates 1997 Douglas Engelbart is available on: 1998 James Gray http://amturing.acm.org/byyear.cfm 1999 Frederick Brooks 2000 Andrew Yao 2001 Ole-Johan Dahl The deadline for nominations/endorsements is 2001 Kristen Nygaard November 30, 2016. 2002 Leonard Adleman 2002 Ronald Rivest 2002 Adi Shamir For additional information on ACM’s award program 2003 Alan Kay 2004 Vinton Cerf please visit: www.acm.org/awards/ 2004 Robert Kahn 2005 Peter Naur 2006 Frances E. Allen 2007 Edmund Clarke 2007 E. Allen Emerson 2007 Joseph Sifakis 2008 Barbara Liskov 2009 Charles P. Thacker 2010 Leslie G. Valiant 2011 Judea Pearl 2012 Shafi Goldwasser 2012 Silvio Micali 2013 Leslie Lamport 2014 Michael Stonebraker 2015 Whitfield Diffie 2015 Martin Hellman COMMUNICATIONS OF THE ACM

Departments News Viewpoints

5 Editor’s Letter 20 Privacy and Security Globalization, Computing, Cyber Defense Triad for and Their Political Impact Where Security Matters By Moshe Y. Vardi Dramatically more trustworthy cyber security is a choice. 7 Cerf’s Up By Roger R. Schell Heidelberg Anew By Vinton G. Cerf 24 Legally Speaking Fair Use Prevails in Oracle v. Google 8 Letters to the Editor Two software giants continue Learn to Live with Academic Rankings with legal sparring after an initial judicial decision. 10 BLOG@CACM By Pamela Samuelson Introducing CS to Newcomers, and JES As a Teaching Tool 18 27 Economic and Business Dimensions Valerie Barr gets high schoolers Visualization to thinking about CS, while 12 Learning Securely Understand Ecosystems Mark Guzdial mulls the benefits of Because it is easy to fool, Mapping relationships between Jython Environment for Students. machine learning must be taught stakeholders in an ecosystem how to handle adversarial inputs. to increase understanding and make 23 Calendar By Erica Klarreich better-informed strategic decisions. By Bala R. Iyer and Rahul C. Basole 123 Careers 15 Blockchain Beyond Bitcoin Blockchain technology has 31 Education the potential to revolutionize Growing Computer Science Last Byte applications and redefine Education Into a the digital economy. STEM Education Discipline 136 Future Tense By Sarah Underwood Seeking to make computing The Candidate education as available as Seeking the programmer vote, 18 Farm Automation Gets Smarter mathematics or science education. an AI delivering a slogan like As fewer people work the land, By Mark Guzdial and Briana Morrison “Make Coding Great Again” robots pick up the slack. could easily be seen as a threat. By Tom Geller 34 Viewpoint By Brian Clegg Time to Reinspect the Foundations? Questioning if computer science is outgrowing its traditional foundations. By Jack Copeland, Eli Dresner, Diane Proudfoot, and Oron Shagrir

37 Viewpoint Technology and Academic Lives Considering the need to create new modes of interaction and approaches to assessment given a rapidly evolving academic realm. By Jonathan Grudin

Association for Computing Machinery Advancing Computing as a Science & Profession IMAGE COURTESY OF BOSCH DEEPFIELD ROBOTICS COURTESY IMAGE

2 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 11/2016 VOL. 59 NO. 11

Practice Contributed Articles Review Articles

56 Apache Spark: A Unified Engine 84 Sex as an Algorithm: The Theory for Big Data Processing of Evolution Under the Lens This open source computing of Computation framework unifies streaming, batch, Looking at the mysteries of evolution and interactive big data workloads to from a computer science point of unlock new applications. view yields some unexpected insights. By Matei Zaharia, Reynold S. Xin, By Adi Livnat and Patrick Wendell, Tathagata Das, Christos Papadimitriou Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen,

Shivaram Venkataraman, Watch the authors discuss Michael J. Franklin, Ali Ghodsi, their work in this exclusive Communications video. Joseph Gonzalez, Scott Shenker, http://cacm.acm.org/videos/ and Ion Stoica sex-as-an-algorithm 40 94 Recommender Systems—

40 The Power of Babble Watch the authors discuss Beyond Matrix Completion Expect to be constantly their work in this exclusive The future success of these Communications video. and pleasantly befuddled. http://cacm.acm.org/videos/ systems depends on more than By Pat Helland spark a Netflix challenge. By Dietmar Jannach, Paul Resnick, 44 Scaling Synchronization 66 Pushing on String: The ‘Don’t Care’ Alexander Tuzhilin, and Markus Zanker in Multicore Programs Region of Password Strength Advanced synchronization Enterprises that impose stringent methods can boost the performance password-composition policies Research Highlights of multicore software. appear to suffer the same fate By Adam Morrison as those that do not. 104 Technical Perspective By Dinei Florêncio, Cormac Herley, If I Could Only Design One Circuit … 52 Research for Practice: Distributed and Paul C. van Oorschot By Kurt Keutzer Consensus and Implications of NVM on Database Management Systems 75 A Theory on Power in Networks 105 DianNao Family: Energy-Efficient Expert-curated guides to the best Actors linked to central others Hardware Accelerators for of CS research for practitioners. in networks are generally central, Machine Learning By Peter Bailis, Camille Fournier, even as actors linked to powerful By Yunji Chen, Tianshi Chen, Zhiwei Xu, Joy Arulraj, and Andrew Pavlo others are powerless. Ninghui Sun, and Olivier Temamv By Enrico Bozzo and Articles’ development led by Massimo Franceschet 113 Technical Perspective queue.acm.org FPGA Compute Acceleration Is First About Energy Efficiency By James C. Hoe

114 A Reconfigurable Fabric for About the Cover: Accelerating Large-Scale When it comes to Datacenter Services unlocking the secrets of evolution, much By A. Putnam, A.M. Caulfield, E.S. Chung, can be gained from D. Chiou, K. Constantinides, J. Demme, exploring these stories from a computer H. Esmaeilzadeh, J. Fowers, G.P. Gopal, science perspective. J. Gray, M. Haselman, S. Hauck, S. Heil, Adi Livnat and Christos Papadimitrou detail how A. Hormati, J.-Y. Kim, S. Lanka, J. Larus, CS can be used to trace the origins and role of E. Peterson, S. Pope, A. Smith, J. Thong,

IMAGE BY ANTON WATMAN ANTON BY IMAGE sexual reproduction. P. Yi Xiao, and D. Burger

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 3 COMMUNICATIONS OF THE ACM Trusted insights for computing’s leading professionals.

Communications of the ACM is the leading monthly print and online magazine for the computing and information technology fields. Communications is recognized as the most trusted and knowledgeable source of industry information for today’s computing professional. Communications brings its readership in-depth coverage of emerging areas of computer science, new trends in information technology, and practical applications. Industry leaders use Communications as a platform to present and debate various technology implications, public policies, engineering challenges, and market trends. The prestige and unmatched reputation that Communications of the ACM enjoys today is built upon a 50-year commitment to high-quality editorial content and a steadfast dedication to advancing the arts, sciences, and applications of information technology.

ACM, the world’s largest educational STAFF EDITORIAL BOARD ACM Copyright Notice and scientific computing society, delivers DIRECTOR OF GROUP PUBLISHING EDITOR-IN-CHIEF Copyright © 2016 by Association for resources that advance computing as a Scott E. Delman Moshe Y. Vardi Computing Machinery, Inc. (ACM). science and profession. ACM provides the [email protected] [email protected] Permission to make digital or hard copies computing field’s premier Digital Library of part or all of this work for personal and serves its members and the computing Executive Editor NEWS or classroom use is granted without profession with leading-edge publications, Diane Crawford Co-Chairs fee provided that copies are not made conferences, and career resources. Managing Editor William Pulleyblank and Marc Snir or distributed for profit or commercial Thomas E. Lambert Board Members advantage and that copies bear this Executive Director and CEO Senior Editor Mei Kobayashi; Michael Mitzenmacher; notice and full citation on the first Bobby Schnabel Andrew Rosenbloom Rajeev Rastogi page. Copyright for components of this Deputy Executive Director and COO work owned by others than ACM must Senior Editor/News VIEWPOINTS Patricia Ryan Larry Fisher be honored. Abstracting with credit is Director, Office of Information Systems Co-Chairs permitted. To copy otherwise, to republish, Web Editor Tim Finin; Susanne E. Hambrusch; Wayne Graves David Roman to post on servers, or to redistribute to Director, Office of Financial Services John Leslie King lists, requires prior specific permission Rights and Permissions Board Members Darren Ramdin Deborah Cotton and/or fee. Request permission to publish Director, Office of SIG Services William Aspray; Stefan Bechtold; from [email protected] or fax Donna Cappo Michael L. Best; Judith Bishop; (212) 869-0481. Art Director Director, Office of Publications Stuart I. Feldman; Peter Freeman; Andrij Borys Bernard Rous Mark Guzdial; Rachelle Hollander; For other copying of articles that carry a Associate Art Director Director, Office of Group Publishing Richard Ladner; Carl Landwehr; code at the bottom of the first or last page Margaret Gray Scott E. Delman Carlos Jose Pereira de Lucena; or screen display, copying is permitted Assistant Art Director Beng Chin Ooi; Loren Terveen; provided that the per-copy fee indicated Mia Angelica Balaquiot ACM COUNCIL Marshall Van Alstyne; Jeannette Wing in the code is paid through the Copyright Designer President Clearance Center; www.copyright.com. Vicki L. Hanson Iwona Usakiewicz Production Manager Vice-President PRACTICE Subscriptions Lynn D’Addesio Cherri M. Pancake Co-Chair An annual subscription cost is included Advertising Sales Secretary/Treasurer Stephen Bourne in ACM member dues of $99 ($40 of Juliet Chance Elizabeth Churchill Board Members which is allocated to a subscription to Past President Eric Allman; Peter Bailis; Terry Coatta; Communications); for students, cost Alexander L. Wolf Columnists Stuart Feldman; Benjamin Fried; is included in $42 dues ($20 of which Chair, SGB Board David Anderson; Phillip G. Armour; Pat Hanrahan; Tom Killalea; Tom Limoncelli; is allocated to a Communications Patrick Madden Michael Cusumano; Peter J. Denning; Kate Matsudaira; Marshall Kirk McKusick; subscription). A nonmember annual Co-Chairs, Publications Board Mark Guzdial; Thomas Haigh; George Neville-Neil; Theo Schlossnagle; subscription is $269. Jack Davidson and Joseph Konstan Leah Hoffmann; Mari Sako; Jim Waldo Pamela Samuelson; Marshall Van Alstyne Members-at-Large The Practice section of the CACM ACM Media Advertising Policy Gabriele Anderst-Kotis; Susan Dumais; Editorial Board also serves as Communications of the ACM and other CONTACT POINTS Elizabeth D. Mynatt; Pamela Samuelson; the Editorial Board of . ACM Media publications accept advertising Eugene H. Spafford Copyright permission in both print and electronic formats. All SGB Council Representatives [email protected] CONTRIBUTED ARTICLES advertising in ACM Media publications is Paul Beame; Jenna Neefe Matthews; Calendar items Co-Chairs at the discretion of ACM and is intended Barbara Boucher Owens [email protected] Andrew Chien and James Larus to provide financial support for the various Change of address Board Members activities and services for ACM members. BOARD CHAIRS [email protected] William Aiello; Robert Austin; Elisa Bertino; Current advertising rates can be found Education Board Letters to the Editor Gilles Brassard; Kim Bruce; Alan Bundy; by visiting http://www.acm-media.org or Mehran Sahami and Jane Chu Prey [email protected] Peter Buneman; Peter Druschel; Carlo Ghezzi; by contacting ACM Media Sales at Practitioners Board WEBSITE Carl Gutwin; Yannis Ioannidis; (212) 626-0686. George Neville-Neil http://cacm.acm.org Gal A. Kaminka; James Larus; Igor Markov; Gail C. Murphy; Bernhard Nebel; Single Copies REGIONAL COUNCIL CHAIRS AUTHOR GUIDELINES Lionel M. Ni; Kenton O’Hara; Sriram Rajamani; Single copies of Communications of the ACM Europe Council http://cacm.acm.org/ Marie-Christine Rousset; Avi Rubin; ACM are available for purchase. Please Dame Professor Wendy Hall Krishan Sabnani; Ron Shamir; Yoav contact [email protected]. ACM India Council ACM ADVERTISING DEPARTMENT Shoham; Larry Snyder; Michael Vitale; Srinivas Padmanabhuni 2 Penn Plaza, Suite 701, New York, NY Wolfgang Wahlster; Hannes Werthner; COMMUNICATIONS OF THE ACM ACM China Council 10121-0701 Reinhard Wilhelm (ISSN 0001-0782) is published monthly Jiaguang Sun T (212) 626-0686 by ACM Media, 2 Penn Plaza, Suite 701, New York, NY 10121-0701. Periodicals PUBLICATIONS BOARD F (212) 869-0481 RESEARCH HIGHLIGHTS Co-Chairs postage paid at New York, NY 10001, Co-Chairs and other mailing offices. Jack Davidson; Joseph Konstan Advertising Sales Azer Bestovros and Gregory Morrisett Board Members Board Members Juliet Chance POSTMASTER Ronald F. Boisvert; Karin K. Breitman; [email protected] Martin Abadi; Amr El Abbadi; Sanjeev Arora; Nina Balcan; Dan Boneh; Andrei Broder; Please send address changes to Terry J. Coatta; Anne Condon; Nikil Dutt; Communications of the ACM Roch Guerrin; Carol Hutchins; For display, corporate/brand advertising: Doug Burger; Stuart K. Card; Jeff Chase; Craig Pitcher Jon Crowcroft; Sandhya Dwaekadas; 2 Penn Plaza, Suite 701 Yannis Ioannidis; Catherine McGeoch; New York, NY 10121-0701 USA M. Tamer Ozsu; Mary Lou Soffa; Alex Wade; [email protected] T (408) 778-0300 Alexei Efros; Alon Halevy; Norm Jouppi; Keith Webster William Sleight Andrew B. Kahng; Sven Koenig; Xavier Leroy; [email protected] T (408) 513-3408 Steve Marschner; Kobbi Nissim; Guy Steele, Printed in the U.S.A. ACM U.S. Public Policy Office Jr.; David Wagner; Margaret H. Wright; Media Kit [email protected] Renee Dopplick, Director Nicholai Zeldovich; Andreas Zeller 1828 L Street, N.W., Suite 800 Washington, DC 20036 USA WEB T (202) 659-9711; F (202) 667-1066 Association for Computing Machinery Chair (ACM) James Landay

E R E C Computer Science Teachers Association 2 Penn Plaza, Suite 701 Board Members S Y A C E L L E Mark R. Nelson, Executive Director New York, NY 10121-0701 USA Marti Hearst; Jason I. Hong; P

T E H T (212) 869-7440; F (212) 869-0481 Jeff Johnson; Wendy E. MacKay N I I S Z M A G A

4 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 editor’s letter

DOI:10.1145/3003732 Moshe Y. Vardi Globalization, Computing, and Their Political Impact

ERCHED AS WE are on the at the broader impact of globalization for inflation, is at the level it was in in crest of the current tech on the economy, we might have reached 1968! Ironically, automation is now and computing-enrollment somewhat less sanguine conclusions. reaching developing-world economies. boom, it is hard to remem- Globalization exerted tremendous com- Manufacturing employment in Chi- ber the dark days of the early petitive pressure on manufacturing in na peaked around 1995, where rising P2000s. The NASDAQ Index peaked on developed countries. It is instructive to wages are driving automation, and a March 10, 2000, declining almost 80% examine the response to this competi- recent report from the International over the next two years. The stock-mar- tive pressure, taking U.S. manufactur- Labor Organization found that more ket crash in the U.S. caused the loss of ing as an example. To survive in the than two-thirds of Southeast Asia’s 9.2 $5 trillion in the market valuations from intensely competitive global economy, million textile and footwear jobs are 2000 to 2002. Computing enrollments U.S. manufacturing had to increase its threatened by automation. in North America went into a steep dive. productivity dramatically, substituting Until very recently, the global pro- At the same time, the Internet enabled technology for labor. U.S. manufac- fessional class, which includes most the globalization of software produc- turing productivity roughly doubled computing professionals, was some- tion, giving rise to the phenomenon of between 1995 and 2015. As a result, what oblivious to the plight of work- offshore outsourcing. There were daily while U.S. manufacturing output to- ing- and middle-class people in de- stories in the media describing major day is essentially at an all-time high, veloped countries. In fact, some have shifts in employment that were occur- employment peaked around 1980, and argued that this class lives in “an eco- ring largely as a result of software off- has been declining precipitously since nomic and cultural bubble.” Political shoring. Combined with the dot-com 1995. Neoclassical economists argue developments of the past few months bust, these reports raised concerns that when technology destroys jobs have made it clear that the issue of about the future of information tech- “people find other jobs, albeit possibly shared prosperity cannot be ignored. nology (IT) as a viable field of study and after a long period of painful adjust- It is now evident that the Brexit vote work in developed countries. ment.” They are definitely right about in the U.K., as well as the temporary In response to these concerns, ACM the painful adjustment! The impact of rise of Bernie Sanders and the rise of Council commissioned a Task Force globalization and automation over the Donald Trump in the U.S., were driven in 2004 to “look at the facts behind past 20 years on working- and middle- to a large extent by economic griev- the rapid globalization of IT and the class Americans has been quite harsh. ances. However the outcome of this migration of jobs resulting from out- This impact has been succinctly cap- month’s U.S. presidential election, sourcing and offshoring.” The Task tured recently by economist Branko Mi- globalization and automation will re- Force, co-chaired by Frank Mayadas lanovic’s “Elephant Curve” (see http:// main policy issues of the utmost priority and myself, with the assistance of Wil- prospect.org/article/worlds-inequality), and will resist simplistic solutions. liam Aspray as Editor, issued its report which shows how people around the Globalization and automation pro- Globalization and Offshoring of Software world, ranked by their income in vide huge benefits to society, but their (http://www.acm.org/globalizationreport) 1998, saw their incomes increase by adverse effects cannot and should not in 2006. The report concluded that 2008. While the incomes of the very be ignored. Technology is not destiny there is no real reason to believe that IT poor was stagnant, rising incomes in and public policy has a key role to play. jobs are migrating away from developed emerging economies lifted hundreds As actors in and beneficiaries of this countries. The passing decade has vin- of millions of people out of poverty. societal transformation, we have, I be- dicated that conclusion. People at the top of the income scale lieve, a social responsibility that goes But while the report conceded that also benefited from globalization and beyond our technical roles. “trade gains may be distributed dif- automation. But the income of work- Follow me on Facebook, Google+, ferentially,” meaning some individu- ing- and middle-class people in the de- and Twitter. als gain and some lose, some localities veloped world stagnated over that pe- gain and some lose; it was focused nar- riod. In the U.S., for example, income Moshe Y. Vardi, EDITOR-IN-CHIEF rowly on the IT industry. Had we looked of production workers today, adjusted Copyright held by author.

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 5 SHAPE THE FUTURE OF COMPUTING. JOIN ACM TODAY.

ACM is the world’s largest computing society, offering benefits and resources that can advance your career and enrich your knowledge. We dare to be the best we can be, believing what we do is a force for good, and in joining together to shape the future of computing. SELECT ONE MEMBERSHIP OPTION ACM PROFESSIONAL MEMBERSHIP: ACM STUDENT MEMBERSHIP:

q Professional Membership: $99 USD q Student Membership: $19 USD q Professional Membership plus q Student Membership plus ACM Digital Library: $42 USD ACM Digital Library: $198 USD ($99 dues + $99 DL) q Student Membership plus Print CACM Magazine: $42 USD q ACM Digital Library: $99 USD q Student Membership with ACM Digital Library plus (must be an ACM member) Print CACM Magazine: $62 USD q Join ACM-W: ACM-W supports, celebrates, and advocates internationally for the full engagement of women in all aspects of the computing field. Available at no additional cost. Priority Code: CAPP Payment Information Payment must accompany application. If paying by check or money order, make payable to ACM, Inc., in U.S. dollars Name or equivalent in foreign currency.

ACM Member # q AMEX q VISA/MasterCard q Check/money order

Mailing Address Total Amount Due

Credit Card # City/State/Province Exp. Date ZIP/Postal Code/Country Signature Email

Return completed application to: Purposes of ACM ACM General Post Office ACM is dedicated to: P.O. Box 30777 1) Advancing the art, science, engineering, and New York, NY 10087-0777 application of information technology Prices include surface delivery charge. Expedited Air 2) Fostering the open interchange of information Service, which is a partial air freight delivery service, is to serve both professionals and the public available outside North America. Contact ACM for more 3) Promoting the highest professional and information. ethics standards Satisfaction Guaranteed!

BE CREATIVE. STAY CONNECTED. KEEP INVENTING.

1-800-342-6626 (US & Canada) Hours: 8:30AM - 4:30PM (US EST) [email protected] 1-212-626-0500 (Global) Fax: 212-944-1318 acm.org/join/CAPP cerf’s up

DOI:10.1145/3005354 Vinton G. Cerf Heidelberg Anew I have just returned from the fourth annual Heidelberg Laureate Forum and I want to emphasize how very important it has been for ACM Turing laureates to participate in the program. Each year 200 math and gramming. The value of abstraction that other ACM awardees might be in- computer science undergraduates par- to aid in reasoning about expected vited to attend the annual event. ticipate in the program, approximately program function resonated very Looking back on the Lindau event 100 each. Speeches by laureates are strongly with me and, I think, with that I attended in late June at which mixed with undergraduate workshops others in attendance. Nobel Prize winners mingle with stu- and plenary open sessions. There is As always, the mathematics and dents, I was struck by the increasingly ample opportunity for interaction computer science students were full of important role of computing in dis- among students and laureates and be- energy, ideas, and eagerness to interact covery science. Simulations of physi- tween students. with each other and with the laureates cal phenomena are revealing new in- This year, Brian Schmidt gave the present. The organizers worked hard sights into the nature of our universe. Lindau lecture (from the annual No- to maximize student opportunities to One of the dramatic examples I have bel Prize winners meeting). Schmidt meet with laureates including a num- seen shows an evolving universe from discovered that the universe is not only ber of workshops where some in-depth the big bang that takes into account expanding, the expansion is accelerat- discussion could be supported. Some dark matter and dark energy and pro- ing. It would be difficult to imagine a of the laureates voiced a strong recom- duces a simulated universe with many more profound discovery. In the very mendation that every effort should be of the large scale structures we actu- long term, it appears the universe will made to allow rich interaction between ally see in the observable universe. We expand to the point that only a certain students of the two disciplines. see huge reticular structures emerging amount of local gravity will hold a gal- Looking at the available laureate at- that are largely the product of masses axy or small group of galaxies together. tendee lists, I can’t help but imagine that of dark matter that organize ordinary The rest will accelerate away and the future Heidelberg events would benefit matter into a lacework of stars and gas. universe will end in a cold whimper. from a cohort of additional younger lau- That these predictions can be tested Fortunately, there was nothing but reates so I look forward to the possibility through observation reinforces the im- stimulating conversation at this year’s portance of computing in our explora- Heidelberg Forum. More than ever tion of the natural world. we are seeing how mathematics and We are reaching We are reaching an exciting period computer science are interacting, es- in scientific discovery in which com- pecially with the arrival of neural net- an exciting period putation is as important as labora- works and quantum computers that in scientific discovery tory experiment and observation. We have capabilities quite different from can invent our own universes and test the conventional von Neumann de- in which computation them for compatibility with the real signs that have dominated computing is as important as one we can measure. Indeed, we may for over seven decades. Fundamental find that our predictions could draw questions about what is computable, laboratory experiment our attention to phenomena we might illuminated by Gödel, are getting at- and observation. never have looked for, were it not for tention in the light of these new com- the revelation of computation. puting engines. Leslie Lamport delivered another Vinton G. Cerf is vice president and Chief Internet Evangelist extraordinary lecture reinforcing the at Google. He served as ACM president from 2012–2014. value of thinking mathematically while considering the process of pro- Copyright held by author.

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 7 letters to the editor

DOI:10.1145/3002205 Learn to Live with Academic Rankings

O ONE LIKES being reduced to diversity of attributes that characterize establishing staff units to provide the a number. For example, there my department, as well as peer depart- diverse data requested by the ranking is much more to my financial ments at other universities. I know how agencies and boosting their communi- picture than my credit score unreasonable it is to reduce it all to a cations and public relations activities. alone. There is even schol- single number. But I also know there There is also evidence that these league Narly work on weaknesses in the system are prospective students, as well as their tables are beginning to (adversely) in- to compute this score. Everyone may parents and others, who find a number fluence resource-allocation and hiring agree the number is far from perfect, useful. I encourage them to consider decisions despite their glaring inad- yet it is used to make decisions that an array of factors when I am trying to equacies and limitations. matter to me, as Moshe Y. Vardi dis- recruit them to choose Michigan. But I I have been asked to serve on the pan- cussed in his Editor’s Letter “Academic cannot reasonably ask them not to look els of two of the ranking systems but have Rankings Considered Harmful!” (Sept. at the number. So it behooves me to do had to abandon my attempts to complete 2016). So I care what my credit score is. what I can to make it as good as it can the questionnaires because I just did not Many of us may even have made finan- be, and to work toward a system that have sufficient information to provide cial decisions taking into account their produces numbers that are as fair as honest responses to the kinds of difficult, potential impact on credit score. they can be. I agree it is not possible to comparative questions about such a large As an academic, I also produce such come anywhere close to perfection, but number of universities. The agencies sel- numbers. I assign grades to my stu- the less bad we can make the numbers, dom report how many “experts” they ac- dents. I strive to have the assigned grade the better off we all will be. tually surveyed or their survey-response accurately reflect a student’s grasp of the H.V. Jagadish, Ann Arbor, MI rates. As regards the relatively “objec- material in my course. But I know this is tive” ARWU ranking, it uses measures imperfect. At best, the grade reflects the like number of alumni and staff winning student’s knowledge today. When a pro- Author Responds: Nobel Prizes and Fields Medals, number spective employer looks at it two years My Editor’s Letter did not question the need of highly cited researchers selected by later, it is possible an A student had for quantitative evaluation of academic Thomson Reuters, number of articles crammed for the exam and has since programs. I presume, however, that published in journals of Nature and Sci- completely forgotten the material, while Dr. Jagadish assigns grades to his students ence, number of articles indexed in Sci- a B student deepened his or her under- rather than merely ranking them. These ence and Social Science Citation Index, standing substantially through a subse- students then graduate with a transcript, and “per capita performance” of a uni- quent internship. The employer must which reports all their grades, rather than versity. It is not at all clear to what extent learn to get past the grade to develop a just their class rank. He argues that we should the six narrowly focused indicators can richer understanding of the student’s learn to live with numbers (I agree) but capture the overall performance of mod- strengths and weaknesses. does not address any of the weaknesses ern universities, which tend to be large, As an academic, I am also a consumer of academic rankings. complex, loosely coupled organizations. of these numbers. Most universities, in- Moshe Y. Vardi, Editor-in-Chief As well, the use of measures like num- cluding mine, look at standardized test ber of highly cited researchers named scores. No one suggests they predict suc- by Thomson Reuters/ISI can exacerbate cess perfectly. But there is at least some More Negative Consequences some of the known citation malprac- correlation—enough that they are used, of Academic Rankings tices (such as excessive self-citations, ci- often as an initial filter. Surely there are I could not agree more with Moshe Y. tation rings, and journal-citation stack- students who could have done very well Vardi’s Editor’s Letter (Sept. 2016). The ing). As Vardi noted, the critical role of if admitted but were not considered ranking systems—whether U.S.-focused commercial entities in the rankings— seriously because they did not make (such as U.S. News and World Report) or notably Times, QS, USNWR, and Thom- the initial cutoff in test scores. A small global (such as Times Higher Education, son Reuters—is also a concern. handful of U.S. colleges and universities World University Reputation Ranking, QS Joseph G. Davis, Sydney, Australia have recently stopped considering stan- University Ranking, and Academic Rank- dardized test scores for undergraduate ing of World Universities, compiled by admission. I admire their courage. Most Shanghai Jiaotong University in Shang- Acknowledge Crowdworkers others have not followed suit because it hai, China)—have all acquired lives of in Crowdwork Research takes a tremendous amount of work to their own in recent years. These rank- Crowdwork promises to help integrate get behind the numbers. Even if better ings have attracted the attention of human and computational processes decisions might result, the process sim- governments and funding bodies and while also providing a source of paid ply requires too much effort. are widely reported in the media. Many work for those who might otherwise As an academic, I appreciate the rich universities worldwide have reacted by be excluded from the global economy.

8 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 letters to the editor

Daniel W. Barowy et al.’s Research We hope future coverage of crowd- the human genome at the start of the Highlight “AutoMan: A Platform for work in Communications will include third millennium. Berger et al. pointed Integrating Human-Based and Digital research incorporating the perspective out that the biologist’s growth exponent Computation” (June 2016) explored a of workers in the design of such sys- is greater even than Moore’s Law. This programming language called Auto- tems. This will help counteract the risk growth has led to an increasing fraction Man designed to integrate human of creating programming languages of medical research funding being di- workers recruited through crowdwork that could actively, even if unintention- rected to data-rich ‘omics. But the differ- markets like Amazon Mechanical Turk ally, accentuate inequality and poverty. ence between what Moore’s Law makes alongside conventional computing At a time when technology increasingly affordable and a greater medical expo- resources. The language breaks new influences political debate, social re- nent is itself in the long term also expo- ground in how to automate the compli- sponsibility is more than ever an in- nential. Berger et al. proposed smarter cated work of scheduling, pricing, and valuable aspect of computer science. algorithms to plug the gap. managing crowdwork. The article described bioinformat- While the attempt to automate this References ics’ ready adoption of cloud computing, 1. Irani, L.C. and Silberman, M.S. Turkopticon: Interrupting managerial responsibility is clearly of worker invisibility in Amazon Mechanical Turk. In but the true parallel nature of much of value, we were dismayed by the authors’ Proceedings of the SIGCHI Conference on Human Factors in e-biology went unstated; for example, Computing Systems (Paris, France, Apr. 27–May 2). ACM lack of concern for those who carry out Press, New York, 2013, 611–620. Illumina next-generation sequencers the actual work of crowdwork. Humans 2. Salehi, N., Irani, L.C., Bernstein, M.S. et al. We are can generate more than one billion Dynamo: Overcoming stalling and friction in collective and computers are not interchangeable. action for crowd workers. In Proceedings of the SIGCHI short DNA strings, each of which can Conference on Human Factors in Computing Systems (Seoul, Minimizing wages is quite different Republic of Korea, Apr. 18–23). ACM Press, New York, be processed independently in parallel. from minimizing execution time. For 2015, 1621–1630. Top-end graphics hardware—GPUs— example, the AutoMan language is de- already contain several thousands of signed to minimize crowdwork request- Barry Brown and Airi Lampinen, processing cores and deliver consider- ers’ costs by iteratively running rounds Stockholm, Sweden ably more raw processing power than of recruitment, with tasks offered at even multi-core CPUs. So it is no wonder increasing wages. However, such opti- that bioinformatics has turned to GPUs. mization is quite different from the per- Authors Respond: Berger et al. did mention BWA’s im- spective of the workers compared to the We share the concerns Brown and plementation of the Burrows-Wheeler requesters. The process is clearly not Lampinen raise about crowdworker compression transform. BarraCUDA is optimized for economic fairness. Sys- rights. In fact, AutoMan, by design, an established port of BWA to Nvidia’s tems that minimize payments could ex- automatically addresses four of the five hardware that was optimized for mod- ert negative economic force on crowd- issues raised by workers, as described ern GPUs; results were presented at the worker wages, failing to account for the by Irani and Silberman in the letter’s 2015 ACM Genetic and Evolutionary complexities of, say, Mechanical Turk Reference 1: AutoMan never arbitrarily Computation Conference.1 Also, Nvidia as a global labor market. rejects work; pays workers as soon as keeps a list of the many bioinformatics Recent research published in the the work is completed; pays workers applications and tools that run on its proceedings of Computer-Human In- the U.S. minimum wage by default; and parallel hardware. teraction and Computer-Supported automatically raises pay for tasks until After CPU clocks maxed out at 3GHz Cooperative Work conferences by Lil- enough workers agree to take them. Our more than 10 years ago, Moore’s Law ly Irani, David Martin, Jacki O’Neill, experience reflects how much workers pushed 21st century computing to be Mary L. Gray, Aniket Kittur, and others appreciate AutoMan, consistently rating parallel. GPUs and GPU-style many- shows how crowdworkers are not inter- AutoMan-generated tasks highly on core hardware are today at the center changeable cogs in a machine but real Turkopticon, the requester-reputation site. of the leading general-purpose paral- humans, many dependent on crowd- Daniel W. Barowy, Charles Curtsinger, lel computing architectures. Much of work to make ends meet. Designing for Emery D. Berger, and microbiology data processing is inher- workers as active, intelligent partners Andrew McGregor, Amherst, MA ently parallel. Computational biology in the functioning of crowdwork sys- and GPUs are a good match and set to tems has great potential. Two examples continue to grow together. where researchers have collaborated Computational Biology Is Parallel with crowdworkers are the Turkopti- Bonnie Berger et al.’s article “Computa- Reference 3. Langdon, W.B. et al. Improving CUDA DNA analysis st con system, as introduced by Irani and tional Biology in the 21 Century: Scal- software with genetic programming. In Proceedings of 1 which allows crowdwork- ing with Compressive Algorithms” (Aug. the 2015 ACM Genetic and Evolutionary Computation Silberman, Conference (Madrid, Spain, July 11–15). ACM Press, ers to review crowdwork requesters, 2016) described how modern biology New York, 2015, 1063–1070. and Dynamo, as presented by Salehi and medical research benefit from in- et al.,2 which supports discussion and tensive use of computing. Microbiology W.B. Langdon, London, U.K. collective action among crowdworkers. has become data rich; for example, the Both projects demonstrate how crowd- volume of sequence data (such as strings Communications welcomes your opinion. To submit a Letter workers can be treated as active part- to the Editor, please limit yourself to 500 words or less, of DNA and RNA bases and protein se- and send to [email protected]. ners in improving the various crowd- quences) has grown exponentially, par- work marketplaces. ticularly since the initial sequencing of © 2016 ACM 0001-0782/16/11 $15.00

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 9 The Communications Web site, http://cacm.acm.org, features more than a dozen bloggers in the BLOG@CACM community. In each issue of Communications, we’ll publish selected posts or excerpts.

Follow us on Twitter at http://twitter.com/blogCACM

DOI:10.1145/2994590 http://cacm.acm.org/blogs/blog-cacm

none of them seemed to have learned Introducing CS to anything about data representation inside the computer. Furthermore, though many of them were gamers, that Newcomers, and JES experience did not seem to keep them from being surprised that you could As a Teaching Tool come up with ways to represent cards or puzzle pieces inside the computer Valerie Barr gets high schoolers thinking about CS, while in ways that would allow a program to Mark Guzdial mulls the benefits of Jython Environment for Students. reason about those entities in the same ways that a human might. Certainly I do not expect K–12 CS education to get deep into mechanisms for data repre- Valerie Barr My overall goal is to try to get the sentation, but I do expect that today’s A Very Local Snapshot kids to think about problems that are students, who as a cohort carry around of K–12 CS Education not obviously numeric in nature. This devices that hold audio files, image files, http://bit.ly/2cqfRDg allows us to talk about the dual chal- and video files, should be a bit more fac- June 30, 2016 lenge of figuring out a solution to the ile with the idea that other entities, like problem and then figuring out how to cards and hard plastic puzzle pieces, I had an interesting experience recent- get the “data” into the computer. Of could also be represented in some way ly. I agreed to run a session on comput- course, I want things to be fun too! inside the computer. er science for the STEP (Science and I have two aids I typically use: SET (a We wrapped up with a quick over- Technology Entry Program) students visual perception card game) and 3D view of Traveling Salesperson (always at Union College’s Kenney Commu- Squares matching puzzles (unfortu- fun to ask kids where in the world they nity Center. The range of students was nately, no longer available for pur- would like to go) because I like to prove large, from 7th to 12th grade. Usually in chase). These lead us to talk about how to them that, as amazing as computers a session like this I start by asking two a human might solve a problem versus are, there are still problems that com- things. First, in what ways are comput- what we would have to tell the comput- puters cannot solve. That concept also ers already in their lives? Second, what er to do. Then there is that pesky ques- seemed quite surprising to them. kind of problems do they think com- tion about how to get the SET cards or As a snapshot of the state of K–12 puters can help solve? There were lots the puzzle pieces “into” the computer. CS education, my experience was not of interesting answers to both ques- This is where things got interesting. particularly encouraging, but it does tions, and I am always intrigued there I had asked the students, as they in- make a strong case for both the coher- are many computers in their lives that troduced themselves, to tell me what ent curricula and the formal profes- kids (and adults) do not think about. sort of prior experience they had had sional development on which many in It is also fun to get sidetracked by dis- with computers. Just about everyone the CS community are working. I am cussion about user interfaces—go heat one had done something formal either ever optimistic that as I continue to do something up in your microwave, for in school or in a camp setting. Several these sorts of sessions, I will begin to example, and contemplate the fact that of them had programmed in Python, at see evidence of the impact of new cur- you can run it for 99 seconds, but 100 least one knew some Java, and several ricula and better-trained teachers. Stay gets you a minute! had done robotics work, but as a group, tuned!

10 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 blog@cacm

Mark Guzdial In JES, you can only edit one file blue box around the lines in the same 14 Years of a at a time, with an interaction pane (a block as the current line (containing Learner-Centered REPL, http://bit.ly/2cEcZrl) for test- the cursor). Python IDE ing and running code. Most Python Now, when students come to me http://bit.ly/2aNXAnC programs by professionals use mul- with indentation errors, I ask them August 10, 2016 tiple files. We opted to design for the “Where’s the blue box?” And they in- I recently discovered that the latest beginner who could get lost in a sea of variably ask, “What blue box?” I point version of our Python IDE for learn- files. Having more than one file open it out to them. “Ohhhh—that blue ers, JES (Jython Environment for Stu- requires some interface for switching box! Yeah, that’s useful.” Until stu- dents), passed 10,000 downloads (see files. One file means no additional in- dents realize they have to attend to in- the count at http://bit.ly/2cyt1l2); terface. You can build bigger things dentation, they do not see the support 10,562 when I started writing this in JES, but we optimized for the most we are providing. Telling them that in- post. We created JES in 2002 in sup- common learner case. dentation is important and pointing port of our Media Computation course Imagineering Media Computation out the blue box is not nearly as effec- at Georgia Tech, which is a required to be Normal Python: We built JES to tive as encountering the error once. course for students in our colleges of facilitate students programming in Over the years, remove features: Liberal Arts, Business, and Design. Media Computation. Students can The original 2002 JES had more fea- Schools that adopted our Python Me- program anything in JES; it is a full tures in it than the most recent ver- dia Computation textbook have also Python implementation. We added sion. Two examples: mostly adopted JES, or used options additional features to JES to facilitate ˲˲ We used to have a menu item to that support the same libraries, such Media Computation with a minimum automatically turn in homework. It as Pythy or Pyjama. The current down- of cognitive load. worked for a while at Georgia Tech, load count is probably an underes- We imagineered a community but then we changed how we handled timate of users, since some schools of practice around Media Compu- homework. Other schools have always download JES only once and then dis- tation (a story we told in http://bit. had their own mechanisms, and most tribute it across campus. ly/2bRCSmd). In JES, it is normal for schools try to make it easy to turn in Most student Python programmers Python to know how to make pictures homework. We dropped that. use IDLE, the integrated development with makePicture, access individual ˲˲ We worried about students load- environment (IDE) provided with Py- pixels with getPixels, change colors ing a function from their homework thon. While Python is more learner- with setRed, and access sound sam- file, then deleting the source code (by centered than many languages used ples with getSamples. Invisibly, we accident). The program might work by professionals (see “Five Principles load libraries for the student so that correctly for them, but the file would for Programming Languages for JES is a Python that supports program- be incomplete when grading. So, we Learners,” http://bit.ly/2chERzG), the ming with multimedia from the first wrote complicated code that com- IDE should also be tailored for learn- moment of class. In normal Python, pared the internal namespace to the ers. Learners have different goals you can print anything to get its val- source code, but that complicated and needs than professional pro- ue. In JES, you explore any picture or code often caused problems of one grammers. They need programming sound to open an explorer to see color sort. We decided it was not worth a big environments that take a learner- values in pictures or to visualize the maintenance task for a rare error case, centered design approach (http://bit. samples in a sound. so we simply removed all of that com- ly/2cEcQnz), like JES or DrPython. Later on, we tell students there are plicated code. JES is likely one of the most-used libraries and how JES is automatically The lesson here is Yogi Berra’s fa- pedagogical programming environ- loading them. But on Day One, JES is a mous quote, “It’s tough to make pre- ments for Python, and the fact that it is friendly Python that knows about pix- dictions, especially about the future.” still frequently used at the ripe old age els and pictures, sounds and samples, Some of the issues that seemed criti- of 14 suggests that it has been pretty and frames and videos. No imported cal in the beginning simply were not successful. It is probably worth con- libraries, no dots, no extra details. all that important in actual practice sidering why it has worked. As lead on Sometimes, you explain errors after later. The features with more staying the team that built it and maintained they occur: When we first started teach- power were the ones that we added it for the last 14 years, I am not a good ing Python to non-technical majors at in response to learner behavior. Our judge for why it has worked. Georgia Tech, students struggled with most successful features were the Instead, I offer four brief stories indentation errors. Even today in our ones we added to meet real learners’ about JES’s development and mainte- ebooks (http://bit.ly/2bW119J), inden- needs, not what we initially imagined nance over the last 14 years. tation problems in Python are among the needs to be. Keep It Simple: From DrScheme the most common and the most diffi- and DrJava, we took the principle to cult for students to fix. Valerie Barr is a professor in the Computer Science Department at Union College, Schenectady, NY. keep it simple. We did not want an We decided to add a small feature Mark Guzdial is a professor in the College of Computing interface with many options for many to JES to help with indentation er- at the Georgia Institute of Technology, Atlanta, GA. kinds of uses. We wanted an interface rors. We wanted indentation to be sa- that worked very well for learners. lient and obvious. JES draws a small © 2016 ACM 0001-0782/16/11 $15.00

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 11 news

Science | DOI:10.1145/2994577 Erica Klarreich Learning Securely Because it is easy to fool, machine learning N must be taught how to handle adversarial inputs.

VER THE PAST five years, machine learning has blos- somed from a promising but immature technology into one that can achieve Oclose to human-level performance on a wide array of tasks. In the near future, it is likely to be incorporated into an in- creasing number of technologies that directly impact society, from self-driv- ing cars to virtual assistants to facial- School bus + tiny adversarial perturbation = “ostrich” recognition software. Yet machine learning also offers brand-new opportunities for hack- ers. Malicious inputs specially crafted by an adversary can “poison” a ma- chine learning algorithm during its training period, or dupe it after it has been trained. While the creators of a machine learning algorithm usually benchmark its average performance carefully, it is unusual for them to con- Dog + tiny adversarial perturbation = “ostrich” sider how it performs against adversar- ial inputs, security researchers say. Adversarial input can fool a machine-learning algorithm into misperceiving images. The emerging field of adversarial machine learning is exploring these of the machine learning algorithm un- the University of California, Berkeley, vulnerabilities. In the past few years, der attack. who has crafted audio files that sound researchers have figured out, for exam- Machine learning can be easy to like white noise to humans, but like ple, how to make tiny, imperceptible fool, computer scientists warn. “We commands to speech recognition al- changes to an image to fool vision pro- don’t want to wait until machine gorithms. “We need to think of the at- cessing systems into interpreting an learning algorithms are being used tacks as early as possible.” image humans see as a school bus as on billions of devices, and then wait an ostrich instead. Such deceptions of- for people to mount attacks,” said Attacking the Black Box ten can be carried out with virtually no Nicholas Carlini, a graduate student Adversarial machine learning has been

knowledge about the inner workings in adversarial machine learning at studied for more than a decade in a few ET AL. SZEGEDY CHRISTIAN BY IMAGES

12 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 news security-related settings, such as spam filtering and malware detection. How- ACM ever, the advent of deep neural net- “Only recently has works has greatly expanded the scope machine learning Member both of the potential applications of machine learning and the attacks that become accurate can be carried out upon it. enough for it to be News Until very recently, it did not make SEEKING DISTRIBUTED AI much sense to study how to trick a neu- interesting when “I was 14 years ral network, said Ian Goodfellow, an it’s inaccurate.” old in 1980 and adversarial machine learning expert a friend said the local Tandy at OpenAI, a research institute in San store was Francisco. “Five years ago, if I said ‘hey, selling Radio I found an example that my neural net- Shack computers, and I didn’t believe work misclassifies,’ that would be the him,” recalls Michael rule, not the exception,” he said. “Only a long time, and the output will be say- Wooldridge, head of the recently has machine learning become ing more and more strongly that it’s a Department of Computer accurate enough for it to be interesting cat,” Goodfellow said. “This says that Science at the University of Oxford, where he also serves as when it’s inaccurate.” neural networks can extrapolate in re- a computer science professor. A paper posted online in 2013 ally extreme ways.” Subtle adjustments Wooldridge and his buddy went launched the modern wave of adver- to pixels that nudge an image along to have a look, and started sarial machine learning research by this “cat” direction can make the neu- hanging around the electronics store. “We persuaded the guys showing, for three different image pro- ral network mistake the image for a cat, in the shop to let us write cessing neural networks, how to cre- even if to a human being, the altered programs on the computer in ate “adversarial examples”—images image looks indistinguishable from the window, and within a month I had written my first that, after tiny modifications to some the original. program. I knew that was what I of the pixels, fool the neural network At the same time, linearity does not wanted to do.” into classifying them differently from explain all the vulnerabilities of ma- Wooldridge went on to the way humans see them. Subsequent chine learning algorithms to adver- earn an undergraduate degree in computer science in 1989, papers have created many more such sarial examples. Last year, for instance, specializing in networks and adversarial examples—for instance, three researchers at the University of artificial intelligence (AI). He one that forced a deep neural network California, Berkeley—Alex Kantch- was interested in becoming a research scientist, and it to classify what humans see as a “stop” elian, Doug Tygar, and Anthony Jo- occurred to him that you sign instead as a “yield” sign. seph—showed a highly nonlinear ma- could put networks and AI While it is not clear whether such chine learning model called “boosted together and have something examples could be used in the real trees” is also highly susceptible to ad- that constituted distributed artificial intelligence. “I world to attack self-driving cars, “it’s versarial examples. “It’s easy to design wondered what that might look important for the designers of self-driv- evasions for them,” Kantchelian said. like, and what the potential ing cars to understand that one subsys- The earliest adversarial examples applications might be.” tem might totally fail because of one were concocted by researchers who Driven by curiosity, Wooldridge earned a Ph.D. adversarial example,” said Goodfellow, had full access to the machine learn- in distributed artificial one of the authors of the 2013 paper. ing model under attack. Yet even with intelligence from the University “It highlights the importance of using those first examples, researchers start- of Manchester. He became a full professor at the University several redundant safety systems, and ed noticing something strange: ex- of Liverpool in 2000, and joined making sure an attacker can’t compro- amples designed to fool one machine the University of Oxford as a mise them all at the same time.” learning algorithm often fooled other professor of computer science The reason deep neural networks machine learning algorithms, too. in 2012. Today, Wooldridge, an can be fooled by adversarial examples, That means an attacker does not ACM Fellow, is focused on Goodfellow said, is that while these necessarily need access to a machine the intersection of logic, complex networks are made up of learning model’s architecture or train- computational complexity, and game theory. “My main many layers of processors, each indi- ing data to attack it. As long as the at- research interests are in the vidual layer essentially uses a piece- tacker can query the model to see how behavior of network systems— wise linear function to transform the it classifies various data, he or she can where network nodes have input. This linearity makes the model use that data to train a different mod- their own preferences, goals, and agendas—and using prone to overconfidence about its in- el, build adversarial examples for that techniques from model ferences, he said. model, then use those to trick the origi- checking together with ideas “If you find a direction to move in nal model. from game theory to study that makes the output more likely to Researchers do not fully under- these systems.” —John Delaney say the image is a cat, then you can stand why adversarial examples trans- keep moving in the same direction for fer from one model to another, but

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 13 news

they have confirmed the phenomenon makes it easy to miss vulnerabilities. ability vectors instead of single labels, in increasingly broad settings. In a Now, he warned, the machine learn- and then trains a new neural network paper posted online in May, Goodfel- ing community is poised to make the using the probability vector labels; this low, together with Nicolas Papernot same mistake: machine learning is a more nuanced training makes the sec- and Patrick McDaniel of Pennsylvania huge field, but only a tiny slice of the ond neural network less prone to over- State University, showed that adver- community is focused on security. fitting. Fooling such a neural network, sarial examples transfer across five “We should be developing security in the researchers showed, required of the most commonly used types of from the start,” he said. “It shouldn’t eight times as much distortion to the machine learning algorithms: neural be an afterthought.” image as before the distillation. networks, logistic regression, support The good news is that adversarial ex- The work done so far is just a start, vector machines, decision trees, and amples do not just offer a way to bench- Tygar said. “We haven’t even begun to nearest neighbors. mark how vulnerable a machine learn- understand all the potential different The team carried out “black box” at- ing model is; they also can be used to environments in which you might have tacks—with no knowledge of the mod- make the model more robust against an attack” on machine learning. The el—on classifiers hosted by Amazon an adversary and, in some cases, even field of adversarial machine learning is and Google. They found after only 800 improve its overall accuracy. full of opportunities, he said. “The area queries to each classifier, they could Some researchers are making ma- is ready to move.” create adversarial examples that fooled chine learning algorithms more robust Although Tygar thinks it is a near the two models 96% and 89% of the by essentially “vaccinating” them: add- certainty we will start seeing more at- time, respectively. ing adversarial examples, correctly la- tacks on machine learning as its use Not every adversarial example beled, into the training data. In a 2014 becomes prevalent, it would be a mis- crafted on one machine learning paper, Goodfellow and two colleagues take, he said, to conclude machine model will transfer to a different tar- demonstrated in a classification task learning is too risky to be used in ad- get, Papernot said, but “for an adver- involving handwritten numbers that versarial settings. Rather, he said, sary, sometimes just a small success this kind of training not only made it the question should be, “How can we rate is enough.” harder to fool a neural network, but strengthen machine learning so it is even brought down its error rate on ready for prime time?” Getting Ready for Prime Time non-adversarial inputs. Kantchelian, Human vision systems have adversar- Tygar, and Joseph have shown a simi- Further Reading ial examples of our own, more com- lar vaccination process can greatly im- monly known as optical illusions. For prove the robustness of boosted trees. Goodfellow, I., Shlens, J., and Szegedy, C. the most part, though, people process Most of the research on adversarial Explaining and Harnessing Adversarial Examples visual data remarkably effectively. As machine learning has focused on “su- http://arxiv.org/pdf/1412.6572v3.pdf computer vision systems approach pervised” learning, in which the algo- Kantchelian, A., Tygar, J. D., and Joseph, A. human-level performance on particu- rithm learns from labeled data. Adver- Evasion and Hardening lar tasks, adversarial examples offer a sarial training offers a potential way for of Tree Ensemble Classifiers new way to benchmark performance, machine learning algorithms to learn http://arxiv.org/pdf/1509.07892.pdf besides measuring how well an algo- from unlabeled data—an exciting pros- Miyato, T., Dai, A., and Goodfellow, I rithm performs on typical inputs. Ad- pect, Goodfellow said, since labeled Virtual Adversarial Training for versarial inputs “help find the flaws in data is expensive and time-consuming Semi-Supervised Text Classification neural networks that do really well on to create. In a paper posted online in http://arxiv.org/pdf/1605.07725v1.pdf the more traditional kinds of bench- May, Takeru Miyato of Kyoto Univer- Papernot, N., McDaniel, P., marks,” Goodfellow said. sity—along with Goodfellow and An- Wu, X., Jha, X., and Swami, A. Distillation as a Defense to “In a perfect world, what I would drew Dai of Google Brain—was able to Adversarial Perturbations against like to see in papers is, ‘Here is a new improve the performance of a movie Deep Neural Networks machine learning algorithm, and here review classifier by taking unlabeled Proceedings of the 37th IEEE Symposium on is the standard benchmark for how reviews, creating adversarial versions Security and Privacy, May 2016. it does on accuracy, and here is the of them, and then teaching the classi- Papernot, N., McDaniel, P., and Goodfellow, I. standard benchmark on how it per- fier to group those reviews in the same Transferability in Machine Learning: forms against an adversary,” Carlini category. With the plethora of texts from Phenomena to Black-Box Attacks said. Adversarial machine learning ex- available on the Internet, “you can get using Adversarial Samples http://arxiv.org/pdf/1605.07277v1.pdf perts have some leads on what such a as many examples for unlabeled learn- benchmark should look like, Papernot ing as you want,” Goodfellow said. Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., said, but no such benchmark has been Meanwhile, in another paper pub- and Fergus, R. established yet. lished in May, Papernot, McDaniel, Intriguing Properties of Neural Networks Software developers have a history and other researchers detail the cre- https://arxiv.org/pdf/1312.6199v4.pdf of adding security to their products ation of another defense against adver- after the fact rather than integrating sarial examples, called “defensive dis- Erica Klarreich is a mathematics and science journalist based in Berkeley, CA. it into the development phase, Car- tillation.” This approach uses a neural lini said, even though that approach network to label images with prob- © 2016 ACM 0001-0782/16/11 $15.00

14 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 news

Technology | DOI:10.1145/2994581 Sarah Underwood Blockchain Beyond Bitcoin Blockchain technology has the potential to revolutionize applications and redefine the digital economy.

LOCKCHAIN TECHNOLOGY HAS attracted attention as the basis of cryptocurren- cies such as Bitcoin, but its capabilities extend far be- Byond that, enabling existing technol- ogy applications to be vastly improved and new applications never previously practical to be deployed. Also known as distributed ledger technology, blockchain is expected to revolutionize industry and commerce and drive economic change on a global scale because it is immutable, trans- parent, and redefines trust, enabling secure, fast, trustworthy, and trans- parent solutions that can be public or private. It could empower people in developing countries with recognized identity, asset ownership, and financial inclusion; and it could avert a repeat of the 2008 financial crisis, support ef- fective healthcare programs, improve supply chains and, perhaps, clean up unethical behavior in high-value busi- nesses such as diamond trading. Blockchain, like the Internet, is an open, global infrastructure that allows block of data is added to the chain and the second Internet, personal comput- companies and individuals making shared by all on the network. Transac- ers, and local area networks. The third transactions to cut out the middle- tions are secure, trusted, auditable, platform delivers computing anywhere, man, reducing the cost of transactions and immutable. They also avoid the immediately, and allows organizations and the time lapse of working through need for copious, often duplicate, doc- to deploy and consume computing re- third parties. The technology is based umentation, third-party intervention, sources in shared communities. on a distributed ledger structure and and remediation. Says Versace, “The core capabilities consensus process. The structure al- Blockchains can be either public of the third platform of technology are lows a digital ledger of transactions to and unpermissioned, allowing any- beyond any we have seen before. Innova- be created and shared between distrib- body to use them (bitcoin is a case in tion accelerators like blockchain mean uted computers on a network. The led- point) or private and permissioned, we can achieve technology value out- ger is not owned or controlled by one creating a closed group of known par- comes that we couldn’t achieve before.” central authority or company, and can ticipants working, perhaps, in a partic- This is promising, but there are be viewed by all users on the network. ular industry or supply chain. caveats. Sandeep Kumar, managing When a user wants to add a trans- Michael Versace, global research di- director of capital markets and a block- action to the ledger, the transaction rector for digital strategies at research chain specialist at digital business con- data is encrypted and verified by other firm IDC, describes blockchain as an sulting and technology services firm computers on the network using cryp- industry and innovation accelerator Synechron, names data privacy, scal- tographic algorithms. If there is con- based on the capability of the third plat- ability, and interoperability as three sensus among the majority of comput- form of technology—the first platform key challenges to blockchain technol-

IMAGE BY IMAGENTLE BY IMAGE ers that the transaction is valid, a new being mainframes and their networks, ogy that are pervasive across applica-

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 15 news

technology and building the founda- Data from Border Devices’ project. tion of a standardized, production- Everledger’s focus is on the identity grade digital ledger. and legitimacy of objects. Blockchain Benefits of Deloitte is working with clients and works well here because its history can- startups to develop solutions includ- not be changed and it enables trust by Blockchain ing Smart Identity, which can support consensus. The company’s initial work in Financial banks’ regulatory client onboarding and provides a distributed ledger of dia- Know Your Customer (KYC) processes, mond ownership and transaction his- Services while individual financial institutions, tory verification for owners, insurance insurance companies, exchanges, and companies, claimants, and law en- Secure transactions ˲˲ Avoid information leakage solutions vendors also have thrown forcement agencies. The system assists ˲˲ Reduce transaction time their weight behind blockchain. with prevention of fraud in the supply ˲˲ Remove transaction intermediaries Many are taking advantage of the chain, but also helps consumers decide ˲˲ Reduce risk of fraud and cybercrime ˲˲ Observe transactions in real time technology’s ability to act as a giant whether to buy particular diamonds. time stamp. Nasdaq is using its Linq Leanne Kemp, founder and CEO Source: IBM blockchain technology to complete of Everledger, explains, “The ultimate and record private securities trans- goal is to track diamonds from mine actions, and the Depository Trust & to market, so that consumers can see if Early Clearing Corporation, working with correct duties and taxes have been paid market participants and technology and whether a diamond is a ‘blood dia- Adopter firm Axoni, is managing post-trade mond’ that has been mined and traded events for credit default swaps. Regula- in a war zone and contributed to hu- Views on tors are also interested in the technol- man atrocity.” The company also is ogy, as its transparency and integrity considering applying its technology to Blockchain allow market activity to be monitored other big-ticket items, such as fine art, ˲˲ Platform openness is required. in real time. vintage cars, and wine. ˲˲ Features like identity, privacy, These early applications show great In addition, blockchain is expected security, operations management, potential, but there are problems to be well suited, with the addition of and interoperability need to be integrated. around data privacy, scale and latency smart contracts that use computerized ˲˲ Performance, scale, support, and in financial markets. The privacy issue transaction protocols to execute the stability are crucial. is about how much information needs terms of contracts agreed by users of ˲˲ Consortium blockchains, which are permissioned networks on which to be exposed to verify a transaction. a blockchain, to applications such as consortium members may execute This could be more than at present product manufacturing, supply chain contracts, are ideal. and could compromise the privacy of management, vehicle provenance, and a trade. Scale and latency are also is- Source: Microsoft sharing resources such as electricity. sues in a market managing huge data Emin Gün Sirer, an associate pro- volumes. These problems are being fessor of computer science at Cor- tions and have not yet been solved addressed by industry consortia and nell University and a participant in a cleanly. Other sticking points are data individual firms, but robust solutions number of blockchain projects, says transfer, and integrating with exist- remain elusive. blockchain could democratize the in- ing systems and sometimes security, which depends on application coding. Commercial Applications In the commercial world, two startups The financial Financial Applications making progress with blockchain are The financial services sector, which Factom and Everledger. services sector must innovate to cut the costs of leg- Factom’s focus is on securing data. takes advantage of acy systems and manage increasing The company is participating in the regulation, is leading the way with Honduran land registry project and blockchain’s security, blockchain and taking advantage of working on a number of projects in immutability, the technology’s security, immutabil- China, including data infrastructure ity, transparency, and ability to cut for 80 smart cities, financial technol- transparency, out the middleman. Fintech startup ogy solutions, and integrating block- and ability to cut out R3, backed by over 40 global banks, chain technology with electronic data is developing a standardized architec- notarization services to enhance integ- the middleman. ture for private ledgers that could sig- rity in information management. nificantly cut the cost and time of set- The company has also secured fund- tling transactions. Similarly, the Linux ing from the U.S. Department of Home- Foundation’s Hyperledger project is land Security’s Science and Technol- an industry initiative including tech ogy Directorate under the ‘Blockchain giant IBM that is evolving open source Software to Prove Integrity of Captured

16 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 news surance industry by using smart con- ment or livestock, and provide access tracts to pay out against insurance to working capital and, by extension, a policies without policyholders having The potential wider market. to make a claim. He adds: “The Inter- of blockchain is Digital identity enabled by block- net of Things could be an enormous chain has the potential to change lives. application area where people want diverse in Says Dahan, “If blockchain technol- to communicate with devices, but not developing countries, ogy can be used to secure robust, self- through intermediaries. There is no sovereign digital identities around killer app yet, but it is likely to feature where the initial personal data, there’s a real possibility the transparency of blockchain.” focus is on the that people in places with poor docu- While start-ups can skip some of the ments, registries and rules of law can challenges presented by blockchain trust element. establish trusted measures of their technology, established firms must good reputation. This would allow set up a network of blockchain partici- them to assert who they are and access pants, perhaps suppliers and custom- proof of their digital identity anywhere ers, and agree on technology protocols. using a private key.” Commercial firms, like others, will also With the benefit of digital identity, hit the interoperability barrier identi- ten in short supply.” many of the world’s two billion un- fied by Synechron and by Microsoft in Dahan suggests the trust element of banked individuals could store their feedback from early blockchain adopt- blockchain will play well into the 2030 identities on a blockchain, permis- ers. Kumar explains: “Blockchain is Sustainable Development Goals ad- sion banks to fulfill regulatory require- evolving in many ecosystems, such as opted by U.N. members in 2015 and de- ments such as Know Your Customer, Hyperledger and Ethereum, but there signed to end poverty, protect the plan- and gain access to bank accounts, needs to be a native way to integrate et, and ensure prosperity for all. More loans, and other financial services pre- blockchains that would allow, for ex- specifically, she notes high-potential viously inaccessible to them. ample, a transaction on Hyperledger to applications of blockchain in land regis- The potential of blockchain to revo- invoke information from Ethereum.” tration, digital identity, and finance for lutionize applications and drive global Sirer warns of less-advantageous ap- small and medium-sized enterprises. economic change is certainly there, but plications such as gambling and on- Land registration and awareness problems persist in wide-scale execu- going security problems. He cites the of its relevance to issues such as food tion. As Kumar concludes: “Blockchain spectacular rise and fall of The DAO, security, climate change, urbaniza- is not yet ready for prime time.” a distributed autonomous organiza- tion, and indigenous people’s rights tion based on Ethereum technology has increased over recent years, yet the Further Reading that acted as an investment vehicle, Independent Evaluation Group of the raising $220 million, then swiftly World Bank says 70% of the world’s The future of financial infrastructure: losing $53 million to a hacker. “We An ambitious look at how blockchain population lacks access to proper land can reshape financial services, looked at the DAO code and found it titling or demarcation. World Economic Forum, was written so badly it was open to Beginning to solve this problem https://www.weforum.org/reports/the- attack from nine different angles. In- are projects like one in the Republic of future-of-financial-infrastructure-an- cidents like this uncover the need for Georgia, where the National Agency of ambitious-look-at-how-blockchain-can- reshape-financial-services/ more multi-disciplinary research on Public Registry is working with BitFury blockchain technology.” on a pilot project that will use a trans- Cuomo, J. parent, secure ledger to manage land How Businesses and Governments Can Capitalize on Blockchain, Developing Countries titles and, if successful, cut property http://www.ibm.com/blogs/ The potential of blockchain is also registration fees by up to 95%, increase think/2016/03/16/how-businesses-and- diverse in developing countries, but transparency of land ownership, and governments-can-capitalize-on-blockchain/ where the commercial world is con- reduce fraud. A similar project partially Sirer. E.G. centrating on outstanding technology funded by the World Bank is being de- Introducing Virtual Notary challenges, developing countries are veloped in Honduras, where Factom is Hacking, Distributed initially focusing on the trust element working with the government to proto- http://hackingdistributed.com/2013/06/20/ virtual-notary-intro/ of blockchain. type a blockchain-based land registry. Mariana Dahan, senior operations Beyond land registration, Dahan Casey, M., and Dahan, M. Blockchain technology: Redefining trust officer at the World Bank in charge explains how the ability to store and for a global, digital economy of the 2030 development agenda and update property titles on a blockchain http://blogs.worldbank.org/ic4d/blockchain- United Nations (U.N.) relations, says, could, for the first time, allow poor technology-redefining-trust-global-digital- “We believe blockchain is a major people to assert reliable title claims economy?cid=EXT_WBBlogSocialShare_D_EXT breakthrough and has great potential. to their homes and use them as col- It will make an impact on, and bring lateral for borrowing. Small and medi- Sarah Underwood is a technology writer based in Teddington, U.K. value to, any transaction that requires um-sized enterprises also could prove trust, a social resource that is all too of- ownership of assets, perhaps equip- © 2016 ACM 0001-0782/16/11 $15.00

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 17 news

Society | DOI:10.1145/2994579 Tom Geller Farm Automation Gets Smarter As fewer people work the land, robots pick up the slack.

IELD FARMING IS “the world’s oldest profession,” and not just because food plants have been cultivated for over 10,000 years. Its individual practitio- Fners are old as well, the median age ris- ing rapidly as young people abandon the farming lifestyle (the U.S. Department of Agriculture reports a median age of 58 in 2012, up from 55 in 2002, with other countries showing similar data). Those who remain face the same repetitive work of seeding, weeding, feeding, and harvesting, the tedium of each task in- The BoniRob is a multipurpose robotic platform for agricultural applications featuring creasing as farms grow ever larger. independently steerable drive wheels and adjustable track width. However, today’s agricultural robots excel at repetitive tasks, letting farmers next door has cheaper labor” said Salah field, the wind blows leaves around, tend to more strategic matters. Sukkarieh, director of Research and In- lighting conditions change, you get “When we develop robots, people novation at the Australian Centre for a bit of dust; these all affect the algo- always say they should increase yields Field Robotics (ACFR) at the University rithm. We try things out in the lab, but or lower costs, but that’s not always the of Sydney. then have to play with them because case,” says Eldert van Henten, a profes- Food security is one of the driving they don’t work as well outdoors, in an sor in Wageningen University’s Farm forces behind “The Vegetable Factory,” unstructured environment.” Technology Group. “Robots that offer an indoor farm that the Japan-based Traditional farm equipment also the farmer time to devote attention to company SPREAD Co. Inc. plans to adds unpredictability to the mix, as business and management also have open in Kyoto next year. In addition Autonomous Tractor Corporation economic value.” to computer-controlled lighting, tem- (ATC) president and CEO Kraig Schulz Allen Lash, CEO of the farm manage- perature, and humidity, virtually every learned when his company started ment company Family Farms Group, aspect of a plant’s development from planning to build an autonomous trac- sees agricultural robots filling in for the seedling to harvest will be automated. tor. “Traditional [drive train] vendors shrinking agricultural workforce. “In a “We expect automation to reduce hu- couldn’t deliver what we needed, so lot of our small, rural communities, we man error, therefore stabilizing the we turned to electric drive trains, like just don’t have adequate labor to run quality and quantity of production,” Google and Tesla have for their cars. A big, sophisticated equipment anymore,” says SPREAD global marketing man- traditional drive train is hard to control; he says. “With autonomous equipment, ager J.J. Price. “Such ‘plant factories’ there’s a lot of mechanical nonsense you’ll need less labor, although the peo- will increase in the future to cover ag- between the driver and the ground. But ple you have will need more computer ricultural issues, food shortages, and I can control an electric motor to milli- technology skills. They’ll control equip- environmental problems.” seconds and fractions of an inch.” The ment remotely to a great extent, from a company later released its “eDrive” sys- control room, scanning at least three Command and Control tem as an electric engine replacement cameras all the time. So we’ll get more Farm automation is particularly diffi- for tractors already in the field. acres covered in a more efficient way; cult because the environment varies so As another step toward autono- maybe several thousand acres per em- much—especially when compared to mous farming, ATC next supplement- ployee, instead of today’s 1,000.” clean labs. “Computer scientists need ed traditional GPS navigation with a Proponents believe robotic agricul- to get outdoors more,” said ACFR’s local laser-and-radio system named ture also will help ensure food is avail- Sukkarieh. “A lot of the [computer vi- AutoDrive. “If you talk to farmers, able for everyone, everywhere, from sion] algorithms get developed on very they’ll say that GPS is good, but far local sources. “Every country is now wor- structured data, like a camera focus- from perfect. They’re in very rural ar-

ried about food security, and the country ing on objects in a kitchen. But in the eas, where reception isn’t great, and OF BOSCH DEEPFIELD ROBOTICS COURTESY IMAGE

18 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 news

GPS tends to suffer from problems cations to analyze the images for things that perceive its own situation; intel- like sunspots, reflections, and tree like disease.” ligence and control mechanism for de- coverage,” says Schulz. “A lot of driv- A different approach to weeding can termining and selecting the appropriate erless cars use optics and radar and be found in “Lettucebot,” a product action by itself; and driving mechanisms such, because there’s stuff around from U.S.-based Blue River Technology for achieving what it has determined.” them to see. When you’re in a field, that puts its intelligence in the imple- Blue River Technology’s Chostner there’s nothing to see! So you need a ment, combining vision technology with says, “A robot sees, decides, and acts. second or third source of data to get microspraying to apply herbicides to Then it closes that loop by seeing how it relative positioning in the field.” The weeds and overplanted lettuce sprouts. acted and adjusting its future actions.” company is now working on a system According to the company, Lettucebot On the other hand, Chostner says, “We’d that can be retrofit to any tractor with can identify 1.5 million plants per hour, much rather focus on the simplest tool its eDrive, adding AutoDrive and its weeding and thinning a 40-acre field that will create value for our customers, own “FieldSmart” artificial intelli- within a day. For vice president of busi- rather than the ultimate ‘robot’ that is gence software. ness development Ben Chostner, speed the company vision. Robots are always is only one benefit to this precision ap- something out in the future; automation An “App” for Planting proach. “Most inputs or chemicals on is something that’s here today, creating Another challenge for autonomous the farm are applied in a broadcast way, value. We found that farmers don’t like farming is the variety of tasks that agri- with hardly any sensors. It’s like if some- to buy robots; they like to buy machines culture demands. The hardest, accord- one in San Francisco had an infection, that work.” ing to Wageningen University’s van and the only solution was to give every- Except for limited tasks by “machines Henten, is selective harvesting. “Ro- one an antibiotic; that would solve the that work,” fully automated growing may bots can do the trick now at relatively problem, but it would be very expensive be years away. Kubota’s Iida believes the slow speeds, and without full success,” and inefficient, and would lead to nega- technology is now in the second of four he says. “We still have a long way to go. tive consequences like resistance. That’s stages: we have achieved automation of But humans excel in terms of eye-hand the same thing that’s happening with individual functions and auto-steering coordination, intelligence, flexibility, weed control today, because those are under operator control, but have yet to and robustness against variation in the the tools we have.” master unguided operation and fully au- environment. So harvesting is done by ATC’s Schulz looks forward to the tonomous farming. humans very effectively.” day that weeding, planting, and harvest- Sukkarieh believes economic chang- One problem, he says, is that “Plants ing are all treated as “apps” that a single es have cleared the way to reach that fi- that are members of the same type can machine (with specialized attachments) nal phase. “Farming is probably the last differ a lot due to a lack of water, light, could learn to perform through artificial major primary industry to focus on auto- or nutrition.” Switching from one vari- intelligence (AI) techniques. “The com- mation, and that’s because of the cost,” ety to another is also difficult. “Consider puter has to become a farmer just like a he says, “but technology costs have the tomato; usually a tomato is red, but person becomes a farmer, over time, by dropped dramatically. Even places with there are some that are yellow. There learning. That’s where AI comes in. The cheaper labor are considering agricul- are also different shapes, and beefsteak driver should think of the tractor riding tural automation. I think over the next and cherry tomatoes have very different along like a son or daughter: it observes year or two, you’ll see a large amount of sizes. But the robot still has to recognize what the farmer is doing and thinks, activity happening in this space.” them as tomatoes.” ‘this is what I have to replicate.’ The trac- A somewhat simpler task is weed- tor will start with small tasks and grow Further Reading ing, which is the bailiwick of the French competence over time. Its tillage ‘app’ company Naïo Technologies. “We start- is very different from a planting ‘app’ Robotics & Automation Society, Technical ed with weeding because it’s very ardu- Committee on Agricultural Robotics and because the task has fundamentally dif- Automation, IEEE: http://www.fieldrobot. ous for farmers,” says Julien Laffont, ferent controls, but the learning process com/ieeeras/ the company’s international business could be similar.” Past IEEE webinars on agricultural robotics: developer, “and it’s a repetitive task, so http://www.fieldrobot.com/ieeeras/Events.html it’s perfect for robots.” The company’s From Automation to Automatons Video of Lettucebot: https://www.youtube. currently available “Oz” system uses two Kubota Corp. director and senior man- com/watch?v=yCPe9Sy0TFY cameras and a laser to guide itself be- aging executive officer Satoshi Iida dif- tween rows of crops as it drags a weeding ferentiates modern, computer-driven Robohub focus on agricultural robotics: http://robohub.org/tag/robohub-focus-on- tool behind it. agents from centuries-old farming tech- agricultural-robotics/ Oz controls only the robot, not the nology like the seed drill. “An agricultur- Australian Centre for Field Robotics tool it drags behind. Laffont said the al machine can be defined as a ‘robot’ (Agriculture Projects): http://sydney.edu.au/ company’s future “Dino” robot also when it has the three technology com- acfr/agriculture will control the tools themselves, let- ponents,” says Iida, who also is general ting them do more than simply avoid manager of Kubota’s Research and De- Tom Geller is an Oberlin, Ohio-based writer and harming crops. “We’ll use cameras that velopment (R&D) headquarters, and of documentary producer. analyze the image to detect, circle, and the company’s Water and Environment count crops. We might also have appli- R&D function. “It must have sensors © 2016 ACM 0001-0782/16/11 $15.00

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 19 viewpoints

DOI:10.1145/3000606 Roger R. Schell VPrivacy and Security Cyber Defense Triad for Where Security Matters Dramatically more trustworthy cyber security is a choice.

N THE EARLY days of computers, confident that from the standpoint of security was easily provided technology there is a good chance for by physical isolation of ma- The security problem secure shared systems in the next few chines dedicated to security do- will remain as long years. However, from a practical stand- mains. Today’s systems need as manufacturers point the security problem will remain Ihigh-assurance controlled sharing as long as manufacturers remain com- of resources, code, and data across remain committed mitted to current system architectures, domains in order to build practical produced without a firm requirement systems. Current approaches to cyber to current system for security. As long as there is support security are more focused on saving architectures, for ad hoc fixes and security packages money or developing elegant techni- for these inadequate designs, and as cal solutions than on working and produced without long as the illusory results of penetra- protecting lives and property. They a firm requirement tion teams are accepted as a demon- largely lack the scientific or engi- stration of computer system security, neering rigor needed for a trustwor- for security. proper security will not be a reality.”8 thy system to defend the security of networked computers in three dimen- Current Approaches sions at the same time: mandatory ac- Aren’t Working cess control (MAC) policy, protection Our confidence in “security kernel” against subversion, and verifiability— technology was well founded, but I what I call a defense triad. development processes were neither never expected decades later to find Fifty years ago the U.S. military rec- scalable nor sustainable for advanc- the same denial of proper security so ognized subversiona as the most seri- ing computer technology and growing widespread. Although Forbes reports ous threat to security. Solutions such threats. In a 1972 workshop, I pro- spending on information security as cleared developers and technical posed “a compact security ‘kernel’ of reached $75 billion for 2015, our ad- the operating system and supporting versaries are still greatly outpacing us. hardware—such that an antagonist With that large financial incentive for a As characterized by Anderson, et al.,2 “System subversion involves the hiding of a software or could provide the remainder of the sys- vested interests, resources are mostly hardware artifice in the system that creates a tem without compromising the protec- devoted to doing more of what we knew ‘backdoor’ known only to the attacker.” tion provided.” I concluded: “We are didn’t work then, and still doesn’t.

20 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 V viewpoints

Why does cyber security seem so remaining hole. Even worse, a witted of “malware,” a preferred attack for difficult? Today’s emphasis on sur- adversary has numerous opportunities many of the most serious breaches. veillance and monitoring tries to to subvert or sabotage a computer’s An IBM executive a few years ago de- discover that an adversary has found protection software itself to introduce scribed the penetrate-and-patch cycle and exploited a vulnerability to pen- insidious new flaws. This is an example as “an arms race we cannot win.”5 etrate security and cause damage—or worse, subverted the security mecha- Figure 1. Cyber security defense triad. nism itself. Then that hole is patched. But science tells us trying to make a system secure in this way is effectively non-computable. Even after fixing known flaws, uncountable flaws re- Secure Systems • Subversion is tool for choice for witted adversary main. Recently, Steven Lipner, for-

• Only label-based MAC policy can enforce secure merly of Microsoft, wrote a Commu- AC Mandatory

Verifiability information flow

nications Privacy and Security column Limit Subversion advocating technical “secure develop- • Security kernel (reference monitor) is only known verifiable protection technology ment processes.”6 But, similar to sur- veillance, “as new classes of vulner- abilities … are discovered, the process must be updated.” This paradigm has for decades been known as “penetrate and patch.” The defender needs to find and patch most (if not all) of the holes, while the adver-

IMAGE BY ALICIA KUBISTA/ANDRIJ BORYS ASSOCIATES BORYS ALICIA KUBISTA/ANDRIJ BY IMAGE sary only needs to find and exploit one

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 21 viewpoints

Figure 2. Reference monitor. These systems did not survive long after the end of the Cold War, and ˲˲ A subject security attributes, much of the “institutional memory” for example, label for clearance is now lost. But fortunately, some ˲˲ Object security attributes, for example, label for security kernel products were main- classification tained and this original equipment manufacturer (OEM) technology is Authorization Database still commercially available today. And many commodity processors (for example, those that implement the Intel IA32 architecture) still include the hardware segmentation and pro- tection rings essential to efficient se- Subjects Reference Monitor Objects curity kernels. High assurance of no security patches is truly a paradigm ˲˲ Active entities ˲˲ Segments shift. What alternative approach ˲˲ User Processes ˲˲ Directories ˲˲ Gain access to ˲˲ Passive data comes close? information repositories Mark Heckman of the University on user’s behalf Audit Trail of San Diego and I recently published a paper focused on techniques for ˲˲ Record of all security- applying those Reference Monitor related user events properties, leveraging the fact that “associated systematic security engi- neering and evaluation methodology Cyber Defense Triad assurance it always works correctly. was codified as an engineering stan- for Secure Systems These three fundamental properties dard in the Trusted Computer System All three defense triad components are are directly reflected in the cyber de- Evaluation Criteria (TCSEC)”4 created critical for defense of both confiden- fense triad. by NSA. However, the TCSEC didn’t in- tiality and integrity of information— As illustrated in Figure 2, a Refer- clude administration and acquisition whether the sensitive information is ence Monitor controls access by sub- mandates to actually use this knowl- personally identifiable information, jects to information in objects. A secu- edge to create a market in the face of financial transactions (for example, rity kernel is a proven way to implement entrenched vested interests. I refer in- credit cards), industrial control sys- a reference monitor in a computer. terested readers to our paper for more tems in the critical infrastructure, or Whenever a user (or program acting on details on the triad components sum- something else that matters. Although behalf of a user) attempts to access in- marized here. not sufficient for perfect security, all formation in the computer system, the ˲˲ Mitigating software subversion. three are practically necessary. These Reference Monitor checks the user’s Several cyber security professionals dimensions can be thought of as three clearance against a label indicating the have concluded that subversion “is the strong “legs of a stool,” as illustrated in sensitivity of that class of data. Only au- attack of choice for the professional Figure 1. thorized users are granted access. attacker.”2 The primary means for Security for cyber systems built software subversion are Trojan horses without a trustworthy operating sys- Applying the Cyber Defense Triad and trap doors (commonly called mal- tem (OS) is simply a scientific impos- The flawed foundation of current sys- ware). Under the seven well-defined se- sibility. NIST has emphasized, “se- tems is evident in the unending stream curity classes in the TCSEC, only Class curity dependencies in a system will of OS security patches that are today A1 systems substantially deal with the form a partial ordering … The partial considered part of best practices. But problems of subversion. ordering provides the basis for trust- we can choose a better alternative. ˲˲ Mandatory access control (MAC) worthiness reasoning.”3 Proven sci- At least a half-dozen security kernel- policy. The reference monitor is fun- entific principles of the “Reference based operating systems have been damentally about access control. All Monitor” model enable engineering a produced that ran for years (even de- access control policies fall into two verifiably secure OS on which we can cades) in the face of nation-state ad- classes: Discretionary Access Control build secure cyber systems. versaries without a single reported (DAC) and MAC. Only a label-based For a Reference Monitor implemen- security patch.7 These successes were MAC policy can, with high assurance, tation to work, it must ensure three not unexpected. As a 1983 article put it, enforce secure information flow. Even fundamental properties. First, it must “the security kernel approach provides in the face of Trojan horses and other validate enforcement of the security controls that are effective against most forms of malicious software, MAC policy for every reference to informa- internal attacks—including some that policies can protect against unau- tion. Second, it must be tamper-proof, many designers never consider.”1 That thorized modification of information that is, it cannot be subverted. Lastly, is a fundamentally different result than (integrity), as well as unauthorized it must be verifiable, so we have high penetrate and patch. disclosure (confidentiality).

22 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 viewpoints

Lipner asserts that this reference tems (DBMS) to secure authenticated monitor approach is “not able to cope Internet communications, by applying Calendar with systems large enough to be use- commercially available security kernel ful.”6 Heckman and I respond that technology.”4 Heckman additionally of Events “this quite widely-spread assertion has describes completed research proto- been repeatedly disproven by counter- types in the past few years for things November 2–4 examples from both real systems and like source-code compatible secure VRST ‘16: 22th ACM Symposium research prototypes.”4 The paper gives Linux and a standards-compliant high- on Virtual Reality Software and Technology numerous examples of how, by leverag- ly secure Network File Service (NFS). Garching bei München, ing MAC, complex integrated systems A first necessary step is to identify Germany, can be composed from logically distinct where high-assurance security mat- Co-Sponsored: ACM/SIG, Contact: Gudrun J. Klinker, hardware and software components ters for a system. As just one example, Email: [email protected] that may have various degrees of secu- several U.S. government leaders have rity assurance or no assurance at all. expressed concern that we face an exis- November 6–9 ˲˲ Verifiability. The Reference Moni- tential cyber security threat to industri- ISS ‘16: Interactive Surfaces and Spaces Surfaces tor implementation defined as a securi- al control systems (ICS) in the critical Niagara Falls, ON, Canada, ty kernel is the only proven technology infrastructure, such as the power grid. Sponsored: ACM/SIG, for reliably achieving verifiable protec- Use of an integrity MAC security ker- Contact: Mark Hancock, tion. It does not depend on unproven nel can within a couple of years make Email: [email protected] elegant technical solutions, such as our critical infrastructure dramatically November 6–9 open source for “source code inspec- more trustworthy. The U.S. government SIGUCCS ‘16: ACM SIGUCCS tion” or “gratuitous formal methods.”2 has a unique opportunity to change the Annual Conference Security kernels have been shown to cyber security game and should aggres- Denver, CO, Sponsored: ACM/SIG, be effective for systematic, repeatable, sively engage ICS manufacturers by Contact: Laurie J. Fox, systems-oriented security evaluation of sponsoring prototypes and providing a Email: [email protected] large, distributed, complex systems. market using proven commercial secu- Lipner in his paper6 asks a critical, rity kernel OEM technology. Otherwise, November 7–10 ICCAD ‘16: IEEE/ACM but largely unanswered, question: How costs may soon be measured in lives in- International Conference on can customers have any assurance stead of bits or dollar signs. Computer-Aided Design that they are getting a secure system? Austin, TX, His answer is limited to development References Co-Sponsored: Other Societies, 1. Ames Jr, S.R., Gasser, M., and Schell, R.R. Security Contact: Frank Liu, process improvements that don’t ad- kernel design and implementation: An introduction. Email: [email protected] Computer 16, 7 (1983), 14–22. dress fundamentally what it means 2. Anderson, E.A., Irvine, C.E., and Schell, R.R. for a system to be “secure.” Heckman, Subversion as a threat in information warfare. J. Inf. November 12–16 by contrast, details how the Reference Warfare 3 (2004), 51–64. ICMI ‘16: International 3. Clark, P., Irvine, C. and Nguyen, T. Design Principles for Conference Monitor approach, with its strong defi- Security. NIST Special Publication 800-160, September on Multimodal Interaction 2016, pp. 207-221; http://csrc.nist.gov/publications/ Tokyo, Japan, nition of “secure system,” can answer drafts/800-160/sp800_160_final-draft.pdf precisely that question.4 4. Heckman, M.R. and Schell, R.R. Using proven Sponsored: ACM/SIG, reference monitor patterns for security evaluation. Contact: Yukiko Nakano, Information 7, 2 (Apr. 2016); http://dx.doi.org/10.3390/ Email: [email protected] What Should We Do Then? info7020023 5. Higgins, K.J. IBM: The security business ‘has no It can be expected to take 10–15 years future.’ Information Week Dark Reading, (4/10/2008); November 13–16 and tens of millions of dollars to build http://www.darkreading.com/ibm-the-security- GROUP ‘16: 2016 ACM business-has-no-future/d/d-id/1129423 Conference and evaluate a high-assurance secu- 6. Lipner, S.B. Security assurance. Commun. ACM 58, 11 on Supporting Groupwork rity kernel. However, once completed, (Nov. 2015), 24–26. Sanibel Island, FL, 7. Schell, R.R. A University Education Cyber Security a general-purpose security kernel is Paradigm Shift. Presented at the National Initiative Sponsored: ACM/SIG, highly reusable for delivering a new for Cybersecurity Education (NICE), (San Diego, CA, Contact: Stephan Lukosch, Nov. 2015); https://www.fbcinc.com/e/nice/ncec/ Email: [email protected] secure system in a couple of years. It presentations/2015/Schell.pdf is economical to use the same ker- 8. Schell, R.R., Downey, P.J. and Popek, G.J. Preliminary Notes on the Design of Secure Military Computer nel in architectures for a wide variety Systems. ESD, Air Force Systems Command, Hanscom AFB, MA. [MCI-73-1], Jan 1973; http://csrc. of systems, and the TCSEC’s Ratings nist.gov/publications/history/sche73.pdf Maintenance Phase (RAMP) allows the kernel to be re-verified using the latest Roger R. Schell ([email protected]) is president of technology, without the same invest- Aesec Corporation, and is currently a Distinguished Fellow at the University of San Diego Center for Cyber ment as the original evaluation. Heck- Security Engineering and Technology. Previously he man summarizes several real-world ex- was a Professor of Engineering Practice at University of Southern California. amples where, “This is demonstrated by OEM deployments of highly secure The author wishes to thank Michael J. Culver, systems and products, ranging from Mark R. Heckman, and Edwards E. Reed for their valuable feedback on an early draft of this Viewpoint. enterprise ‘cloud technology’ to gener- al-purpose database management sys- Copyright held by author.

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 23 viewpoints

VDOI:10.1145/3000608 Pamela Samuelson Legally Speaking Fair Use Prevails in Oracle v. Google Two software giants continue with legal sparring after an initial judicial decision.

RACLE AND GOOGLE have been battling in the courts for more than six years about whether Google in- fringed Oracle copyrights Oby using 37 packages of the Java ap- plication program interface (API) in developing the Android platform for smartphones. In May 2016, a jury re- jected Oracle’s copyright claim and decided that Google’s use of these 37 packages was fair and non-infringing. Oracle’s lawyers have announced that the company plans to appeal. This col- umn will explain why Oracle’s appeal is unlikely to succeed and why that’s good news for Java programmers, for the software industry, and for the public. But before getting to that, this col- umn will relate some facts about the litigation, about fair use as a defense to copyright infringement, and about Oracle and Google’s arguments about within each package, without a license mulation was that any expressiveness the fair-use defense. from Sun. It also developed more than the Java API elements Google used in 100 new packages in Java and C++ for Android had “merged” with the API Background About Oracle v. Google smartphone functions. Soon after ac- functionality and so should not be a Oracle acquired Sun Microsystems in quiring Sun, Oracle sued Google claim- basis for copyright liability. Because 2010. Sun’s assets included intellec- ing that Android infringed Oracle’s Judge Alsup found Google’s copyright- tual property rights in Java technolo- patents and copyrights. In an earlier ability defense persuasive, there was gies. Before that acquisition, Google trial, a jury decided against the patent no need to reach Google’s backup fair- negotiated with Sun about a possible claims. use defense. license to use Java technologies in An- Initially, Google’s main defense to Oracle appealed that ruling and droid. Although those negotiations Oracle’s copyright claim was not fair convinced the Court of Appeals for the broke down, Google went ahead with use. Instead, Google asserted that the Federal Circuit (CAFC) that the Java API using certain packages of the Java API, Java API packages, classes, and decla- elements incorporated into Android— and in particular, the declarations that rations it used in Android were not pro- principally, the 7,000 declarations that invoke implementing code for specific tectable by copyright law because they Google had literally copied in Android functions and the structure, sequence were too functional as components of source code—were protectable expres-

and organization (SSO) of classes the Java API system. An alternative for- sion under U.S. copyright law. (My No- ASSOCIATES/SHUTTERSTOCK ANDRIJ BORYS BY COLLAGE IMAGE

24 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 viewpoints viewpoints

vember 2012 Communications Legally plaintiff’s work on the market for or that Google had made transformative Speaking column incorrectly predicted value of the plaintiff’s work is also im- uses of the Java API packages by build- that the CAFC would affirm; my March portant. When the defendant’s use is ing them into the Android platform. 2015 column criticized the CAFC deci- transformative, the focus is on whether He compared the Java API to a file cabi- sion and incorrectly predicted that the the defendant’s work would serve as a net with folders, a functional device Supreme Court would take Google’s substitute for the plaintiff’s work, not that was far from the core of copyright. appeal. Oh well.) The CAFC sent the whether the plaintiff wants the defen- He argued that Google took no more case back for trial on the fair-use issue. dant to pay a license fee. from the Java API than was necessary V to achieve its transformative purpose. What Is Fair Use? Fair-Use Arguments The transformative nature of Google’s Fair use is a statutorily recognized de- in Oracle v. Google use mitigated the harm factor because fense to a claim of copyright infringe- Oracle and Google had starkly differ- Android did not supplant demand for ment in the U.S. When this defense is ent views about whether the reuse of 37 the Java API for its original purpose. successful, the defendant will be vindi- Java API packages could be fair use. Or- Fair use is intended to promote ongo- cated and no copyright liability will be acle emphasized that Google acted in ing innovation, which Google’s lawyer found. (Most nations do not have fair- bad faith because some internal email argued Android had done. use provisions in their copyright laws.) correspondence showed that some The copyright statute says that four fac- Google employees thought Google Oracle’s Effort to Overturn tors should be considered: needed to a license to use the Java API. the Jury Verdict Purpose of Use. The purpose and Oracle also contended that Google had Oracle v. Google was tried to a jury be- character of the defendant’s use of made non-transformative use of the cause the two software giants did not the plaintiff’s work is the first factor Java API packages because it copied agree about key facts pertinent to the to consider. This includes subfactors the declarations verbatim. Its purpose, resolution of their dispute about An- such as whether the use was commer- moreover, was commercial. Oracle ar- droid. The role of a jury is to decide cial or noncommercial. Also signifi- gued all three considerations weighed which litigant’s view of the facts is most cant is whether the use was “transfor- against fair use. persuasive, and then to apply the law mative.” That is, did the defendant’s As for the other fair-use factors, to those facts in accordance with the use enable the creation of a new work Oracle insisted that the Java API was instructions the judge reads to jurors that builds upon the plaintiff’s work, highly creative. Google’s appropriation after the trial testimony has ended and giving it a different purpose, mean- was substantial because the Android the lawyers have made their closing ar- ing, or message, as a parody might do? source code included 11,500 lines of guments. Juries generally come back Non-transformative uses consume the declaring code that Google copied from with a verdict for one party or the other. work for its original purpose, as a pho- the Java API. Oracle’s economic expert They do not have to explain their find- tocopy might do. Transformative uses witness testified that Google’s reuse of ings or the reasons for their verdict. are more likely to be fair uses than the Java API packages had caused sub- After the jury found in favor of non-transformative ones. stantial harm to the market for the Java Google’s fair-use defense, Oracle’s law- Nature of Plaintiff’s Work. The na- API because Oracle had been unable to yers filed a motion asking Judge Alsup ture of the copyrighted work is a sec- collect licensing revenues from Google to set aside the jury verdict and rule in ond fair-use factor. If the plaintiff’s and from others as well. Oracle also its favor as a matter of law. The judge work is highly creative, entertaining, contended that the network effects denied that motion and wrote an opin- fanciful, or artistic, fair use is likely arising from the success of the Android ion to explain why a reasonable jury to be narrow. If the plaintiff’s work is platform had made it impossible for might have found in Google’s favor on functional or factual, fair use will tend Oracle to make a successful entrance several key fact issues: to be broader. into the smartphone market. For one thing, Oracle made much Amount Taken. The substantiality Google’s lawyer urged the jury to find during the trial about Google acting of the defendant’s taking of expression in bad faith in using the Java API dec- from the plaintiff’s work is often mea- larations when it should have gotten a sured quantitatively, that is, in terms of As for the other license. Counteracting that evidence what proportion of expression from the was testimony by some witnesses that plaintiff’s work the defendant appro- fair-use factors, Google’s reimplementation of inter- priated. But qualitative assessments Oracle insisted faces was customary in the software of substantiality are sometimes made, industry. A reasonable jury, Judge Al- especially if the defendant copied the that the Java API sup concluded, could have concluded “heart” of the work. When the defen- was highly creative. that Google’s good faith defense was dant’s use is transformative, however, more persuasive than Oracle’s bad the question becomes whether what faith accusation. the defendant took was reasonable in Second, Oracle insisted that the light of its transformative purpose. Java language, which all were free to Harm to Market. The effect of the use without permission, was distinct defendant’s use of expression from the from the Java API declarations. Google

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 25 viewpoints

Why Do These Facts Matter? By emphasizing the disagreements The jury verdict over these facts, Judge Alsup not only in Google’s favor provided ample reasons for denying Oracle’s post-trial motion. He also set was obviously good the stage for narrowing what the CAFC news for Larry Page can (or should) do with an Oracle ap- AACCMM Confeerreennccee peal. Appellate courts are supposed to and the shareholders defer to jury findings and to construe PPrrooceediinnggss of Google. evidence in the record as consistent with the jury’s verdict. NNooww AAvailaabbllee v viiaa The CAFC is already on record PPrriinntt--oon-Deemmaanndd!! that Google’s fair-use defense raised a triable issue of fact. So the best that Oracle can realistically hope for on ap- Did you know that you can peal is that its attack on Judge Alsup’s now order many popular contended that the Java language and jury instructions will prevail. Yet, this the API were inextricably intertwined. would just mean the case would go ACM conference proceedings A reasonable jury could have been per- back to Judge Alsup for another trial. via print-on-demand? suaded that there was less separability Not even Oracle’s lawyers could look than Oracle contended. The jury could forward to that. also have decided that it would be un- Institutions, libraries and reasonable to require Java program- Conclusion individuals can choose mers to have to learn and keep straight The jury verdict in Google’s favor was from more than 100 titles two dialects of Java-based APIs, one for obviously good news for Larry Page and Android and one for other Java plat- the shareholders of Google. Yet, it was on a continually updated forms, which is what an Oracle victory also good news for programmers who list through Amazon, Barnes would have meant. have been using those 37 Java API pack- Third, Judge Alsup thought that ages when developing apps for the An- & Noble, Baker & Taylor, the jury could have reasonably found droid platform. Judge Alsup recognized Ingram and NACSCORP: that Google’s reuse of the Java decla- that if Java programmers had to “master CHI, KDD, Multimedia, rations was transformative because and keep straight two different [Java] Google reimplemented the interfaces SSOs as they switched between the two SIGIR, SIGCOMM, SIGCSE, in independently written code and de- systems for different projects,” as a SIGMOD/PODS, veloped new API packages to enable verdict for Oracle would have required, and many more. smartphone functionalities that were this would have “fomented confusion quite different from the computing and error to the detriment of both Java- platforms for which the Java API was based systems and to the detriment of For available titles and originally developed. Java programmers at large.” ordering info, visit: Fourth, the jury could reasonably Google’s victory is also good news have found that the Java declarations, for competition in the software in- librarians.acm.org/pod while creative enough to be copyright- dustry, at least in the U.S., because able, were predominantly functional. fair use now a meaningful defense for Fifth, the jury could have believed those who reimplement other compa- that Google “duplicated the bare mini- nies’ APIs in independent code. While mum of the 37 Java API packages” that most cases have struck down copyright was reasonably necessary to achieving claims in interfaces necessary for in- its transformative purpose. teroperability on lack of copyrightabil- Sixth, the jury could have reason- ity grounds, fair use is now a proven al- ably found that Oracle suffered no ternative path to defense victories. The harm from Google’s use of the 37 API public benefits from Google’s victory packages because the Java API was de- because of the ongoing competition veloped for a different computing en- and innovation that reuse of APIs has vironment. Sun’s effort to promote a brought and will bring. version of Java for mobile devices was unsuccessful before Android entered Pamela Samuelson ([email protected]) is the Richard M. Sherman Distinguished Professor of Law and the market. And even before Android Information at the University of California, Berkeley, and a was released, Sun had opened the member of the ACM Council. Java API for free use under an open source license. Copyright held by author.

26 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 viewpoints

VDOI:10.1145/3000610 Bala R. Iyer and Rahul C. Basole Economic and Business Dimensions Visualization to Understand Ecosystems Mapping relationships between stakeholders in an ecosystem to increase understanding and make better-informed strategic decisions.

OMPANIES CAN NO longer rely on just their internal competencies. They must complement their own competencies with those of Cother firms through alliances, part- nerships, and digital relationships. Google, by opening its mobile An- droid platform to manufacturers and developers, has become the leading apps provider in the mobile space. These multifaceted interdependen- cies create complex connections, or ecosystems, that affect and are affect- ed by multiple stakeholders such as suppliers, distributors, outsourcing firms, makers of related products or services, and technology providers.3 Competition today is between eco- systems. Understanding ecosystems can then help managers improve strategic decisions and reshape the boundaries of their industries. In- terpreting this network of interac- tions, however, can be difficult. For example, a nascent ecosystem like the Internet of Things (IoT) has no dominant player and is increasing in scale, scope, and complexity.4 Under effectively, many entities, activities, to interpret the visuals and make recom- these circumstances, companies can and tools are required. Capturing, gen- mendations. Finally, companies must have difficulty determining the most erating, and interpreting ecosystems manage the data, visuals tools, and les- effective business opportunities. requires a sponsor who wants to learn sons learned in order to be effective. We describe a graph theoretic visu- more about a company, a technology, alization method to help companies or an industry, data scientists to collect A Visual Approach map relationships between ecosystem salient data and prepare it for analysis, Decision makers that want to visu-

IMAGERY BY SUMKINN BY IMAGERY stakeholders. To visualize ecosystems visualization tools, and a domain expert alize ecosystems need a structured

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 27 viewpoints

Figure 1. An Internet of Things (IoT) stack. platform companies for the IoT ecosys- tem. Lists can be augmented by semi- open sources such as Crunchbase or Application Layer for Sense Making for example, Analytics Angel.co or paid services such as Dun System Integrators for example, Accenture, Cognizant & Bradstreet, DataFox, and CB In- Infrastructure for example, Azure, Amazon Web Services, Telstra sights. Once finalized, the analyst can Platforms for Device Connectivity for example, Thingworkx, Xively, Iotivity visit each company’s website to iden- tify listed partners that provide servic- Sensory Layer for example, ARM, Intel, Sigfox es or components for platforms and document dependencies. Most com- pany websites list strategic alliances approach and several visual repre- industry. For the IoT industry, the stack or partners, and describe the types of sentation methods can be used.1 The shown in Figure 1 is an illustration relationships (for example, technical, methodology consists of steps dis- based on current industry consensus. marketing, licensing, and so forth) cussed here. This approach is not lin- Refinements to the stack can be made and content (for example, date formed, ear; there are many feedback loops be- as new companies emerge. An inde- nature, and other characteristics). tween the stages. Progress is guided by pendent list of available platforms can Finalize semantics for nodes and human decision making along the way. be collected from trade publications dependencies: Based on the data col- We use The Internet of Things (IoT) as and mapped onto the stack. lected so far, the analyst can prepare an illustration of this methodology. Identify companies and their attributes: it for visualization. Some software Determine industry structure: Think Platforms are important to technology- requires explicit specification of the of an industry as a company, its com- centric industries.5 Identifying platform visual encodings of nodes and edges plementors and competitors. Identify companies for the ecosystem can be (including the attributes that drive the value chain or stacks of activities done by searching articles in industry color, shape, size, and dependen- that deliver something of benefit to publications. Use Internet search en- cies), and particular consideration customers. These are inferred from gines, news portals, socially curated to data type (that is, quantitative, industry publications and company news websites, or social media sources ordinal, categorical). More recent websites. Industry structure helps the to locate these articles. Industry pub- software packages (for example, Tab- analyst identify the companies that are lications complemented with discus- leau, Gephi, ecoxight) allow dynamic the major platform companies in an sions with practitioners identified 34 selection of attributes and assign- ments. Node sizes are usually based Figure 2. The IoT ecosystem. on a company’s revenue or number of employees. Dependencies are often color coded, based on relationship. Visualize, analyze, and interpret: The visualizations in the figures in this column use Gephi, an open source graph visualization tool.2 Many visualization packages offer different network layout algorithms. Most com- monly used are derivations of force- directed layouts, in which prominent nodes are drawn in the center and less prominent ones are pushed to the pe- riphery. Nodes close to each other have stronger associations. After visualiza- tion, the sponsoring organization can be asked for feedback to determine if they have any insights about compa- nies in key network positions or clus- ters of interest, find any surprises, and identify companies that did not make the list. Corrections are incorporated in subsequent analysis. This iterative process creates confidence in the visu- alizations.

Interpreting the Visualization As noted, the IoT is a complex ecosys- tem not yet dominated by any single

28 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 viewpoints

player. In Figure 2, platform compa- Figure 3. Vertical differentiation in the IoT ecosystem. nies are the red-colored circles and component providers are the gray-col- ored circles. Dependencies between platforms and components are shown as links. This visualization can be used to identify key players and determine business opportunities.4 Alliance Strength. Density of con- nections, number of partners, and network centrality all provide a sense of the resources available to any given firm that serves as an axis. In Figure 1, it’s easy to see, for example, that the Zigbee Alliance and Industrial In- ternet Consortium each represent an important nexus of activity. If one is a small player looking to join a winning alliance then choosing from among the bigger players could be wise. While, if one is a competing nexus, choosing compatibility with other midsize players could reshape the network landscape. Vertical Integration. Companies that provide end-to-end solutions become vertically integrated and tightly controlled. For example, in its early days, the software indus- try was dominated by IBM, Digital Equipment, and International Com- puters. These end-to-end solutions Figure 4. Preference of Android to IFTTT. providers fragmented into multiple companies and divided technical leadership. The IoT ecosystem (see Figure 3) reveals many clusters, sug- gesting they are either providing end-to-end solutions or leaving in- tegration to the user. IBM among those three early pioneers survives today, but Bluemix (integrating IBM products and services), Kaa (providing middleware services that work with any stack), and 365 Farmnet (specialized for agricultural services) might be relevant for indi- vidual specializations and doomed from a network perspective. Preferential Attachment. With pro- liferation of platforms in the IoT eco- system, solutions providers must se- lect one platform over another. If they simultaneously commit to too many platforms they make significant re- source commitments. In the software industry, solutions providers attach themselves to one or two platforms based on incentives, momentum or technological superiority. Here solu- tions providers are unsure, they can

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 29 viewpoints

Figure 5. Core of the IoT ecosystem. track the ways users integrate services and devices, providing insights into how users derive value through a prod- uct’s use. Alphabet’s Nest has used this very effectively to implement the “Works With Nest” program. Because of this program, customers buying Nest will have the option to integrate it with the most common devices like LG washing machines or Amazon’s Echo. Nest’s position can be further so- lidified, if it became interoperable with highly connected platforms (as shown in Figure 3). This position gives it ac- cess to critical knowledge flows across the network.

Challenges Visualizations can help with under- standing industry structure and emer- gence of key players. The dynamic na- ture of competition makes every new entrant and every move by an existing player relevant to the topology the ecosystem changes and the competi- tive position of companies. Compa- nies must constantly track nodes and dependencies in networks. Compu- tational tools for data gathering and AI techniques for text processing and understanding can be automated so hedge their bets by selecting multiple decision to open its API, it instantly executives can focus on understanding platform. Oracle built huge market brings several hundreds of thousands and exploring interactive visuals. Eco- share by supporting multiple plat- of developers into the equation. system visualization continues to grow forms like Windows and Linux. An- Focus on the Core. An emergent eco- and mature as data scientists, graph droid (see Figure 4) is preferential to system builds infrastructure first (see theorists, and visualization research- IFTTT, ensuring that Android can inte- the core of the ecosystem in Figure 5). ers create new techniques. In this way, grate with most other products. Personal computers needed an operat- ecosystem visualization is likely to be- New Entrant and Network Effects. ing system before the explosion of appli- come ubiquitous in high-velocity busi- Major players in non-IoT industries (for cations could occur. The IoT ecosystem ness environments. example, Samsung, Apple, and Alpha- needs constant monitoring. Communi- bet) still fight for dominance in the IoT cating data across networks is critical. References 1. Basole, R.C. et al. Understanding business ecosystem ecosystem. To gain developers, these Low-power solutions and connectivity dynamics: A data-driven approach. ACM Trans. companies open up their IoT platforms Management Information Systems (TMIS) 6, 2 (Feb. are needed in the infrastructure (see the 2015), 6. and programming interfaces. Brand figure showing the core of the current 2. Bastian, M., Heymann, S., and Jacomy, M. Gephi: An open source software for exploring and manipulating recognition, existing devices, plat- IoT ecosystem) to help connect devices networks. ICWSM 8, (2009), 361–362. forms, and relationships with devel- and ensure efficiency. 3. Iansiti, M. and Levien, R. The Keystone Advantage: What the New Dynamics of Business Ecosystems opers might give them an advantage. The Power of Interoperability Emer- Mean for Strategy, Innovation, and Sustainability. As Alphabet expands into home auto- gent ecosystems require integration of Harvard Business Press, 2004. 4. Iyer, B. To predict the trajectory of the Internet mation with Nest, many Google Play devices and services. They also require of Things, look to the software industry. Harvard developers might move with it. De- interoperability between competing Business Review (Feb. 25, 2016). 5. Parker, G.G., Van Alstyne, M.W., and Choudary, S.P. veloper familiarity with Google’s API platforms. These significant challeng- Platform Revolution. Norton Publishing, New York, 2016. in one setting could also be used to es benefit from learning about the early develop products in other settings. A days in the software industry. Applica- Bala R. Iyer ([email protected]) is Professor and Chair similar trend in the software industry tion Program Interfaces (APIs) allowed TOIM Division, Babson College, MA. Twitter: @BalaIyer enabled Microsoft’s strong position firms to interact and share information Rahul C. Basole ([email protected]) is Associate Professor, School of Interactive Computing, and Director, in operating systems to help it domi- with other firms, and made it easy to Tennenbaum Institute, Georgia Institute of Technology, nate the web browser market. In Fig- achieve integration and interoperabil- GA. Twitter: @basole ure 3, Apple Homekit occupies a key ity across platforms and devices. They position in the network. With Apple’s could be used for more: APIs could Copyright held by authors.

30 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 viewpoints

VDOI:10.1145/3000612 Mark Guzdial and Briana Morrison Education Growing Computer Science Education Into a STEM Education Discipline Seeking to make computing education as available as mathematics or science education.

OMPUTING EDUCATION IS changing. At this year’s CRA Snowbird Conference, there was a plenary talk and three breakout sessions Cdedicated to CS education and enroll- ments. In one of the breakout ses- sions, Tracy Camp showed that much of the growth in CS classes is coming from non-CS majors, who have dif- ferent goals and needs for comput- ing education than CS majors.a U.S. President Obama in January 2016 an- nounced the CS for All initiative with a goal of making computing education available to all students.b Last year, the U.S. Congress passed the STEM Education Act of 2015, which officially made computer science part of STEM (science, technology, engi- neering, and mathematics). The federal High school students and teachers engaging in collaborative meetings about computer government offers incentives to grow science represents an important step toward making computing education as available as participation in STEM, such as scholar- science or mathematics education. ships to STEM students and to prepare STEM teachers. Declaring CS part of ing computing education so it is just hind them. Informally, many students STEM is an important step toward mak- as common requires recognition that talk to their parents about science issues ing computing education as available as education in computer science is differ- (“Why is the sky blue?”), think about the mathematics or science education. ent in important ways from education numbers in their lives (“How much is The declaration is just a first step. in STEM. We have to learn to manage the tax?”), visit science museums, and Mathematics and science classes those differences. see media about mathematics and sci- are common in schools today. Grow- ence. Students live in a world where Computer Science Is Invisible, living, chemical, and physical behavior Formally and Informally is explained by biology, chemistry, and a http://cra.org/wp-content/uploads/2016/07/ BoomCamp.pdf Students enter mathematics or science physics. They develop ideas about how b https://www.whitehouse.gov/blog/2016/01/30/ classes at the post-secondary level with the world works, some of which are

PHOTO BY CALVIN LIN, UNIVERSITY OF TEXAS, AUSTIN LIN, UNIVERSITY OF TEXAS, CALVIN BY PHOTO computer-science-all years of knowledge and experience be- wrong (like simple Lamarckian evolu-

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 31 viewpoints

tion). Students start formal learning ics education, we do not have learning about mathematics and science in the progressions. Computer science edu- earliest grades. Mathematics and sci- cators mostly have had the goal of pre- ence classes can help answer their ques- paring future professional program- tions and improve the theories that a re- mers, but goals change when we talk flective child has about the world. about teaching everyone. Not every in- While computing is ubiquitous in troductory biology student will become the developed world, and cellphones a biologist and not every intro physics and other handheld computing devices student will become a physicist. We are increasingly common in the devel- need to determine what pieces of com- oping world, few students get the oppor- puting the educated citizen needs to tunity to look under the covers of those know. Then we can plan how and in

APPS devices to reflect on questions about what order we teach those pieces. computing. Maybe children might ask, What can we expect a first-year un- “Why is my browser so slow?” The con- dergraduate to learn in a single term cepts that computer science explains CS class if they have no previous com- are mostly invisible to children, such as puting experience? What can we expect digital representations, algorithms, and to be able to teach any eight-year-old or networks. If they don’t see the computa- a 12-year-old about computer science? tion in their world, it’s difficult to reflect What are the challenges to teaching Access the on and develop questions about it. We computing successfully to students have less museum or media coverage with cognitive disabilities, or who are latest issue, than the rest of STEM. Few students blind, or who have inadequate math- enter secondary or post-secondary com- ematics preparation? past issues, puter science classes with any previous We first realized in the early 1980s BLOG@CACM, formal education in computing. that we often overestimate what first- The difference means it is difficult time CS students can do.5 The McCrack- News, and for a teacher to set expectations. Some en Study2 showed that problems that students do have experience coming introductory computer science instruc- more. in to class; most do not. When does a tors find reasonable are not solvable by student need remedial help? We are in a majority of introductory students. a transitional stage where the gap be- Part of the trick is simply learning tween the haves and have-nots in com- how to measure what students have puting education is large. learned. Briana Morrison and Lauren There is a positive side to the invisi- Margulieux showed4 that textual pro- bility of computation in children’s daily gramming is cognitively complex. We life. Mathematics and science students want students to learn the concepts develop their misunderstandings of and skills of programming, but pro- Available for iPad, the world from daily life, too. Students’ gramming itself is a complex activity— iPhone, and Android incorrect science theories are wrong, asking students to program requires but mostly work. The world is not flat, students to be able to do everything. but it seems so, and you can live much Morrison and Margulieux showed they of your life believing the world is flat. can measure learning toward program- On the other hand, students develop ming using Parsons Problems.3 A Par- incorrect theories in computer science sons Problem asks students to solve only in computer science classes and a programming problem, then gives mostly because of how we taught them. them all the lines of code that solve If we change the way we teach, we can that problem, on tiles or “refrigera- Available for iOS, reduce misunderstandings. tor magnets.” They showed they could Android, and Windows tease out differences in students’ un- Developing Learning Progressions derstanding of programming, even http://cacm.acm.org/ We have been doing mathematics and when the students could not success- about-communications/ science education for a long time. We fully program the solutions yet. Being mobile-apps have a good idea of what students can able to measure the development of COMMUNICATIONS do in science and mathematics from these skills, without requiring stu- early ages, and how quickly they can dents to master the skills, is critical to develop new concepts and skills. Edu- developing learning progressions. cators call this a learning progression. If we want to teach computer sci- Since computing education is ence at younger ages, we have to figure younger than science and mathemat- out how to reduce that complexity. We

32 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 viewpoints

and want more computing education cation classes, we need to increase the for their children. But the parents and number of computer science classes While computer principals mostly do not understand and teachers by a magnitude. That is science is now part what it is. an enormous change with dramatic Students want computer science, implications. What we have today may of STEM in the U.S. whatever they think it is. Many of them not tell us much about tomorrow. The by fiat, students want it because of the economic value preparation, abilities, and preferences of knowledge of computer science. They of existing computer science teachers cannot access don’t know what it is, but they know it may not be predictive when we have a computer science can get them a good job and make them 10-times-larger population of teachers. more effective at the job they want. We have to invent whole new teacher classes as easily as The CS situation is different from education programs. mathematics and science or mathematics. Contrast the number of coding boot camps Steps Toward Pervasive science education. available in your area to the num- Computing Education ber of biology boot camps or algebra While computer science is now part of boot camps. While having a demand STEM in the U.S. by fiat, students can- for CS is mostly positive, it creates not access computer science classes a strange dynamic in the computer as easily as mathematics and science want students to be able to build pro- science class. Students demand the education. Many countries are ramp- grams they find motivating and engag- “real thing” (which we might inter- ing up computing education, so the ing, without mastering all the skills of pret as “what will help me in a job”), situation is going to change. As it textual programming first. David Wein- even if they don’t know what that is. does, we will have to develop more ac- trop and Uri Wilensky have shown that For example, students might com- curate expectations of how students students using blocks-based languages plain about learning a blocks-based learn CS, improve our ability to mea- (like Scratch, Snap, and Blockly) make language or using a pedagogical IDE sure learning in computing, develop far fewer errors than in textual pro- because it’s not “real”—even if they learning progressions, and create an gramming languages. Again, part of are not quite clear what “real” is. infrastructure to develop teachers their achievement is in measurement. and track progress as we reach the They developed a commutative assess- Building the Infrastructure pervasiveness of mathematics and ment6 that allowed them to compare for CS Classes science education. the same concepts in both textual and In many countries and U.S. states, blocks-based programming languages. you can learn the number of students References 1. Hewner, M. Undergraduate conceptions of the field of We will need many kinds of measures taking primary or secondary school computer science. In Proceedings of the Ninth Annual to develop learning progressions for International ACM Conference on International reading, mathematics, or science Computing Education Research (ICER ‘13). ACM, New computer science. classes. In the U.S., hardly any state York, 2013, 107–114. 2. McCracken, M. et al. A multi-national, multi-institutional can tell you the number of students study of assessment of programming skills of first- Computer Science Is in their computer science classes at year CS students. ACM SIGCSE Bulletin 33, 4 (2001), 125–140. Valued but Misunderstood any level, or what is being taught in 3. Morrison, B.B., Margulieux, L.E., and Guzdial, M. Students enter undergraduate math- those courses. (In many states, “com- Subgoals, context, and worked examples in learning computing problem solving. In Proceedings of ematics and science classes after years puter science” and “computing ap- the Eleventh Annual International Conference on of formal education, so they enter with plications” courses are considered International Computing Education Research (Omaha, Neb., 2015), 21–29. a good idea about what those fields the same.) Because computer science 4. Morrison, B.B., Margulieux, L.E., Ericson, B., and mean. That’s not true for computer sci- has only recently been declared part Guzdial, M. Subgoals help students solve Parsons problems. Paper presented at the Proceedings of the ence. Even undergraduate CS majors of STEM, it has not been tracked like 47th ACM Technical Symposium on Computing Science do not know what computer science is. other STEM subjects. We don’t know Education (Memphis, Tenn., 2016). 5. Soloway, E. Learning to program = learning to Mike Hewner showed that even under- with certainty how much computing construct mechanisms and explanations. Commun. graduate students who declare a major education is offered in the U.S. today ACM 29, 9 (Sept. 1986), 850–858. 6. Weintrop, D. and Wilensky, U. 2015. Using in computer science only have an un- nor where it is offered, which makes commutative assessments to compare conceptual 1 understanding in blocks-based and text-based clear idea of what the field is about. it difficult to plan and grow access to programs. In Proceedings of the Eleventh Annual Large-scale surveys in collabora- computing education. International Conference on International Computing Education Research (ICER ‘15). ACM, tion between Google and Gallup have We believe (from looking at data New York, NY, 2015, 101–110; DOI: http://dx.doi. shown that parents and principals about Advanced Placement CS and in org/10.1145/2787622.2787721 think courses in computer science those states that do track) that far less are about how to use personal com- than 30% of secondary school students Mark Guzdial ([email protected]) is a professor at the Georgia Institute of Technology. puters.c The surveys show parents and even have the opportunity take a com- Briana Morrison ([email protected]) is an principals highly value computing, puter science course in the U.S. today, assistant professor of computer science at the University and less than 10% of primary school of Nebraska Omaha. c https://services.google.com/fh/files/misc/ students. To reach the ubiquity of ac- images-of-computer-science-report.pdf cess to mathematics and science edu- Copyright held by author.

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 33 viewpoints

VDOI:10.1145/2908733 Jack Copeland, Eli Dresner, Diane Proudfoot, and Oron Shagrir Viewpoint Time to Reinspect the Foundations? Questioning if computer science is outgrowing its traditional foundations.

HE THEORY OF computability was launched in the 1930s, by a group of logicians who pro- posed new characterizations of the ancient idea of an algo- Trithmic process. The most prominent of these iconoclasts were Kurt Gödel, Alonzo Church, and Alan Turing. The theoretical and philosophical work that they carried out in the 1930s laid the foundations for the computer revolu- tion, and this revolution in turn fueled the fantastic expansion of scientific knowledge in the late 20th and early 21st centuries. Thanks in large part to these groundbreaking logico-mathematical investigations, unimagined number- crunching power was soon boosting all fields of scientific enquiry. (For an account of other early contributors to the emergence of the computer age see Copeland and Sommaruga.9) The motivation of these three revolu- tionary thinkers was not to pioneer the disciplines now known as theoretical and applied computer science, although with hindsight this is indeed what they did. Nor was their objective to design electronic digital computers, although Turing did go on to do so. The found- ing fathers of computability would have thought of themselves as working in a most abstract field, far from practical nerstones of current science and tech- ing, nano-computing, DNA computing, computing. They sought to clarify and nology. Since then many diverse com- neuron-like computing, computing define the limits of human computabil- putational paradigms have blossomed, over the reals, and computing involv- ity in order to resolve open questions at and still others are the object of current ing quantum random-number genera- the foundations of mathematics. theoretical enquiry: massively parallel tors. The list goes on … (for example, The 1930s revolution was a critical and distributed computing, quantum see Cooper et al.,3 and recent issues of moment in the history of science: ideas computing, real-time interactive asyn- Natural Computing and International

devised at that time have become cor- chronous computing, hypercomput- Journal of Unconventional Computing). GANZALESS BY IMAGE

34 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 viewpoints viewpoints

Few of these forms of computation tion? Some will argue that the Turing- were even envisaged at the time of the machine model gives an adequate an- 1930s analysis of computability—and The founding swer to this question. But, given the yet the ideas forged then are still typi- fathers of enormous diversity in the many spe- cally regarded as constituting the very cies of computation actually in use or basis of computing. computability would under theoretical consideration, does So here’s the elephant in the room. have thought of the Turing-machine model still cap- Do the concepts introduced by the themselves ture the nature of computation? For V 1930s pioneers provide a logico-math- example, what about parallel asyn- ematical foundation for what we call as working in chronous computations? Or interac- computing today, or do we need to over- tive, process-based computations?15,24 haul the foundations in order to fit the a most abstract Now that such diversity is on the table, 21st century? Much work has been de- field, far it may be that the Turing-machine voted in recent years to analysis of the model no longer nails the essence of foundations and theoretical bounds from practical computation. If so, is there any suf- of computing. However, the results of computing. ficiently flexible and general model this diverse work—carried out by com- to replace it? Some of the authors of puter scientists, mathematicians, and this Viewpoint would bet on Turing’s philosophers—do not so far form a model, while the others would not. Ei- unified and coherent picture. It is time ther way this question is central. for the reexamination of the logico- Turning to the mysterious com- mathematical foundations of comput- puter in our skulls … just as the limits ing to move center stage. Not that we rote with paper and pencil.5–7,21–23 The of an idealized human clerk working should necessarily expect an entirely limits of a human computer don’t nec- by rote do not necessarily dictate the unified picture to emerge from these essarily set the limits of every conceiv- limits of computing hardware, nor investigations, since it is possible that able physical process (for example, see do they necessarily dictate the limits there is no common thread uniting the Pour El and Richards18), nor the limits of the human brain. For example, do very different styles of computation of every conceivable machine. So it is an human beings qua creative mathema- countenanced today. Perhaps investi- open question whether the Turing-ma- ticians carry out brain-based com- gation will conclude that nothing more chine model of computation captures putations that transcend the limits than ‘family resemblance’ links them. the physical, or even the logical, limits of human beings qua mathematical A question pressed by foundational of machine computability. clerks? The question is not merely revisionists is: How are the bounds of The Turing machine has also tra- philosophical: it seems highly likely computability, as delineated by the ditionally held a central place in that the brain will prove to be a rich 1930s pioneers—bounds that theoreti- complexity theory, with the so-called source of models for new computing cal computer science has by and large ‘complexity-theoretic’ Church-Tur- technologies. This gives us yet more simply inherited and enshrined in ing2 or ‘extended’ Church-Turing reason to return to the writings of the the textbooks—related to the bounds thesis1 asserting that there is always 1930s pioneers, since both Gödel and of physical computing? The famous at most a polynomial difference be- Turing appear to have held that math- Church-Turing thesis delineates the tween the time complexity of any rea- ematical thinking possesses features bounds of computability in terms of the sonable model of computation and going beyond the classical Turing- action of a Turing machine. But could that of a probabilistic Turing ma- machine model of computation (see there be mathematical functions that chine. (A deterministic version of the Copeland and Shagrir8). are physically computable and yet not thesis is also known as the Cobham- In 2000, the first International Hy- Turing-machine computable? Various Edmonds thesis.13) This traditional percomputation Workshop was held at physical processes have been proposed picture is today under threat, when University College, London. Now there that allegedly allow for computation some propose counterexamples to is a growing community of hundreds beyond Turing-machine computabil- the extended Church-Turing thesis. of researchers in this and allied fields, ity, appealing to physical scenarios that Bernstein and Vazirani2 gave what communicating results at conferences involve, for example, special or general they called ‘the first formal evidence’ (for example, ‘Computability in Eu- relativity11,14,16,17,19 (see Davis10 for a cri- that quantum Turing machines vio- rope,’ ‘Theory and Applications of Com- tique of the whole idea of super-Turing late the extended Church-Turing puting,’ and ‘Unconventional Com- computation). The possibility of super- thesis. Shor’s quantum algorithm puting’) and in journals (for example, Turing computation or hypercomputa- for prime factorization is arguably Applied Mathematics and Computation, tion is not ruled out by the Church-Tur- another counterexample.20 All this is Theoretical Computer Science, Comput- ing thesis, once the latter is understood highly controversial, but once again ability, and Minds and Machines). We as originally intended, namely as an the signs are that we should take a sys- hope this is only the beginning. If it was analysis of the bounds of what is com- tematic look at the foundations. important to clarify the logico-mathe- putable by an idealized human comput- The most fundamental question matical foundations of computing in er—a mathematical clerk who works by of all is, of course: what is computa- the 1930s, when computer science was

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 35 viewpoints

no more than a gleam in Turing’s eye, 8. Copeland, B.J. and Shagrir, O. Turing versus Gödel Reflections on the Foundations of Mathematics, on computability and the mind. In B.J. Copeland, C. Association for Symbolic Logic, Natick, MA, 2002, how much more important it seems Posy, and O. Shagrir, Eds. Computability: Gödel, Turing, 390–409. today, when computing technology is Church and Beyond, MIT Press, Cambridge, MA, 2013. 22. Sieg, W. On computability. In A. Irvine, Ed., Handbook 9. Copeland, B.J. and Sommaruga, G. The stored-program of the Philosophy of Mathematics, Elsevier, 2009. diversifying and mutating at an unprec- universal computer: Did Zuse anticipate Turing and 23. Stannett, M. X-machines and the halting problem: edented rate. Via the great pioneers of von Neumann? In G. Sommaruga and T. Strahm, Eds., Building a super-Turing machine. Formal Aspects of Turing’s Revolution, Birkhauser, Basel, 2006. Computing 2, (1990), 331–341. electronic computing (such as Freddie 10. Davis, M. The myth of hypercomputation. In C. 24. Wegner, P. and Goldin, D. Computation beyond Turing Williams, Tom Kilburn, Harry Huskey, Teuscher, Ed., Alan Turing: Life and Legacy of a Great machines. Commun. ACM 46, 4 (Apr. 2003), 100–102. Thinker. Springer Verlag, Berlin, 2004, 195–212. Jay Forrester, John von Neumann, Ju- 11. Etesi, G. and Németi, I. Non-Turing computations via Malament-Hogarth space-times. International lian Bigelow, and of course Turing him- Jack Copeland ([email protected]) is a Journal of Theoretical Physics 41, (2002), 341–370. professor of Philosophy at the University of Canterbury, 12. Gandy, R. Church’s thesis and principles for self) the 1930s analysis of computation New Zealand, where he is also the director of the Turing mechanisms. In J. Barwise, D. Kaplan, H.J. Keisler, Archive for the History of Computing. led to the modern computing era. Who P. Suppes, and A.S. Troelstra, Eds., The Kleene st knows where a 21 -century overhaul of Symposium, North-Holland, Amsterdam, 1980. Eli Dresner ([email protected]) is a member of the that classical analysis might lead. 13. Goldreich, O. Computational Complexity: A Conceptual Gershon H. Gordon Faculty of Social Sciences at Tel Aviv Perspective. Cambridge University Press, 2008. University. 14. Hogarth, M.L. Does general relativity allow an observer to view an eternity in a finite time? Diane Proudfoot ([email protected]) References Foundations of Physics Letters 5, (1992), 173–181. is an associate professor (reader) of Philosophy at the 1. Aharonov, D. and Vazirani, U.V. Is quantum mechanics 15. Milner, R. Elements of interaction: Turing Award University of Canterbury, where she is also the co-director falsifiable? A computational perspective on the lecture. Commun. ACM 36, 1 (Jan. 1993), 78–89. of the Turing Archive for the History of Computing. foundations of quantum mechanics. In B.J. Copeland, C. 16. Németi, I. and Andréka, H. Can general relativistic Oron Shagrir ([email protected]) is a professor Posy, and O. Shagrir, Eds., Computability: Gödel, Turing, computers break the Turing barrier? In A. Beckmann, of Philosophy and Cognitive Science at the Hebrew Church and Beyond, MIT Press, Cambridge, MA, 2013. U. Berger, B. Löwe, and J.V. Tucker, Eds., Logical University of Jerusalem. 2. Bernstein, E. and Vazirani, U.V. Quantum complexity Approaches to Computational Barriers, Springer- theory. STOC ‘93 Proceedings of the Twenty-Fifth Verlag, Berlin, 2006, 398–412. Annual ACM Symposium on Theory of Computing 17. Pitowsky, I. The physical Church thesis and physical (1993), 11–20. computational complexity. Iyyun 39 (1990), 81–99. 3. Cooper, B., Lowe, B. and Sorbi, A., Eds. New 18. Pour-El, M. and Richards, I. The wave equation with Computational Paradigms: Changing Conceptions of computable initial data such that its unique solution is not What Is Computable. Springer, 2008. computable. Advances in Mathematics 39, (1981), 215–239. 4. Copeland, B. J. The Church-Turing Thesis. The 19. Shagrir, O. and Pitowsky, I. Physical hypercomputation Stanford Encyclopedia of Philosophy. E.N. Zalta, Ed., and the Church-Turing Thesis. Special issue on 2002; http://plato.stanford.edu/entries/church-turing/. hypercomputation. Minds and Machines 13, 1 (2003), 5. Copeland, B.J. Narrow versus wide mechanism. 87–101. The Journal of Philosophy 97, (2000), 5–32. 20. Shor, P.W. Polynomial-time algorithms for prime 6. Copeland, B.J. Hypercomputation: Philosophical issues. factorization and discrete logarithms on a quantum Theoretical Computer Science 317 (2004), 251–267. computer. SIAM Journal on Computing 26, 5 (1997), 7. Copeland, B.J. and Proudfoot, D. Alan Turing’s 1484–1509. forgotten ideas in computer science. Scientific 21. Sieg, W. Calculations by man and machine: Conceptual American 280, (1999), 76–81. analysis. In W. Sieg, R. Sommer, and C. Talcott, Eds., Copyright held by authors.

The MIT Press

Software Design Decoded Driverless Photo Forensics Men, Machines, and 66 Ways Experts Think Intelligent Cars and the Road Ahead Hany Farid Modern Times 50th Anniversary Edition Marian Petre Hod Lipson and Melba Kurman “. . . a unique and delightful book, and André van der Hoek “Driverless vehicles are poised to a tour of computational photog- Elting E. Morison illustrations by Yen Quach usher in a massive disruption of our raphy by a forgery sleuth. I loved foreword by Rosalind Williams “This is the book I wish I’d had transportation system, our urban it. Hany Farid, preeminent in this with remarks by Leo Marx around throughout my journey as landscapes, our economy—and fi eld, offers code, case studies, “The most brilliant, original, and a software architect. It’s charming, quite possibly the very fabric of and advice to help readers fi nd absorbing book in American his- approachable, and full of wisdom— society. Anyone who wants to manipulated images.” tory I have read for some time.” you’ll learn things you’ll come back understand what’s coming must —William T. Freeman, MIT; —Arthur Schlesinger, Jr. to again and again.” read this fascinating book.” Research Scientist, Google —Grady Booch, IBM Fellow and —Martin Ford, New York Times best- Chief Scientist, IBM selling author of Rise of the Robots mitpress.mit.edu

36 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 viewpoints

VDOI:10.1145/2911980 Jonathan Grudin Viewpoint Technology and Academic Lives Considering the need to create new modes of interaction and approaches to assessment given a rapidly evolving academic realm.

“ ’M OVERCOMMITTED AT the mo- ment. I find the university at the center of a huge work speedup.” —Vice provost and dean I“The cult of busy-ness … is really poisonous. People don’t want to be seen to have time for a leisurely intel- lectual conversation, because having time for that means they are insuf- ficiently busy. They should be head- ing to the airport to go hustle grant money.” —Department chair

From Dagstuhl to Doonesbury, we hear about the plight of academics. Curi- ous, I consulted over a dozen com- puter and information science fac- ulty, department chairs, deans, and other administrators from leading research universities. Details differed, but heightened demands and shifting evaluation criteria were consistently demic ladder in 1995, and am now an were difficult to organize and few in raised. What drove the changes? They affiliate professor.a number. Research labs and depart- mentioned budget pressure and rising ments tended to be fiefdoms or small expectations, but unanticipated con- 1975: The Ivory Tower Before Silicon communities, each existing in a bubble. sequences of new technologies appear The world of 1975 is not easy to bring We relied on people who were with- to be at the heart of it. Technology has into focus, even for those of us who were in walking distance. Faculty thorough- dramatically improved research and there. No Internet, Web, mobile phones, ly trained their graduate students to be teaching, yet it can also complicate Federal Express, or fax.b Long-distance full collaborators. They invested in ex- academic life. telephone was expensive. Conferences plaining their research carefully to de- Let’s consider how faculty were as- partmental colleagues to get feedback, sessed in three worlds: technology- a My three snapshots of academic life are from ideas, and possible collaborations. De- intensive university departments in graduate work at Stanford, MIT, and UCSD; ris- partmental colloquia were well attend- 1975, when few had Internet access; ing from assistant to full professor at UC Irvine ed; faculty often commented cogently in the 1990s; and now an affiliate professor at in 1995, with the Internet but no Web the University of Washington. on talks outside their area. At UCSD, services; and today. I was a graduate b Facsimile machines existed but were not wide- each graduate student presented a

IMAGE BY SERGEY PAZHARSKI BY IMAGE student in 1975, climbing the aca- ly used. I first sent a fax in the late 1980s. project to the department at the end

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 37 viewpoints

of the first year, providing all faculty ence letters were more balanced. The INTERACTIONS with a view into their colleagues’ work, academic norm was lavish praise; read- methods, and student selection and ers looked for subtle negatives. My own training. Faculty helped one another, tenure and full professor promotions building the department’s reputation encountered reference-letter turbu- in an outside world that was known lence but survived. mainly through journal articles. Being able to interact at a distance It wasn’t idyllic—there were fac- has phenomenal benefits, but it does tional disputes, tenure anxiety, and reduce local interaction. This dimin- ‘publish or perish.’ Nevertheless, high ishing of community was reflected in familiarity enabled faculty promotion the increased outsourcing of faculty as- cases to be handled effectively within a sessment. It didn’t seem ideal to weight department and school. the subjective opinions of outsiders so heavily, but it was manageable. ACM’s Interactions magazine The Internet Arrives explores critical relationships In late 1979, halfway through my Ph.D. 2015: The Information Age between people and journey, our lab connected to the AR- As one of few full professors in HCI in technology, showcasing PANET. Faculty and staff could access the early 2000s, I was asked to write emerging innovations and the computer and the ARPANET via many letters for appointments and industry leaders from around modem from home. For students, the promotions. Over time, urgent per- the world across important Internet precursor was largely a curi- sonal requests gave way to form let- applications of design thinking osity. For faculty, it turned out to be ters that often omit key information. and the broadening eld of more significant. External letters appeared to be losing interaction design. Taking a postdoc in the U.K. in their dominant role. This hypothesis Our readers represent a growing 1982 was like parachuting in with was supported in my recent inqui- community of practice that is the clothes on my back. Communica- ries. A wealth of digital data is now of increasing and vital global tion with U.S. family, friends, and col- available and often relied upon. One importance. leagues was epistolary and about one department chair wrote, “There is a in 10 airmail letters went by sea and ar- pervasive atmosphere of pressure and rived a month late. On different occa- obsessive quantification, and yes, it sions, the two senior professors from affects senior people, too. In part this my UCSD lab visited and presented is because as the senior ranks fill with research co-authored with people out- people who came up through the ob- side our lab. I was stunned. This work sessive quantification, the attitudes had begun when I was a student but become entrenched ... People who was never discussed in the lab. Prior don’t buy into obsessive quantifica- to the ARPANET, everything was dis- tion get filtered out.” cussed. Suddenly, faculty could work Budget-strapped legislatures de- with distant colleagues. They no lon- mand that state universities show evi- ger had to educate departmental col- dence of impact. Great teaching may leagues to get strong feedback. With not compensate for weak research, faculty more focused on external dis- but bad teaching is tolerated less. cussions and collaborations, depart- Public universities with reduced state mental meetings became less central support and private universities fac- to academic life. ing higher costs count on rainmaking. “Funding is the first consideration for 1995: Assessment from Outside To learn more about us, promotion to full, even though that is visit our award-winning website After a stint in industry, I joined a re- http://interactions.acm.org markably democratic department: All faculty participated in all evaluations. A highly beneficial Follow us on Within months, as an assistant profes- Facebook and Twitter sor, I was voting on associate and full technology can To subscribe: professor cases. Faculty were no lon- have an undesirable http://www.acm.org/subscribe ger well acquainted with one another’s work; we relied extraordinarily heavily side effect. on external letters. This was not good Association for news for those of us in sub-fields in Computing Machinery which most professors had recently arrived from industry (such as IBM Research and Bell Labs), where refer-

38 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11

IX_XRDS_ThirdVertical_V01.indd 1 3/18/15 3:35 PM viewpoints not written,” observed a Promotion of exponential growth, with the breath- and Tenure Committee chair. NSF taking wave that quickly follows the adds to reporting requirements as it If any technology first perceptible effect.3 awards proportionally fewer grants. has an irresistable To assert that we are masters of our In addition to teaching evaluations destiny is to set thrones on the beach. and grant funding, performance mea- trajectory, digital We cannot always control the conse- sures include citations and down- technology does. quences of introducing a new technol- loads, journal and conference tiering, ogy. Bows and arrows were doomed as g- and h-indices, and students matric- weapons of war by the invention of a ulated. Many universities use Academ- musket that anyone could load, point, ic Analytics. External letters remain a and shoot. The introduction of literacy factor, especially in appointments for fundamentally changed previously which the digital record is sparser, We see the unquestioned value of oral cultures; money fundamentally but their subjectivity seems alien in a technology but often underestimate changed barter cultures. These technol- world of ‘big data’ and metrics. the inextricably intertwined costs: re- ogies brought advantages and disadvan- Faculty CVs sprout new categories, duced depth of social connectedness, tages. Socrates famously railed against such as mass media notices. A dean work speedups, a troubled publication writing, but despite his eloquence, he said, “I don’t think much of the [re- culture, lost jobs, privacy intrusions, couldn’t slow it down. If any technol- search] literature is read anymore. cybercrime, distributed terrorist net- ogy has an irresistible trajectory, digi- Most of the communication is through works, and so on. tal technology does. As it is woven ever coverage in national media (of which more deeply into the fabric of our lives, the research publication is the pre- Addressing Indirect Consequences it will be more difficult to link subtle ef- text).” Another dean said “A New York Proposed solutions are often exhor- fects to causes, and more challenging to Times op-ed piece, that’s huge!” tations to roll back the clock: restrict address effects that are unexpected. Faculty assessment is back in local academic assessments to a few high- hands, but the assessment is not based quality publications, return journals to Conclusion on the direct familiarity that marked preeminence, build technical barriers The close-knit communities of our pre-Internet academic communities. to privacy incursions, and so on. Un- academic predecessors are melting Collaborating and gathering infor- fortunately, the underlying forces that away. They won’t come back. We mation across distance accelerates brought us here are powerful, because must create new modes of interac- scholarship, yet it erodes the sense of they also bring tremendous benefits. tion and approaches to assessment community. For example, when faculty We smile at the story of King Ca- that are less impersonal and stress- invested in and relied on a network of nute placing his throne on the beach ful, for faculty members and job can- local colleagues, job-hopping exacted and commanding the incoming tide to didates, grant applicants and con- a high price. Today’s weaker local ties halt. A technological tide is sweeping ference submitters, for students in enable professors to play on a world in that will never retreat. We can’t com- MOOCs and traditional courses, and stage, with benefits and costs. It is mand away undesirable side effects by more broadly throughout society. working, but stresses are felt. issuing policy statements. Perhaps to Just as technology contributed to the protect valued heritage we can build a problems, it will help address them. Unintended Consequences massive seawall Netherlands-style— Software is versatile and we are in- A highly beneficial technology can but only if we understand the tidal novative. The best metaphor may not have an undesirable side effect. Auto- forces, decide what to save and what be a seawall that strives to block the mation of routine tasks is great, but to let go, budget for the costs, and ac- forces unleashed by technology and it can put people out of work. Trans- cept that unanticipated developments forestall change; rather, consider parency is good, but it can facilitate could render our efforts futile, as would a martial art that enables us, when unwanted surveillance. Collaborating a 10-centimeter rise in ocean levels. armed with a deep understanding over distances and with more people is An irresistible force—technology— of those forces, to redirect them to fantastic, but it means less interaction meets an immovable object—our ge- achieve positive outcomes. with nearby colleagues and each col- netic constitution. Behaviors that we laborator on average. inherited from ancestors who lived in References 1. Grudin, J. Journal-conference interaction and the Effects of technology on community tightly knit communities do not stand competitive exclusion principle. ACM Interactions 20, 1, (2013), 68–73. extend beyond faculty assessment. Our in opposition to technology, but they 2. Grudin, J. Technology, conferences, and community. conferences once focused on commu- shape our reactions. Can we control Commun. ACM, 54, 2, (Feb. 2011), 41–43. 3. Grudin, J. The demon in the basement. ACM nity building. As I have written previ- those tendencies, build seawalls to Interactions 13, 6 (2006), 50–53. ously,1,2 a likely indirect effect of two of protect us when human nature risks in- our most valued technologies—word teracting disadvantageously with use- Jonathan Grudin ([email protected]) is a principal processing and digital archives—was ful technologies that it did not evolve researcher at Microsoft Research in Redmond, WA. that our conferences became arbiters alongside? Perhaps, if we recognize the of quality with high rejection rates that forces at work. But we have little time undermine cohesion. to develop understanding in our world Copyright held by author.

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 39 practice

DOI:10.1145/2980932

Article development led by queue.acm.org Expect to be constantly and pleasantly befuddled.

BY PAT HELLAND The Power of Babble

its permutations with usages that usu- ally do not converge. My personal idio- lect shifts depending on whether I am speaking to a computer science audi- METADATA DEFINES THE shape, the form, and how to ence, my team at work with its contex- tual usages, my wife, my grandkids, understand our data. It is following the trend taken by or the waiter at a local restaurant. Dif- natural languages in our increasingly interconnected ferent communities of people extend world. While many concepts can be communicated English in different ways. Computer systems have an emerging using shared metadata, no one can keep up with the and increasing common metadata for number of disparate new concepts needed to have a interoperability. XML and now JSON fill similar roles by making the parsing of common understanding. messages easy and common. It’s great English is the lingua franca of the world, yet there we are no longer arguing over ASCII ver- are many facets of humanity and the concepts held sus EBCDIC, but that is hardly the most challenging problem of understanding. by different people that simply cannot be captured in As we move up the stack of under- English no matter how pervasive the language. In fact, standing, new subtleties constantly emerge. Just when we think we under- English itself has nooks, crannies, dialects, meetups, stand, the other guy has some crazy and teenager slang that innovate and extend new ideas!

40 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 As much as we would like to have they are lucky enough to slide into a The Dialects of the Business World complete understanding of each other, trough of inactivity after a flurry of re- Computer systems and applications tend independent innovation is far more search and before huge investments in to be developed independently to sup- important than crisp and clear com- productization (see Figure 1). This ob- port the special needs of their users. In munication. Our economic future de- servation is known as the Apocalypse the past, each system would be bespoke pends on the “power of babble.” of the Two Elephants (although Clark and support detailed specifications. actually didn’t name it that).1 Increasingly, shared application plat- The Apocalypse of Two Elephants Standards that happen in this forms are leveraged, either on prem- To facilitate communications, the com- trough are effective and experience lit- ises or in the cloud. In these common puting industry, various companies, tle competition. If a standard doesn’t apps, there is common metadata—at and other organizations try to establish emerge here or the trough is squished least as far as the apps have a common standard forms of communication. We by the two humps overlapping, it’s a heritage. see TCP, IP, Ethernet, and other com- much murkier road forward. When applications are independently munication standards as well as XML, The best de jure standards are rubber developed, they have disparate concepts JSON, and even ASCII making it easier stamps over de facto standards. and representations. Many of these pur- to communicate. Above this, there are If there’s no de facto standard chased applications are designed for ex- vertical specific standards (for example, to start from, then the de jure stan- tensions. As the specific customer gloms healthcare and manufacturing stan- dard typically contains the union extensions onto the side of the app, this dards). Many companies have internal of all ideas discussed by the com- impacts the shape, form, and meaning communication standards as well. mittee. Natural selection relegates of its internal and shared data. Dave Clark of MIT observed that these standards and their clutter to When there’s a common application

IMAGE BY ANTON WATMAN ANTON BY IMAGE successful standards happen only if history books. lineage, there’s a common understand-

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 41 practice

Figure 1. The Apocalypse of Two Elephants. ing of its data. Popular ERP (enterprise resource planning), CRM (customer billion-dollar relationship management), and HRM research investment (human resources management) appli- cations have their ways of solving busi- ness problems, and different compa- standards nies that have adopted these solutions

Activity may find it easier to interoperate.

Internecine Interop Still, challenges of understanding may exist even across departments or divi- Time sions of the same company. A large conglomerate may sell many prod- ucts, including light bulbs, dishwash- Figure 2. Least-lossy conversion. ers, locomotives, and nuclear power plants. I would hazard a guess that it doesn’t have a single canonical cus- service service service tomer record type. Of course, mergers and divestitures impact a company’s metadata. I know service service from personal experience how difficult it is to change my mailing address with a bank or insurance company. They can’t seem to track down all the sys- service service tems that record my address even over the course of a year. It’s not a big sur- prise that they have a hard time manag- service service ing their metadata.

What’d You Say? service service Whenever there are two representa- service tions of data, either somebody adapts or the fidelity of the translation suf- fers. In many cases, the adaptation is 12 services, 12 x 11 = 132 message transformers driven by economic power. When a manufacturer wants to sell something to a huge retailer, it may be told exactly Figure 3. ‘Double lossy’ conversion. the shape, form, and semantics of the messaging between the companies. To get the business, the manufacturer will service figure it out! service service The dog wags the tail. In any commu- nication partnership, the onus to adapt service service rests on the side that most needs the rela- tionship to work. Translating between two data repre- sentations may very likely be lossy. Not service canonical service all of the information in one form can be moved to the other form. It’s highly likely that some stuff will be nulled out service service or possibly translated into a form that doesn’t precisely map. Each translation is lossy. By the time service service service the translation occurs, a loss of knowl- edge has occurred. The best results will be from dedicated transformations de- signed to take exactly one source and 12 services, 2 x N; 2 x 12 = 24 message transformers translate it as best as possible to ex- actly one target. This is the least lossy

42 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 practice form of translation. Unfortunately, is deemed more important than hold- Similarly, the base metadata continues this results in a boatload of translators. ing an illness private. Communication to move and adjust as it assimilates Creating a specific conversion for each cannot take place without understand- those new messages and fields that source and destination pair results in ing the assumptions and interpreting made no sense at all a short time ago. great conversion fidelity but also re- through that lens. sults in N2 converters (see Figure 2). The artificial language Esperanto Relishing Diversity What to do? Many times, we simply was created in 1887 with the hope of While not understanding another capture a canonical representation achieving a common shared natural party is a pain, it probably means that and do two data translations: first, a language for all people. Some folks innovation and growth have occurred. lossy translation into the canonical grabbed hold and used it to write Economic forces will drive when and representation; then, a lossy transla- and share. Some say a few million where it’s worth the bother to invest in tion from the canonical representation speak it today. deeper understanding. into the target representation. This is The use of Esperanto has been wan- Playing loose with understanding double-lossy and just doesn’t supply as ing, however. Each of the approximately allows for better cohesion, as exempli- good a result. 6,000 languages spoken by different fied by Amazon’s product catalog and Why do the translation to a canoni- communities in the world has its own the search results from Google or Bing. cal form? Because only 2*N translators flavor and nuance. You can say certain Remember that in many cases, cul- are needed for N sources, and that is things in one language that you just tural and contextual issues will drive a heck of a lot fewer than N2, as N gets can’t say in another one. how something is interpreted. Exten- large. Using canonical metadata as a sible data does not have a prearranged common translation reduces the num- Diverse and Homogeneous understanding. Translating between ber of converters but results in a dou- The words and phrases people use and representations is lossy and frequently ble-lossy conversion (see Figure 3). the metadata that applications use involves a painful trade-off between In most cases, people use canonical follows a similar pattern. With a com- expensive handcrafted translators and metadata to bound complexity but add mon codebase DNA and history, some even lossier multiple translations. specific source-to-target translators meanings are the same. As time, evolu- Personally, as the years have gone when the lossiness is too large. tion, and commingling occur, it’s more by, I’ve gotten much more relaxed difficult to understand one another. about the things I don’t know and What Color Are New software applications either in don’t understand. A lot of stuff confus- Your Rose-Colored Glasses? the cloud or on premises sometimes es me! As we interoperate across dis- We all see stuff couched in terms of a set offer enough business benefit that parate boundaries, it would do us well of assumptions. This is a worldview that enterprises adapt their ways of doing to remember that the less stressed we allows us to interpret incoming informa- business to fit the application. The new are about perfect understanding and tion. This interpretation may be right or user adopts the canonical representa- agreement, the better we will all get wrong, but, more importantly, it is right tion of data and business processes by along. Moving forward, I expect to be or wrong for our subjective usage. sheer hard work. When the business constantly and pleasantly befuddled by Computer systems are invariably de- value of the software is high enough, the power of babble. signed for a certain company, depart- mapping to it is cost effective. Now ment, or group. The data is typically cast the enterprise is much more closely Related articles into a meaning and use that are appro- aligned to the new approach and to on queue.acm.org priate for one side but lose their deeper interoperating with other enterprises meaning through the translation. sharing the new data and process. Immutability Changes Everything Sometimes, the meaning and un- Next, the enterprise will begin to ex- Pat Helland derstanding of some data are deeply tend the system using extensibility fea- http://queue.acm.org/detail.cfm?id_2884038 couched in cultural issues. Any trans- tures. These extensions can then become Standardized Storage Clusters lation to a new environment and cul- a source of misunderstanding, but they Garth Goodson et al. http://queue.acm.org/detail.cfm?id+1317402 ture simply loses all meaning. Read- bring business value to the enterprise. ing about daily life in Medieval Europe The U.S., Canada, and many other Search Considered Integral Ryan Barrows and Jim Traverso doesn’t help much unless you study the Western countries have tremendous http://queue.acm.org/detail.cfm?id=1142068 relationships between serfs and lord diversity in their populations. New ar- as well as between men and women. rivals bring new customs. They work Reference Only then can you understand the ac- to understand the existing customs in 1. Clark, D. 2009. The Apocalypse of Two Elephants, or ‘what I really said.’ Advanced Network Architecture. tions described in the book. Similarly, their new home. While there are many MIT CSAIL; http://groups.csail.mit.edu/ana/People/ in any discussion of privacy, cultural differences at first, in a few short years DDC/Apocalypse.html. expectations must be addressed. In the immigrants fit in. Their children are North America and Europe, protecting deeply ingrained in the new country, Pat Helland has been implementing transaction systems, databases, application platforms, distributed systems, against the damage that may result by even though they still like some of that fault-tolerant systems, and messaging systems since disclosing a medical challenge is para- food their mom cooked at home. That 1978. He currently works at Salesforce. mount. In India, the essential need to food becomes as American (or English Copyright held by author. vet a prospective spouse for your child or German) as pizza, tacos, and falafel. Publication rights licensed to ACM. $15.00

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 43 practice

DOI:10.1145/2980987 To help readers make informed de- Article development led by queue.acm.org sign decisions, this article describes advanced (and practical) synchroniza- tion methods that can push the per- Advanced synchronization methods can boost formance of designs using shared mu- the performance of multicore software. table data to levels that are acceptable to many applications. BY ADAM MORRISON To get a taste of the dilemmas in- volved in designing multicore software, let us consider a concrete problem: implementing a work queue, which al- lows threads to enqueue and dequeue Scaling work items—events to handle, packets to process, and so on. Issues similar to those discussed here apply in general to multicore software design.14 Synchronization Centralized shared queue. One natural work queue design (depicted in Figure 1a) is to implement a central- ized shared (thread-safe) version of the in Multicore familiar FIFO (first in, first out) queue data structure—say, based on a linked list. This data structure supports en- Programs queuing and dequeuing with a constant number of memory operations. It also easily facilitates dynamic load balanc- ing: because all pending work is stored in the data structure, idle threads can easily acquire work to perform. To make the data structure thread-safe, however, updates to the head and tail of the queue must be synchronized, and DESIGNING SOFTWARE FOR modern multicore this inevitably limits scalability. Using locks to protect the queue se- processors poses a dilemma. Traditional software rializes its operations: only one core at designs, in which threads manipulate shared data, a time can update the queue, and the have limited scalability because synchronization of others must wait for their turns. This ends up creating a sequential bottle- updates to shared data serializes threads and limits neck and destroying performance very parallelism. Alternative distributed software designs, quickly. One possibility to increase scal- in which threads do not share mutable data, eliminate ability is by replacing locks with lock- free synchronization, which directly synchronization and offer better scalability. But manipulates the queue using atomic distributed designs make it challenging to implement instructions,1,11 thereby reducing the amount of serialization. (Serialization features that shared data structures naturally is still a problem because the hardware provide, such as dynamic load balancing and strong cache coherence mechanism1 serial- consistency guarantees, and are simply not a good fit izes atomic instructions updating the same memory location.) In practice, for every program. however, lock-free synchronization Often, however, the performance of shared mutable often does not outperform lock-based synchronization, for reasons to be dis- data structures is limited by the synchronization cussed later.

methods in use today, whether lock-based or lock-free. Partially distributed queue. Al- BALL STEVE BY IMAGE

44 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 45 practice

ternative work-queue designs seek selected at random) until finding one is the data structure’s consistency guar- scalability by distributing the data containing work. antee. In particular, unlike the central- structure, which allows for more paral- This design should scale much ized shared queue, the distributed de- lelism but gives up some of the prop- better than the centralized shared sign does not maintain the cause and erties of the centralized shared queue. queue: enqueues by different cores effect relation in the program. Even

For example, Figure 1b shows a design run in parallel, as they update differ- if core P1 enqueues x1 to its queue af-

that uses one SPMC (single-producer/ ent queues, and (assuming all queues ter core P0 enqueues x0 to its queue, x1

multiple-consumer) queue per core. contain work) dequeues by different may be dequeued before x0. The design Each core enqueues work into its cores are expected to pick different weakens the consistency guarantees queue. Dequeues can be implemented queues to dequeue from, so they will provided by the data structure. in various ways—say, by iterating over also run in parallel. The fundamental reason for this all the queues (with the starting point What this design trades off, though, weakening is that in a distributed de- sign, it is difficult (and slow) to combine Figure 1. Possible designs for a work queue. the per-core data into a consistent view of the data structure—one that would have been produced by the simple cen- (a) all cores access a centralized shared queue tralized implementation. Instead, as in this case, distributed designs usually centralized FIFO queue weaken the data structure’s consisten- cy guarantees.5,8,14 Whether the weaker P0 P1 P2 guarantees are acceptable or not de- (b) each core has an SPMC queue pends on the application, but figuring this out—reasoning about the accept- able behaviors—complicates the task

SPMC SPMC SPMC of using the data structure. Despite its more distributed nature, P0 P1 P2 this per-core SPMC queue design can

(c) each core shares an SPSC queue with every other core still create a bottleneck when load is not balanced. If only one core gener- ates work, for example, then dequeuing cores all swoop on its queue and their SPSC SPSC SPSC SPSC SPSC SPSC SPSC SPSC SPSC operations become serialized.

P0 P1 P2 Distributed queue. To eliminate many-thread synchronization alto- gether, you can turn to a design such as the one depicted in Figure 1c, with Figure 2. Critical path of lock-based code. each core maintaining one SPSC (sin- gle-producer/single-consumer) queue (a) queue-based lock for each other core in the system, into which it enqueues items that it wishes lock critical lock acquire section release its peer to dequeue. As before, this de-

P0 [ ] [ ] [ ] sign weakens the consistency guaran- tee of the queue. It also makes dynamic (spinning) load balancing more difficult because P [ ] [ ] [ ] 1 it chooses which core will dequeue an item in advance.

P2 [ ] [ ] [ ] Motivation for improving synchro- time nization. The crux of this discussion is that obtaining scalability by distribut- (b) delegation ing the queue data structure trades off lock lock some useful properties that a central- acquire release critical sections ized shared queue provides. Usually, P [ ] [ ] [ ] 0 however, these trade-offs are clouded publish op by the unacceptable performance of P 1 [ ] [ centralized data structures, which

publish op obviates any benefit they might offer. This article makes the point that much P2 [ ] time of this poor performance is a result of inefficient synchronization methods. It surveys advanced synchronization

46 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 practice methods that boost the performance Can lock-based serialization be made of centralized designs and make them even more efficient? The delegation syn- acceptable to more applications. With chronization method described here these methods at hand, designers can does so: It eliminates most lock acqui- make more informed choices when ar- sition and releases from the critical chitecting their systems. The crux of this path, and it speeds up execution of the discussion is that critical sections themselves. Scaling Locking with Delegation Delegation. In a delegation lock, the Locking inherently serializes execu- obtaining scalability core holding the lock acts as a server tions of the critical sections it protects. by distributing and executes the operations that the Locks therefore limit scaling: the more cores waiting to acquire the lock wish cores there are, the longer each core the queue data to perform. Delegation improves scal- has to wait for its turn to execute the ability in several ways (Figure 2b). First, critical section, and so beyond some structure trades it eliminates the lock acquisitions and number of cores, these waiting times off some useful releases that otherwise would have dominate the computation. This scal- been performed by waiting threads. ability limit, however, can be pushed properties that a Second, it speeds up the execution of quite high—in some cases, beyond the centralized shared operations (critical sections), because scales of current systems—by serial- the data structure is hot in the server’s izing more efficiently. Locks that serial- queue provides. cache and does not have to be trans- ize more efficiently support more op- ferred from a remote cache or from erations per second and therefore can memory. Delegation also enables new handle workloads with more cores. optimizations that exploit the seman- More precisely, the goal is to mini- tics of the data structure to speed up mize the computation’s critical path: critical-section execution even further, the length of the longest series of op- as will be described. erations that have to be performed Implementing delegation. The idea sequentially because of data depen- of serializing faster by having a sin- dencies. When using locks, the critical gle thread execute the operations of path contains successful lock acquisi- the waiting threads dates to the 1999 tions, execution of the critical sections, work of Oyama et al.13 but the over- and lock releases. heads of their implementation over- As an example of inefficient serial- shadow its benefits. Hendler et al.6 in ization that limits scalability, consider their flat combining work, were first to the lock contention that infamously oc- implement this idea efficiently and curs in simple spin locks. When many to observe that it facilitates optimiza- cores simultaneously try to acquire tions based on the semantics of the a spin lock, they cause its cache line executed operations. to bounce among them, which slows In the flat combining algorithm, down lock acquisitions/releases. This every thread about to acquire the increases the length of the critical path lock posts the operation it intends to and leads to a performance “melt- perform (for example, dequeue or down” as core counts increase.2 enqueue(x)) in a shared publication Lock contention has a known so- list. The thread that acquires the lock lution in the form of scalable queue- becomes the server; the remaining based locks.2,10 Instead of having all threads spin, waiting for their opera- waiting threads compete to be the next tions to be applied. The server scans one to acquire the lock, queue-based the publication list, applies pending locks line up the waiting threads, operations, and releases the lock when enabling a lock release to hand the done. To amortize the synchronization lock to the next waiting thread. These cost of adding records to the publica- hand-offs, which require only a con- tion list, a thread leaves its publication stant number of cache misses per record in the list and reuses it in future acquisition/release, speed up lock operations. Later work explored pig- acquisition/release and decrease the gybacking the publication list on top length of the critical path, as depict- of a queue lock’s queue,4 and boosting ed in Figure 2a: the critical path of cache locality of the operations by ded- a queue-based lock contains only a icating a core for the server role instead single transfer of the lock’s cache line of having threads opportunistically be- (dotted arrow). come servers.9

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 47 practice

Figure 3. Enqueue/dequeue throughput comparison. Figure 4. Lock-free linking of a node to the head of a linked list.

30 struct Node { struct Node* next; 25 void* value; } // Pointer to head of the list delegation 20 Node* head = NULL;

void enqueue (void* v) { 15 Node *old, *new = malloc(); M ops/second new—>value = v; 10 lock-based while (true) { old = head; new—>next = old; 5 if ( CAS(&head, old, new) ) 1 2 4 8 10 12 14 16 18 20 return; Threads } }

Semantics-based optimizations. The Deferring delegation. For operations heavy workloads can be rare—all of its server thread has a global view of con- that only update the data structure operations execute asynchronously. currently pending operations, which but do not return a value, such as en- The original implementation of it can leverage to optimize their execu- queue(), delegation facilitates an opti- this optimization still required cores tion in two ways: mization that can sometimes eliminate to synchronize when executing these ˲˲ Combining. The server can com- serialization altogether. Since these deferred operations, since it logged bine multiple operations into one and operations do not return a response the operations in a centralized (lock- thereby save repeated accesses to the to the invoking core, the core does not free) queue.7 Boyd-Wickizer et al.,3 data structure. For example, multiple have to wait for the server to execute however, implemented deferred del- counter-increment operations can be them; it can just log the requested op- egation without any synchronization converted into one addition. eration in the publication list and keep on updates by leveraging systemwide ˲˲ Elimination. Mutually canceling running. If the core later invokes an op- synchronized clocks. Their OpLog operations, such as a counter incre- eration whose return value depends on library logs invocations of response- ment and decrement, or an insertion the state of the data structure, such as less update operations in a per-core and removal of the same item from a a dequeue(), it must then wait for the log, along with their invocation times. set, can be executed without modifying server to apply all its prior operations. Operations that read the data struc- the data structure at all. But until such a time—which in update- ture become servers: they acquire the locks of all per-core logs, apply the op- Figure 5. Critical path of lock-free updating head of linked list. erations in timestamp order, and then read the updated data-structure state. (a) ideal case OpLog thus creates scalable imple- read mentations of data structures that are head updated heavily but read rarely, such as P [ ] 0 CAS LRU (least recently used) caches. Performance. To demonstrate the benefits of delegation, let’s compare P1 [ CAS ] a lock-based work queue to a queue implemented using delegation. The P [ CAS ] 2 lock-based algorithm is Michael and time Scott’s two-lock queue.11 This algorithm protects updates to the queue’s head (b) effect of CAS failures and tail with different locks, serializ- read head ing operations of the same type but al- P lowing enqueues and dequeues to run 0 [ CAS ] in parallel. Queue-based CLH (Craig, failure Landin, and Hagerstein) locks are used P 1 [ CAS ] in the evaluated implementation of the lock-based algorithm. The delegation- P 2 [ CAS ] based queue is Fatourou and Kallima- nis’s CC-Queue,4 that adds delegation time to each of the two locks in the lock- based algorithm. (It thus has two serv-

48 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 practice ers running: one for dequeues and one It atomically updates the value stored (or only light) contention. Under high for enqueues.) in addr from old to new; if the value contention, however, lock-free syn- Figure 3 shows enqueue/dequeue stored in addr is not old, the CAS fails chronization has the potential to be throughput comparison (higher is without updating memory. much more efficient than lock-based better) of the lock-based queue and CAS-based lock-free algorithms syn- synchronization, as it eliminates lock its delegation-based version. The chronize with a CAS loop pattern: A core acquire and release operations from benchmark models a generic applica- reads the shared state, computes a new the critical path, leaving only the data tion. Each core repeatedly accesses value, and uses CAS to update a shared structure operations on it (Figure 5a). the data structure, performing pairs variable to the new value. If the CAS In addition, lock-free algorithms of enqueue and dequeue operations, succeeds, this read-compute-update guarantee that some operation can al- reporting the throughput of queue op- sequence appears to be atomic; other- ways complete and thus behave grace- erations (that is, the total number of wise, the core must retry. Figure 4 shows fully under high load, whereas a lock- queue operations completed per sec- an example of such a CAS loop for link- based algorithm can grind to a halt ond). To model the work done in a real ing a node to the head of a linked list, if the operating system preempts a application, a period of “think time” is taken from Treiber’s classic LIFO (last thread that holds a lock. inserted after each queue operation. in, first out) stack algorithm.15 Similar In practice, however, lock-free al- Think times are chosen uniformly at ideas underlie the lock-free implemen- gorithms may not live up to these per- random from 1 to 100 nanoseconds to tations of many basic data structures formance expectations. Consider, for model challenging workloads in which such as queues, stacks, and priority example, Michael and Scott’s lock-free queues are heavily exercised. The C im- queues, all of which essentially perform queue algorithm.11 This algorithm im- plementations of the algorithms from an entire data-structure update with a plements a queue using a linked list, Fatourou and Kallimanis’s benchmark single atomic instruction. with items enqueued to the tail and re- framework (https://github.com/nkal- The use of (sometimes multiple) moved from the head using CAS loops. lima/sim-universal-construction) are atomic instructions can make lock- (The exact details are not as important used, along with a scalable memory free synchronization slower than a as the basic idea, which is similar in allocation library to avoid malloc bot- lock-based solution when there is no spirit to the example in Figure 4.) De- tlenecks. No semantics-based optimi- zation is implemented. Figure 6. Lock-free synchronization CAS failure problem. This benchmark (and all other ex- periments reported in this article) was (a) Lock-free vs. Lock-based Queue Throughput run on an Intel Xeon E7-4870 (West- mere EX) processor. The processor has 11 10 2.40-GHz cores, each of which mul- 10 tiplexes two hardware threads, for a to- tal of 20 hardware threads. 9 lock-based Figure 3 shows the benchmark throughput results, averaged over 10 8 runs. The lock-based algorithm scales 7

M ops/second lock-free to two threads, because it uses two locks, but fails to scale beyond that 6 amount of concurrency because of se- 5 rialization. In contrast, the delegation- 1 2 4 8 10 12 14 16 18 20 based algorithm scales and ultimately Threads performs almost 30 million operations per second, which is more than 3.5 (b) Effect of CAS Failures times that of the lock-based algo- rithm’s throughput. 30 5 CAS 25 Avoiding CAS Failures in 4 Lock-Free Synchronization Lock-free synchronization (also re- 20 CAS per op 3 ferred to as nonblocking synchroniza- CAS/op tion) directly manipulates shared data 15 M ops/second using atomic instructions instead of CAS loop 2 locks. Most lock-free algorithms use 10 the CAS (compare-and-swap) instruc- 5 1 tion (or equivalent) available on all 1 2 4 8 10 12 14 16 18 20 multicore processors. A CAS takes Threads three operands: a memory address addr, an old value, and a new value.

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 49 practice

spite this, as Figure 6a shows, the lock- operations that fail in this way pile use- tions completed in Figure 5a). free algorithm fails to scale beyond four less work on the critical path. Although To estimate the amount of perfor- threads and eventually performs worse these failing CASes do not modify mem- mance wasted because of CAS failures, than the two-lock queue algorithm. ory, executing them still requires ob- Figure 6b compares the throughput The reason for this poor performance taining exclusive access to the variable’s of successful CASes executed in a CAS is CAS failure: as the amount of concur- cache line. This delays the time at which loop (as in Figure 4) to the total CAS rency increases, so does the chance that later operations obtain the cache line throughput (including failed CASes). a conflicting CAS gets interleaved in the and complete successfully (see Figure Observe that the system executes con- middle of a core’s read-compute-update 5b, in which only two operations com- tending atomic instructions at almost CAS region, causing its CAS to fail. CAS plete in the same time that three opera- three times the rate ultimately observed in the data structure. If there were a way Figure 7. Infinite array queue. to make every atomic instruction use- ful toward completing an operation, // The following defines a node you would significantly improve perfor- struct Cell { mance. But how can this be achieved, void* value; } given that CAS failures are inherent? // Queue is infinite array of nodes, The key observation to make is that // with head and tail pointers. the x86 architecture supports several Cell Q [] = { ┴, ┴, ...}; atomic instructions that always suc- int head = 0; int tail = 0; ceed. One such instruction is FAA (fetch-and-add), which atomically adds void enqueue(void* x) { an integer to a variable and returns the while (true) { t = FAA(&tail, 1) previous value stored in that variable. if ( CAS(&Q[t], ┴, x) ) return The design of a lock-free queue based } } on FAA instead of CAS is described void *dequeue() { next. The algorithm, named LCRQ (for while (true) { 12 h = FAA(&head, 1) linked concurrent ring queue), uses if ( !CAS(&Q[h], ┴, ┬) ) return Q[h] FAA instructions to spread threads if ( tail ≤ h+1 ) return NULL among items in the queue, allowing } } them to enqueue and dequeue quickly and in parallel. LCRQ operations typi- cally perform one FAA to obtain their Figure 8. Enqueue/dequeue throughput comparison of all queues. position in the queue, providing exact- ly the desired behavior.

(a) 45 The LCRQ Algorithm 40 LCRQ (FAA) This section presents an overview of the 35 LCRQ algorithm; for a detailed descrip- 12 30 delegation tion and evaluation, see the paper. 25 Conceptually, LCRQ can be viewed as 20 a practical realization of the follow- LCRQ (CAS) 15 ing simple but unrealistic queue algo- M ops/second 10 rithm (Figure 7). The unrealistic al- 5 lock-free (CAS) gorithm implements the queue using 0 an infinite array, Q, with (unbounded) 1 2 4 8 10 12 14 16 18 20 Threads head and tail indices that identify (b) the part of Q that may contain items. 45 Initially, each cell Q[i] is empty and 40 LCRQ (FAA) contains a reserved value ⊥ that may 35 not be enqueued. The head and tail 30 indices are manipulated using FAA 25 and are used to spread threads around 20 LCRQ (CAS) the cells of the array, where they syn- 15 M ops/second chronize using (uncontended) CAS. 10 lock-free (CAS) An enqueue(x) operation obtains 5 delegation a cell index t via an FAA on tail. 0 20 30 40 80 120 160 The enqueue then atomically places Threads x in Q[t] using a CAS to update Q[t] from ⊥ to x. If the CAS succeeds, the enqueue operation completes; other-

50 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 practice wise, it repeats this process. previously. The impact of CAS failures Maximizing Power Efficiency with A dequeue, D, obtains a cell in- is explored by testing LCRQ-CAS, a ver- Asymmetric Multicore Systems dex h using FAA on head. It tries to sion of LCRQ in which FAA is imple- Alexandra Fedorova et al. http://queue.acm.org/detail.cfm?id=1658422 atomically CAS the contents of Q[h] mented with a CAS loop. from ⊥ to another reserved value . Figure 8a shows the results. LCRQ Software and the Concurrency Revolution Herb Sutter and James Larus This CAS fails if Q[h] contained some outperforms all other queues beyond http://queue.acm.org/detail.cfm?id=1095421 x ≠ ⊥, in which case D returns x. Oth- two threads, achieving peak throughput erwise, the fact that D stored  in the of ≈ 40 million operations per second, or References cell guarantees an enqueue operation about 1,000 cycles per queue operation. 1. Al Bahra, S. Nonblocking algorithms and scalable multicore programming. Commun. ACM 56, 7 (July that later tries to store an item in Q[h] From eight threads onward, LCRQ out- 2013), 50–61. will not succeed. D then returns NULL performs the delegation-based queue 2. Boyd-Wickizer, S., Frans Kaashoek, M., Morris, R. and Zeldovich, N. Non-scalable locks are dangerous. In (indicating the queue is empty) if tail by 1.4 to 1.5 times and the MS (Michael Proceedings of the Ottawa Linux Symposium, 2012, ≤ h + 1 (the value of head following D’s and Scott) queue by more than three 121–132. 3. Boyd-Wickizer, S., Frans Kaashoek, M., Morris, R. and FAA is h + 1). If D cannot return NULL, times. LCRQ-CAS matches LCRQ’s Zeldovich, N. OpLog: A library for scaling update- it repeats this process. performance up to four threads, but at heavy data structures. Technical Report MIT-CSAIL- TR2014-019, 2014. This algorithm can be shown to that point its performance levels off. 4. Fatourou, P. and Kallimanis, N.D. Revisiting the implement a FIFO queue correctly, Subsequently, LCRQ-CAS exhibits the combining synchronization technique. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles but it has two major flaws that prevent throughput “meltdown” associated with and Practice of Parallel Programming, 2012, 257–266. it from being relevant in practice: us- CAS failures. Similarly, the MS queue’s 5. Haas, A., Lippautz, M., Henzinger, T.A., Payer, H., Sokolova, A., Kirsch, C.M. and Sezgin, A. Distributed ing an infinite array and susceptibility performance peaks at two threads and queues in shared memory: multicore performance and scalability through quantitative relaxation. In to livelock (when a dequeuer continu- degrades as concurrency increases. Proceedings of the ACM International Conference on ously writes  into the cell an enqueuer Oversubscribed workloads can Computing Frontiers, 2013, 17:1–17:9. 6. Hendler, D., Incze, I., Shavit, N., Tzafrir, M. Flat is about to access). The practical LCRQ demonstrate the graceful behavior of combining and the synchronization-parallelism algorithm addresses these flaws. lock-free algorithms under high load. trade-off. In Proceedings of the 22nd ACM Symposium on Parallelism in Algorithms and Architectures, 2010, The infinite array is first collapsed to In these workloads the number of soft- 355–364. a concurrent ring (cyclic array) queue— ware threads exceeds the hardware-sup- 7. Klaftenegger, D., Sagonas, K. and Winblad, K. Delegation locking libraries for improved performance CRQ for short—of R cells. The head ported level, forcing the operating sys- of multithreaded programs. In Proceedings of the and tail indices still strictly increase, tem to context-switch between threads. 20th International European Conference on Parallel and Distributed Computing, 2014, 572–583. but now the value of an index modulo R If a thread holding a lock is preempted, 8. Kulkarni, M., Pingali, K., Walter, B., Ramanarayanan, specifies the ring cell to which it points. a lock-based algorithm cannot make G., Bala, K., Chew, L.P. Optimistic parallelism requires abstractions. In Proceedings of the 2007 ACM Because now more than one enqueuer progress until it runs again. Indeed, as SIGPLAN Conference on Programming Language and dequeuer can concurrently access Figure 8b shows, when the number of Design and Implementation, 211–222. 9. Lozi, J.-P., David, F., Thomas, G., Lawall, J. and a cell, the CRQ uses a more involved threads exceeds 20, the throughput of Muller, G. Remote core locking: migrating critical CAS-based protocol for synchronizing the lock-based delegation algorithm section execution to improve the performance of multithreaded applications. In Proceedings of the within each cell. This protocol enables plummets by 15 times, whereas both 2012 USENIX Annual Technical Conference, 65–76. 10. Mellor-Crummey, J.M. and Scott, M.L. Algorithms an operation to avoid waiting for the LCRQ and the MS queue maintain their for scalable synchronization on shared-memory completion of operations whose FAA peak throughput. multiprocessors. ACM Trans. Computer Systems 9, 1 (1991), 21–65 . returns smaller indices that also point 11. Michael, M.M. and Scott, M.L. Simple, fast, and to the same ring cell. Conclusion practical non-blocking and blocking concurrent queue algorithms. In Proceedings of the 15th Annual ACM The CRQ’s crucial performance Advanced synchronization methods Symposium on Principles of Distributed Computing, property is that in the common fast can boost the performance of shared 1996, 267–275. 12. Morrison, A. and Afek, Y. Fast concurrent queues path, an operation executes only one mutable data structures. Synchroniza- for x86 processors. In Proceedings of the 18th ACM FAA instruction. The LCRQ algorithm tion still has its price, and when perfor- SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2013, 103–112. then builds on the CRQ to prevent the mance demands are extreme (or if the 13. Oyama, Y., Taura, K. and Yonezawa, A. Executing livelock problem and handle the case properties of centralized data struc- parallel programs with synchronization bottlenecks efficiently. InProceedings of International Workshop of the CRQ filling up. The LCRQ is es- tures are not needed), then distributed on Parallel and Distributed Computing for Symbolic sentially a Michael and Scott linked data structures are probably the right and Irregular Applications, 1999, 182–204. 14. Shavit, N. Data structures in the multicore age. Comm. 11 list queue in which each node is a choice. For the many remaining cases, ACM 54, 3 (Mar. 2011), 76–84. CRQ. A CRQ that fills up or experiences 15. Treiber, R.K. Systems programming: coping with however, the methods described in parallelism. Technical Report RJ5118 (2006). IBM livelock becomes closed to further en- this article can help build high-perfor- Almaden Research Center. queues, which instead append a new mance software. Awareness of these CRQ to the list and begin working in it. methods can assist those designing Adam Morrison works on making parallel and distributed systems simpler to use without compromising their Most of the activity in the LCRQ there- software for multicore machines. performance. He is an assistant professor at the Blavatnik fore occurs in the individual CRQs, School of Computer Science, Tel Aviv University, Israel. making contention (and CAS failures) Related articles on the list’s head and tail a nonissue. on queue.acm.org Performance. This section com- Nonblocking Algorithms and Scalable pares the LCRQ to Michael and Scott’s Multicore Programming 11 classic lock-free queue, as well as to Samy Al Bahra Copyright held by author. the delegation-based variant presented http://queue.acm.org/detail.cfm?id=2492433 Publication rights licensed to ACM. $15.00

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 51 practice

DOI:10.1145/2949033 and remained primarily academic con- Article development led by queue.acm.org cerns. As a result, these protocols were largely ignored by industry. The rise of Internet-scale services and demands Expert-curated guides to the best for automated solutions to cluster of CS research for practitioners. management, failover, and sharding in the 2000s finally led to the widespread BY PETER BAILIS, CAMILLE FOURNIER, practical adoption of these techniques. JOY ARULRAJ, AND ANDREW PAVLO Adoption proved difficult, however, and the process in turn led to new (and ongoing) research on the subject. The papers in this selection highlight the challenges and the rewards of making the theory of consensus practical— Research both in theory and in practice. Second, while consensus concerns distributed shared state, our second selection concerns the impact of hard- for Practice: ware trends on single-node shared state. Joy Arulaj and Andy Pavlo provide Distributed Consensus a whirlwind tour of the implications of NVM (non-volatile memory) technolo- gies on modern storage systems. NVM and Implications of promises to overhaul the traditional paradigm that stable storage (that is, NVM on Database storage that persists despite failures) be block-addressable (that is, requires Management Systems reading and writing in large chunks). In addition, NVM’s performance char- acteristics lead to entirely different de- sign trade-offs than conventional stor- age media such as spinning disks. As a result, there is an arms race to rethink software storage-systems ar- chitectures to accommodate these new IN THIS INSTALLMENT of Research for Practice, we characteristics. This selection high- provide highlights from two critical areas in storage lights projected implications for recov- ery subsystems, data-structure design, and large-scale services: distributed consensus and and data layout. While the first NVM non-volatile memory. devices have yet to make it to market, First, how do large-scale distributed systems mediate these pragmatically oriented citations from the literature hint at the volatile access to shared resources, coordinate updates to effects of non-volatile media on future mutable state, and reliably make decisions in the storage systems. presence of failures? Camille Fournier, a seasoned I believe these two excellent contri- and experienced distributed-systems builder (and about rfp ZooKeeper PMC), has curated a fantastic selection Research for Practice combines the resources of the ACM Digital Library, on distributed consensus in practice. The history the largest collection of computer science research in the world, with the expertise of consensus echoes many of the goals of RfP: For of the ACM membership. In every RfP decades the study and use of consensus protocols column two experts share a short, curated selection of papers on a concentrated, were considered notoriously difficult to understand practically oriented topic.

52 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 butions fulfill RfP’s goal of allowing provides useful functionality for devel- Hell Is Other Programmers you, the reader, to become an expert opers. The final paper attempts to an- in a weekend afternoon’s worth of swer the question—Is there an easier Burrows, M. reading. To facilitate this process, as way?—by introducing Raft, a consen- The Chubby lock service for loosely coupled distributed systems. always, we have provided open access sus algorithm designed to be easier for Proceedings of the 7th Symposium on Operating to the ACM Digital Library for the rele- developers to understand. Systems Design and Implementation, 2006, vant citations from these selections so 335–350. you can enjoy these research results in Theory Meets Reality http://queue.a cm.org/rfp/vol14iss3.html their full glory. Keep on the lookout for our next installment, and please enjoy! Chandra, T. D., Griesemer, R., While “Paxos Made Live” discusses the — Peter Bailis Redstone, J. et al. implementation of the consensus al- Paxos made live—An engineering perspective. gorithm in detail, this paper about the Peter Bailis is assistant professor of computer science th Chubby lock service examines the over- at Stanford University. His research in the Future Data Proceedings of the 26 Annual ACM Systems group (http://futuredata.stanford.edu/) focuses Symposium on Principles of Distributed all system built around this algorithm. on the design and implementation of next-generation Computing, 2007, 398–407. data-intensive systems. As research papers go, this one is a true http://queue.acm.org/rfp/vol14iss3.html delight for the practitioner. In particu- lar, it describes designing a system and Distributed Paxos as originally stated is a page then evolving that design after it comes Consensus of pseudocode. The complete imple- into contact with real-world usage. By Camille Fournier mentation of Paxos inside of Google’s This paper should be required reading “A distributed system is Chubby lock service is several thou- for anyone interested in designing and one in which the failure sand lines of C++. What happened? developing core infrastructure soft- of a computer you didn’t “Paxos Made Live” documents the ware that is to be offered as a service. know existed can render your own com- evolution of the Paxos algorithm from Burrows begins with a discussion of puter unusable.” theory into practice. the design principles chosen as the ba- —Leslie Lamport The basic idea of Paxos is to use vot- sis for Chubby. Why make it a central- As Lamport predicted in this quote, ing by replicas with consistent storage ized service instead of a library? Why is the real challenges of distributed com- to ensure that, even in the presence of it a lock service, and what kind of lock- puting—not just communicating via failures, there can be unilateral con- ing is it used for? Chubby not only pro- a network, but communicating to un- sensus. This requires a coordinator vides locks, but also serves small files known nodes in a network—has greatly be chosen, proposals sent and voted to facilitate sharing of metadata about intensified in the past 15 years. With the upon, and finally a commit recorded. distributed system state for its clients. incredible scaling of modern systems, Generally, systems record a series of Given that it is serving files, how many “we have found ourselves in a world these consensus values to a sequence clients should Chubby expect to sup- where answering the question, what is log. This log-based variant is called port, and what will that mean for the running where?” is increasingly diffi- multi-Paxos, which is less formally caching and change notification needs? cult. Yet, we continue to have require- specified. After discussing the details of the de- ments that certain data never be lost In creating a real system, durable sign, system structure, and API, Burrows and that certain actions behave in a con- logs are written to disks, which have gets into the nitty-gritty of the imple- sistent and predictable fashion, even finite capacity and are prone to corrup- mentation. Building a highly sensitive when some nodes of the system may tion that must be detected and taken centralized service for critical operations fail. To that end, there has been a rapid into account. The algorithm must be such as distributed locking and name adoption of systems that rely on consen- run on machines that can fail, and to resolution turns out to be quite difficult. sus protocols to guarantee this consis- make it operable at scale you need to Scaling the system to tens of thousands of tency in a widely distributed world. be able to change group membership clients meant being smart about caching The three papers included in this dynamically. While the system was and deploying proxies to handle some of selection address the real world of con- expected to be fault tolerant, it also the load. The developers misused and sensus systems: Why are they needed? needed to perform quickly enough to abused the system by accident, using fea- Why are they difficult to understand? be useful; otherwise, developers would tures in unpredictable ways, attempting What happens when you try to imple- work around it and create incorrect to use the system for large data storage or ment them? Is there an easier way, abstractions. The team details their messaging. The Chubby maintainers re- something that more developers can efforts to ensure the core algorithm is sorted to reviewing other teams’ planned understand and therefore implement? expressed correctly and is testable, but uses of Chubby and denying access until The first two papers discuss the even with these conscious efforts, the review was satisfied. Through all of this reality of implementing Paxos-based need for performance optimizations, we can see that the challenge in building consensus systems at Google, focus- concurrency, and multiple develop- a consensus system goes far beyond im- ing first on the challenges of correctly ers working on the project still means plementing a correct algorithm. We are implementing Paxos itself, and second that the final system is ultimately an still building a system and must think as on the challenges of creating a system extended version of Paxos, which is dif- carefully about its design and the users based on a consensus algorithm that ficult to prove correct. we will be supporting.

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 53 practice

Can We Make This Easier? were very few reliable and successful nologies—including phase-change open-source systems based on Paxos. memory, memristors, and STT-MRAM Ongaro, D., Ousterhout, J. (spin-transfer torque-magnetoresis- In search of an understandable consensus Bottlenecks, Single Points of tive random-access memory)—that algorithm. Failure, and Consensus Proceedings of the Usenix Annual Technical provide low-latency reads and writes Conference, 2014, 305–320. Developers are often tempted to use on the same order of magnitude as https://www.usenix.org/conference/atc14/ a centralized consensus system to serve DRAM (dynamic random-access mem- technical-sessions/presentation/ongaro as the system of record for distributed ory), but with persistent writes and coordination. Explicit coordination large storage capacity like an SSD (sol- Finally we come to the question, have can make certain problems much easi- id-state drive). Unlike DRAM, writes to we built ourselves into unnecessary er to reason about and correct for; how- NVM are expected to be more expen- complexity by taking it on faith that ever, that puts the consensus system in sive than reads. These devices also Paxos and its close cousins are the only the position of the bottleneck or criti- have limited write endurance, which way to implement consensus? What if cal point of failure for the other sys- necessitates fewer writes and wear- there was an algorithm that we could tems that rely on it to make progress. leveling to increase their lifetimes. also show to be correct but was de- As we can see from these papers, mak- The first NVM devices released will signed to be easier for people to com- ing a centralized consensus system have the same form factor and block- prehend and implement correctly? production-ready can come at the cost oriented access as today’s SSDs. Thus, Raft is a consensus algorithm writ- of adding optimizations and recovery today’s DBMSs will use this type of ten for managing a replicated log but mechanisms that were not dreamed of NVM as a faster drop-in replacement designed with the goal of making the in the original Paxos literature. for their current storage hardware. algorithm itself more understandable What is the way forward? Argu- By the end of this decade, however, than Paxos. This is done both by de- ably, writing systems that do not rely NVM devices will support byte-address- composing the problem into pieces on centralized consensus brokers to able access akin to DRAM. This will that can be implemented and under- operate safely would be the best op- require additional CPU architecture stood independently and by reducing tion, but we are still in the early days and operating-system support for per- the number of states that are valid for of coordination-avoidance research sistent memory. This also means that the system to hold. and development. While we wait for existing DBMSs are unable to take full Consensus is decomposed into is- more evolution on that front, Raft advantage of NVM because their inter- sues of leader election, log replication, provides an interesting alternative, nal architectures are predicated on the and safety. Leader election uses ran- an algorithm designed for readability assumption that memory is volatile. domized election timeouts to reduce and general understanding. The im- With NVM, many of the components the likelihood of two candidates for pact of having an easier algorithm to of legacy DBMSs are unnecessary and leader splitting the vote and requiring implement is already being felt, as far will degrade the performance of data- a new round of elections. It allows can- more developers are embedding Raft intensive applications. didates for leader to be elected only within distributed systems and build- We have selected three papers that if they have the most up-to-date logs. ing specifically tailored Raft-based focus on how the emergence of byte- This prevents the need for transfer- coordination brokers. Consensus addressable NVM technologies will ring data from follower to leader upon remains a tricky problem—but one impact the design of DBMS archi- election. If a follower’s log does not that is finally seeing a diversity of ap- tectures. The first two present new match the expected state for a new en- proaches to reaching a solution. abstractions for performing durable try, the leader will replay entries from atomic updates on an NVM-resident earlier in its log until it reaches a point Camille Fournier is a writer, speaker, and entrepreneur. database and recovery protocols for an Formerly the CTO of Rent the Runway, she serves on at which the logs match, thus correct- the technical oversight committee for the Cloud Native NVM DBMS. The third paper address- Computing Foundation, as a Project Management ing the follower. This also means that a Committee member of the Apache ZooKeeper project, and es the write-endurance limitations of history of changes is stored in the logs, a project overseer of the Dropwizard Web framework. NVM by introducing a collection of providing a side value of letting clients write-limited query-processing algo- read (some) historical entries, should Implications of NVM on rithms. Thus, this selection contains they desire. Database Management novel ideas that can help leverage the The authors then show that after Systems unique set of attributes of NVM devic- teaching a set of students both Paxos By Joy Arulraj and es for delivering the features required and Raft, the students were quizzed Andrew Pavlo by modern data-management applica- on their understanding of each and The advent of non-vola- tions. The common theme for these scored meaningfully higher on the tile memory (NVM) will papers is that you cannot just run an Raft quiz. Looking around the current fundamentally change existing DBMS on NVM and expect it state of consensus systems in indus- the dichotomy between to leverage its unique set of proper- try, we can see this play out in another memory and durable ties. The only way to achieve that is to way: namely, several new consensus storage in a database come up with novel architectures, pro- systems have been created since 2014 management system tocols, and algorithms that are tailor- based on Raft, where previously there (DBMS). NVM is a broad class of tech- made for NVM.

54 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 practice

ARIES Redesigned for NVM dates on an NVM-resident database new bounds on the number of writes than the previous paper. In ARIES, that different kinds of query-process- Coburn, J., et al. during recovery the DBMS first loads ing algorithms must perform. From ARIES to MARS: Transaction support the most recent snapshot. It then Viglas presents a collection of novel for next-generation, solid-state drives. Proceedings of the 24th ACM Symposium on replays the redo log to ensure that query-processing algorithms that mini- Operating Systems Principles, 2013, 197–212. all the updates made by committed mize I/O by trading off expensive NVM http://queue.acm.org/rfp/vol14iss3.html transactions are recovered. Finally, it writes for cheaper reads. One such uses the undo log to ensure that the algorithm is the segment sort. The ba- ARIES is considered the standard for changes made by incomplete transac- sic idea is to use a combination of two recovery protocols in a transactional tions are not present in the database. sorting algorithms—external merge DBMS. It has two key goals: first, it pro- This recovery process can take a lot of sort and selection sort—that splits the vides an interface for supporting scal- time, depending on the load on the input into two segments that are then able ACID (atomicity, consistency, iso- system and the frequency with which processed using a different algorithm. lation, durability) transactions; second, snapshots are taken. Thus, this paper The selection-sort algorithm uses extra it maximizes performance on disk- explores whether it is possible to le- reads, and writes out each element in based storage systems. In this paper, verage NVM’s properties to speed up the input only once at its final location. the authors focus on how ARIES should recovery from system failures. By using a combination of these two al- be adapted for NVM-based storage. The authors present a software- gorithms, the DBMS can optimize both Since random writes to the disk based primitive called non-volatile the performance and the number of whenever a transaction updates the da- pointer. When a pointer points to data NVM writes. tabase obviously decrease performance, residing on NVM, and is itself stored ARIES requires that the DBMS first re- on NVM, then it will remain valid even Game Changer for cord a log entry in the write-ahead log after the system recovers from a power DBMS Architectures (a sequential write) before updating failure. Using this primitive, the au- NVM is a definite game changer for the database itself (a random write). It thors design a library of non-volatile future DBMS architectures. It will re- adopts a no-force policy wherein the up- data structures that support durable quire system designers to rethink dates are written to the database lazily atomic updates. They propose a recov- many of the core algorithms and after the transaction commits. Such a ery protocol that, in contrast to MARS, techniques developed over the past policy assumes that sequential writes obviates the need for an ARIES-style 40 years. Using these new storage to non-volatile storage are significantly redo log. This enables the system to devices in the manner prescribed faster than random writes. The authors, skip replaying the redo log, and there- by these papers will allow DBMSs however, demonstrate that this is no by allows the NVM DBMS to recover the to achieve better performance than longer the case with NVM. database almost instantaneously. what is possible with today’s hard- The MARS protocol proposes a new Both papers propose recovery pro- ware for write-heavy database appli- hardware-assisted logging primitive tocols that target an NVM-only storage cations. This is because these tech- that combines multiple writes to ar- hierarchy. The generalization of these niques are designed to exploit the bitrary storage locations into a single protocols to a multitier storage hierar- low-latency read/writes of NVM to en- atomic operation. By leveraging this chy with both DRAM and NVM is a hot able a DBMS to store less redundant primitive, MARS eliminates the need for topic in research today. data and incur fewer writes. Further- an ARIES-style undo log and relies on more, we contend that existing in- the NVM device to apply the redo log at Trading Expensive Writes memory DBMSs are better positioned commit time. We are particularly fond for Cheaper Reads to use NVM when it is finally available. of this paper because it helps in better This is because these systems are al- appreciating the intricacies involved Viglas, S.D. ready designed for byte-addressable in designing the recovery protocol in a Write-limited sorts and joins access methods, whereas legacy disk- for persistent memory. DBMS for guarding against data loss. Proceedings of the VLDB Endowment 7, 5 oriented DBMSs will require laborious (2014), 413–424. and costly overhauls in order to use Near-Instantaneous http://www.vldb.org/pvldb/vol7/p413-viglas.pdf NVM correctly, as described in these Recovery Protocols papers. Word is bond. The third paper focuses on the higher Arulraj, J., Pavlo, A., Dulloor, S.R. write costs and limited write-endur- Joy Arulraj is a Ph.D. candidate at Carnegie Mellon Let’s talk about storage and recovery University. He is interested in the design and ance problems of NVM. For several de- implementation of next-generation database methods for non-volatile memory management systems. database systems. cades algorithms have been designed Proceedings of the ACM SIGMOD International for the random-access machine model Andy Pavlo is an assistant professor of databaseology in the Department of Computer Science at Carnegie Conference on Management of Data, 2015, where reads and writes have the same Mellon University, Pittsburgh, PA. 707–722. cost. The emergence of NVM devices, http://queue.acm.org/rfp/vol14iss3.html where writes are more expensive than reads, opens up the design space for

This paper takes a different approach new write-limiting algorithms. It will Copyright held by authors. to performing durable atomic up- be fascinating to see researchers derive Publication rights licensed to ACM. $15.00.

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 55 contributed articles

DOI:10.1145/2934664 for interactive SQL queries and Pregel11 This open source computing framework for iterative graph algorithms. In the open source Apache Hadoop stack, unifies streaming, batch, and interactive big systems like Storm1 and Impala9 are data workloads to unlock new applications. also specialized. Even in the relational database world, the trend has been to BY MATEI ZAHARIA, REYNOLD S. XIN, PATRICK WENDELL, move away from “one-size-fits-all” sys- TATHAGATA DAS, MICHAEL ARMBRUST, ANKUR DAVE, tems.18 Unfortunately, most big data XIANGRUI MENG, JOSH ROSEN, SHIVARAM VENKATARAMAN, applications need to combine many MICHAEL J. FRANKLIN, ALI GHODSI, JOSEPH GONZALEZ, different processing types. The very SCOTT SHENKER, AND ION STOICA nature of “big data” is that it is diverse and messy; a typical pipeline will need MapReduce-like code for data load- ing, SQL-like queries, and iterative machine learning. Specialized engines Apache Spark: can thus create both complexity and inefficiency; users must stitch together disparate systems, and some applica- tions simply cannot be expressed effi- ciently in any engine. A Unified In 2009, our group at the Univer- sity of California, Berkeley, started the Apache Spark project to design a unified engine for distributed data Engine for processing. Spark has a programming model similar to MapReduce but ex- tends it with a data-sharing abstrac- tion called “Resilient Distributed Da- Big Data tasets,” or RDDs.25 Using this simple extension, Spark can capture a wide range of processing workloads that previously needed separate engines, Processing including SQL, streaming, machine learning, and graph processing2,26,6 (see Figure 1). These implementations use the same optimizations as special- ized engines (such as column-oriented processing and incremental updates) and achieve similar performance but THE GROWTH OF data volumes in industry and research run as libraries over a common en- poses tremendous opportunities, as well as tremendous gine, making them easy and efficient to compose. Rather than being specific computational challenges. As data sizes have outpaced the capabilities of single machines, users have needed key insights new systems to scale out computations to multiple ˽˽ A simple programming model can capture streaming, batch, and interactive nodes. As a result, there has been an explosion of workloads and enable new applications new cluster programming models targeting diverse that combine them. ˽˽ Apache Spark applications range from 1,4,7,10 computing workloads. At first, these models were finance to scientific data processing and combine libraries for SQL, machine relatively specialized, with new models developed for learning, and graphs. new workloads; for example, MapReduce4 supported ˽˽ In six years, Apache Spark has grown to 1,000 contributors and 13 batch processing, but Google also developed Dremel thousands of deployments.

56 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 Analyses performed using Spark of brain activity in a larval zebrafish: (left) matrix factorization to characterize functionally similar regions (as depicted by different colors) and (right) embedding dynamics of whole-brain activity into lower-dimensional trajectories. Source: Jeremy Freeman and Misha Ahrens, Janelia Research Campus, Howard Hughes Medical Institute, Ashburn, VA. to these workloads, we claim this result gine, Spark can run diverse functions applications that combine their func- is more general; when augmented with over the same data, often in memory. tions (such as video messaging and data sharing, MapReduce can emu- Finally, Spark enables new applica- Waze) that would not have been pos- late any distributed computation, so tions (such as interactive queries on a sible on any one device. it should also be possible to run many graph and streaming machine learn- Since its release in 2010, Spark other types of workloads.24 ing) that were not possible with previ- has grown to be the most active open Spark’s generality has several im- ous systems. One powerful analogy for source project or big data processing, portant benefits. First, applications the value of unification is to compare with more than 1,000 contributors. The are easier to develop because they use a smartphones to the separate portable project is in use in more than 1,000 or- unified API. Second, it is more efficient devices that existed before them (such ganizations, ranging from technology to combine processing tasks; whereas as cameras, cellphones, and GPS gad- companies to banking, retail, biotech- prior systems required writing the gets). In unifying the functions of these nology, and astronomy. The largest data to storage to pass it to another en- devices, smartphones enabled new publicly announced deployment has

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 57 contributed articles

Figure 1. Apache Spark software stack, with specialized processing libraries implemented returns a result to the program (here, over the core engine. the number of elements in the RDD) instead of defining a new RDD. Spark evaluates RDDs lazily, al- lowing it to find an efficient plan for Streaming SQL ML Graph the user’s computation. In particular, transformations return a new RDD ob- ject representing the result of a compu- tation but do not immediately compute it. When an action is called, Spark looks at the whole graph of transformations used to create an execution plan. For ex- ample, if there were multiple filter or map operations in a row, Spark can fuse them into one pass, or, if it knows that data is partitioned, it can avoid moving it over the network for groupBy.5 Users can thus build up programs modularly without losing performance. Finally, RDDs provide explicit sup- port for data sharing among compu- tations. By default, RDDs are “ephem- eral” in that they get recomputed each time they are used in an action (such as count). However, users can also more than 8,000 nodes.22 As Spark has across a cluster that can be manipu- persist selected RDDs in memory or grown, we have sought to keep building lated in parallel. Users create RDDs by for rapid reuse. (If the data does not on its strength as a unified engine. We applying operations called “transfor- fit in memory, Spark will also spill it (and others) have continued to build an mations” (such as map, filter, and to disk.) For example, a user searching integrated standard library over Spark, groupBy) to their data. through a large set of log files in HDFS with functions from data import to ma- Spark exposes RDDs through a func- to debug a problem might load just the chine learning. Users find this ability tional programming API in Scala, Java, error messages into memory across the powerful; in surveys, we find the major- Python, and R, where users can simply cluster by calling ity of users combine multiple of Spark’s pass local functions to run on the clus- libraries in their applications. ter. For example, the following Scala errors.persist() As parallel data processing becomes code creates an RDD representing the common, the composability of process- error messages in a log file, by search- After this, the user can run a variety of ing functions will be one of the most ing for lines that start with ERROR, and queries on the in-memory data: important concerns for both usability then prints the total number of errors: and performance. Much of data analy- // Count errors mentioning MySQL sis is exploratory, with users wishing to lines = spark.textFile(“hdfs://...”) errors.filter(s => s.contains(“MySQL”)) combine library functions quickly into errors = lines.filter( .c o u nt() a working pipeline. However, for “big s => s.startsWith(“ERROR”)) // Fetch back the time fields of errors that data” in particular, copying data be- println(“Total errors: “ + errors.count()) // mention PHP, assuming time is field #3: tween different systems is anathema to errors.filter(s => s.contains(“PHP”)) performance. Users thus need abstrac- The first line defines an RDD backed .map(line => line.split(‘\t’)(3)) tions that are general and composable. by a file in the Hadoop Distributed File .c ollect() In this article, we introduce the Spark System (HDFS) as a collection of lines of programming model and explain why it text. The second line calls the filter This data sharing is the main differ- is highly general. We also discuss how transformation to derive a new RDD ence between Spark and previous com- we leveraged this generality to build from lines. Its argument is a Scala puting models like MapReduce; other- other processing tasks over it. Finally, function literal or closure.a Finally, the wise, the individual operations (such we summarize Spark’s most common last line calls count, another type of as map and groupBy) are similar. Data applications and describe ongoing de- RDD operation called an “action” that sharing provides large speedups, often velopment work in the project. as much as 100×, for interactive que- ries and iterative algorithms.23 It is also Programming Model a The closures passed to Spark can call into any the key to Spark’s generality, as we dis- existing Scala or Python library or even refer- The key programming abstraction in ence variables in the outer program. Spark cuss later. Spark is RDDs, which are fault-toler- sends read-only copies of these variables to Fault tolerance. Apart from provid- ant collections of objects partitioned worker nodes. ing data sharing and a variety of paral-

58 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 contributed articles lel operations, RDDs also automatical- RDDs usually store only temporary SQL and DataFrames. One of the ly recover from failures. Traditionally, data within an application, though most common data processing para- distributed computing systems have some applications (such as the Spark digms is relational queries. Spark SQL2 provided fault tolerance through data SQL JDBC server) also share RDDs and its predecessor, Shark,23 imple- replication or checkpointing. Spark across multiple users.2 Spark’s de- ment such queries on Spark, using uses a different approach called “lin- sign as a storage-system-agnostic techniques similar to analytical da- eage.”25 Each RDD tracks the graph of engine makes it easy for users to run tabases. For example, these systems transformations that was used to build computations against existing data support columnar storage, cost-based it and reruns these operations on base and join diverse data sources. optimization, and code generation for data to reconstruct any lost partitions. query execution. The main idea behind For example, Figure 2 shows the RDDs in Higher-Level Libraries these systems is to use the same data our previous query, where we obtain the The RDD programming model pro- layout as analytical databases—com- time fields of errors mentioning PHP by vides only distributed collections of pressed columnar storage—inside applying two filters and a map. If any objects and functions to run on them. RDDs. In Spark SQL, each record in an partition of an RDD is lost (for example, Using RDDs, however, we have built RDD holds a series of rows stored in bi- if a node holding an in-memory partition a variety of higher-level libraries on nary format, and the system generates of errors fails), Spark will rebuild it by Spark, targeting many of the use cas- applying the filter on the corresponding es of specialized computing engines. Figure 2. Lineage graph for the third query in our example; boxes represent RDDs, and block of the HDFS file. For “shuffle” op- The key idea is that if we control the arrows represent transformations. erations that send data from all nodes to data structures stored inside RDDs, all other nodes (such as reduceByKey), the partitioning of data across nodes, lines senders persist their output data locally and the functions run on them, we can filter(line.startsWith(“ERROR”)) in case a receiver fails. implement many of the execution tech- Lineage-based recovery is signifi- niques in other engines. Indeed, as we errors cantly more efficient than replication show in this section, these libraries filter(line.contains(“PHP”))) in data-intensive workloads. It saves often achieve state-of-the-art perfor- PHP errors both time, because writing data over mance on each task while offering sig- map(line.split(‘\t’)(3)) the network is much slower than writ- nificant benefits when users combine time fields ing it to RAM, and storage space in them. We now discuss the four main memory. Recovery is typically much libraries included with Apache Spark. faster than simply rerunning the pro- gram, because a failed node usually Figure 3. A Scala implementation of logistic regression via batch gradient descent in Spark. contains multiple RDD partitions, and these partitions can be rebuilt in paral- // Load data into an RDD lel on other nodes. val points = sc.textFile(...).map(readPoint).persist() A longer example. As a longer exam- // Start with a random parameter vector ple, Figure 3 shows an implementa- var w = DenseVector.random(D) tion of logistic regression in Spark. It uses batch gradient descent, a // On each iteration, update param vector with a sum simple iterative algorithm that for (i <- 1 to ITERATIONS) { val gradient = points.map { p => computes a gradient function over p.x * (1/(1+exp(-p.y*(w.dot(p.x))))-1) * p.y the data repeatedly as a parallel }.reduce((a, b) => a+b) sum. Spark makes it easy to load the w -= gradient data into RAM once and run multiple } sums. As a result, it runs faster than traditional MapReduce. For example, in a 100GB job (see Figure 4), MapRe- Figure 4. Performance of logistic regression in Hadoop MapReduce vs. Spark for 100GB of data on 50 m2.4xlarge EC2 nodes. duce takes 110 seconds per iteration because each iteration loads the data from disk, while Spark takes only one Hadoop Spark second per iteration after the first load. 2,500 Integration with storage systems. 2,000 Much like Google’s MapReduce, 1,500 Spark is designed to be used with 1,000 multiple external systems for per- 500 sistent storage. Spark is most com- Running Time (s) 0 monly used with cluster file systems 1 5 10 20 like HDFS and key-value stores like Number of Iterations S3 and Cassandra. It can also connect with Apache Hive as a data catalog.

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 59 contributed articles

code to run directly against this layout. means model) are easily passed to oth- Beyond running SQL queries, er libraries. Apart from compatibility we have used the Spark SQL engine at the API level, composition in Spark to provide a higher-level abstrac- is also efficient at the execution level, tion for basic data transformations because Spark can optimize across pro- called DataFrames,2 which are RDDs Spark has a similar cessing libraries. For example, if one li- of records with a known schema. programming brary runs a map function and the next DataFrames are a common abstraction library runs a map on its result, Spark for tabular data in R and Python, with model to will fuse these operations into a single programmatic methods for filtering, MapReduce but map. Likewise, Spark’s fault recovery computing new columns, and aggrega- works seamlessly across these librar- tion. In Spark, these operations map extends it with ies, recomputing lost data no matter down to the Spark SQL engine and re- which libraries produced it. ceive all its optimizations. We discuss a data-sharing Performance. Given that these librar- DataFrames more later. abstraction ies run over the same engine, do they One technique not yet implemented lose performance? We found that by in Spark SQL is indexing, though other called “resilient implementing the optimizations we libraries over Spark (such as Indexe- distributed just outlined within RDDs, we can often dRDDs3) do use it. match the performance of specialized Spark Streaming. Spark Streaming26 datasets,” or RDDs. engines. For example, Figure 6 com- implements incremental stream pro- pares Spark’s performance on three cessing using a model called “discretized simple tasks—a SQL query, stream- streams.” To implement streaming over ing word count, and Alternating Least Spark, we split the input data into small Squares matrix factorization—versus batches (such as every 200 milliseconds) other engines. While the results vary that we regularly combine with state across workloads, Spark is generally stored inside RDDs to produce new re- comparable with specialized systems sults. Running streaming computations like Storm, GraphLab, and Impala.b For this way has several benefits over tradi- stream processing, although we show tional distributed streaming systems. results from a distributed implementa- For example, fault recovery is less expen- tion on Storm, the per-node through- sive due to using lineage, and it is pos- put is also comparable to commercial sible to combine streaming with batch streaming engines like Oracle CEP.26 and interactive queries. Even in highly competitive bench- GraphX. GraphX6 provides a graph marks, we have achieved state-of-the- computation interface similar to Pregel art performance using Apache Spark. and GraphLab,10,11 implementing the In 2014, we entered the Daytona Gray- same placement optimizations as these Sort benchmark (http://sortbench- systems (such as vertex partitioning mark.org/) involving sorting 100TB of schemes) through its choice of parti- data on disk, and tied for a new record tioning function for the RDDs it builds. with a specialized system built only MLlib. MLlib,14 Spark’s machine for sorting on a similar number of ma- learning library, implements more chines. As in the other examples, this than 50 common algorithms for dis- was possible because we could imple- tributed model training. For example, it ment both the communication and includes the common distributed algo- CPU optimizations necessary for large- rithms of decision trees (PLANET), La- scale sorting inside the RDD model. tent Dirichlet Allocation, and Alternat- ing Least Squares matrix factorization. Applications Combining processing tasks. Spark’s Apache Spark is used in a wide range libraries all operate on RDDs as the of applications. Our surveys of Spark data abstraction, making them easy to combine in applications. For example, b One area in which other designs have outper- Figure 5 shows a program that reads formed Spark is certain graph computations.12,16 some historical Twitter data using However, these results are for algorithms with Spark SQL, trains a K-means clustering low ratios of computation to communication model using MLlib, and then applies (such as PageRank) where the latency from syn- chronized communication in Spark is signifi- the model to a new stream of tweets. cant. In applications with more computation The data tasks returned by each library (such as the ALS algorithm) distributing the ap- (here the historic tweet RDD and the K- plication on Spark still helps.

60 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 contributed articles users have identified more than 1,000 making applications. Published use streaming with batch and interactive companies using Spark, in areas from cases for Spark Streaming include queries. For example, video company Web services to biotechnology to fi- network security monitoring at Cis- Conviva uses Spark to continuously nance. In academia, we have also seen co, prescriptive analytics at Samsung maintain a model of content distribu- applications in several scientific do- SDS, and log mining at Netflix. Many tion server performance, querying it mains. Across these workloads, we find of these applications also combine automatically when it moves clients users take advantage of Spark’s gener- ality and often combine multiple of its Figure 5. Example combining the SQL, machine learning, and streaming libraries in Spark. libraries. Here, we cover a few top use // Load historical data as an RDD using Spark SQL cases. Presentations on many use cases val trainingData = sql( are also available on the Spark Summit “SELECT location, language FROM old_tweets”) conference website (http://www.spark- // Train a K-means model using MLlib summit.org). val model = new KMeans() Batch processing. Spark’s most com- .setFeaturesCol(“location”) mon applications are for batch proc- .setPredictionCol(“language”) essing on large datasets, including .fit(trainingData) // Apply the model to new tweets in a stream Extract-Transform-Load workloads to TwitterUtils.createStream(...) convert data from a raw format (such .map(tweet => model.predict(tweet.location)) as log files) to a more structured for- mat and offline training of machine learning models. Published examples Figure 6. Comparing Spark’s performance with several widely used specialized systems for SQL, streaming, and machine learning. Data is from Zaharia24 (SQL query and stream- of these workloads include page per- ing word count) and Sparks et al.17 (alternating least squares matrix factorization). sonalization and recommendation at Yahoo!; managing a data lake at Gold- man Sachs; graph mining at Alibaba; Response Time Throughput Response Time financial Value at Risk calculation; and (sec) (records/s) (hours) text mining of customer feedback at 20 10 x 106 6 Toyota. The largest published use case

Spark 5 we are aware of is an 8,000-node cluster 8 at Chinese social network Tencent that 15 MATLAB 4 Impala (disk) 22 Spark (disk) ingests 1PB of data per day. 6 While Spark can process data in 10 3 memory, many of the applications in 4 Storm this category run only on disk. In such 2 Impala (mem)

5 Mahout cases, Spark can still improve perfor- 2 mance over MapReduce due to its sup- Redshift 1 Spark (mem) Spark port for more complex operator graphs. GraphLab 0 0 0 Interactive queries. Interactive use of SQL Streaming Machine Learning Spark falls into three main classes. First, organizations use Spark SQL for rela- tional queries, often through business- intelligence tools like Tableau. Examples Figure 7. PanTera, a visualization application built on Spark that can interactively filter data. include eBay and Baidu. Second, devel- opers and data scientists can use Spark’s Scala, Python, and R interfaces interac- tively through shells or visual notebook environments. Such interactive use is crucial for asking more advanced ques- tions and for designing models that eventually lead to production applica- tions and is common in all deployments. Third, several vendors have developed domain-specific interactive applications that run on Spark. Examples include Tresata (anti-money laundering), Tri- facta (data cleaning), and PanTera (large- scale visualization, as in Figure 7). Stream processing. Real-time proc- essing is also a popular use case, both Source: PanTera in analytics and in real-time decision-

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 61 contributed articles

across servers, in an application that queries during live experiments. Figure Figure 9, most organizations use mul- requires substantial parallel work for 8 shows an example image generated tiple components; 88% use at least two both model maintenance and queries. using Spark. of them, 60% use at least three (such Scientific applications. Spark has also Spark components used. Because as Spark Core and two libraries), and been used in several scientific domains, Spark is a unified data-processing en- 27% use at least four components. including large-scale spam detection,19 gine, the natural question is how many Deployment environments. We also image processing,27 and genomic data of its libraries organizations actually see growing diversity in where Apache processing.15 One example that com- use. Our surveys of Spark users have Spark applications run and what data bines batch, interactive, and stream shown that organizations do, indeed, sources they connect to. While the first processing is the Thunder platform use multiple components, with over Spark deployments were generally in for neuroscience at Howard Hughes 60% of organizations using at least Hadoop environments, only 40% of de- Medical Institute, Janelia Farm.5 It is three of Spark’s APIs. Figure 9 out- ployments in our July 2015 Spark sur- designed to process brain-imaging data lines the usage of each component in vey were on the Hadoop YARN cluster from experiments in real time, scaling a July 2015 Spark survey by Databricks manager. In addition, 52% of respon- up to 1TB/hour of whole-brain imaging that reached 1,400 respondents. We dents ran Spark on a public cloud. data from organisms (such as zebrafish list the Spark Core API (just RDDs) and mice). Using Thunder, researchers as one component and the higher- Why Is the Spark Model General? can apply machine learning algorithms level libraries as others. We see that While Apache Spark demonstrates (such as clustering and Principal Com- many components are widely used, that a unified cluster programming ponent Analysis) to identify neurons in- with Spark Core and SQL as the most model is both feasible and useful, it volved in specific behaviors. The same popular. Streaming is used in 46% of would be helpful to understand what code can be run in batch jobs on data organizations and machine learning makes cluster programming models from previous runs or in interactive in 54%. While not shown directly in general, along with Spark’s limita- tions. Here, we summarize a discus- Figure 8. Visualization of neurons in the zebrafish brain created with Spark, where each sion on the generality of RDDs from neuron is colored based on the direction of movement that correlates with its activity. 24 Source: Jeremy Freeman and Misha Ahrens of Janelia Research Campus. Zaharia. We study RDDs from two perspectives. First, from an expres- siveness point of view, we argue that RDDs can emulate any distributed computation, and will do so efficient- ly in many cases unless the computa- tion is sensitive to network latency. Second, from a systems point of view, we show that RDDs give applications control over the most common bottle- neck resources in clusters—network and storage I/O—and thus make it possible to express the same optimizations for these resources that characterize specialized systems. Expressiveness perspective. To study the expressiveness of RDDs, we start by com- paring RDDs to the MapReduce model, which RDDs build on. The first question is what computations can MapReduce Figure 9. Percent of organizations using each Spark component, from the Databricks 2015 itself express? Although there have been Spark survey; https://databricks.com/blog/2015/09/24/. numerous discussions about the limita- tions of MapReduce, the surprising an- swer here is that MapReduce can emu- Core late any distributed computation. SQL To see this, note that any distributed computation consists of nodes that per- Streaming form local computation and occasionally MLlib exchange messages. MapReduce offers the map operation, which allows local GraphX computation, and reduce, which allows 0% 20% 40% 60% 80% 100% all-to-all communication. Any distrib- uted computation can thus be emulated, Fraction of Users perhaps somewhat inefficiently, by breaking down its work into timesteps,

62 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 contributed articles running maps to perform the local bottleneck resources in cluster com- Links. Each node has a 10Gbps computation in each timestep, and putations? And can RDDs use them ef- (1.3GB/s) link, or approximately 40× batching and exchanging messages at ficiently? Although cluster applications less than its memory bandwidth and the end of each step using a reduce. A are diverse, they are all bound by the 2× less than its aggregate disk band- series of MapReduce steps will capture same properties of the underlying hard- width; and the whole result, as in Figure 10. Re- ware. Current datacenters have a steep Racks. Nodes are organized into racks cent theoretical work has formalized storage hierarchy that limits most ap- of 20 to 40 machines, with 40Gbps– this type of emulation by showing that plications in similar ways. For example, 80Gbps bandwidth out of each rack, MapReduce can simulate many com- a typical Hadoop cluster might have the or 2×–5× lower than the in-rack net- putations in the Parallel Random Ac- following characteristics: work performance. cess Machine model.8 Repeated Map- Local storage. Each node has local Given these properties, the most Reduce is also equivalent to the Bulk memory with approximately 50GB/s important performance concern in Synchronous Parallel model.20 of bandwidth, as well as 10 to 20 lo- many applications is the placement of While this line of work shows that cal disks, for approximately 1GB/s to data and computation in the network. MapReduce can emulate arbitrary 2GB/s of disk bandwidth; Fortunately, RDDs provide the facili- computations, two problems can make the “constant factor” behind Figure 10. Emulating an arbitrary distributed computation with MapReduce. this emulation high. First, MapReduce is inefficient at sharing data across timesteps because it relies on repli- map cated external storage systems for this purpose. Our emulated system may reduce thus become slower due to writing out its state after each step. Second, the latency of the MapReduce steps (a) MapReduce provides primitives determines how well our emulation for local computation and all-to-all will match a real network, and most communication. Map-Reduce implementations were designed for batch environments with minutes to hours of latency. . . . RDDs and Spark address both of (b) By chaining these steps together, these limitations. On the data-sharing we can emulate any distributed computation. The main costs for this front, RDDs make data sharing fast by emulation are the latency of the rounds avoiding replication of intermediate data and the overhead of passing state and can closely emulate the in-memory across steps. “data sharing” across time that would happen in a system composed of long- running processes. On the latency front, Figure 11. Example of Spark’s DataFrame API in Python. Unlike Spark’s core API, DataFrames have a schema with named columns (such as age and city) and take expressions in a limited Spark can run MapReduce-like steps language (such as age > 20) instead of arbitrary Python functions. on large clusters with 100ms latency; nothing intrinsic to the MapReduce model prevents this. While some applications users.where(users[“age”] > 20) .groupBy(“city”) need finer-grain timesteps and commu- .agg(avg(“age”), max(“income”)) nication, this 100ms latency is enough to implement many data-intensive workloads, where the amount of com- putation that can be batched before a Figure 12. Working with DataFrames in Spark’s R API. We load a distributed DataFrame using Spark’s JSON data source, then filter and aggregate using standard R column ex- communication step is high. pressions. In summary, RDDs build on Map-

Reduce’s ability to emulate any dis- people <- read.df(context, “./people.json”, “json”) tributed computation but make this emulation significantly more efficient. # Filter people by age Their main limitation is increased adults = filter(people, people$age > 20) latency due to synchronization in each # Count number of people by country communication step, but this latency summarize(groupBy(adults, adults$city), count=n(adults$id)) is often not a factor. ## city count Systems perspective. Independent ##1 Cambridge 1 ##2 San Francisco 6 of the emulation approach to char- ##3 Berkeley 4 acterizing Spark’s generality, we can take a systems approach. What are the

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 63 contributed articles

ties to control this placement; the in- ity in new libraries. More than 200 third- relational optimizations under a data terface lets applications place com- party packages are also available.c In the frame API.d putations near input data (through research community, multiple projects While DataFrames are still new, an API for “preferred locations” for at Berkeley, MIT, and Stanford build on they have quickly become a popular input sources25), and RDDs provide Spark, and many new libraries (such API. In our July 2015 survey, 60% of control over data partitioning and co- as GraphX and Spark Streaming) came respondents reported using them. Be- location (such as specifying that data from research groups. Here, we sketch cause of the success of DataFrames, be hashed by a given key). Libraries four of the major efforts. we have also developed a type-safe in- (such as GraphX) can thus implement DataFrames and more declarative terface over them called Datasetse that the same placement strategies used in APIs. The core Spark API was based on lets Java and Scala programmers view specialized systems.6 functional programming over distrib- DataFrames as statically typed col- Beyond network and I/O bandwidth, uted collections that contain arbitrary lections of Java objects, similar to the the most common bottleneck tends to be types of Scala, Java, or Python objects. RDD API, and still receive relational CPU time, especially if data is in memo- While this approach was highly ex- optimizations. We expect these APIs ry. In this case, however, Spark can run pressive, it also made programs more to gradually become the standard ab- the same algorithms and libraries used difficult to automatically analyze and straction for passing data between in specialized systems on each node. For optimize. The Scala/Java/Python ob- Spark libraries. example, it uses columnar storage and jects stored in RDDs could have com- Performance optimizations. Much of processing in Spark SQL, native BLAS plex structure, and the functions run the recent work in Spark has been on per- libraries in MLlib, and so on. As we over them could include arbitrary formance. In 2014, the Databricks team discussed earlier, the only area where code. In many applications, develop- spent considerable effort to optimize RDDs clearly add a cost is network la- ers could get suboptimal performance Spark’s network and I/O primitives, al- tency, due to the synchronization at if they did not use the right operators; lowing Spark to jointly set a new record parallel communication steps. for example, the system on its own for the Daytona GraySort challenge.f One final observation from a systems could not push filter functions Spark sorted 100TB of data 3× faster perspective is that Spark may incur extra ahead of maps. than the previous record holder based costs over some of today’s special- To address this problem, we extend- on Hadoop MapReduce using 10× few- ized systems due to fault tolerance. ed Spark in 2015 to add a more declara- er machines. This benchmark was not For example, in Spark, the map tasks tive API called DataFrames2 based on executed in memory but rather on (solid- in each shuffle operation save their the relational algebra. Data frames are state) disks. In 2015, one major effort was output to local files on the machine a common API for tabular data in Py- Project Tungsten,g which removes Java where they ran, so reduce tasks can re- thon and R. A data frame is a set of re- Virtual Machine overhead from many of fetch it later. In addition, Spark imple- cords with a known schema, essentially Spark’s code paths by using code genera- ments a barrier at shuffle stages, so the equivalent to a database table, that tion and non-garbage-collected memory. reduce tasks do not start until all the supports operations like filtering One benefit of doing these optimizations maps have finished. This avoids some and aggregation using a restricted in a general engine is that they simulta- of the complexity that would be needed “expression” API. Unlike working in neously affect all of Spark’s libraries; for fault recovery if one “pushed” re- the SQL language, however, data frame machine learning, streaming, and SQL cords directly from maps to reduces in operations are invoked as function all became faster from each change. a pipelined fashion. Although removing calls in a more general programming R language support. The SparkR some of these features would speed language (such as Python and R), al- project21 was merged into Spark in up the system, Spark often performs lowing developers to easily structure 2015 to provide a programming inter- competitively despite them. The main their program using abstractions in the face in R. The R interface is based on reason is an argument similar to our host language (such as functions and DataFrames and uses almost identical previous one: many applications are classes). Figure 11 and Figure 12 show syntax to R’s built-in data frames. Oth- bound by an I/O operation (such as examples of the API. er Spark libraries (such as MLlib) are shuffling data across the network or Spark’s DataFrames offer a similar also easy to call from R, because they reading it from disk) and beyond this API to single-node packages but auto- accept DataFrames as input. operation, optimizations (such as matically parallelize and optimize the Research libraries. Apache Spark pipelining) add only a modest benefit. computation using Spark SQL’s query continues to be used to build higher- We have kept fault tolerance “on” by planner. User code thus receives op- default in Spark to make it easy to reason timizations (such as predicate push- about applications. down, operator reordering, and join d One reason optimization is possible is that Spark’s DataFrame API uses lazy evaluation algorithm selection) that were not where the content of a DataFrame is not com- Ongoing Work available under Spark’s functional API. puted until the user asks to write it out. The Apache Spark remains a rapidly evolv- To our knowledge, Spark DataFrames data frame APIs in R and Python are eager, pre- ing project, with contributions from are the first library to perform such venting optimizations like operator reordering. e https://databricks.com/blog/2016/01/04/in- both industry and research. The code- troducing-spark-datasets.html base size has grown by a factor of six c One package index is available at https:// f http://sortbenchmark.org/ApacheSpark2014.pdf since June 2013, with most of the activ- spark-packages.org/ g https://databricks.com/blog/2015/04/28/

64 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 contributed articles

level data processing libraries. Recent amplab/spark-indexedrdd S., and Stoica, I. Shark: SQL and rich analytics at scale. 4. Dean, J. and Ghemawat, S. MapReduce: Simplified In Proceedings of the ACM SIGMOD/PODS Conference projects include Thunder for neurosci- data processing on large clusters. In Proceedings of (New York, June 22–27). ACM Press, New York, 2013. ence,5 ADAM for genomics,15 and Kira the Sixth OSDI Symposium on Operating Systems 24. Zaharia, M. An Architecture for Fast and General Data 27 Design and Implementation (San Francisco, CA, Dec. Processing on Large Clusters. Ph.D. thesis, Electrical for image processing in astronomy. 6–8). USENIX Association, Berkeley, CA, 2004. Engineering and Computer Sciences Department, Other research libraries (such as 5. Freeman, J., Vladimirov, N., Kawashima, T., Mu, Y., University of California, Berkeley, 2014; https://www.eecs. Sofroniew, N.J., Bennett, D.V., Rosen, J., Yang, C.-T., berkeley.edu/Pubs/TechRpts/2014/EECS-2014-12.pdf GraphX) have been merged into the Looger, L.L., and Ahrens, M.B. Mapping brain activity 25. Zaharia, M. et al. Resilient distributed datasets: A main codebase. at scale with cluster computing. Nature Methods 11, 9 fault-tolerant abstraction for in-memory cluster (Sept. 2014), 941–950. computing. In Proceedings of the Ninth USENIX 6. Gonzalez, J.E. et al. GraphX: Graph processing in a NSDI Symposium on Networked Systems Design and Conclusion distributed dataflow framework. InProceedings of the Implementation (San Jose, CA, Apr. 25–27, 2012). 11th OSDI Symposium on Operating Systems Design 26. Zaharia, M. et al. Discretized streams: Fault-tolerant Scalable data processing will be es- and Implementation (Broomfield, CO, Oct. 6–8). streaming computation at scale. In Proceedings of sential for the next generation of USENIX Association, Berkeley, CA, 2014. the 24th ACM SOSP Symposium on Operating Systems 7. Isard, M. et al. Dryad: Distributed data-parallel Principles (Farmington, PA, Nov. 3–6). ACM Press, New computer applications but typically programs from sequential building blocks. In York, 2013. involves a complex sequence of pro- Proceedings of the EuroSys Conference (Lisbon, 27. Zhang, Z., Barbary, K., Nothaft, N.A., Sparks, E., Zahn, Portugal, Mar. 21–23). ACM Press, New York, 2007. O., Franklin, M.J., Patterson, D.A., and Perlmutter, S. cessing steps with different com- 8. Karloff, H., Suri, S., and Vassilvitskii, S. A model Scientific Computing Meets Big Data Technology: of computation for MapReduce. In Proceedings An Astronomy Use Case. In Proceedings of IEEE puting systems. To simplify this of the ACM-SIAM SODA Symposium on Discrete International Conference on Big Data (Santa Clara, task, the Spark project introduced Algorithms (Austin, TX, Jan. 17–19). ACM Press, CA, Oct. 29–Nov. 1). IEEE, 2015. New York, 2010. a unified programming model and 9. Kornacker, M. et al. Impala: A modern, open-source engine for big data applications. Our SQL engine for Hadoop. In Proceedings of the Seventh Matei Zaharia ([email protected]) is an assistant Biennial CIDR Conference on Innovative Data professor of computer science at Stanford University, experience shows such a model can Systems Research (Asilomar, CA, Jan. 4–7, 2015). Stanford, CA, and CTO of Databricks, San Francisco, CA. efficiently support today’s workloads 10. Low, Y. et al. Distributed GraphLab: A framework Reynold S. Xin ([email protected]) is the chief architect for machine learning and data mining in the cloud. on the Spark team at Databricks, San Francisco, CA. and brings substantial benefits to users. In Proceedings of the 38th International VLDB We hope Apache Spark highlights the Conference on Very Large Databases (Istanbul, Patrick Wendell ([email protected]) is the vice president of engineering at Databricks, San Francisco, CA. importance of composability in pro- Turkey, Aug. 27–31, 2012). 11. Malewicz, G. et al. Pregel: A system for large-scale Tathagata Das ([email protected]) is a software gramming libraries for big data and graph processing. In Proceedings of the ACM engineer at Databricks, San Francisco, CA. SIGMOD/PODS Conference (Indianapolis, IN, June encourages development of more eas- 6–11). ACM Press, New York, 2010. Michael Armbrust ([email protected]) is a ily interoperable libraries. 12. McSherry, F., Isard, M., and Murray, D.G. Scalability! software engineer at Databricks, San Francisco, CA. th But at what COST? In Proceedings of the 15 Ankur Dave ([email protected]) is a graduate All Apache Spark libraries described HotOS Workshop on Hot Topics in Operating Systems student in the Real-Time, Intelligent and Secure Systems in this article are open source at http:// (Kartause Ittingen, Switzerland, May 18–20). USENIX Lab at the University of California, Berkeley. Association, Berkeley, CA, 2015. spark.apache.org/. Databricks has 13. Melnik, S. et al. Dremel: Interactive analysis of Web- Xiangrui Meng ([email protected]) is a software also made videos of all Spark Summit scale datasets. Proceedings of the VLDB Endowment 3 engineer at Databricks, San Francisco, CA. (Sept. 2010), 330–339. Josh Rosen ([email protected]) is a software conference talks available for free at 14. Meng, X., Bradley, J.K., Yavuz, B., Sparks, E.R., engineer at Databricks, San Francisco, CA. https://spark-summit.org/. Venkataraman, S., Liu, D., Freeman, J., Tsai, D.B., Amde, M., Owen, S., Xin, D., Xin, R., Franklin, M.J., Shivaram Venkataraman ([email protected]) Zadeh, R., Zaharia, M., and Talwalkar, A. MLlib: is a Ph.D. student in the AMPLab at the University of Acknowledgments Machine learning in Apache Spark. Journal of Machine California, Berkeley. Learning Research 17, 34 (2016), 1–7. Michael Franklin ([email protected]) is the Liew Apache Spark is the work of hun- 15. Nothaft, F.A., Massie, M., Danford, T., Zhang, Z., Family Chair of Computer Science at the University of dreds of open source contributors Laserson, U., Yeksigian, C., Kottalam, J., Ahuja, A., Chicago and Director of the AMPLab at the University of Hammerbacher, J., Linderman, M., Franklin, M.J., California, Berkeley. who are credited in the release notes Joseph, A.D., and Patterson, D.A. Rethinking data- intensive science using scalable analytics systems. Ali Ghodsi ([email protected]) is the CEO of Databricks at https://spark.apache.org. Berke- In Proceedings of the SIGMOD/PODS Conference and adjunct faculty at the University of California, ley’s research on Spark was sup- (Melbourne, Australia, May 31–June 4). ACM Press, Berkeley. ported in part by National Science New York, 2015. Joseph E. Gonzalez ([email protected]) is an 16. Shun, J. and Blelloch, G.E. Ligra: A lightweight assistant professor in EECS at the University of California, Foundation CISE Expeditions Award graph processing framework for shared memory. Berkeley. In Proceedings of the 18th ACM SIGPLAN PPoPP CCF-1139158, Lawrence Berkeley Symposium on Principles and Practice of Parallel Scott Shenker ([email protected]) is a professor National Laboratory Award 7076018, Programming (Shenzhen, China, Feb. 23–27). ACM in EECS at the University of California, Berkeley. and DARPA XData Award FA8750- Press, New York, 2013. Ion Stoica ([email protected]) is a professor in 17. Sparks, E.R., Talwalkar, A., Smith, V., Kottalam, EECS and co-director of the AMPLab at the University of 12-2-0331, and gifts from Amazon J., Pan, X., Gonzalez, J.E., Franklin, M.J., Jordan, California, Berkeley. M.I., and Kraska, T. MLI: An API for distributed Web Services, Google, SAP, IBM, The machine learning. In Proceedings of the IEEE ICDM Thomas and Stacey Siebel Founda- International Conference on Data Mining (Dallas, TX, Copyright held by the authors. Dec. 7–10). IEEE Press, 2013. Publication rights licensed to ACM. $15.00 tion, Adobe, Apple, Arimo, Blue 18. Stonebraker, M. and Cetintemel, U. ‘One size fits all’: An Goji, Bosch, C3Energy, Cisco, Cray, idea whose time has come and gone. In Proceedings of the 21st International ICDE Conference on Data Cloudera, EMC2, Ericsson, Face- Engineering (Tokyo, Japan, Apr. 5–8). IEEE Computer book, Guavus, Huawei, Informatica, Society, Washington, D.C., 2005, 2–11. 19. Thomas, K., Grier, C., Ma, J., Paxson, V., and Song, Intel, Microsoft, NetApp, Pivotal, D. Design and evaluation of a real-time URL spam Samsung, Schlumberger, Splunk, filtering service. InProceedings of the IEEE Symposium on Security and Privacy (Oakland, CA, May Virdata, and VMware. 22–25). IEEE Press, 2011. 20. Valiant, L.G. A bridging model for parallel computation. Commun. ACM 33, 8 (Aug. 1990), 103–111. References 21. Venkataraman, S. et al. SparkR; http://dl.acm.org/ 1. Apache Storm project; http://storm.apache.org citation.cfm?id=2903740&CFID=687410325&CFTO 2. Armbrust, M. et al. Spark SQL: Relational data KEN=83630888 processing in Spark. In Proceedings of the ACM 22. Xin, R. and Zaharia, M. Lessons from running large- Watch the authors discuss SIGMOD/PODS Conference (Melbourne, Australia, May scale Spark workloads; http://tinyurl.com/large- their work in this exclusive 31–June 4). ACM Press, New York, 2015. scale-spark Communications video. 3. Dave, A. Indexedrdd project; http://github.com/ 23. Xin, R.S., Rosen, J., Zaharia, M., Franklin, M.J., Shenker, http://cacm.acm.org/videos/spark

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 65 contributed articles

DOI:10.1145/2934663 Our review of tools available to im- Enterprises that impose stringent prove attack-resistance finds that, for example, compelling returns are password-composition policies appear offered by password blacklists, throt- to suffer the same fate as those that do not. tling, and hash iteration, while cur- rent password-composition policies BY DINEI FLORÊNCIO, CORMAC HERLEY, AND PAUL C. VAN OORSCHOT fail to provide demonstrable improve- ment in outcomes against offline guessing attacks. Suppose a system administrator is tasked with defending a corpo- Pushing rate, government, or university net- work or site. The user population’s passwords are targeted by attackers seeking access to network resources. on String: Passwords can be attacked by guess- ing attacks—online or offline—and by capture attacks, that is, by non- guessing attacks. How to choose The ‘Don’t Care’ among password-attack mitigations is a practical question faced by mil- lions of administrators. Little action- able guidance exists on how to do so Region of in a principled fashion. Sensible poli- cies must consider how to protect the population of user accounts. What is the best measure of the strength of a population of passwords? Is a good Password proxy for the overall ability of the network to resist guessing attacks the average, median, strongest, or weakest password? Slogans (such as Strength suggesting that all passwords be “as strong as possible”) are too vague to guide action—and also suggest that infinite user effort is both available and achievable. WE EXAMINE THE efficacy of tactics for defending key insights password-protected networks from guessing attacks, ˽˽ It has long been accepted that making taking the viewpoint of an enterprise administrator users choose more complex passwords is the price they must pay to make their whose objective is to protect a population of passwords. accounts safer; it turns out this line of thinking misunderstands how attacks Simple analysis allows insights on the limits of common actually work.

approaches and reveals that some approaches spend ˽˽ Harder-to-guess passwords do not always reduce the likelihood of effort in “don’t care” regions where added password successful guessing attacks; in fact, in a large portion of the attack space, they strength makes no difference. This happens either when make no difference at all.

passwords do more than enough to resist online attacks ˽˽ Enterprises should focus on users with the most easily guessed passwords; an while falling short of what is needed against offline attacker probably gets all the access attacks or when so many accounts have fallen that needed just by compromising them, so improving other passwords denies the an attacker gains little from additional compromises. attacker very little.

66 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 The Compromise Saturation Point Suppose our hypothetical admin- ceed, the specific attack channel is To begin, consider an administrator istrator resets all accounts. There still not necessarily clear; it is not obvious who observes unusual behavior on remains the question: How were the whether the compromised accounts his network and suspects that some credentials obtained in the first place? had weak passwords, were spear- accounts have been compromised. If If the door that led to the compromise phished, or were drive-by download he thinks it is just a few accounts, and remains open (such as undetected victims. Further, if it is not known can identify them, he might just block keyloggers on several machines) then which accounts have fallen, it may be access to those. However, if he can- nothing improves after a systemwide best to reset all of them, even those not be sure that an account is com- credential reset. On the other hand, if not compromised. To facilitate more promised until he sees suspicious the credentials were compromised by precise reasoning, let α be the frac- activity, it is difficult to figure out the guessing, then a reset (at least tempo- tion of credentials under attacker magnitude of the problem. Should he rarily) helps, and a change in the poli- control—whether or not yet exploited. block access to all accounts and trig- cies that allowed vulnerable passwords Thus, when α = 0.5, half of the accounts ger a systemwide reset? If only a few might be in order. But even if he con- (passwords) have already fallen to an accounts have been compromised, cludes that password guessing was the attacker. At that point, would the ad- perhaps not; if half of them have attack channel, was it online guessing? ministrator consider the network only been, he almost certainly should. Or, somehow, did the attacker get hold 50% compromised, or fully overrun? What about a 1% compromise rate, or of the password hash file and succeed The answer depends of course on the 5%, or 10%? At what point is global re- with an offline guessing attack? nature of the network in question. If a

IMAGE BY RANGSAN PAIDAEN RANGSAN BY IMAGE set the right answer? When attacks on passwords suc- compromised account never has impli-

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 67 contributed articles

cations for any other account, then we credentials, to get others. Phishing might say the damage grows more-or- email messages that originate from less linearly with α. However, for an en- an internal account are far more likely terprise network, a compromised ac- to deceive co-workers. Depending on count almost certainly has snowballing the attacker’s objective, a toehold in effects;3 a single credential might give The argument— the network may be all she requires. access to many network resources, so “stronger The 2011 attack on RSA1 (which forced the damage to the network grows faster a recall of all SecurID tokens) began than α (one possible curve is shown in passwords are with phishing email messages to “two Figure 1). At α = 0.5, a system is argu- always better”— small groups of employees,”13 none ably completely overrun. For many en- “particularly high-profile or high-value terprise environments in this scenario, while deeply targets.” Dunagan et al.3 in examin- there would be few if any resources the ing a corporate network of more than attacker cannot access; in many social ingrained, thus 100,000 machines, found that 98.1% networks, the network value would appears untrue. of the machines allowed an outward approach zero, as spam would prob- snowballing effect allowing compro- ably render things unusable; access to mise of 1,000 additional machines. (probably well under) 50% of email in- Edward Snowden was able to compro- boxes likely yields a view to almost all mise an enormous fraction of secrets company email, as an attacker requires on the NSA network starting from just access to only one of the sender(s) or one account.14 Given that RSA and the recipient(s) of each message. NSA (organizations we might expect to All password-based systems must have above-average security standards) tolerate some compromise; passwords experienced catastrophic failures will be keylogged, cross-site scripted, caused by handfuls of credentials in and spear-phished, and a network un- attackers’ hands, we suggest a reason- able to handle this reality will not be able upper bound on the saturation able to function in a modern threat en- point for a corporate or government

vironment. As an attacker gains more network is αsat ≈ 0.1; saturation likely and more credentials in an enterprise occurs at much lower values. network, she naturally reaches some It seems likely that enterprises will

saturation point after which the im- have the lowest values of αsat; at con- pact of additional fallen credentials is sumer Web services, compromise of negligible, having relatively little effect one account has less potential to affect on the attacker’s ability to inflict harm. the whole network. As noted earlier, The first credential gives initial access our focus here is on enterprise; none- to the network, and the second, third, theless, we suggest that damage prob- and fourth solidify the beachhead, ably also grows faster than linearly but the benefit brought by each addi- at websites. For example, online ac- tional credential decreases steadily. counts at a bank should have minimal The gain is substantial when k is small, crossover effects, but at 25% compro- less when large; the second password mise all confidence in the legitimacy of guessed helps a lot more than, say, transaction requests and the privacy of password 102. customer data is likely lost.

Let αsat be the threshold value at For guessing attacks, the most eas- which the attacker effectively has con- ily guessed passwords fall first. It is trol of the password-protected part of thus the weakest passwords that de-

the system, in the sense that there is termine αsat; the number of guesses it negligible marginal gain from compro- takes to gather a cumulative fraction

mising additional credentials. That is, αsat of accounts is what it takes to reach if an attacker had control of a fraction the saturation point. Since the attack-

αsat of account credentials, there are er’s ability to harm saturates once she

very few resources that she could not reaches αsat, the excess strength of the

access; so the difference between con- remaining (1-αsat) of user passwords

trolling αsat and (αsat + ε) is negligible. is wasted. For example, the strongest In what follows, our main focus is en- 50% of passwords might indeed be very terprise networks to consider possible strong, but from a systemwide view-

values for αsat. point, that strength will come into play There are a variety of tools attack- only when the other 50% of credentials ers can use, once they have one set of has already been compromised—and

68 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 contributed articles the attacker already has the run of ev- order to verify correctness of guesses gradually with the number of guesses a erything that is password-protected in offline), and, in this case, an offline password will withstand. One hundred the enterprise network. attack is necessary only if that file has guesses per account might be easy for In summary, password strength, or been properly salted and hashed; oth- an online attacker, but 1,000 is some- guessing resistance, is not an abstract erwise, simpler attacks are possible6 what more difficult, and so on; at some quantity to be pursued for its own sake. (such as rainbow tables for unsalted point, online guessing is no longer fea- Rather, it is a tool we use to deny an hashed passwords). sible. Similarly, at some point the risk attacker access to network resources. There is an enormous difference be- from an offline attack begins to gradu-

There is a saturation point, αsat, where tween the strength required to resist on- ally decrease. Let T0 be a threshold rep- the network is so thoroughly penetrat- line and offline guessing attacks. Natu- resenting the maximum number of ed that additional passwords gain the rally, the probability of falling either to guesses expected from an online attack attacker very little; resistance to guess- an online or offline attack decreases and T1 correspondingly the minimum ing beyond that point is wasted since it denies the attacker nothing. There is Figure 1. The fraction of network resources under attacker control grows faster than the fraction of credentials compromised. thus a “don’t care” region after that sat- uration level of compromise, and for enterprise networks, it appears quite For example, given half of a system’s credentials, an attacker likely effectively has access to all resources. For enterprise networks, we expect the saturation threshold, αsat, where the network is reasonable to assume that αsat is no completely penetrated, is likely under 0.1. higher than 0.1. At that level, it is not simply the case that the weakest 10% of 1.0 credentials is most important, but that the excess strength of the remaining 90% is largely irrelevant to the adminis- trator’s goal of systemwide defense. Of course there may be secondary benefits to individual users in having their pass- word withstand guessing, even after

αsat has been exceeded. For example, if reused at another site, the user still has a significant stake in seeing the pass- word withstand guessing. Since our focus is on the administrator’s goal of protecting a single site, this is not part of our model. of resources controlled by attacker Fraction

αsat 1.0 The Online-Offline Chasm Fraction of credentials compromised α We next consider the difference be- tween online and offline guessing. In online attacks, an attacker checks guesses against the defender’s server, Figure 2. “Don’t care” regions where there is no return for increasing effort. that is, submitting guesses to the same server as legitimate users. In offline, T0 is the threshold above which online attacks cease to be a threat. she uses her own hardware resources, T1 is the threshold below which passwords almost surely will not survive credible offline attacks. is the threshold fraction of compromised accounts at which an attacker effectively has control including networked or cloud resourc- αsat of system resources. Examples for these parameters might be T = 106, T = 1014, and α = 0.1. es and machines equipped with graph- 0 1 sat ical processing units (GPUs) optimized 1 for typical hash computations required 0.9 to test candidate guesses. Offline at- 0.8 tacks can thus test many-orders-of- 0.7 Online-offline chasm magnitude more guesses than online 0.6 attacks, whether or not online attacks 0.5 are rate-limited by system defenses. Attacher control 0.4

We consider online and offline guess- tive c

of credentials compromised 0.3 ing attacks separately. ff e E An online attack is always possible α 0.2 αsat against a Web-facing service, assum- 0.1

ing it is easy to obtain or generate valid Fraction 0 T T account user IDs. An offline attack is 0 1 possible only if the attacker gains ac- log10 (Number of guesses) cess to the file of password hashes (in

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 69 contributed articles

number of guesses expected from a offline gap are thus in a “don’t care” From the administrator’s point of view, credible offline attack. (The asymme- region in which they do both too much the strongest password in the popula- try in these definitions is intentional to and not enough—too much if the at- tion, or having the greatest guessing- provide a conservative estimate in rea- tack vector is online guessing and not resistance (the password at α = 1.0) is

soning about the size of the gap.) A pass- enough if it is offline. Once a password similar to one at αsat + ε, in that both fall

word withstanding T0 guesses is then is able to withstand T0 guesses, to stop after the attacker’s ability to harm is al- safe from online guessing attacks, while additional attacks, any added strength ready saturated.

one that does not withstand T1 guesses must be sufficient to move to the right In summary, shaded regions denote

will certainly not survive offline attacks. of T1. While both online and offline at- these areas where there is no return- 6 Our own previous work suggests T0 ≈ tacks are countered by guessing-resis- on-effort for “extra strength”: first, 6 14 10 and T1 ≈ 10 are reasonable coarse tance, the amount needed varies enor- passwords whose guessing-resistance estimates, giving a gap eight orders of mously. In practical terms, distinct lies within the online-offline gap; and magnitude wide (see Figure 2); we em- defenses are required to stop offline second, passwords beyond where an phasize, however, that the arguments and online attacks. This online-offline attacker gains little from additional herein are generic, regardless of the ex- chasm then gives us a second “don’t credentials. The size of the “don’t

act values of T0 and T1. While estimates care” region, besides the one defined care” region naturally depends on the

for T1 in particular are inexact, depend- by αsat. particular values of αsat, T0, and T1, but ing as they do on assumptions about in all cases the shape of the password attacker hardware and strategy (a previ- The “Don’t Care” Region distribution (as defined by the col- ous estimate6 assumed four months of The compromise saturation point and ored curves in Figure 2) matters only

cracking against one million accounts the online-offline chasm each imply in the areas below αsat AND (left of T0

using 1,000 GPUs, each doing one bil- regions where there is no return on ef- OR right of T1). An important obser-

lion guesses per second), clearly T1 fort. The marginal return on effort is vation is that, under reasonable as-

vastly exceeds T0. zero for improving any password that sumptions, the “don’t care” regions

A critical observation is that for a starts in the zone greater than T0 yet cover a majority of the design space.

password P whose guessing-resistance remains less than T1 and for passwords The relatively small unshaded regions

falls between T0 and T1, incremental with guessing-resistance above the αsat are shown in Figure 2; outside these

effort that fails to move it beyond T1 threshold. Figure 2 depicts this situa- regions, changes to the password dis- is wasted in the sense that there is no tion, with shaded areas denoting the tribution accomplish nothing, at least guessing attack to which the original “don’t care” regions. A password with- from the administrator’s viewpoint.

P falls, that the stronger password re- standing T1 − ε guesses has the same To anchor the discussion, based on

sists, since guessing attacks are either survival properties as one surviving T0 + what we know of enterprise networks online or offline, with no continuum ε; both survive online but fall to offline and attacker abilities, we have offered 6 in between. Passwords in this online- attack, that is, equivalent outcomes. estimates of αsat = 0.1, T0 = 10 , and T1 = 1014; but choosing different values Figure 3. Defensive elements aiming to improve guessing resistance. does not alter the conclusion that in

(1) α sat (the point at which attacker control saturates) can be raised by implementing basic security large areas of the guess-resistance principles (such as least-privilege and compartmentalization). vs. credentials-compromised space, (2) T (the maximum number of online guesses) can be reduced by throttling mechanisms. 0 changing the distribution of user-cho- (3) T 1 (the minimum number to be expected from an offline attack) can be reduced by iterated hash functions. sen passwords improves little of use (4) The cumulative fraction of accounts that have fallen at a given number of attacker guesses can for the enterprise defender; it causes be reduced (pushing the blue curve down) by improving the guessing-resistance of user-chosen no direct damage but, like pushing on passwords (such as to the left of T by password blacklisting and by password composition 0 string, is ineffective and wasteful of policies generally).

Changing αsat, T0 and T1 alter the size of white- and blue-shaded “don’t care” regions. user energy. To understand the consequences of the “don’t-care” regions, Figure 2 also 1 depicts guessing resistance of three hy- 0.9 pothetical password distributions. The 0.8 2 3 blue (top) and green (middle) curves 0.7 4 diverge widely on the right side of the 0.6 figure; for a fixed number of guesses, 0.5 far fewer green-curve accounts will 0.4 be compromised than accounts from 0.3 4 the blue-curve distribution. The green 0.2 1 curve might appear better since those 4 α 0.1 sat passwords are much more guess-resis-

Fraction of credentials compromised α Fraction 0 tant than those from the blue curve. T T 0 1 Nonetheless, they have identical attack log (Number of guesses) 10 survival outcomes since their diver-

gence between T0 and T1 has minimal

70 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 contributed articles effect on performance against on- curve in Figure 2. However, striving for line or offline attacks, and divergence such resistance but falling short wastes above αsat happens only after the at- user effort and accomplishes nothing. tacker’s capacity for harm has already In Figure 2, all effort above and beyond plateaued. The red (lower) curve shows that necessary to withstand online at- a distribution that might survive an Password strength, tack is completely wasted in the two offline attack, as the curve lies below which actually upper curves. An organization that tries αsat, even at the number of guesses to withstand offline attacks but fails to an offline attacker might deliver. The means guessing have at least a fraction 1 − αsat of users enormous difficulty of getting users to resistance, is not survive T1 guesses fares no better than choose passwords that will withstand one that never made the attempt and offline guessing, and the waste that a universal good does worse if we assume user effort results unless almost all of them do, is a scarce resource.5 The evidence of has also been argued by Tippett.11 We to be pursued for recent work on cracking on real dis- 8,10 reemphasize that αsat, T0, and T1 are its inherent tributions suggests this is the fate of site- and implementation-dependent more or less all organizations that al- variables. We next examine how an ad- benefits; it is low user-chosen passwords, unless we ministrator can vary them to decrease useful only believe that αsat ≈ 0.1 is unreasonably the size of the “don’t care” zone. low or T1 ≈ 1014 is unreasonably high. to the extent The argument—“stronger passwords are What Should an it denies things always better”—while deeply ingrained, Administrator Optimize? thus appears untrue. Stronger pass- Ideally, a system’s population of pass- to adversaries. words lower the cumulative fraction words would withstand both online of accounts compromised at a given and offline attacks. For this to be so, number of guesses, that is, push the fraction of accounts compromised the curves in Figure 2 lower. How- must be lower than αsat at T1 guesses, ever, changes that occur within the and ideally much lower, since an of- shaded “don’t care” regions happen fline attacker may be able to go well when it no longer matters and do not beyond the expected minimum T1. improve outcomes. Unfortunately, recent work shows that It follows that a reasonable objec- user-chosen passwords do not even ap- tive is to maximize, within reasonable proach this level, even when stringent costs, guessing-resistance across a composition policies are enforced. For given system’s set of passwords at the example, Mazurek et al.10 found that expected online and offline number of 48% of Carnegie Mellon University guesses, subject to the constraint that passwords (following a length-eight the fraction of compromised accounts and four-out-of-four character sets stays below αsat. That is, so long as α 14 policy) fell within 10 guesses. On ex- < αsat, the lower α is at T0 (respectively amining eight different password-cre- T1) the better the resistance to online ation policies, Kelley et al.8 found that (respectively offline) attacks. For α > none kept the cumulative fraction of αsat, improvements that do not reduce accounts compromised below 10% by α below αsat are unrewarded. Focusing 1011 guesses. If we believe an attacker’s as our model does on a single site, we control saturates by the time a fraction remind readers that the additional ben-

αsat = 0.1 of accounts is compromised, efit of withstanding guessing to users and an offline attack can mount at who have reused their password at oth- 11 least T1 = 10 guesses per account, er sites is not captured in this model. there is thus little hope of resisting offline attack. That is, with these as- What Can an sumed values of αsat and T1, an attacker Administrator Control? who gains access to the hashed pass- What tools does an administrator have word file will have all the access she to reach these goals? The outcome of needs, no matter how far outside her any administrator actions will be in- reach the remaining fraction (1-αsat) of fluenced by the values αsat, T0, T1, and passwords lie. the shape of the cumulative password What then should an organization distribution. We show these various seek to do? It is ideal if passwords ro- forces in Figure 3. bustly withstand both online and of- The value of the compromise satu- fline attacks, as is the case for the lower ration point αsat is largely determined

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 71 contributed articles

by the network topology and might be against each of one million accounts.6 relatively difficult to control or change Furthermore, technology advances typi- in a given environment. Basic net- cally aid attackers more than defenders; work hygiene and adherence to secu- few administrators replace hardware rity principles (such as least privilege, every year, while attackers can be as- isolation, and containment) can help If the hashed sumed to have access to the latest re- minimize damage when intrusion oc- password file sources. This moves T1 to the right—as curs. These defenses are also effective computing speeds and technology ad- against intrusions that do not involve leaks, burdening vance and customized hardware gives password guessing. We assume these users with complex attackers further advantages. Since the defenses will already be in force and in hardware is controlled by the attacker, the rest of this section concentrate on composition little can be done to directly throttle measures that mostly affect password- offline guessing. However, an effective guessing attacks. policies does not way to reduce the number of trials per Improving T0. There are few, if any, alter the fact that a second is to use a slow hash, that is, one good reasons for an authentication sys- that consumes more computation. The tem to allow hundreds of thousands of competent attacker most obvious means is hash iteration;12 distinct guesses on a single account. likely gets access to recent research is also exploring the In cases of actual password forgetting, design of hash functions specifically it is unlikely that the legitimate user all the accounts she designed to be GPU unfriendly. For types more than one dozen or so dis- could possibly need. example, ignoring the counteracting tinct guesses. Mechanisms that limit force of speed gains due to advancing the number of online guesses (thus technology, an iteration count of n = 4 14 10 reducing T0) include various throttling 10 reduces T1 from 10 to 10 . Even

mechanisms (rate limiting) and IP ad- with iteration it is difficult to move T1

dress blacklisting. The possibility of all the way left to T0 to make the online- denial-of-service attacks can usually offline chasm disappear. Limits on be dealt with by IP address whitelist- further increasing n arise from the re- ing. (We mean, not applying the throt- quirement that the time to verify legiti- tling triggered by new IP addresses to mate users must be tolerable to both known addresses from which a previ- the users (wait time) and system hard- ous login succeeded; wrong guesses ware; for example, n might allow verify- from that known address should still ing 100 legitimate users/second (10ms be subject to throttling.) A simple, per user). If 10ms is a tolerable delay, easily implemented throttling mecha- an attacker with access to 1,000 GPUs nism may suffice for many sites. When can compute a total of 1,000 × 4 × 30 × denial-of-service attacks are a possibil- 24 × 60 × 60/10−2 ≈ 1012 guesses in four ity, more complex mechanisms may months. Directing this effort at 100 ac- be necessary, perhaps including IP counts would mean each would have to 10 address white- and blacklisting, and withstand a minimum of T1 = 10 guesses. methods requiring greater effort from Since these are conservative assumptions, site administrators. Such defensive it appears challenging for a typical enter-

improvements should come without prise to decrease T1 below this point. additional burdens of effort and incon- Note that, as technology evolves, the venience to users. Together with pass- number of hash iterations can easily be word blacklisting (discussed in the fol- increased, invisibly to users and on the lowing section), throttling may almost fly—by updating the stored password completely shut down generic online hashes in the systemside file to reflect guessing attacks. new iteration counts.12 Among the ap-

Improving T1. A password must pealing aspects of iterated hashing, it withstand a relatively large number of is long known as an effective defensive guesses to have any hope of withstand- tool, and costs are borne systemside ing credible offline attacks. A lower rather than by user effort. However,

bound T1 on this number may vary de- hash iteration is not a miracle cure-all; pending on the defenses put in place for a password whose guess-resistance and can be quite high. For example, if is 106, online throttling is still impor- an attacker can make 10 billion guess- tant. An online attacker who could test es per second on each of 1,000 GPUs,7 one password every 10ms (matching the then in a four-month period, she can system rate noted earlier) will succeed 14 4 try approximately T1 = 10 guesses in 10 seconds = 2 hours, 47minutes.

72 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 contributed articles

Eliminating offline attacks altogeth- attacker to online guessing, since, by Password blacklisting. Attackers ex- er. The emphasis by any parties who en- design, the connection is too small to ploit the fact that certain passwords courage users to choose stronger pass- support the typical guessing rate an of- are common. Explicitly forbidding words can obscure the fact that offline fline attacker needs, or to allow export passwords known to be common can attacks are only a risk when the pass- of any file that would be useful to an of- thus reduce risk. Blacklists concen- word hash file “leaks,” a euphemism fline attacker. trate at the head of the distribution, for “is somehow stolen,” or otherwise Improving the password distribu- blocking the choice of most common becomes available to an attacker. Any tion. Finally, we consider changes to passwords. For example, a user who means or mechanisms that prevent the the password distribution as a means attempts to choose “abcdefg” is in- password hash file from leaking entire- of improving outcomes. Recall the formed this is not allowed and asked to ly remove the need for individual pass- curves in Figure 2 represent the cu- choose another. Certain large services words to withstand an offline attack. mulative fraction of accounts compro- do this; for example, Twitter black-

Since we defined T1 as the minimum mised as a function of the number of listed 380 common passwords after an number of guesses a password must guesses per account. In general, it has online guessing incident in 2009, and withstand to resist an offline attack, any been accepted without much second Microsoft applied a blacklist of several such mechanism effectively reduces T1 thought that the lower this cumulative hundred to its online consumer prop- to zero. We next discuss one particular- fraction the better; a great deal of effort erties in 2011. With improvements of ly appealing such mechanism. has gone into coercing users to choose password crackers and the recent wide Hardware security modules (HSMs). supposedly “stronger” passwords, thus availability of passwords lists, black- A properly used HSM eliminates the pushing the cumulative distribution lists need to be longer. risk of a hash file leaking6 or, equiva- curve downward in one or more of the A blacklist of, say, 106 common lently, eliminates the risk of a decryp- three regions induced by T0 and T1. passwords may help bring guessing re- tion key (or backup thereof) leaking in However, as explained earlier, lower sistance to the 105 level. A natural con- the case that the means used to protect is of tangible benefit only outside the cern with blacklists is that users may the information stored systemside to “don’t care” region; improvements to not understand why particular choices verify passwords is reversible encryp- the curve inside the “don’t care” region are forbidden. Kelley et al.8 examined tion. In such a proper HSM architec- have negligible effects on outcomes in the guessing resistance of blacklists of ture, rather than a file of the one-way any attack scenario. various sizes, but the question of how hashes of salted passwords, what is First, note that tools to influence long one can be before the decisions stored in each file entry is a message the cumulative distribution are mostly appear capricious is an open one. Kom- authentication code (MAC) computed indirect; users choose passwords, not anduri et al.9 pursue this question with over the corresponding password us- administrators. For example, by some a meter that displays, as a user types, ing a secret key. When a password can- combination of education campaigns, the most likely password completion. didate is presented for verification, password policies, and password me- A further unknown is the improve- the candidate plus the corresponding ters, administrators may try to influence ment achieved when users are told MAC from the system file are provided this curve toward “better” passwords. their password choice is forbidden. It as HSM inputs. The HSM holds the However, the cumulative distribution appears statistically unlikely that all system secret key used to compute the is ultimately determined by user pass- of the users who initially selected one MAC; importantly, this secret key is word choice; if users ignore advice, do of Twitter’s 380 blacklisted passwords by design never available outside the the minimum to comply with policies, would collide again on so small a list HSM. Upon receiving the (MAC, can- or are not motivated by meters, then ef- when making their second choice, but didate password) input pair, the HSM forts to lower the curve may have little we are not aware of any measurements independently computes a MAC over impact on user choices. of the dispersion achieved. A promis- the input candidate, compares it to the Second, note that many policy and ing recent practical password strength input MAC, and answers yes (if they education mechanisms are unfocused estimator is zxcvbn,15 a software tool agree) or no if they do not. Stealing the in the sense that they cannot be target- that can also be of use for password password hash file—in this case a pass- ed at the specific part of the cumulative blacklisting. word MAC file—is now useless to the distribution where they make the most Observe that blacklists are both di- offline attacker, because the HSM is difference—and away from the “don’t rect and focused: they explicitly pre- needed to verify guesses; that is, offline care” region where they make none. vent choices known to be bad rather attacks are no longer possible. Even if they succeed, exhortations to than rely on indirect measures, and Another interesting scheme to miti- “choose better passwords” are not con- they target those users making bad gate offline attacks, proposed by Cres- centrated at one part of the curve or choices, leaving the rest of the popu- cenzo et al.,2 bounds the number of another; if all users respond to such a lation unaffected. A blacklist appears guesses that can be made by restrict- request by improving their passwords to be one of the simplest measures to ing the bandwidth of the connection marginally, the related effort of 90% of meaningfully improve the distribution between the authentication server and users is still wasted for an enterprise in resisting online attacks. a specially constructed hash server where αsat = 0.1. We now examine com- Composition policies. Composition that requires a very large collection of mon approaches to influencing pass- policies attempt to influence user- random bits. This approach limits an word choices in this light. chosen passwords by mandating a

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 73 contributed articles

minimum length and the inclusion ing resistance were free this would not few barriers to implementing these of certain character types; a typical ex- matter, but such increases are typically simple defenses. ample is “length at least eight charac- achieved at great cost in user effort; for ters, and use three of the four charac- example, there is a void of evidence that References 1. Bright, P. RSA finally comes clean: SecurID is ter sets {lowercase, uppercase, digits, current approaches based on password compromised. Ars Technica (June 6, 2011); http:// arstechnica.com/security/news/2011/06/rsa-finally- special characters}.” Certain policies composition policies significantly im- comes-clean-securid-is-compromised.ars/ may help improve guess-resistance prove defensive outcomes and strong 2. Crescenzo, G.D., Lipton, R.J., and Walfish, S. Perfectly 5 8 secure password protocols in the bounded retrieval in the 10 to 10 range. However, for arguments that they waste much user ef- model. In Proceedings of the Theory of Cryptography what we have suggested as the reason- fort. This situation thus creates risk of a Conference (New York, Mar. 4–7). Springer-Verlag, 14 2006, 225–244. able values αsat = 0.1 and T1 = 10 , the false sense of security. 3. Dunagan, J., Zheng, A.X., and Simon, D.R. Heat-Ray: evidence strongly suggests that none It is common to assume that users Combating identity snowball attacks using machine learning, combinatorial optimization and attack of the password-composition policies must choose passwords that will with- graphs. In Proceedings of the ACM Symposium on in common use today or seriously pro- stand credible offline attacks. However, Operating Systems Principles (Big Sky, MT, Oct. 11–14). 8,10 ACM Press, New York, 305–320. posed can help; for these to become if we assume an offline attacker can 4. Florêncio, D. and Herley, C. Where do security policies 14 come from? In Proceedings of the SOUPS Symposium relevant, one must assume that an at- mount T1 = 10 guesses per account and On Usable Privacy and Security (Redmond, WA, July tacker’s ability to harm saturates much has all the access she needs by the time 14–16, 2010). 5. Florêncio, D., Herley, C., and van Oorschot, P. Password higher than αsat = 0.1 or that the attack- she compromises a fraction αsat = 0.1 of portfolios and the finite-effort user: Sustainably 14 er can manage far fewer than T1 = 10 accounts, we must acknowledge that managing large numbers of accounts. In Proceedings offline guesses. Such policies thus fail trying to stop offline attacks by aim- of the 23rd USENIX Security Symposium (San Diego, CA, Aug. 20–22). USENIX Association, Berkeley, CA, to prevent total penetration of the net- ing user effort toward choosing “better 2014, 575–590. work. Their ineffectiveness is perhaps 6. Florêncio, D., Herley, C., and van Oorschot, P.C. An passwords” is unachievable in practice. administrator’s guide to Internet password research. the reason why a majority of large Web The composition policies in current use In Proceedings of the USENIX LISA Conference 4 (Seattle, WA, Nov. 9–14). USENIX Association, services avoid onerous policies. seem so far from reaching this target Berkeley, CA, 2014, 35–52. Note that composition policies are that their use appears misguided. This 7. Goodin, D. Why passwords have never been weaker and crackers have never been stronger. Ars indirect: the constraints they impose is not to say that offline attacks are not Technica (Aug. 20, 2012); http://arstechnica.com/ are not themselves the true end objec- a serious threat. However, it appears security/2012/08/passwords-under-assault/ 8. Kelley, P.G., Komanduri, S., Mazurek, M.L., Shay, R., tives, but it is hoped they result in a that enterprises that impose stringent Vidas, T., Bauer, L., Christin, N., Cranor, L.F., and Lopez, more defensive password distribution. password-composition policies on J. Guess again (and again and again): Measuring password strength by simulating password-cracking This problem is compounded by the their users suffer the same fate as those algorithms. In Proceedings of the IEEE Symposium fact that whether the desired improve- that do not. If the hashed password file on Security and Privacy (San Francisco, May 20–23). IEEE Press, 2012, 523–537. ment is indeed achieved is concealed leaks, burdening users with complex 9. Komanduri, S., Shay, R., Cranor, L.F., Herley, C., and from an administrator. The justifiably composition policies does not alter the Schechter, S. Telepathwords: Preventing weak passwords by reading users’ minds. In Proceedings recommended practice of storing pass- fact that a competent attacker likely of the 23rd USENIX Security Symposium (San Diego, words as salted hashes means the pass- gets access to all the accounts she CA, Aug. 20–22). USENIX Association, Berkeley, CA, 2014, 591–606. word distribution is obscured, as are could possibly need. Nudging users in 10. Mazurek, M.L., Komanduri, S., Vidas, T., Bauer, L., Christin, N., Cranor, L.F., Kelley, P., Shay, R., and Ur, any improvements caused by policies. the “don’t care” region (where most B. Measuring password guessability for an entire Composition policies are also unfo- passwords appear to lie) is simply a university. In Proceedings of the 20th ACM Conference on Computer and Communications Security (Berlin, cused in that they affect all users rather waste of user effort. Germany, Nov. 4–8). ACM Press, New York, 2013. than being directed specifically where The best investments to defend 11. Tippett, P. Stronger passwords aren’t. Information Security Magazine (June 2001), 42–43. they may matter most. A policy may against offline attacks appear to in- 12. Provos, N. and Mazières, D. A future-adaptable greatly affect user password choice and volve measures transparent to users. It- password scheme. In Proceedings of the 1999 USENIX Annual Technical Conference, FREENIX still have little effect on outcome (such eration of password hashes lowers the Track (Monterey, CA, June 6–11). USENIX Association, as if all of the change in the cumulative T1 boundary; however, even with very Berkeley, CA, 1999, 81–91. 13. RSA FraudAction Research Labs. Anatomy of a hack. distribution happens inside the “don’t aggressive iteration, we expect that at RSA, Bedford, MA, Apr. 1, 2011; https://blogs.rsa.com/ care” region). least 1010 offline guesses remain quite anatomy-of-an-attack/ 14. Toxen, B. The NSA and Snowden: Securing the all- feasible for attackers. Use of MACs, seeing eye. Commun. ACM 57, 5 (May 2014), 44–51. Conclusion that is, keyed hash functions, instead 15. Wheeler, D. zxcvbn: Low-budget password strength estimation. In Proceedings of the 25th USENIX Password strength, which actually of regular (unkeyed) password hashes, Security Symposium (Austin, TX, Aug. 10–12). USENIX means guessing resistance, is not a uni- provides effective defense against of- Association, Berkeley, CA, 2016. versal good to be pursued for its inher- fline attacks that exploit leaked hash Dinei Florêncio ([email protected]) is a senior ent benefits; it is useful only to the extent files—provided the symmetric MAC researcher in the Multimedia and Interactive Experiences it denies things to adversaries. When we key is not also leaked. Use of HSMs is group of Microsoft Research, Redmond, WA. consider a population of accounts, there one method of protecting MAC keys, Cormac Herley ([email protected]) is a principal are large areas where increased guessing as discussed; though more expensive researcher at Microsoft Research, Redmond, WA. Paul C. van Oorschot ([email protected]) is a resistance accomplishes nothing—ei- than software-only defenses, HSMs professor of computer science and Canada Research Chair ther because passwords fall between the can eliminate offline attacks entirely. in Authentication and Computer Security at Carleton online and offline thresholds or because Online guessing attacks, in contrast, University, Ottawa, Canada. so many accounts have already fallen cannot be entirely eliminated, but ef-

that attacker control has already satu- fective defenses include password Copyright held by the authors. rated. If increases in password guess- blacklists and throttling. There appear Publication rights licensed to ACM. $15.00

74 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 DOI:10.1145/2934665 Actors linked to central others in networks are generally central, even as actors linked to powerful others are powerless.

BY ENRICO BOZZO AND MASSIMO FRANCESCHET A Theory on Power in Networks

A “NETWORK” CONSISTS of a crowd of actors and a set of binary relations that tie pairs of actors. Networks are pervasive in the real world. Nature, society, information, and technology are supported by ostensibly different networks that in fact share an amazing number of interesting structural properties.

Networks are modeled in math- tral (important) nodes? Many mea- ematics as “graphs,” with actors rep- sures have been proposed to address resented as points (also called nodes it. Among them, “eigenvector central- or vertices) and relations depicted as ity” (or simply centrality in this article) lines (also called edges or arcs) con- states that an actor is central if it is con- necting pairs of points. In this article, nected with central actors. This circu- we focus on undirected graphs, where lar definition is captured by an elegant the edges do not have a particular ori- recursive equation entation. A meaningful question on networks is: Which are the most cen- λx = Ax, (1)

key insights where x is a vector containing the sought centralities, A is a matrix en- ˽˽ Networks are everywhere; they are indeed the fabric of nature, society, information, coding the network, and λ is a positive technology, and sometimes even of art. constant. Two actors in a network All we need is an eye for them. that are tied by an edge are said to

˽˽ The notion of centrality claims central be neighbors. Equation (1) claims two actors are connected with central important properties of centrality: the others, a mantra repeated over the past centrality of an actor is directly correlat- 70 years in econometrics, sociometry, ed with the number of its neighbors and bibliometrics, and information retrieval. the centrality of its neighbors. Central ˽˽ Power, on the other hand, claims that actors are those with many ties or, for powerful actors are connected with an equal number of ties, central actors powerless others; it is meaningful in, say, bargaining situations, where it is favorable are those connected with central others. to negotiate with those with few options. This intriguing definition has been dis-

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 75 contributed articles

covered and rediscovered many times the more ties an actor has, the more over in different contexts. It has been powerful the actor is. The second investigated, in chronological order, in property characterizes power; for an econometrics, sociometry, bibliometrics, equal number of ties, actors linked to Web information retrieval, and net- powerless others are powerful. On the work science; see Franceschet12 for Central actors other hand, actors tied to powerful an historical overview. are those with others are powerless. In some circumstances, however, We investigate the existence and centrality—the quality of being con- many ties or, for uniqueness of a solution for Equation nected to central ones—has limited util- an equal number (2), exploiting well-known results in com- ity in predicting the locus of “power” binatorial matrix theory. We study how in networks.2,8,11 Consider exchange of ties, central to regain the solution when it does networks, where the relationship in not exist, by perturbing the matrix rep- the network involves the transfer of actors are those resenting the network. We formally valued items like information, time, connected with relate the introduced notion of power money, or energy. A set of exchange with alternative notions and empiri- relations is positive if exchange in one central others. cally compare them on the European relation promotes exchange in others natural gas pipeline network. and negative if exchange in one rela- tion inhibits exchange in others.7 In Motivating Example “negative exchange networks,” power In his seminal work on power-depen- comes from being connected to those dence relations, from 1962, Richard with few options. Being connected to Emerson11 claimed that power is a those with many possibilities reduces property of the social relation, not one’s power. Think of, for instance, an attribute of the person: “X has a social network in which time is the power” is meaningless, unless we exchanged value. Imagine every actor specify “over whom.” Power resides has a limited time to listen to others implicitly in others’ dependence, and and that each actor divides its time dependence of an actor A upon actor among its neighbors. Exchange of B is directly proportional to A’s moti- time in one relation clearly precludes vational investment in goals medi- exchange of the same time in other ated by B and inversely proportional relations. Which actors receive the to the availability of those goals to A most attention? They are the nodes that outside the A–B relation. The availabil- are connected to many neighbors with ity of such goals outside that relation few options, since they receive almost refers to alternative avenues of goal full attention from all their neighbors. achievement, most notably through On the other hand, actors connected other social relations.11 This type of to few neighbors with many options relational power is endogenous with receive little consideration because respect to the network structures, their neighbors are mostly busy with meaning it is a function of the position others. of the node in the network. Exogenous In this article, we propose a theory factors (such as allure or charisma) on power in the context of networks. external to the network structure We start by this thesis: An actor is might be added to endogenous power powerful if it is connected to power- to complete the picture. less actors. We implement this circu- We begin with some small archetypal lar thesis through this equation examples typically used in exchange- x = Ax÷, (2) network theory to informally illustrate the notion of power and sometimes where x is the sought power vector, to distinguish it from the intersecting A is a matrix encoding the network, concept of centrality.10 Consider a two- and x÷ is the vector whose entries are node path the reciprocal of those of x. Equation (2) states two important properties A−B. of power: the power of an actor is directly correlated with the number The situation is perfectly symmetric, of its neighbors and is inversely corre- and a reasonable prediction is that lated with the power of its neighbors. both actors have the same power. In a The first property seems reasonable; three-node path

76 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 contributed articles

A−B−C, network. Nodes are European coun- matrix of G; that is, Ai, j is the weight of

tries (country codes according to ISO edge (i, j) if such edge exists and Ai, j = 0 much is changed. Intuitively, B is pow- 3166-1), and there is an undirected otherwise. Hence, A is a square, sym- erful and A and C are not. Indeed, both edge between two nations if a natu- metric, nonnegative matrix. Loops in A and C have no alternative venues ral gas pipeline crosses the borders G correspond to elements in the main besides B (both depend on B), while B of the two countries. Data has been diagonal of A. can exclude one of them by choosing downloaded from the website of the The “centrality problem” is as fol- the other.a In a four-node path International Energy Agency (http:// lows: find a vector x with positive , www.iea.org). The original data cor- entries such that A−B−C−D responds to a directed, weighted multigraph, with edge weights corre- λx = Ax, (3) actors B and C hold power, while A sponding to the maximum flow of the and D are dependent on either B or C. pipeline. We simplified and symme- where λ > 0 is a constant. This means

Nevertheless, the power of B is less here trized the network, mapping the edge λxi = ∑j Ai, j xj; that is, the centrality of than in the three-node path; in both weights in a consistent way. a node is proportional to the weighted cases, A depends on B, but in the three- This is a negative exchange network sum of centralities of its neighbors. node path, C also depends on B, while because the exchange of gas with a This is the main idea behind PageRank, in the four-node path, C has an alterna- country precludes the exchange of the Google’s original webpage ranking tive, node D. Hence, node B is less pow- same gas with others. Intuitively, pow- algorithm. PageRank determines the erful in the four-node path with respect erful countries are those that are con- importance of a webpage in terms of to the three-node path since its neigh- nected with states with few options the importance assigned to the pages bors are more powerful. Finally, the for exchanging the gas. Suppose coun- that hyperlink to it. Besides Web infor- five-node path try B is connected to countries A and mation retrieval, this thesis has been C, and B is the only connection for successfully exploited in disparate con- A−B−C−D−E them, or A−B−C. Countries A and C texts, including bibliometrics, sociom- can sell or buy gas only from B, while etry, and econometrics.12 is interesting since it discriminates country B can choose between A and We define the “power problem” as power from centrality. All traditional C. Reasonably, the bargaining power follows: find a vector x with positive central measures (eigenvector, closeness, of B is greater, which traduces in entries such that betweenness) claim that C is the central higher revenues or less expense for B one. Nevertheless, B and D are reasonably in the gas negotiation. x = Ax÷, (4) the powerful ones. Again, this is because they negotiate with weak partners (A and A Theory on Power where we denote with x÷ the vector C or E and C), while C bargains with Let G be an undirected, weighted whose entries are the reciprocal of those strong parties (B and D). This example graph. The graph G may contain of x. This means xi = ∑j Ai, j/xj; that is, the is useful for illustrating an additional “loops,” or edges from a node to itself. power of a node is equal to the weighted subtle aspect of power. Notice that in The edges of G are labeled with posi- sum of reciprocals of power of its neigh- both the five-node path and the four- tive weights. Let A be the adjacency bors. Notice that if λx = Ax÷, then, setting node path, B is surrounded by nodes (A and C) that are locally similar; for Figure 1. The European natural gas pipeline network. instance, they have the same degree PT in both paths. However, the power of MA IE C is reasonably less in the five-­node path than in the four-node path; ES UK DZ hence, we might expect the power SE of B is greater in the five-node path NL NOBE FR with respect to the four-node path. DK TN LU This separation is possible only if the DE CH notion of power spans beyond the local IT LY neighborhood of a node, if, say, power CZ AT SI is recursively defined. FI PL HR RU SK As a larger and more realistic exam- EE BY LV LT HU ple, consider Figure 1, which depicts UA the European natural gas pipeline RS RO TR a We assume here the so-called “1-exchange IR BG GR MD rule,” meaning each node may exchange with at GE most one neighbor. Likewise, we consider a neg- MK ative exchange network in which the exchange in one relation inhibits exchange in others.

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 77 contributed articles

y = , we have that y = Ay÷; hence, Proof. If DAD is doubly stochastic, matrix A is said to have “support” if the proportionality constant λ is not then DADe = e and eT DAD = eT, where e it contains a positive diagonal and is necessary in the power equation. This is a vector of all 1s. Actually, since A and D said to have “total support” if A ≠ 0 notion of power is relevant on negative are symmetric, it holds that DADe = e ⇔ and every positive element of A lies exchange networks.2,8 In these networks, eT DAD = eT. If the vector x does not have on a positive diagonal. Total support -1 when a value is exchanged between zero entries, then Dx is invertible and D x = clearly implies support. ÷ ÷ ÷ actors along a relation, it is consumed Dx . We have that x = Ax ⇔ Dxe = ADx e A matrix is “indecomposable (irre- -1 and cannot be exchanged along another ⇔ e = Dx ADx÷e ⇔ e = Dx÷ ADx÷e. ducible” if it is not possible to find a relation. Hence, important actors are permutation matrix P such that those in contact with many actors with Existence and unicity of a solu- few exchanging possibilities. tion. The link between the balanc- , Finally, the “balancing problem” is ing problem and the power problem the following: find a diagonal matrix D we established in Theorem 1 allows where X and Z are both square matrices with positive main diagonal such that us to investigate a solution of the and 0 is a matrix of 0s; otherwise A is power problem (Equation 4) using “decomposable (reducible).” A matrix S = DAD the ­well-established theory of matrix is “fully indecomposable” if it is not balancing. possible to find permutation matrices is doubly stochastic; that is, all rows Recall that the “diagonal” of a P and Q such that and columns of S sum to 1. The balanc- square n × n matrix is a sequence of ing problem is a fundamental question n elements that lies on different rows , that is claimed to have first been used and columns of the matrix. A permu- in the 1930s to calculate traffic flow4 tation matrix is a square n × n matrix where X and Z are both square matri- and since then has been applied in that has exactly one entry equal to one ces; otherwise, A is “partly decom- many disparate contexts.14 in each row and each column, while posable.” Clearly, a matrix (fully It turns out that the power problem all the other entries are equal to zero. indecomposable) is also irreducible. is intimately related to the balancing Each diagonal clearly corresponds to It also holds that full indecomposabil- 5 problem. Given a vector x, let Dx be a permutation matrix where the posi- ity implies total support. Moreover, the diagonal matrix whose diagonal tions of the diagonal elements corre- the adjacency matrix of a bipartite entries coincide with those of x. We spond to those of the unity entries of graph is never fully indecomposable, thus have the following result. the permutation matrix. In particu- while the adjacency matrix of a non- lar, the identity matrix I is a permu- bipartite graph is fully indecompos- Theorem 1. The vector x is a solution tation matrix, and the diagonal of A able if and only if it has total support of the power problem if and only if the associated with I is called the main and is irreducible.9 We say a graph has

diagonal matrix Dx÷ is a solution for the diagonal of A. A diagonal is positive if support, has total support, is irreduc- balancing problem. all its elements are greater than 0. A ible, and is fully indecomposable if the corresponding adjacency matrix Figure 2. (top left) The graph has no support since a spanning cycle forest is missing. (top- has these properties. right). The graph has support formed by edges (1, 4) and (2, 3), but the support is not total; (edges (1, 3) and (1, 2) are not part of any spanning cycle graph. (bottom left) The graph has The combinatorial notions just out- total support but is not irreducible, hence is not fully indecomposable. (bottom right) The lined are rather terse. Fortunately, most graph has total support and is irreducible and not bipartite, so is fully indecomposable. of them have a simple interpretation in graph theory. It is known that irreduc- ibility of the adjacency matrix corre- 3 3 sponds to connectedness of the graph. Moreover, given an undirected graph G, let us define a “spanning cycle forest” of 11G a spanning subgraph of G whose con- 4 4 nected components are single edges or 2 2 cycles, including loops that are cycles of length 1. It is easy to realize that there exists a correspondence between diagonals in the adjacency matrix and spanning cycle forests in the graph. 3 1 Hence, a graph has support if and only 3 if it contains a spanning cycle forest and total support if and only if each edge 1 is included in a spanning cycle forest. 4 Four examples are given in Figure 2. 2 2 The following is a well-known nec- essary and sufficient condition for the solution of the balancing problem.9,17

78 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 contributed articles

Theorem 2. Let A be a symmetric Figure 3. Correlation between original and perturbed powers varying the damping parameter nonnegative square matrix. A necessary from 0 to 1 on the largest biconnected component of the social network among dolphins and sufficient condition for the existence (which has total support). The horizontal line corresponds to the correlation with diagonal of a doubly stochastic matrix S of the perturbation and maximum damping. The correlation on the other networks is similar. form DAD, where D is a diagonal matrix with positive main diagonal, is that A has

1.00 Diagonal perturbation total support. If S exists, then it is unique. Full perturbation If A is fully indecomposable, then matrix

D is unique. 0.90

It follows that the power problem Correlation 0.80 x = Ax÷ has a solution on the class of graphs that has total support.

Moreover, if the graph is fully inde- 0.70 0.0 0.2 0.4 0.6 0.8 1.0 composable, then the solution is also unique. Damping Perturbation (regaining the solu- tion). What about the power problem on graphs whose adjacency matrix in the network. We can thus play with original power; power with diagonal lacks total support? For such graphs, the diagonal of the adjacency matrix to perturbation is closer to original power the power problem has no solu- assign nodes with potentially different than power with full perturbation; and tion. Nevertheless, a solution can be entry levels of exogenous power. the larger the damping parameter, the regained by perturbing the adjacency Intuitively, the diagonal perturbation lower the adherence of perturbed solu- matrix of the graph in a suitable way. is less invasive than its full counter- tions to the original one. We investigate two perturbations on part; the former modifies the diago- Computing power. Due to the the adjacency matrix A nal elements only, and the latter established relationship between touches all matrix elements. But the balancing problem and the power D 1. Diagonal perturbation: Aα = A + αI, how invasive is the perturbation with problem, we can use known meth- where α > 0 is a damping parame- respect to the resulting power? To ods for the former in order to solve ter and I is the identity matrix. investigate this issue, we computed the latter. The simplest approach for F 2. Full perturbation: Aα = A + αE, the correlation between original and solving Equation (4) is to set up the where α > 0 is a damping param- perturbed power solutions. A simple iterative method eter and E is a full matrix of all 1s. and intuitive measure of the correla- ÷ tion between two rankings of size n is xk+1 = Axk , (5) F Matrix Aα is clearly fully indecompos- Kendall rank correlation coefficient able, has total support, and is irreduc- k, which is the difference between known as the Sinkhorn–Knopp method 17 ible. Hence, the power problem (as well the fraction of concordant pairs c (the (SKM). If we set x0 = e, the vector of all as the centrality problem) on a fully number of concordant pairs divided 1s, then the first iteration x1 = Ae; that perturbed matrix has a unique solu- by n(n − 1)/2) and that of discordant is, x1(i) = ∑j Ai, j is the degree di of i. The D ÷ tion. On the other hand, matrix Aα has pairs d in the two rankings: k = c − d. second iteration x2 = A(Ae) ; that is, total support. Indeed, if Ai, j > 0 and i = The coefficient runs from −1 to 1, with x2(i) = ∑j Ai, j/ di is the sum of reciprocals j, then the main diagonal Ak,k for 1 ≤ k negative values indicating negative cor- of the degrees of the neighbors of i.

≤ n is positive and contains Ai, j. If i ≠ j, relation, positive values indicating pos- If A has total support, then the SKM then the diagonal Ai, j, Aj,i, Ak,k for 1 ≤ k itive correlation, and values close to 0 converges, or, more precisely, the ≤ n and k ≠ i, j is positive and contains indicating independence. We used the even and odd iterates of the method

Ai, j. The power problem on a diagonally following network datasets: a social converge to power vectors that dif- perturbed matrix thus has a solution. network among dolphins, the Madrid fer by a multiplicative constant. The Moreover, the solution is unique if A train bombing terrorist network, a convergence is linear with a rate of is irreducible, since it is known that social network of jazz musicians, a net- convergence that depends of the sub- for a symmetric matrix A it holds that work of friendships between members dominant eigenvalue of the balanced A is irreducible if and only if A + I is of a karate club, a collaboration net- matrix S = DAD (see Theorem 2).17 In fully indecomposable.6 Interestingly, work of scholars in the field of network some cases, however, the convergence the diagonal perturbation, besides science, and a co-appearance network can be very slow. Knight and Ruiz14 providing convergence of the method, of characters in the novel Anna Karenina proposed a faster algorithm based on is useful for incorporating exogenous by Lev Tolstoj. Newton’s method (NM) that we now power in the model. By setting a posi- The main outcomes of the current describe according to our setting and tive value in the ith position of the experiment are as follows (see Figure 3): notations. In order to solve Equation diagonal, we are saying that node i as soon as the damping parameter is (4), we apply NM for finding the zeros of has a minimal amount of power, or not small, both diagonal and full perturba- the function f: Rn → Rn defined by f(x) = a function of the position of the node tions do not significantly change the x − Ax÷. It is not difficult to check that

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 79 contributed articles

Table 1. Complexity of computation of power with different methods: PM (benchmark); SKM and PM; and NM with diagonal per- (SKM without perturbations for totally supported networks); SKM-D (SKM with diagonal turbation is even faster than NM, and perturbation and damping 0.15); SKM-F (SKM with full perturbation and damping 0.01); NM the larger the damping parameter, the (NM without perturbations for totally supported networks); and NM-D (NM with diagonal faster the method. perturbation and damping 0.15). Relationship with alternative power measures. Bonacich2 proposed a fam- Network PM SKM SKM-D SKM-F NM NM-D ily of parametric measures depend-

Dolphin 73 294 300 72 47 30 ing on two parameters: α and β. If A is Madrid 28 416 320 78 46 27 the adjacency matrix of the graph, the Jazz 42 300 288 78 37 27 Bonacich index x is defined as Karate 42 – 494 52 – 31 Collab 65 – 9740 30 – 33 Karenina 24 – 1006 32 – 32 x = αAe + βAx. (6)

The index for a node is the sum of two components: a first one (weighted Table 2. Correlation of power, as defined in this article, with degree (D), centrality (C), by the parameter α) depends on the Bonacich power (B), Shapley power (S), and Nash power (N). node’s degree, and a second one (weighted by the parameter β ) depends Network D C B S N on the index on the node’s neighbors. Dolphin 0.81 0.35 0.89 0.91 0.72 From Equation (6), under the condition Madrid 0.62 0.33 0.69 0.68 0.48 that I − βA is not singular, it is possible Jazz 0.85 0.62 0.91 0.85 0.17 to obtain the following explicit repre- Karate 0.77 0.36 0.74 0.96 0.75 sentation of the proposed measure Collaboration 0.77 0.05 0.77 0.85 0.60 Karenina 0.75 0.45 0.62 0.89 0.86 (7) For the computation of power, we used diagonal perturbation (damping 0.15). For Bonacich power we used α = 1 and β = −0.85/r, where r is the spectral radius of the graph. The equivalence with the infinite sum holds when |β| < 1/r, where r =

We experimentally assessed the maxi|λi|, with λi the eigenvalues of complexity of computation of power A; that is, r is the spectral radius of on the real social networks; in fact, we A. When the parameter β is positive,

where δi, j = 1 if i = j and δi, j = 0 other- used the largest biconnected compo- the index is a centrality measure. In wise. We collect these partial deriva- nent for the first three networks in or- particular, the measure approaches tives in the Jacobian matrix of f that der to also work with totally supported eigenvector centrality as a limit as β turns to be graphs. We use both SKM and NM. We approaches 1/r. On the other hand, consider the computation on the orig- when β is negative, the index is a 2 ÷ Jf (x) = I + AD (x ) , inal matrix, as well as on the perturbed power measure; it corresponds to a ones. We use as a benchmark the com- weighted sum of odd-length paths where the squaring of x is to be intended plexity of the computation of central- (with positive sign) and even-length entrywise. Formally, the NM applied to ity using the power method (PM). The paths (with negative sign).2 Hence, the equation f (x) = 0 becomes complexity is expressed as the overall powerful nodes correspond to nodes number of matrix-vector product op- with many powerless neighbors. erations. If a matrix is sparse (the case Finally, when β = 0, the measure boils for all tested networks), such opera- down to degree centrality. tion has linear complexity in the num- The difficulty with this measure ber of nodes of the graph. The main is that it is parametric; that is, it empirical findings are summarized depends on parameters α and β. as follows (see Table 1): SKM on the While it is simple to set the param- To apply NM precisely it is neces- original matrix is significantly slow- eter α, and it can be used to assign sary to solve a linear system at each er than PM, and diagonal perturba- exogenous power to nodes, the choice step, but this would be too expen- tion does not significantly change its for the parameter β is more delicate. sive. Nevertheless, an approximate speed; full perturbation significantly In particular, the index makes sense solution of the system obtained by increases the speed of SKM, so the when the parameter |β| < 1/r, hence means of an iterative method is suf- complexity of SKM with full perturba- the spectral radius r must be com- ficient, giving rise to an inner–outer tion and that of PM are comparable puted or at least approximated. iteration. This approach is appealing (moreover, the larger the damping pa- The precise relationship between when the matrix that has to be bal- rameter, the faster the method); NM Bonacich power (Bonacich index anced is symmetric and sparse, which on the original matrix is much faster with negative β) and power defined is the case for the power problem on than SKM: its complexity is compa- in Equation (4) is explained as fol- 14 real networks. rable to that of fully perturbed SKM lows: If we set x0 = (1/γ)e in Newton’s

80 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 contributed articles iteration for the computation of power to networks of actors was proposed described earlier we obtain in Cook et al.8 and Rochford16 and fur- ther investigated, particularly in Bayati 2 -1 1 13 x1 = 2γ (I + γ A) Ae. et al. and Kleinberg et al. In the fol- lowing, we describe the dynamics that But this first approximation is a member The power of capture such an extension. Let A be of the family of Bonacich’s measures, an actor somewhat the adjacency matrix of an undirected, with α = 2γ and β = −γ2. Since β is nega- unweighted graph G. Hence, A = 1 if inversely depends i,j tive, we indeed are facing a measure of there is an edge (i, j) in G and Ai,j = 0 power. Hence, Bonacich power can be on the power otherwise. Negotiation among actors considered as a first-order approxima- is possible only along edges; each pair tion of power using NM. of its neighbors. of actors on an edge negotiates for a Bozzo et al.3 investigated power mea- fixed amount of €1, and each actor may sures on sets of nodes. Given a node set conclude a negotiation with at most T let B(T) be the set of nodes whose one neighbor (one-exchange rule). For neighbors all belong to T. Notice that every edge (i, j), define nodes in B(T) do not have connections outside T, hence are potentially at the •• Ri,j as the amount of “revenue” mercy of nodes in T. We define a power actor i receives in a negotiation function p such that p(T) = |B(T)| − |T|. with j.

Hence, a set T is powerful if it has poten- •• Li, j as the amount of revenue actor i tial control over a much larger set of receives in the best alternative neighbors B(T). The power measure is negotiation, excluding the one with interpreted as the characteristic func- j. tion of a coalition game played on the graph and the “Shapley value” of the Notice that matrices R and L have game; or the average marginal contri- the same zero-non-zero pattern as A. bution to power carried by a node when More precisely, consider the following (0) it is added to any node set is proposed iterative process. We start with Ri, j = 1/2 (0) as a measure of power for single nodes. for all edges (i, j) and Ri, j = 0 elsewhere. Interestingly, the discovered game-theo- Let N(i) be the set of neighbors of node retic power measure corresponds to the i. For t > 0, the best alternative matrix second iteration of SKM for the compu- L(t) at time t is tation of power as defined by Equation (4); that is, to the sum of reciprocals of neighbors’ degrees. The study of power has a long his- (t) (t) (t) tory in economics (in its recognition Let the surplus Si, j = 1 - Li, j - Lj,i be of bargaining power) and sociol- the amount for which actors i and j will ogy (in its interpretation of social negotiate at time t; notice that actor power).10 Consider the most basic i will never accept an offer from j less (t) case where just two actors, A and B, than his alternate option Li, j , and actor are involved in a negotiation over j will never accept an offer from i less (t) how to divide one unit of money. than her alternate option Lj, i. The profit Each actor has an alternate option— matrix R(t) at time t is then a backup amount it can collect in case negotiations fail, say, α for A and β for B. A natural prediction, known as “Nash’s bargaining solu- tion,”15 is that the two actors will split the surplus s = 1 − α − β, if any, (t) (t) (t) equally between them; that is, if s < Notice that Ri, j + Rj, i = 1; that is, Ri, j (t) 0 no agreement between A and B is and Rj, i is the Nash’s bargaining solu- possible, since any division is worse tion of a negotiation between actors i and (t) than the backup option for at least j, given their alternate options Li, j and (t) one of the parties. On the other hand, Lj, i. Let R be the fixpoint of the itera- if s >= 0, then A and B will agree on tive process R(t) for growing time t. The

α+s/2 for A and β+s/2 for B. “Nash power” xi of node i is the best A natural extension of the Nash bar- revenue of actor i among its neighbors; gaining solution from pairs of actors that is

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 81 contributed articles

Table 3. Matrix of correlations among power and centrality measures. for an actor i depends directly on the rev- enues of i among its neighbors, which directly depend on the alternate options S B P N C of i among its neighbors, which inversely S 1.00 0.82 0.90 0.69 0.41 depends on the revenues of neighbors of B 0.82 1.00 0.84 0.61 0.46 i, which determine the power of neigh- P 0.90 0.84 1.00 0.72 0.47 bors of i. Hence, power of an actor some- N 0.69 0.61 0.72 1.00 0.36 what inversely depends on the power of C 0.41 0.46 0.47 0.36 1.00 its neighbors. S, Shapley power; B, Bonacich power; P, power as defined in this article; N, Nash power; C, centrality. Using Kendall correlation, we assessed the overlapping of power, as defined in this article, with central- ity and degree, as well as Bonacich power Table 4. The top 10 powerful and central countries in the European natural gas exchange (Bonacich index with negative param- network. eter β), Shapley power (the sum of reciprocals of neighbors’ degrees), and Nash power on the social networks P TR DE IT ES HU RU BG BE AT UK mentioned earlier. The main empirical 6.26 6.09 5.54 5.50 4.62 4.53 3.99 3.60 3.29 3.09 outcomes are summarized in the fol- B DE IT HU TR AT RU ES BE NO BG lowing (see Table 2): as expected, both 7.07 4.58 4.06 3.73 3.41 3.37 3.29 3.23 2.92 2.76 power and centrality are positively S TR ES IT DE RU HU BG RO UK AT correlated with degree, but power is negatively correlated with centrality 2.92 2.70 2.56 2.54 2.46 2.23 1.95 1.67 1.53 1.51 when the effect of degree is excluded N ES TR BG RU IT HU UK RO DK LV (we used partial correlation); power 1.00 1.00 1.00 0.87 0.83 0.83 0.75 0.75 0.75 0.75 is positively correlated with Bonacich C DE NO BE NL FR AT DK CH CZ UK power, and the association increases 1.00 0.71 0.68 0.62 0.56 0.52 0.46 0.45 0.40 0.39 as the parameter β declines below 0 down to −1/r, with r the spectral P, power as defined in this article; B, Bonacich power; S, Shapley power; N, Nash power; C, centrality. radius of the adjacency graph matrix (moreover, the association is greater when the adjacency matrix is per- Figure 4. Scatterplot of power versus centrality. Vertical and horizontal lines correspond to turbed); power is positively corre- third quartile. lated with Shapley power, and the association is generally stronger

.0 DE than with Bonacich power; and power is positively correlated with Nash bargaining network power, but the 81

0. strength of the correlation is gener- ally weaker than with Shapley power NO BE and Bonacich power. In particular, we NL 0.6 FR noticed that the Nash-based method AT maps the power scores of the nodes CH DK

Centrality of the surveyed networks into a small CZ UK 0.4 set of values, with very high frequency LU PL for values close to 0, 0.5, and 1. Hence, RU IT SK HU it is difficult to discriminate different 0.2 SIUA gradations of power for nodes. HR ES SE RO IEEEBYTN LT LV TR LYRSFI DZ BG MAGEMKPTIRMDGR Motivating Example Reloaded 0.0 Here, we revisit examples from the 123456 “Motivating Example” section, using Power them as a benchmark to compare the different notions of power described in the “Relationship with Alternative

xi = maxj Ri, j. Nash power bears some analogy with Power Measures” section. When the the one we propose and investigate graphs are not totally supported (all Among many other attractive results, here; in particular, both notions share cases but the two-node path), we used Bayati et al.1 showed that the dynamics the same recursive powerful-is-linked- diagonal perturbation with damping always converge to a fixpoint solution. with-powerless philosophy. Nash power 0.15 to obtain a solution. Moreover,

82 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 contributed articles we set Bonacich index parameters but not powerful; and there are many parametric; and it is global (the power α = 1 and β = −0.85/r, where r is the countries that are neither powerful of a node depends on the entire net- spectra radius of the graph. nor central outside the rankings. For work) and can be approximated with In the two-node path, all methods instance, Italy contracts with nations a simple local measure—the sum of agree to give identical power to both that are both powerless and periph- reciprocals of node degrees—that has nodes. In the three-node path A−B−C, eral, namely Austria, Switzerland, a game-theoretic interpretation and all methods agree B is the powerful Croatia, Tunisia, Libya, and Slovenia, can be efficiently computed on all net- one. Notably, Nash power assigns all with only Austria included in the top- works. The definition has limitations power (1) to B and no power (0) to A 10 power list and only Austria and as well, mainly that an exact solution and C, while the other methods say A Switzerland included in the top-10 exists only on the class of totally sup- and C hold a small amount of power. centrality list (not in the first posi- ported networks and is not immedi- In the four-node path A−B−C−D, all tions). The ranking according to ately normalizable, so care is needed methods claim B and C are the pow- Nash power is somewhat unusual when comparing power values for erful ones. Moreover, all methods if compared with the other power nodes in different networks. recognize that the power of B in this measures; for instance, Germany instance is less than its power in the has bargaining power 0.5 and only References th 1. Bayati, M., Borgs, C., Chayes, J., Kanoria, Y., and three-node path. Finally, in the five- in 14 postion, tied with the other Montanari, A. Bargaining dynamics in exchange node path A−B−C−D−E, all methods countries. It is fair to note that the networks. J. Econ. Theory 156 (2015), 417–454. 2. Bonacich, P. Power and centrality: A family of discriminate B and D as the most generalized Nash bargaining solu- measures. Am. J. Sociol. 92, 5 (1987), 1170–1182. powerful nodes, followed by C and tion was originally proposed in 3. Bozzo, E., Franceschet, M., and Rinaldi, F. Vulnerability and power on networks. Network Sci. 3, 2 (2015), finally A and E, with the only excep- the context of assignment problems 196–226. 4. Brown, J.B., Chase, P.J., and Pittenger, A.O. Order tion of Nash power, which assigns all (such as in matching apartments to independence and factor convergence in iterative power (1) to B and D and null power tenants and students to colleges) scaling. Linear Algebra Appl. 190 (1993), 1–38. 5. Brualdi, R.A. Matrices of 0s and 1s with total support. (0) to all other nodes; hence, the cen- and was not suggested as a rating- J. Combin. Theory Ser. A 28, 3 (1980), 249–256. tral node C has the same power as the and-ranking method for nodes in a 6. Brualdi, R.A. and Ryser, H.J. Combinatorial Matrix Theory, Vol. 39 of Encyclopedia of Mathematics peripheral nodes A and E, according network. For instance, in balanced and Its Applications. Cambridge University Press, to this method. All methods, with the matching over the gas network, Italy Cambridge, U.K., 1991. 7. Cook, K.S., Emerson, R.M., Gillmore, M.R., and exception of Shapley, notice that the preferably negotiates with Libya and Yamagishi, T. The distribution of power in exchange power of B is greater in the five-node Turkey with Georgia. In fact, Cook networks: Theory and experimental results. Am. J. 8 Sociol. 89, 2 (1983), 275–305. path with respect to the three-node and Yamagishi proposed using the 8. Cook, K.S. and Yamagishi, T. Power in exchange path. This is because Shapley is a negotiation values obtained by each networks: A power-dependence formulation. Social Networks 14 (1992), 245–265. local method, while the others are node in such a solution as a struc- 9. Csima, J. and Datta, B.N. The DAD theorem for global (recursive) methods. tural power measure; see also Easley symmetric nonnegative matrices. J. Combin. Theory 12, 1 (1972), 147–152. 10 Let us now revisit the natural gas and Kleinberg (chapter 12) for a 10. Easley, D. and Kleinberg, J. Networks, Crowds, and pipeline example. We ranked all similar interpretation. According Markets: Reasoning About a Highly Connected World. Cambridge University Press, New York, 2010. countries according to the follow- to the experiments we conducted 11. Emerson, R.M. Power-dependence relations. Am. for this article, this interpretation Sociol. Rev. 27, 1 (1962), 31–41. ing power and centrality measures: 12. Franceschet, M. PageRank: Standing on the shoulders Shapley power (S), Bonacich power might seem opinable, but further of giants. Commun. ACM 54, 6 (2011), 92–101. 13. Kleinberg, J. and Tardos, E. Balanced outcomes in (B), power as defined in this article investigation is necessary to gain a social exchange networks. In Proceedings of the 40th (P), Nash power (N), and eigenvec- solid conclusion. Annual ACM Symposium on Theory of Computing (2008), 295–304. tor centrality (C). Table 3 shows the 14. Knight, P.A. and Ruiz, D. A fast algorithm for matrix corresponding Kendall correlation Conclusion balancing. IMA J. Numer. Anal. 33, 3 (2013), 1029–1047. matrix. As expected, P is well corre- We proposed a theory on power in the 15. Nash, J. The bargaining problem. Econometrica 18 lated with its approximations B and S. context of networks. The philosophy (1950), 155–162. 16. Rochford, S.C. Symmetrically pairwise-bargained Moreover, P is positively correlated underlying our notion of power main- allocations in an assignment market. J. Econ. Theory with N, but the correlation strength is tains that an actor is powerful if it is 34, 2 (1984), 262–281. 17. Sinkhorn, R. and Knopp, P. Concerning nonnegative weaker. Also, the association between connected with many powerless actors. matrices and doubly stochastic matrices. Pac. J. P and C is positive but weak and This thesis has its roots and applica- Math. 21 (1967), 343–348. mostly explained by the association tions mainly in sociology and econom- with degree of both measures. Indeed, ics and traces an historical parallel Enrico Bozzo ([email protected]) teaches numerical algorithms in the Department of Mathematics, if we exclude the effect of degree, this with its celebrated linear counterpart, Computer Science, and Physics at the University of correlation is negative. namely eigenvector centrality.12 Udine, Udine, Italy. These associations are mirrored in The virtues of our definition of Massimo Franceschet (massimo.franceschet@uniud. it) teaches network science and generative art in the the top-10 rankings and ratings listed power are: it is a simple, elegant, and Department of Mathematics, Computer Science, and in Table 4, as well as in the scatter- understandable measure; it is theo- Physics at the University of Udine, Udine, Italy. plot comparing power and centrality retically well-grounded and directly in Figure 4. Notice how Germany (DE) related to the well-studied balanc- is both powerful and central; Italy ing problem, making it possible to (IT) and Turkey (TR) are powerful but borrow results and techniques from not central; Norway (NO) is central this setting; the formulation is not © 2016 ACM 0001-0782/16/11 $15.00 .

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 83 review articles

DOI:10.1145/2934662 of Charles Darwin’s The Origin of Spe- Looking at the mysteries of evolution cies in 1859.8 But of course nothing is that simple: during the first half of the from a computer science point of view yields 19th century, several scientists were con- some unexpected insights. vinced the diversity of life we see around us must be the result of evolution (see BY ADI LIVNAT AND CHRISTOS PAPADIMITRIOU the sidebar on Charles Babbage and the accompanying figure). Darwin’s immense contribution lies in three things: His identification of natural se- lection as the engine of evolution; his Sex as an articulation of the common descent hypothesis, stating that different spe- cies came from common ancestors, and further implying that all life came from Algorithm a common source; and the unparalleled force of argument with which he em- powered his theory. But of course The Origin was far from the last word on the The Theory subject: Darwin knew nothing about ge- netics, and had no clue about the role of sex in evolution, among several impor- of Evolution tant gaps. On the ultimate reason for sex, for instance, he wrote, “the whole subject is as yet hidden in darkness.” Mathematics has informed the the- Under the Lens ory early on. Mendel discovered genet- ics by appreciating mathematical pat- terns in the ratios of sibling pea plants of Computation exhibiting different characteristics and model building. When his laws were rediscovered 40 years later, their discrete nature was misunderstood as being at loggerheads with Darwin’s key insights

LOOK AROUND YOU, and you will be stunned by the ˽˽ Sexual reproduction is nearly ubiquitous in nature. Yet, despite a century of intense work of evolution. According to Nobel Laureate research, its evolutionary role and origin Jacques Monod, a strange thing about evolution is that is still a mystery. ˽˽ Recent research at the interface of all educated persons think they understand it fairly evolution and CS has revealed that well, and yet very few—if any, one may grumble— evolution under sex possesses a surprising and multifaceted computational actually do. Understanding evolution is essential: nature: It can be seen as a coordination game between genes played according “Nothing in biology makes sense except in the light to the powerful Multiplicative Weights th Update Algorithm; or as a randomized of evolution,” famously said the eminent 20 century algorithm for deciding whether genetic variants perform well across all possible biologist Theodosius Dobzhansky. And evolution is genetic combinations; it allows mutation closer to home than black holes and other mysteries of to process and transmit information from transient genetic combinations to future science—it feels almost like your family history. generations; and much more. More so than most scientific fields, the theory of ˽˽ Computational models and considerations are becoming an indispensable tool for

evolution has a sharp beginning: the publication unlocking the secrets of evolution. ARMIN STAUDT BY MICROSCOPE PHOTO

84 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 review articles

continuous perception of traits. A deep As we recount in this article, recent scientific crisis raged for two decades, joint research by computer scientists and was eventually resolved with the and biologists, bringing ideas and help of mathematics: discrete alleles concepts from computation into biol- (see the glossary) can result, through ogy, has made quite unexpected prog- cumulative contributions, in con- Not only is sex ress in these questions. Additional tinuous phenotypic traits. During the essentially background and literature in the on- 1920s and the next two decades, R.A. line appendix is available in the ACM Fisher, J.B.S. Haldane, and S. Wright universal, but it Digital Library (dl.acm.org) under developed mathematical equations seems to be very Source Material. for predicting how the genetic compo- sition of a population changes over the much center stage Evolution and Computer Science generations. This mathematical theo- Over the past 70 years, computer sci- ry of population genetics is introduced in life, the basis of entists, starting with von Neumann,37 briefly in the sidebar “The Equation a fantastic variety have been inspired and intrigued by that Reconciled Darwin and Mendel.” evolution. During the 1950s, computer It is key to what is called the “modern of behavior and scientists working in optimization de- evolutionary synthesis”—the 20th-cen- structure. veloped local search heuristics: Start tury view of evolution, because it pro- with a random solution and repeat the posed one way of unifying Darwinism following step: If there is a “mutation” and Mendelism. of this solution that is better than the During the near-century since then, current one, change to that, until a the study of evolution has flourished local optimum is discovered. By “mu- into a mature, comprehensive, and tation” (much more often the word prestigious scientific discipline, while “neighbor” is used), we mean a solu- over the past two decades it has been tion differing from the present one inundated by a deluge of molecular in a very small number of features; in data, a vast scientific gold mine that the traveling salesman problem, for informs—and often challenges—its te- example, a mutation could change nets. And yet, despite the towering ac- two, or three, edges of the tour to form complishments of modern evolution- a new tour. This process is repeated ary biology, there are several important many times from random starts, a questions that are beyond our current stratagem that can be seen as a se- understanding: quential way of maintaining a popula- ˲˲ What is the role of sex in evolution? tion (see Papadimitriou and Steiglitz32 Reproduction with recombination is for a survey on the 1980s). almost ubiquitous in life (even bacteria This basic idea of local search exchange genetic material), while obli- was enhanced in the 1980s by a ther- gate asexual species appear to be rare modynamic metaphor,20 to help the evolutionary dead ends. And yet there algorithm escape local optima and is no agreement among the experts as barriers: Simulated annealing, as this to what makes sex so advantageous. variant of local search is called, allows ˲˲ What exactly is evolution optimiz- the adoption of even a disadvanta- ing, if anything? With all the evolution- geous mutation, albeit with a prob- inspired optimization heuristics coded ability decreasing with the disadvan- by computer scientists (as we discuss tage, and with time. A further variant next), this question—also very much called go with the winners1 is closer to contemplated by biologists over the evolution in that it keeps a population decades—comes to the fore. of solutions, teleporting the individu- ˲˲ The paradox of variation. Genetic als that are stuck at local optima to the variation in humans, and in many more promising spots. Notice that all other species, is much higher than the these heuristics are inspired by asexual theory predicted due to selection; and evolution (no recombination between assuming the variation is neutral has solutions happens); heuristics of this its problems too. genre have been used successfully in ˲˲ Are mutations completely ran- many realms of practice, and there dom? We know they are far from uni- are several practically important hard formly distributed in the genome, but problems, such as graph partitioning, can they be the results of elaborate ge- for which such heuristics are competi- netic mechanisms? tive with the best known.

86 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 review articles

During the 1970s, a different family of heuristics called genetic algorithms was proposed by John Holland:14 A pop- ulation of solutions evolves through mu- Charles Babbage tations and recombination. Recombina- tion is much more difficult to apply to on Evolution optimization problems, because it pre- Charles Babbage, the grandfather of computer science, was undoubtedly the first to think of evolution in algorithmic terms. In 1837—22 years before the publication of supposes a genetic code, mapping the The Origin of Species and the same year that Darwin jotted down a sketch of the tree features of the problem to recombinable of life in his notebook (see the figure below)—Babbage wrote the Ninth Bridgewater genes (to see the difficulty, think of the Treatise.4 The treatise was ninth in a series of philosophical essays on the subject traveling salesman problem: tours can of the deity, in which he refers extensively to his Difference Engine and how it can be programmed to go on for days calculating unimaginably large integers, with no mutate, but they cannot recombine). further intervention by the programmer. Babbage used this image to argue that a The evolutionary fitness of each indi- benevolent creator can design not necessarily species, but an algorithm (he did not vidual solution in the population (that use the word) for creating species. is, the number of children it will have) is proportional to the quantity being maximized in the underlying problem. Portraits of Charles Babbage (left) and Charles Darwin (right). In the middle is Darwin’s sketch showing the principle of common descent leading to a branching pattern. After many generations, the population will presumably include some excellent solutions. Holland’s idea had instant appeal and immense following, and by now there is a vast bibliography on the subject (see, for example, Mitchell28 and Goldberg12). The terms evolutionary algorithms and evolutionary computa- tion are often used as rough synonyms of “genetic algorithms,” but often they describe more general concepts, such as the very interesting algorithmic work — also categorized as research in artificial life—whose purpose is not to find good solutions to a practical optimization problem, but instead to understand evo- lution in nature by exposing novel, com- plex evolutionary phenomena in silico, for an example, see Jong.16 Glossary Coming back to the genetic algo- Allele: Variant of a gene. Some are known to affect our susceptibility to diseases. rithms, several successes in solving ac- tual optimization problems have been Diploidy: The state of having two instances of each chromosome, and thus each gene, in a cell. Organisms with one instance of each chromosome and gene are called haploid. reported in the literature, but the gen- eral impression seems to prevail that Fitness: The ability of an individual to survive and reproduce in its natural environment, genetic algorithms are far less success- manifested in its expected number of surviving offspring. ful in practice than local search and Gene: A unit of heredity and a region of the DNA that encodes a functional product. simulating annealing. If this is true, it It is thought that humans have more than 20,000 of these. However, now that coding is quite remarkable—a great scientific is known to be far more complex than originally thought, it is no longer clear how to define these units and their boundaries. mystery—because in nature the exact opposite happens: Recombination is Genotype: The whole or a part of the genetic make-up of an individual or that is successful and ubiquitous, while obli- common to a group of individuals. gate asexual reproduction is extremely Heterozygous: An individual having two different alleles of a certain gene. rare and struggling. Mutation: A change in the hereditary material. The authors started collaborating on a computational understanding of evo- Phenotype: The characteristics of an individual other than its genetic code. lution in 2006, precisely in order to in- Recombination: In sexual reproduction, taken broadly, this term means that the genetic vestigate this mystery, and we recount material of an offspring is a bricolage of the genetic material of its parents, due to both our findings in this article. At exactly the independent assortment of chromosomes during the halving of chromosome numbers and the crossing over of chromosomes—the shuffling of segments between the same year, Leslie Valiant first wrote two corresponding chromosomes (in humans, typically 2–3 per chromosome)—that on his theory of evolvability, another occur in the generation of gametes. More generally, DNA recombination in the sense of attempt at understanding evolution exchange of genetic material between two DNA molecules or between segments of the in computational terms.36 Valiant sees same DNA molecule is an important part of mutational mechanisms. evolution as an approximation of an

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 87 review articles

The Equation that Reconciled Darwin and Mendel

t t+1 After Fisher showed how a continuous trait can result from adding The formula for going from the Pij s to the Pij s is the following: up the contributions of multiple discrete Mendelian factors, he, Wright, and Haldane established the classic population genetic , framework for studying how a population of genotypes evolves from one generation to the next. A central idea is fitness, a where is the sum of the numerators for all ijs. Of course, this mathematical articulation of Darwin’s conception of evolutionary equation is based on certain crucial assumptions: It is assumed advantage, a useful summary of an organism’s phenotype. that generations do not overlap (as if all members of one Formally, this fitness of a genotype can be thought of as the generation procreate simultaneously just before death), that the number of surviving offspring an organism with this genotype will population is so large that it can be thought of as being infinite, have, in expectation. For simplicity, imagine a haploid organism that mating is completely random and without separate sexes, (one copy of each gene) in which fitness depends only on two and that expectations can be added and divided with impunity. genes, and these genes come in three alleles each. The fitness However, for the sake of simplicity, these assumptions have been function of this species is perhaps captured by this 3 × 3 table and used very widely and successfully in evolutionary theory for many denoted wi j : i = 1; 2; 3; j = 1; 2; 3 : decades. So far we have been assuming sexual reproduction. If this species were asexual, the next generation, the 10th and 100th hence j = 1 j = 2 j = 3 would be like this: i = 1 1.05 1.00 0.96 j = 1 j = 2 j = 3 i = 2 1.01 1.03 1.02 i = 1 0.1164 0.1109 0.1064 i = 3 0.94 1.02 0.99 i = 2 0.1120 0.1142 0.1131 Suppose that at some point in time, in the tth generation of this t i = 3 0.1042 0.1131 0.1098 species, the nine genotypes have these frequencies, denoted Pij: next generation j = 1 j = 2 j = 3

i = 1 0.1111 0.1111 0.1111 j = 1 j = 2 j = 3 i = 2 0.1111 0.1111 0.1111 i = 1 0.1693 0.1039 0.0691 i = 3 0.1111 0.1111 0.1111 i = 2 0.1148 0.1397 0.1267 i = 3 0.0560 0.1267 0.0940 Assume reproduction by recombination (sex); this means that a child inherits each allele from either of its parents; assume that it in 10 generations inherits it from either with equal probability, independently for the two genes (this holds, for example, if the two genes reside in different chromosomes). Then the genotype frequencies in the j = 1 j = 2 j = 3 next generation, Pt+1 , to the fourth decimal place, will be as shown ij i = 1 0.7767 0.0059 0.0001 in this table: i = 2 0.0160 0.1135 0.0428 j = 1 j = 2 j = 3 i = 3 0.0000 0.0428 0.0022 i = 1 0.1164 0.1109 0.1064 in 100 generations i = 2 0.1120 0.1142 0.1131 i = 3 0.1042 0.1131 0.1098 and the equation is much simpler: while the frequencies in 10 and 100 generations will be as follows:

j = 1 j = 2 j = 3 with playing the same role. Naturally, much of evolution i = 1 0.1210 0.1226 0.0919 involves events not entering these equations at all: mutations i = 2 0.1263 0.1469 0.1171 creating new alleles, speciation events (the creation of a new branch in the tree of life with its own equations), and changes i = 3 0.0825 0.1085 0.0831 in the environment. These equations should be thought of as in 10 generations a mathematical description of a piece of the puzzle, still studied after nearly a century of research. j = 1 j = 2 j = 3 i = 1 0.0848 0.1409 0.0163 i = 2 0.2320 0.4424 0.0542 i = 3 0.0086 0.0185 0.0022 in 100 generations

88 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 review articles ideal fitness function by a polynomially to sea, and concluding that he wants to standard equations used to describe large population of genotypes in poly- bring food to his family’s table. (Inci- how genotype frequencies change nomially many generations35 through dentally, the designers of genetic algo- over generations (see the Darwin and learning by mutations. Notably, there rithms are well aware of this downside Mendel sidebar), we demonstrated is no recombination (sex) in his theory, of sex, and often allow the most suc- an important difference between sex even though it can be added for a mod- cessful individuals into the next gen- and asex: In asexual evolution, the est advantage.17 Several natural classes eration, a stratagem known as “elit- best combination of alleles always pre- of functions are evolvable in this sense; ism,” which however cannot be easily vails. In the presence of sex, however, in fact, functions susceptible to a lim- imitated by nature.) natural selection favors “mixable” al- ited weak form of learning called statis- Evolutionary theorists have labored leles, those alleles that, even though tical learning.18 In the section “A Game for about a century to find other explana- they may not participate in any truly between Genes,” we discuss another in- tions for the role of sex in evolution, but great genetic combinations, perform teresting connection between machine all 20th century explanations are valid adequately across a wide variety of dif- learning algorithms and evolution. only under specific conditions, contra- ferent combinations.23–26 To put it dif- dicting the prevalence of sex in nature.9,a ferently, in the hypothetical three-by- The Role of Sex This is not a small problem. Imag- three fitness landscape in the sidebar, Sex is nearly universal in life: it occurs ine, for example, that even though the winner of asexual evolution will be in animals and plants by the coming much of the terrestrial world is green, the largest entry of the fitness matrix together of sperm and egg, in fungi by we had no clue why leaves exist. That (in this case, 1.05). In contrast, sexual the fusion of hyphae, and even in bacte- would have been a pretty big gap in evolution will favor, roughly speaking, ria:34 Two bacterial cells can pair up, for our understanding of nature. Not those alleles (rows and columns) with example, and build a bridge between knowing the role of sex is an even big- larger average value; where “average” them through which genes are trans- ger gap, because far more life forms takes into account the prevalence of ferred. Many species engage in asexual exchange genes than photosynthe- these genotypes in the population, as reproduction, or in selfing, at some size. It is no wonder the role of sex has we will explore.b times, but also engage in sexual repro- been called “the queen of problems” in duction at other times, keeping their evolutionary biology.6 Sex as a Randomized Algorithm genotypes well shuffled. In contrast, Since sex breaks down genetic com- One of the most central and striking species that do not exchange genes in binations, it has been mainly thought themes of algorithms research in the any form or manner, called “obligate in evolution that effective selection past few decades has been the surpris- asexuals,” are extremely rare, inhabit- acts on individual alleles,38 that is, each ing power of randomization.29 Para- ing sparse, recent twigs of the tree of (non-neutral) allele is either beneficial doxically, evoking chance is often the life, coming from sexual ancestral spe- or detrimental on its own. According safest and most purposeful and effec- cies that lost their sexuality, and head- to this line of thinking, two main forc- tive way of solving a computational ing toward eventual extinction without es drive allele frequencies: selection problem. For one, it helps avoid the producing daughter species.22 acting on alleles as independent ac- worst case, as in Quicksort. Second, Not only is sex essentially univer- tors (where alleles are often assumed sampling from a distribution D helps sal, but it seems to be very much cen- to be making additive contributions decide between competing hypoth- ter stage in life, the basis of a fantas- to fitness), and random genetic drift eses about D: Randomized algorithms tic variety of behavior and structure: (chance sampling effects on allele fre- for software testing, as well as for test- from bacterial conjugation to the in- quency, as discussed previously). The ing primality, or validity of polynomial tense molecular machinery of meio- interaction between alleles, within identities, are like this. sis (cell division producing gametes), and between loci—even though it has Evolution under sex can be seen from flower coloration to bird courtship been of interest in population genet- as an instance of a randomized algo- dances, from stag fights all the way to ics from the start10,39—has played a rithm of the latter type. Suppose we the drama of human passion, much of secondary role, often being treated as want to design a hypothetical evolu- life seems to revolve around sex. So why? a mere correction to the above, under tionary experiment for determining What role might sex play in evolution? the term “epistasis.” A few years ago, whether a new allele of a particular One common answer is that sex while working with biologists Mar- gene performs better than its alterna- generates vast genetic diversity, and cus Feldman and Jonathan Dushoff tive, across all genetic combinations. hence it must help evolution. But, and computer scientist Nicholas Pip- If the population is asexual, this could just as sex puts together genetic com- penger, we asked whether interactions be done by inserting this mutation in binations, it also breaks them down: between alleles could be crucial to the the genome of one individual, and a highly successful genotype will be understanding of the role of sex in a yet gauging the lineage thus founded to absent from the next generation, as unexplored manner.23–26 Based on the see if it thrives. This kind of sampling children inherit half their genes from each parent. To say the role of sex is to a See the appendix available in the ACM Digital b The original paper24 refers to the unweighted create particular, highly favorable ge- Library (dl.acm.org) under Source Material average fitness as mixability, instead of the netic combinations is like watching a for a more extensive bibliography on this and more natural average weighted by genotype man catch fish only to toss them back other subjects. frequency.

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 89 review articles

is very inefficient, because we sample is written Fg = 1 + s Δg, where s is small ing surprisingly well in a multitude

from a small pool (the genotypes that and Δg is the differential fitness of the of difficult problems and contexts: happen to be available in the popula- genotype, ranging in [− 1, 1]. the multiplicative weights update al- tion), and must repeat the insertion Working with the weak selection gorithm (MWU),2 also known in ma- many times—in many individuals. assumption, and after some alge- chine learning as “no-regret learning” But if the population is sexual, then braic manipulation, we noticed the or “hedge” (see more in box “The Ex- by inserting the mutation once, af- equations of the evolution of a popu- perts Problem” in the online appen-

ter log n generations, where n is the lation under sex are mathematically dix). MWU changes the frequencies xi number of genes with which this par- equivalent to a novel process, which of the i-th allele of the gene as follows 33 ticular allele interacts, we will be entails an entirely different way of xi ← xi (1 + smi), (1)

sampling from all possible genetic looking at evolution: where mi is the expected differential combinations that could in principle ˲˲ The process is a game between fitness, positive or negative, of the i-th be constructed. Sex enables evolu- the genes of the organism. Recall that allele in the current gene pool. This

tion to sample quickly from the en- a game is a mathematical model of the quantity mi is a measure of what we have tire space of genetic combinations, in interaction between several players, called the mixability of allele i, its ability the distribution under which they ap- each player having a set of available to form fit combinations with alleles of pear in the population. What is more, actions or pure strategies, and a util- other genes in the current genetic mix. evolution under sex not only decides ity function: the objective the player To summarize, at each generation in among the competing hypotheses is trying to maximize in the game, sexual evolution, each gene boosts the (which allele performs better), but a mapping from choices of actions, frequency of each of its alleles by a fac- also implements this decision (even- one for each player, to a real number. tor that increases with the mixability tually, and with high probability, it The game here is repeated, that is, of this allele in the current generation. will fix the winner). the same precise game is played over Naturally, the quantities resulting from Finally, for yet another take on ex- and over for many rounds, with each the equation are normalized appropri- plaining the ubiquity of sex in life, in round corresponding to a generation ately so as to add to one. terms and concepts familiar to com- of the population. This is a completely new way of puter scientists, view The Red-Blue ˲˲ The available actions of each gene/ looking at evolution. And it is a produc- Tree Theorem sidebar. player correspond to the gene’s alleles. tive view, because it gets more inter- ˲˲ At every generation, each gene esting: Let us look back at the update A Game between Genes chooses and plays a mixed strategy: Equation 1 and ask once again the The search in population genetics for it randomizes over its actions, that is question: This choice of the new prob- a quantity that is optimized by natu- to say, its alleles. The probability as- abilities for the alleles by the gene, is it ral selection has a long history. Fisher signed to each allele in this mixed optimizing something? For once, the wanted a theory for evolution with a strategy corresponds to the frequency answer is very clean: Yes, the choice of mathematical law as clean and central of the allele in the population during allele frequencies by the gene shown as the second law of thermodynam- this generation. in Equation 1 optimizes the following ics,10 while Wright pointed out that ˲˲ The intricacy of game theory is function, specific to this gene, of the al- the frequency of an allele in a diploid mostly due to conflicts between the lele frequencies: locus changes in the direction that objectives of the players. But in the 1 increases the population’s mean fit- game played by the genes there are no Φ (x) = Σ xiMi – Σ xi ln xi. (2) i s i ness.40 Later, investigators tried their conflicts—this is not Dawkins’s self-

hand again at looking for a Lyapunov ish gene metaphor: All players have Here Mi denotes the cumulative rel- function that will describe evolution, the same utility function, which is ative fitness of allele i, that is, the sum

albeit with little success. the fitness of the genotype resulting of the mi in Equation 1 over all genera- Our search for an analytical maxi- from the players’ choices. Games of tions up to and including t − 1. It is easy mization principle involving mixability identical utilities are called coordina- to notice that Φ is a strictly concave ended with a surprise: We did not an- tion games in game theory, and are function, and thus has one maximum, swer the question “What is evolution the simplest possible kind. They are and this maximum can be checked by optimizing?” but, perhaps more inter- of interest only in cases in which the routine calculation to be exactly the estingly, we identified the quantity that players are cognitively weak, or cannot new frequencies as updated in Equation each gene seems to be optimizing dur- communicate effectively. 1! Now notice that the second term of Φ ing evolution under sex. Together with ˲˲ In a repeated game, the players is plainly the entropy of the distribution Erick Chastain and Umesh Vazirani,7 we must update their mixed strategies x, a well-known measure of a distribu- focused on the standard equations de- from one round to the next, based on tion’s diversity. scribed in the Darwin Mendel sidebar, in the outcome of the previous rounds. There is much that is unexpected a particular evolutionary regime known Here lies the biggest surprise: The and evocative here, but perhaps most as weak selection.30 Weak selection is the update rule used by the genes in this surprising of all is that this radically widely held assumption that fitness dif- game is identical to a venerable learn- new interpretation of evolution was ferences between genotypes are small. ing algorithm, well known to comput- lurking for almost a century so close to The fitness of a genotype g in this regime er scientists for its prowess in work- the surface of these well-trodden equa-

90 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 review articles tions. That an algorithm as effective as MWU is involved in evolution under sex is also significant. It was pointed out in a commentary on our paper5 The Red-Blue that the MWU is also present in asex. Indeed, asexual evolution can be trivi- Tree Theorem ally thought of as MWU helping nature A completely different take on the ubiquity of sex in evolution, based on unpublished work, uses concepts and images very familiar to computer scientists. select genotypes. However, our point is Imagine a tree with nodes of two kinds: red and blue. The tree starts from a root, that in sexual evolution, the picture is far which is blue, and is generated through a random process. At each step (depth of the more sophisticated and organic, occur- tree), the nodes at this depth are replaced as shown here, each independently of the ring deeper in the hierarchy of life: Indi- others according to the probabilities as shown below. vidual genes interact, each “managing .898 .002 .1 its investments in alleles” using MWU, in a context created by the other genes and, of course, by the environment. (extinction) The function Φ is a rare explicit op- timization principle in evolution. The second term, and especially its large constant coefficient (recall that the se- lection strength s is small, and |mi|≤ 1) suggests that attention to diversity is .899 .001 .1 an important ingredient of this mecha- nism, a remark that may be relevant to the question of how genetic diversity is (extinction) maintained. But there is a mystery in the non-diversity terms: the cumulative nature of the fitness coefficients M sug- gests that performance during any pre- This is a very crude model for generating the tree of life (the probabilities here are vious generation is as important as the indicative, and not supposed to reflect some biological reality). Red nodes are sexual species, and blue nodes are distinct asexual types (the term “species” does not perfectly current generation for the determina- apply to the latter). While there are reasons to believe that life started sexually,22 and tion of the genetic make-up of the next not with a well-delineated species as exist today, we make the assumption that the tree generation. How can this be? begins with a blue node, which for our current purposes is conservative. According to the chosen parameters, sexuals and asexuals become extinct with equal probability. This surprising connection between If extinction does not happen, sexual species branch out 50% more. With small evolution and algorithms, through probability, one of the new species can be of the other kind. game theory and machine learning, as How would the leaves of the tree (that is, the life we see around us) look like, many well as the maximization principle Φ, steps down? After only 50 steps, the expected ratio of asexuals to sexuals is roughly 2:1000 (the probability of the top middle event). are very recent, and their full interpreta- Asexual nodes are rare! And, to a computer scientist, this stands to reason: We know tion is a work in progress. that branching factors (in trees, computer virus propagation, recursive algorithms, The insights about sex as we have among others) can dwarf in ultimate significance all other ingredients of a situation. Interestingly, this simple model immediately explains the main characteristics of the discussed shed some light on the mys- distribution of sexuality: It predicts that asexual forms are rare, sparsely distributed tery of genetic algorithms, as explained across the tree of life, and recently derived from sexual ancestors. Empirical evidence previously. There is a mismatch between shows that all three points are true.22 heuristics and evolution. Heuristics The precise relevant theorem is the following23 (it is a consequence of well known properties of branching, see Athreya and Ney3): should strive to create populations that Consider a continuous random process of generating a red-blue tree in which contain outstanding individuals. In con- the rates of the events B → BB, B → BR, B → extinction, R → RR, R → BR and R → trast, evolution under sex seems to excel extinction are, respectively, σA, β, εA, σS, α, and εS Let δS = σS − εS, δA = σA, − εA, and δS at something markedly different: at cre- > δA. Suppose that α > 0 and β > 0 be each small in comparison to δS − δA. Then the asymptotic ratio of asexual to sexual species at the leaves is ≈ α/ (δS − δA). ating a “good population.” So, it is small This argument suggests that, conceivably, a key advantage of sex might be its “branching wonder that genetic algorithms are not factor” in the tree of life. Even if sex were a burden for species and organisms (which is at the best heuristics around. On the other least a part of the truth), the mere fact that it promotes diversification and an exploration of new evolutionary niches might be enough for this strange trait to dominate the planet. hand, these insights also suggest that ge- netic algorithms may be valuable when the robustness of solutions is sought, or ry, as many important mutations are re- deluge of evidence pointing to involved when the true objective is unknown or arrangements of small stretches as well biological mechanisms that bring about uncertain or subject to change. as large swaths of DNA: duplications, and affect mutations.22 deletions, insertions, inversions, among We know, for example, that the chance Are Mutations Random? others.13 For a long time it has been be- of a mutation varies from one region of What is a mutation? Point mutations lieved that mutations are the results of the genome to another and is affected (changes in a single base such as an A accidents such as radiation damage or by both local and remote DNA.22 Nearly a turning into a C) are only part of the sto- replication error. But by now we have a quarter of all point mutations in humans

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 91 review articles

happen at a C base which comes before an image of evolution that is even more a G after that C is chemically modified explicitly algorithmic. It also means that (methylated);11 methylation is know to genes interacting in one organism can be the result of complex enzymatic pro- leave hereditary effects on the organ- cesses. As to rearrangement mutations, ism’s offspring.22 It no longer matters there are powerful agents of mutagenic- There is a mismatch that a lucky genetic combination created ity (creation of mutations) in the genome, between heuristics by sex is doomed to vanish from the face such as transposable elements: DNA se- of the earth (that the fisherman of our quences prone to “jump” from one place and evolution. earlier metaphor throws the fish away): It of the genome to another, carrying other Heuristics should may have achieved a lasting effect on the DNA sequences with them.13 A key step population through mutagenicity. Final- in mammalian pregnancy (decidualiza- strive to create ly, the biological mechanisms affecting tion), for instance, was the result of mas- mutations may themselves evolve. sive evolutionary rewiring of about 1,500 populations that genes mediated in part by transposable contain outstanding On the Preservation of Variation elements.27 Furthermore, the genetic se- If there is one idea that permeates all the quences that participate in a rearrange- individuals. various aspects of computational think- ment mutation are likely to be function- Evolution under ing about evolution, as explained in the ally related, since they are likely to be past sections, it is this: Interactions be- close to each other in 3D space and to sex seems to tween genes are crucial for understand- bear sequence similarity, both of which excel at something ing evolution. Gene interactions also allow interaction through recombina- come up in our recent work on the fol- tion-based mutational mechanisms (see markedly different: lowing question: How is variation within Livnat22 and references therein). Indeed, creating a a species preserved? the same machinery that effects sexual Classic data indicates that, for a large recombination is also involved in mu- “good population.” variety of plants and animals taken to- tations, and in fact produces different gether, the percentage of protein-coding types of rearrangement mutations, de- loci that are polymorphic (in the sense pending on the genetic sequences that that more than one protein variant ex- are present.13 Finally, different human ists that appears in more than 1% of the populations undergo different kinds of individuals in a population), and the mutations resulting in the same favor- percentage of such loci that are hetero- able effect, such as malaria resistance, zygous in an individual, average around suggesting that genetic differences be- 30% and 7% respectively.31 Such a num- tween populations cause differences in ber is far greater than could be explained mutation origination.22 by traditional selection-based theories.21 The idea that mutations may be Genetic variation is fed by mutations, non-accidental is still confronted with and, according to the equations of popu- suspicion due to the legacy of Lamarck, lations genetics, it is decreased by fixa- who believed around 1800 that organ- tion, the eventual triumph of one allele isms can sense, through interaction with and the extinction of all other alleles of the environment, what is needed for an the same gene. In much of the discourse evolutionary improvement, and are able on the subject, selection is assumed to to make the correct heritable change. act on individual genes, and therefore Since in the light of modern biology this fitness is additive. The equations tell us 1 seems impossible—to a computer sci- that fixation will happen after O(Δ ) gen- entist it sounds like reversing a one-way erations, where Δ is the differ- ence function, or a hash function—the acci- in relative fitness between the most fit dental mutation notion prevailed. But and the second most fit allele. Fixation in science one must not assume that the looks rather speedy. only relevant alternatives are the famil- In the spring of 2014, the Simons in- iar, inside-the-box ones. Mutations are stitute brought to Berkeley 60 biologists random—but it may be more productive and computer scientists to exchange to think of them as random in the same ideas on evolution. It was during that way that the outputs of randomized algo- time the authors, with Costis Daskalakis rithms are random. Indeed, mutations and Albert Wu, explored how our under- are biological processes, and as such standing of the speed of fixation would they must be affected by the interactions be affected if one takes into account gene between genes. This new conception of interactions. We approached the subject heredity is exciting, because it creates through some decades-old work on the

92 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 review articles

complexity of local search;15 this theory pristine solutions to difficult problems 18. Kearns, M. Efficient noise-tolerant learning from statistical queries. J. ACM 45, 6 (1998), 983–1006. examines how difficult it is for a local such as communication, cooperation, vi- 19. Kimura, M. and Ohta, T. The average number of search process to reach a local optimum, sion, locomotion, and reasoning, among generations until fixation of a mutant gene in a finite population. Genetics 61, 3 (1969), 763. and the conclusion has in many cases so many more. One is tempted to ask: 20. Kirkpatrick, S., Gelatt, Jr,, C.D. and Vecchi, M.P. been: “pretty hard.” By applying this What algorithm could create all this in just Optimization by simulated annealing. Science 220 (1983), 671–680. 12 12 point of view to selection on interacting 10 steps? The number 10 —one tril- 21. Lewontin, R.C. and Hubby, J.L. A molecular approach to genes, we showed that there are n-gene lion—comes up because this is believed the study of genic heterozygosity in natural populations; amount of variation and degree of heterozygosity in systems, in which the fitness is the sum to be the number of generations since natural populations of Drosophila pseudoobscura. total of contributions of certain pairs the dawn of life 3.5 ∙ 109 years ago (notice Genetics 54 (1966), 595–609. 22. Livnat, A. Interaction-based evolution: How natural of alleles—that is, the next step beyond that most of our ancestors could not have selection and nonrandom mutation work together. Biology Direct 8, 1 (2013), 24. selection on single alleles—for which lived for much more than a day). And it 23. Livnat, A., Feldman, M.W., Papadimitriou, C. and fixation takes a number of generations is not a huge number: cellphone proces- Pippenger, N. On the advantage to sexual species in n diversification rates. Unpublished manuscript. proportional to 2 to happen. A stronger sors do many more steps in an hour. 24. Livnat, A., Papadimitriou, C., Dushoff, J. and Feldman, result can also be obtained under the Over the past decade, computer sci- M.W. A mixability theory for the role of sex in evolution. In Proceedings of the National Academy of Sciences 105, well-accepted complexity assumption entists and evolutionary biologists work- 50 (2008), 19803–19808. that local search is intractable in general ing together have come up with new 25. Livnat, A., Papadimitriou, C. and Feldman, M.W. An 15 analytical contrast between fitness maximization and (see Johnson et al. for details). The im- insights about central open problems selection for mixability. J. Theoretical Biology 273, 1 (2011), plication is that, if gene interactions are surrounding evolution—including, 232–234. 26. Livnat, A., Papadimitriou, C., Pippenger, N. and Feldman, taken into account, fixation may take rather surprisingly, a proposed answer M.W. Sex, mixability, and modularity. In Proceedings much longer than in the regime of selec- to the “algorithm” question—by looking of the National Academy of Sciences 107, 4 (2010), 1452–1457. tion on individual genes. at evolution from a computational point 27. Lynch, V.J., Leclerc, R.D., May, G. and Wagner, G.P. Does this insight explain the mystery of view. And, of course many more ques- Transposon-mediated rewiring of gene regulatory networks contributed to the evolution of pregnancy in of variation? Not yet, because our analy- tions, inviting similar investigation, were mammals. Nature Genetics 43 (2011), 1154–1159. sis so far has been disregarding two oth- opened up in the process. 28. Mitchell, M. An Introduction to Genetic Algorithms. MIT Press, Cambridge, MA, 1996. er powerful forces in evolution, besides 29. Motwani, R. and Raghavan, P. Randomized Algorithms. Additional background and literature appears in an online Cambridge University Press, 1995. mutation and selection, acting on varia- appendix available with this article in the ACM Digital 30. Nagylaki, T., Hofbauer, J. and Brunovský, P. Convergence tion: the finiteness of the population, Library (dl.acm.org) under Source Material. of multilocus systems under weak epistasis or weak selection. J. Mathematical Biology 38, 2 (1999), 103–133. and heterozygosity (a diploid organism 31. Nevo, E., Beiles, A. and Ben-Shlomo, R. The References evolutionary significance of genetic diversity: Ecological, carrying two different alleles of a gene.). 1. Aldous, D. and Vazirani, U. ‘Go with the winners’ algorithms. th demographic and life history correlates. Lecture Notes in First, finiteness. Because the num- In Proceedings of the 35 Annual IEEE Symposium on Biomathematics 53 (1984), 13–213. (1994). 492–501. Foundations of Computer Science 32. Papadimitriou, C. and Steiglitz, K. Combinatorial ber of individuals carrying the alleles in 2. Arora, S., Hazan, E. and Kale, S. The multiplicative Optimization: Algorithms and Complexity. Dover, 1998. question is finite, sayN , the number of weights update method: A meta-algorithm and 33. Rabani, Y. Rabinovich, Y. and Sinclair, A. A computational applications. , 1 (2012), 121–164. Theory of Computing 8 view of population genetics. In Proceedings of the 27th individuals carrying each allele at each 3. Athreya, K. and Ney, P. Branching Processes. Springer, 1972. nd Annual ACM Symposium on Theory of Computing,(1995), generation evolves as in a kind of a ran- 4. Babbage, C. The Ninth Bridgewater Treatise. 2 edn. 83–92. John Murray, London, 1838. dom walk within the confines of [0, N], 34. Stearns, S.C. and Hoekstra, R.F. Evolution: An 5. Barton, N.H., Novak, S. and Paixão, T. Diverse forms Introduction. Oxford University Press, New York, 2005. and, ignoring selection, this results in of selection in evolution and computer science. In 35. Valiant, L. Probably Approximately Correct: Nature’s Proceedings of the National Academy of Sciences 111, 29 fixation after O(N) generations.19 Sec- Algorithms for Learning and Prospering in a Complex (2014), 10398–10399. World. Basic Books, 2013. ond, diploidy introduces the possibility 6. Bell, G. The Masterpiece of Nature: The Evolution and 36. Valiant, L.G. Evolvability. J. ACM 56, 1 (2009), 3. Genetics of Sexuality. University of California Press, of overdominance, in which organisms 37. Von Neumann, J. and A. W. Burks, A.W. Theory of self- Berkeley, CA, 1982. reproducing automata. IEEE Transactions on Neural with two different alleles of a gene are 7. Chastain, E., Livnat, A., Papadimitriou, C. and Vazirani, Networks 5, 1 (1966), 3–14. U. Algorithms, games, and evolution. In Proceedings 38. Williams, G.C. Adaptation and Natural Selection, 8th more fit than organisms with two copies of the National Academy of Sciences 111, 29 (2014), edition. Princeton University Press, 1996. of one allele or two copies of the other. In 10620–10623. 39. Wright, S. Evolution in Mendelian populations. Genetics 8. Darwin, C. On the Origin of Species by Means of Natural 16 (1931), 97–159. overdominance, the equations of selec- Selection, or the Preservation of Favoured Races in the 40. Wright, S. The distribution of gene frequencies in tion point to stable variation, with both Struggle for Life. Murray, London, 1859. populations. In Proceedings of the National Academy of 9. Feldman, M.W. Otto, P. and Christiansen, F.B. Population Sciences of the United States of America, 23, 6 (1937), alleles enjoying stably high frequency in genetic perspectives on the evolution of recombination. 307–320. Annual Review of Genetics 30 (1997), 261–295. the population. 10. Fisher, R.A. The Genetical Theory of Natural Selection. How these three effects, of finite The Clarendon Press, Oxford, U.K., 1930. Adi Livnat ([email protected]) is a Senior Lecturer population, of heterozygosity, and of 11. Fryxell, K.J. and Moon, W.-J. CpG mutation rates in in the Department of Evolutionary and Environmental the human genome are highly dependent on local GC Biology, and Institute of Evolution at the University of selection acting on combinations of content. Molecular Biology and Evolution 22 (2005), Haifa, Israel. 650–658. alleles across loci, interact with one 12. Goldberg, D. Genetic Algorithms in Search, Optimization Christos Papadimitriou ([email protected]) is another is an important subject for fur- and Machine Learning. Addison-Wesley, Reading, MA, 1989. the C. Lester Hogan Professor in the Computer Science 13. Graur, D. and Li, W.-H. Fundamentals of Molecular Division of the University of California at Berkeley. ther research. Evolution. Sinauer Associates, Sunderland, MA, 2000. 14. Holland, J.H. Adaptation in Natural and Artificial Copyright held by authors. Systems: An Introductory Analysis with Applications to Publications rights licensed to ACM. $15.00. Epilogue Biology, Control, and Artificial Intelligence. U Michigan A computer scientist marvels at the Press, 1975. 15. Johnson, D.S., Papadimitriou, C.H., and Yannakakis, M. brilliant ways in which evolution has How easy is local search? J. Computer and System Sciences 37, 1 (1988), 79–100. achieved so much: Systems with remark- 16. Jong, K.A.D. Evolutionary Computation: A Unified Watch the authors discuss able resource efficiency, reliability and Approach. MIT Press, Cambridge MA, 2006. their work in this exclusive 17. Kanade, V. Evolution with recombination. In Proceedings Communications video. survivability, adaptability to exogenous of the 52nd Annual IEEE Symposium on Foundations of http://cacm.acm.org/videos/ circumstances, let alone ingenious and Computer Science, (2011), 837–846. sex-as-an-algorithm

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 93 review articles

DOI:10.1145/2891406 a variety of signals about characteris- The future success of these systems tics of the users and items, including people’s explicit or implicit evaluations depends on more than a Netflix challenge. of items. The systems process these signals at a massive scale, often under BY DIETMAR JANNACH, PAUL RESNICK, real-time constraints. Most impor- ALEXANDER TUZHILIN, AND MARKUS ZANKER tantly, the recommendations are of sig- nificant quality on average. In empiri- cal tests, people choose the suggested items far more often than they choose suggested items based on unpersonal- Recommender ized benchmark algorithms that are based on overall item popularity. In other ways, the systems that Systems— produce these recommendations are sometimes remarkably bad. Occasion- ally, they make recommendations that Beyond Matrix are embarrassing for the system, such as recommending to a faculty mem- ber an introductory book from the Completion “for dummies” series on a topic she is expert in. Or, they continue recom- mending items the user is no longer interested in. Shortcomings like these motivate ongoing research both in industry and academia, and recom- mender systems are a very active field of research today. To provide an understanding of the THE USE OF recommender systems has exploded over state of the art of recommender sys- the last decade, making personalized recommendations tems, this article starts with a bit of his- tory, culminating in the million-dollar ubiquitous online. Most of the major companies, Netflix challenge. That challenge led to including Google, Facebook, Twitter, LinkedIn, Netflix, a formulation of the recommendation problem as one of matrix completion: Amazon, Microsoft, Yahoo!, eBay, Pandora, Spotify, Given a matrix of users by items, with and many others use recommender systems (RS) item ratings as cells, how well can an within their services. algorithm predict the values in some These systems are used to recommend a whole key insights range of items, including consumer products, movies, ˽˽ Recommender systems have become a songs, friends, news articles, restaurants and various ubiquitous part of our daily online user experience and support users in a variety others. Recommender systems constitute a mission- of domains. critical technology in several companies. For example, ˽˽ Today, the scientific community operationalizes the research problem Netflix reports that at least 75% of its downloads and mainly on principles from information retrieval and machine learning, leading rentals come from their RS, thus making it of strategic to a well-defined but narrow problem importance to the company.a characterization. ˽˽ We briefly review the history of the In some ways, the systems that produce these field, report on the recent advances, and propose a more comprehensive recommendations are remarkable. They incorporate research approach that considers both the consumer's and the provider's a http://techblog.netflix.com/2012/04/netflix-recommendations-beyond-5-stars.html perspective.

94 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 cells that are deliberately held out? tions must be personalized or adapted vided by the users to rank and filter However, algorithms with maximum to the user’s situation, with different documents, for example, based on key- accuracy at the matrix completion people typically getting different item word overlap counts. Later on, more task are not sufficient to make the best suggestions. That implies maintaining elaborate techniques like weighted recommendations in many practical some kind of user history or model of term vectors (for example, TF-IDF vec- settings. We will describe why, review user interests. tors) or more sophisticated document some of the approaches that current Building user profiles: Information analysis methods like latent semantic research is taking to do better, and fi- filtering roots. In many application do- indexing (LSI) were applied to repre- nally sketch ways of approaching the mains, for example, in news recom- sent documents, with correspond- recommendation problem in a more mendation, recommenders can be ing representations of user interests comprehensive way in the future. seen as classic information filtering stored as user models. Recommender (IF) systems that scan and filter text systems based on these techniques are A Brief History documents based on personal user typically called “content-based filter- Many fields have contributed to rec- preferences or interests. The idea of us- ing” approaches. ommender systems research, includ- ing a computer to filter a stream of in- Leveraging the opinions of others. ing information systems, information coming information according to the As early as 1982, then ACM president retrieval (IR), machine learning (ML), preferences of a user dates back to the Peter J. Denning complained about human-computer interaction (HCI), 1960s, when first ideas were published “electronic (email) junk” and advo- and even more distant disciplines like under the term “selective dissemina- cated the development of more intel- marketing and physics. The common tion of information.”17 Early systems ligent systems that help to organize,

IMAGE BY ALICIA KUBISTA/ANDRIJ BORYS ASSOCIATES BORYS ALICIA KUBISTA/ANDRIJ BY IMAGE starting point is that recommenda- used explicit keywords that were pro- prioritize, and filter the incoming

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 95 review articles

sion rates. In the same report, several challenges in practical environments were discussed, in particular the prob- Recommendation as lem of scalability and the need to cre- ate recommendations in real time. Matrix Completion The matrix completion problem. By The recommendation problem viewed as a matrix completion problem as done in the this time, the research community had Netflix Prize. 1. Given a sparse matrix M, create a matrix M′ by randomly hiding a subset H of the developed some standard concepts known ratings. and terminology. The core element is 2. Predict the values (H*) of the hidden ratings using M′. the user-item rating matrix, as illus- 3. Assess the difference between the predicted and the true ratings using the Root Mean Square Error (RMSE). trated in the upper part of the accom- RMSE: Given a vector of predictions H* of length n and the vector containing the panying sidebar. Rows represent users, true values H, the RMSE is computed as columns represent items, and each cell 1 n * 2 represents a user’s subjective prefer- √ Σi = 1 (Hi – Hi) n ence for an item, determined based on an explicit report (for example, 1–5 M Users/Items I1 I2 I3 …. I100000 x Known ratings stars) or based on user behavior (for U1 4 3 2 Missing ratings example, clicking, buying, or spending U2 2 5 time on the item). U3 3 5 2 … The user-item matrix is generally U10000 2 2 5 sparse: most users have not interact- ed with most items. One formulation of the recommender problem, then, M′ Users/Items I1 I2 I3 …. I100000 x Known ratings U1 4 3 2 Missing ratings is that of a matrix completion prob- U2 2 ? ? Withheld ratings lem. That is, the problem is to predict U3 3 5 ? what the missing cells will be, in other … H words, how will users rate items they U10000 2 ? 5 haven’t rated yet? With that formulation, it was natu- Common techniques. The goal of matrix factorization techniques in RS is to ral to apply and adapt machine-learn- determine a low-rank approximation of the user-item rating matrix by decomposing it into a product of (user and item) matrices of lower dimensionality (latent factors). ing techniques from other problem The idea of ensemble methods is to combine multiple alternative machine learning settings, including various forms of models to obtain more accurate predictions. clustering, classification, regression, or singular value decomposition.3,4 Correspondingly, the community ad- streams of information.8 One of his ued the ideas of Tapestry and intro- opted evaluation measures from the IR proposals included the idea to use duced a system component, the “Better and ML fields like precision/recall and “trusted authorities” that assess docu- Bit Bureau,” which made automated the root mean squared error (RMSE) ment quality; receivers would only read predictions about which items people measure. These evaluation techniques documents that surpass some defined would like based on a nearest-neigh- withhold some of the known ratings, quality level. In 1987, the “Information bor scheme. Other research groups use the algorithm to predict ratings of Lens” personal mail processing system working independently developed those cells, and then compare the pre- was proposed.25 The system was main- similar ideas.18,36 The idea was “in the dicted ratings with the withheld ones. ly based on manually defined filtering air” that opinions of other people were The availability of some common rating rules but the authors already envisaged a valuable resource and the race was on datasets, distributed by vendors and a system where email receivers could to turn the idea into practical results. academic projects, enabled research- endorse other people whose opinions It works in e-commerce! In 1999, only ers to conduct bake-offs comparing the they value. The number and strength five years after the first CF methods performance of alternative matrix fill- of the endorsements would then pri- were proposed, Shafer et al.35 reported ing algorithms against each other. oritize incoming messages. The Tap- on several industrial applications of The Netflix Prize. Netflix, which saw estry email filtering system at Xerox recommender systems technology in e- the strategic value in improving its PARC14 adopted a similar approach of commerce; for example, for the recom- recommendations, supercharged the employing user-specified rules. It also mendation of books, movies, or music. bake-off process for matrix comple- introduced the idea that some readers Amazon.com is mentioned as one of tion algorithms in 2006. Netflix of- could classify (rate) messages and oth- the early adopters of recommendation fered a million-dollar prize, a dataset er readers could access this informa- systems. In 2003, Linden et al.23 report for training, and an infrastructure for tion, which was called “collaborative that Amazon’s use of item-to-item CF testing algorithms on withheld data. filtering” (CF). techniques as a targeted marketing The training dataset included 100 In 1994, Resnick et al.33 presented tool had a huge impact on its business million real customer ratings. The the GroupLens system, which contin- in terms of click-through and conver- prize was for the first algorithm to

96 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 review articles outperform Netflix’s in-house system ratings of songs randomly assigned to mendation domain, the problem re- by 10% on RMSE (see the sidebar). In- users had a very different distribution mains that the “ground truth” (that is, terest in this competition was huge. than ratings of songs users had chosen whether or not an item is actually rel- More than 5,000 teams registered for to rate.26 evant for a user) is available only for a the competition and the prize was fi- As a result, algorithms that predict tiny fraction of the items. The results of nally awarded in 2009. Substantial well on held-out ratings that users pro- an empirical evaluation can depend on progress was made with respect to the vided may predict poorly on a random how the items with unknown ground application of ML approaches for the set of items the user has not rated. This truth are treated when determining rating prediction task. In particular, can mean that algorithms tuned to per- the accuracy metrics. In addition, the various forms of matrix factorization form well on past ratings are not the problem of items not missing at ran- as well as ensemble learning tech- best algorithms for recommending in dom also exists for learning-to-rank niques were further developed in the the real world.7 approaches and at least some of them course of the competition and proved In addition, the matrix completion exhibit a strong bias to recommend to be highly successful. problem setup is not suitable to as- blockbusters to everyone, which might sess the value of reminding users of be of little value for the users.19 Beyond Matrix Completion items they have already purchased or In some domains, like music rec- At the conclusion of the Netflix Prize consumed in the past. However, such ommendation, it is also important to competition, it might have been plausi- repeated recommendations can be a avoid very “bad” recommendations ble to think that recommender systems desired functionality of recommend- as they can greatly impact the user’s were a solved problem. After all, many ers, for example, in domains like music quality perception.6,21 Omitting some very talented researchers had devoted recommendation or the recommenda- “good” recommendations is not nearly themselves for an extended period of tion of consumables. so harmful, which would argue for risk- time to improve the prediction of with- In the end, the standardized evalua- averse algorithm designs that mostly held ratings. The returns on that effort tion setup and the availability of public recommend items with a high average seemed to be diminishing quite rap- rating datasets made it attractive for re- rating and low rating variance. Recom- idly, with the final small improvements searchers to focus on accuracy measures mending only such generally liked, that were sufficient to win the prize and the matrix completion setup and non-controversial items might however coming from combining the efforts of may have lured them away from inves- not be particularly helpful for some of many independent contestants. tigating the value of other information the users. However, it turns out that recom- sources and alternative ways of evaluat- System quality factors beyond accu- mender systems are far from a solved ing the utility of recommendations. racy. The Netflix Prize with its focus on problem. Here, we first give examples of Today, a growing number of academ- accuracy has undoubtedly pushed rec- why optimizing the prediction accuracy ic studies try to evaluate the performance ommender systems research forward. for held-out historical ratings might be of their methods using A/B tests on live However, it has also partially over- insufficient or even misleading. Then customers in real industrial settings shadowed many other important chal- we discuss selected quality factors of (for example, Dias et al,9 Garcin et al.,12 lenges when building a recommender recommender systems not covered by and Gorgoglione et al.16). This is a very system and today even Netflix states the matrix completion task at all and positive trend that requires cooperation “there are much better ways to help give examples of recent research that from a commercial vendor who may not people find videos to watch than focus- goes beyond matrix completion. agree to make data publicly available, ing only on those with a high predicted Pitfalls of matrix completion set- thus making it difficult for results to be star rating.”15 Next, we give examples of ups. Postdiction ≠ prediction. Predicting checked or reproduced by others. quality factors other than single-item held-out matrix entries is really pre- Not all items and errors are equally accuracy, review how recent research dicting the past rather than the future. important. RMSE, the evaluation met- has approached these problems, and If the held-out rating entries are repre- ric used in the Netflix Prize, equally sketch open challenges. sentative of the hidden rating entries, weights errors of prediction on all Novelty, diversity, and other compo- then the distinction does not matter. items. However, in most practical set- nents of utility. Making good rating pre- However, in many recommender set- tings items with low predicted ratings dictions for as-yet unrated items is al- tings, the held-out ratings are not rep- are never shown to users, so it hardly most never the ultimate goal. The true resentative of the missing ratings. matters if the correct prediction for goal of providing recommendations is One reason is the missing ratings those items is 1, 2, or 3 stars. Intuitively, rather some combination of a certain are generally not missing at random. it is more appropriate in these domains value for the user and profit for the site. Even for items that people have experi- to optimize a ranking criterion that fo- In some domains, user ratings may rep- enced, if rating requires any effort at all cuses on having the top items correct. resent a general quality assessment but they are more likely to rate items that In recent years, a number of learn- still not imply the item should be rec- they love or hate rather than those that ing-to-rank approaches have been pro- ommended. As an example, consider they feel lukewarm about. Moreover, posed in the literature to address this the problem of recommending restau- people are more likely to try items they issue, which aim to optimize (proxies rants to travelers. Most people dining expect to like. For example, in one em- of) rank-based measures. When ap- at a Michelin-starred location may give pirical study, researchers found that plying such IR measures in the recom- it five stars, but budget travelers may be

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 97 review articles

annoyed to see it recommended. As a the usefulness of the recommenda- result, some researchers have tried to tions from the perspective of the end recommend based on a more encom- user. The providers of recommenda- passing model of utility. For the budget tion services, however, often try to op- traveler, that utility or “economic val- timize their algorithms based on A/B ue” might increase with predicted rat- Predicting held-out tests using very different measures, ing but decrease with cost.13 Therefore, matrix entries is including sales volumes, conver- predicting how much a user will “like” sion rates, activity on the platform, or an item—as done in the Netflix Prize— really predicting sustained customer loyalty in terms can in many domains be insufficient. the past rather of revisiting customers or renewed The problem in reality often is to addi- subscriptions. These measures vary tionally predict the presumed utility of than the future. significantly across businesses and a recommendation for the user. sometimes even over time when the The novelty and non-obviousness of importance of different key perfor- an item are, for instance, factors that mance indicators varies over time. may affect the utility of item recom- Context matters. Even if we generally mendations. Proposing the purchase know how to assess the usefulness of an of bread and butter in a grocery shop item for a user, this usefulness might is obvious and will probably not gen- not be stable and depend on the user’s erate additional sales. Similarly, rec- current context. Assume, for example, ommending sequels of a movie that a that we have built a recommendation user liked a lot and will watch anyway system for restaurants and have done or songs by a user’s favorite artist will everything right so far. Our algorithm not help the user discover new things. is good at matching the users’ prefer- In many domains, it is not even ences and the recommendations them- meaningful to assess the utility of a selves are a good mix of familiar and single recommendation, but only sets new options as well as popular choices of recommendations. In the movie do- and insider tips. The recommendations main, once a recommendation list in- can still be perceived as poor. A restau- cludes one Harry Potter movie, there is rant in a northern climate with accept- diminished value from additional Harry able food and indoor ambience but an Potter movies. Quality measures like exceptional outdoor patio overlooking novelty, diversity and unexpectedness the harbor will probably be a good rec- have therefore moved into the focus of ommendation, but only when visited researchers in recent years.5 in summer. Traditional CF techniques While quite some progress was made unfortunately do not account for such over the past few years and researchers time aspects. In many domains context- are increasingly aware of the problem aware algorithms are therefore required that being accurate might be insuffi- as they are able to vary their recommen- cient, a number of issues remain open. dations depending on contextual fac- It is, for example, often not clear if and tors such as time, location, mood, or the to what extent a certain quality charac- presence of other people. teristic like novelty is truly desired in Over the past decade, a number of a given application for a specific user. context-aware recommendation capa- Similarly, too much diversity can some- bilities have been developed in academia times be detrimental to the user experi- and applied in a variety of application ence. Finding the right mix of novel and settings, including movies, restaurants, familiar items can be challenging, and music, travel, and mobile applica- more research is required to better un- tions.2 Typical context adaptation strat- derstand the requirements and success egies are to filter the recommendable factors in particular domains. items before or after the application of Another issue is that estimating the a non-contextualized algorithm, to col- strength of quality factors like diversity lect multiple ratings for the same item based on offline experiments is prob- in different context situations, or to de- lematic. Measures like Intra-List-Diver- sign recommendation techniques that sity have been proposed in the literature factor in context information into their but up to now it is unclear to what extent machine learning models. such objective measures correlate with Contextualization has also become the users’ diversity perception. a common feature in real applications Generally, the literature focuses on today. For example, many music web-

98 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 review articles sites, such as Spotify, ask listeners for sational recommender systems were ating fake profiles and ratings. their current mood or adapt the recom- proposed to elicit user preferences in- In the long run, customers who were mendations depending on the time teractively and engage in a “propose, misled by such manipulated reviews of the day. Online shopping sites look feedback, and revise” cycle with us- would distrust the recommendations at the very recent navigation behavior ers.37 They are employed in domains made by the system and in the worst and infer short-term shopping goals where consumers are confronted case the online service as a whole. Be- of their visitors. Mobile recommender with high involvement buying deci- ing resilient against such manipula- systems finally constitute a special sions, such as financial services, real tions can therefore be crucial to the case of context-aware recommenders, estate, or tourism. Most approaches long-term success of a system. as more and more sensor information use forms-based dialogues to let us- There has been considerable re- becomes available, for example, about ers choose from predefined options search on manipulation resistance, the user’s location and local time. or use natural language processing where resistance is defined as attackers From a research perspective, con- techniques to cope with free-text or having only a limited ability to change text is a multifaceted concept that has oral user input. Recent alternative ap- the rating predictions that are made. been studied in various research dis- proaches also include more emotional Most of it identifies archetypal attack ciplines. Over the last 10 years signifi- ways of expressing preferences, for ex- strategies and proposes ways to detect cant progress has been made also in ample, based on additional sensors to and counteract them. For example, one the field of context-aware recommend- determine the user’s emotional state “shilling” or “profile injection” attack ers and the first comparative evalua- or by supporting alternative ways of creates profiles for fake users, with rat- tions and benchmark datasets were user input such as selecting from pic- ings for many items close to the overall published.31 Nonetheless, much more tures.30 Furthermore, the integration average for all users. Then, these fake work is required to fully understand of better recommendation functional- users give top (or bottom) ratings to this multifaceted concept and to go ity in voice-controlled virtual assistants the items that are being manipulated.22 beyond what is called the representa- like Apple’s Siri represents another This line of research has identified al- tional approach with its predefined promising path to explore by the RS re- gorithms that are more or less resistant and fixed set of observable attributes. search community. to particular attack strategies.29 Interacting with users. Coming back One key insight in conversational In recent research, the textual re- to our restaurant recommender, let systems is that users may not initially views provided by users on platforms us assume we have extended its capa- understand the space of available like TripAdvisor are used instead or in bilities and it now considers the user’s items, and so do not have well-formed combination with numerical ratings time and geographical location when preferences that can be expressed in to understand long-term user prefer- making recommendations. But what terms of attributes of items. An inter- ences. These textual reviews do not happens if the user—in contrast to her active or visualization-based recom- only carry more detailed information past preferences—is in the mood to try mender can help users explore the than the ratings, they can also be au- out something different, for example, item space and incrementally reveal tomatically analyzed to detect fake en- a vegan restaurant. How would she tell their preferences to the system. For tries.20 Research suggests that in some the system? And if she did not like it example, critiquing-based interfaces domains the fraction of manipulated afterward, how would she inform the invite users to say “Show me more like entries can be significant. system not to recommend vegan res- restaurant A, but cheaper.” Although Generally, to resist manipulation, taurants again in the future? these approaches attracted consider- algorithms take some countermeasure In many application domains, able interest in research, they are not that discards or reduces the influence short-term preferences must be elic- yet mainstream in practice.27 given to ratings or reviews that are sus- ited and recommending cannot be a Overall, with interactive systems, pected of not being trustworthy. How- one-shot process of determining and the design challenge is no longer sim- ever, this has the effect of throwing presenting a ranked list of items. In- ply one of choosing items to recom- away some good information. There is stead, various forms of user interac- mend but also to choose a sequence a lower bound on the good information tions might be required or useful to put of conversational moves as proposed that must be discarded in any attempt the user in control. Examples of typi- by Mahmood et al.24 who developed an to prevent attacks by statistical means cal interaction patterns are interactive adaptive conversational recommenda- of noticing anomalous patterns.34 No preference elicitation and refinement tion system for the tourism domain. easy solution to this problem seems to procedures, the presentation of expla- Manipulation resistance. Moving on exist, unless attackers can be prevent- nations and persuasive arguments, from the specific problems of how the ed from injecting fake profiles. or the provision of mechanisms that preferences are acquired and which Trust and loyalty. Manipulation re- help users explore the space of avail- algorithms are used in our restaurant sistance is not the only requirement for able options. The design of the user recommender, the question could building a trustworthy system. Let us experience and the provided means of arise of whether we can trust that the return to our restaurant recommender interacting with the system can be a ratings of the community are honest and assume that our user has eventu- key quality factor for the success of the and fair. Interested parties might ma- ally decided to try out one of our rec- recommendation service. nipulate the output of a recommender ommendations. Thus, from a provider In the research literature, conver- to their advantage, for example, by cre- perspective, we were successful in driv-

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 99 review articles

ing the user’s short-term behavior. But ated by complex ML models. Another ally looking for new items to discover what if the user is dissatisfied after- is how to leverage additional informa- or are they seeking “more of the same” ward with her choice and in particular tion such as the browsing history or the for comparison purposes—this ques- feels that our recommendations were user’s social graph to make recommen- tion is seldom asked. biased and not objective? dations look more plausible or familiar Due to their high practical relevance, As a result, she might not trust the to the user.32 RS are naturally a field of research in service in the future and even the most disciplines other than computer sci- relevant recommendations might be From Algorithms to Systems ence (CS), including information sys- ignored. In the worst case, she will even Our brief survey on the history of the tems (IS), e-commerce, consumer re- distrust the competence and integrity field indicates that recommender sys- search, or marketing. Research work of the service provider.6 An important tems have arrived at the Main Street like Xiao and Benbasat39 that develop quality factor of a recommendation with broad industry interest and an a comprehensive conceptual model system is that it is capable of building active research community. Further- of the characteristics, use, and impact long-term loyalty through repeated more, we have seen the recommender of e-commerce “recommendation positive experiences. systems community address a variety agents” are largely unnoticed in the CS In e-commerce settings, users can of topics beyond rating prediction and literature. In their work, the authors rightly assume that economic consider- item ranking, for example, concern- develop 28 propositions that center ations might influence what is placed ing the system’s user interface or long- around two practically relevant ques- in the recommendation lists and can term effects. tions in e-commerce settings: How can be worried that what is being proposed Beyond the computer science per- RS help to improve the user’s decision is not truly optimal for them but for the spective. Many of the proposals dis- process and quality? and Which factors seller. Transparency is therefore an im- cussed earlier focus on algorithmic influence the user’s adoption of and portant factor that has been shown to aspects, for example, how to combine trust toward the system? The process positively influence the user’s trust in context information with matrix com- of actually generating the recommen- a system: What data does the RS con- pletion approaches, how to find the dations—which is the focus in the CS sider? How does the data lead to recom- most “informative” items that users field—is certainly important, but only mendations? Explanations put the fo- should be asked to rate, or how to de- one of several factors that contribute to cus on providing additional information sign algorithms that balance diversity the success of an RS. in order to answer these questions and and accuracy in an optimal way. As sus- Research questions in the context of justify the proposed recommendations. pected in Wagstaff38 for the ML com- a RS should therefore be viewed from In the research literature, a number munity, the RS research community, a more comprehensive perspective as of explanation strategies have been ex- to some extent, still seems too focused sketched in Figure 1. Whenever new plored over the past 10 years. Many of on benchmark datasets and abstract technological proposals are made, we them are based on “white-box” strate- performance metrics. Whether or not should ask which specific need or re- gies that expose how a system derived the reported improvements actually quirement in a given domain are ad- the recommendations.11 However, many matter in the real world for a certain dressed. Making better buying deci- challenges remain open. One is how to application domain, and the needs of sions can be one need from the user’s explain recommendations that are cre- the users—are they, for instance, actu- perspective; guiding customers to oth- er parts of the product spectrum can Figure 1. A more comprehensive view on the recommendation problem. be a desired effect from the provider’s side. Correspondingly, these goals de- termine the choice of the evaluation User Service Provider/Business measure that is chosen to assess the ef- Needs, e.g., Business Goals, fectiveness of the approach. Information Filtering Desired Impact on Users, At the end, the goals of a recom- Item Discovery e.g., Decision Making Customer Loyalty mendation system can be very diverse, Revenue Increase ranging from improved decision mak- Data, e.g., Sales Diversification Preferences ing over item filtering and discovery, Context to increased conversion or user en- Demographics Long-term Profile gagement on the platform. Abstract, Recommender System domain-independent accuracy mea- Design Decisions Algorithms sures as often used today are typically Interactivity insufficient to assess the true value of Explanations, ... a new technique.15 Data, e.g., Focusing on business- and util- Ratings, ity-oriented measures and the con- Item Features, Social Network, ... sideration of novelty, diversity, and Environment serendipity aspects of recommenda- tions—as discussed earlier—are im- portant steps into that direction. In

100 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 review articles

any case, which measures are actually Figure 2. A new characterization of the recommendation problem. chosen for the evaluation, always has to be justified by the specific goals that should be achieved with the sys- tem. Furthermore, in offline experi- Set of actions Acquire preferences mentations, multi-metric evaluation actions Present recommendations Purpose and Goals schemes, application-specific mea- Display explanations User/Provider Adapt recommendation strategy sures, and the consideration of recom- Short-term/Long-term ... mendation biases represent one way of assessing desired and potentially time Recommendation System undesired effects of a RS on its users.19 However, to better understand the effectiveness of a RS and its impact on users, more user-centric and util- ity-oriented research is generally re- quired within the CS community and the algorithmic works should be better explanations when desired, present al- built-in learning capabilities.10,24 connected with the already existing in- ternative or complementary shopping Finally, mobile and wearable de- sights from neighboring fields. proposals and, in general, put the user vices have become the personal digi- Putting the user back in the loop. more into control and allow for new tal assistants of today. With the recent A recommender system is usually one types of interactions. developments in speech recognition, component within an interactive ap- When looking at the recommen- gesture-based interactions, and a mul- plication. The minimal interaction dations provided by Amazon.com on titude of additional sensors of these level provided by such a component is their websites, we can see that various devices, new opportunities arise re- that a list of recommendations is dis- forms of user interaction already exist garding how we interact with recom- played and users can select them for that are underexplored in academia. mender systems. inspection or immediate consump- Amazon.com, for example, provides Toward a more comprehensive tion, for example, on media stream- multiple recommendation lists on its characterization of the recommen- ing platforms. landing page. Amazon's system also dation task. In the research literature, RS have one of their roots in the field supports explanations for the made an often-cited definition of the recom- of human-computer interaction (HCI) recommendations and even lets the mendation problem is to find a func- and the design of the user interface, the user indicate if a past user action ob- tion that outputs a relevance score for choice of the supported forms of inter- served by the system (for example, a each item given information about the activity, or the selection of the content purchase) should no longer be consid- user profile and her contextual situa- to be displayed can all have an impact ered in the recommendation process. tion, “content” information about the on the success of a recommender. However, many questions, such as how items, and information about prefer- However, the amount of research dedi- to design such interactive elements in ence patterns in the user community.1 cated to these questions is comparably the best possible way, how much cogni- Although the development of even low, particularly when compared to the tive load for the user is acceptable, or better techniques for item selection huge amount of research on item-rank- how the system can stimulate or per- and ranking will remain at the core of ing algorithms. suade people to do certain actions are the research problem, the discussions Therefore, our second tenet is that largely unexplored. here indicate this definition seems the CS community should put more Furthermore, in case a system sup- too narrow. To conclude our consider- effort on the HCI perspective of RS, ports various forms of interactivity and ations related to the HCI perspective as has been advocated earlier, for ex- is at the same time capable of acquir- on recommenders and the more com- ample, in Konstan and Riedl21 and ing additional information from the prehensive consideration of the in- McNee et al.28 Current research largely user, additional algorithmic and com- terplay between users, organizations, relies on explicit ratings and automati- putational challenges arise. An intel- and the recommendation system, we cally observable user actions—often ligent system might, for example, de- propose a new characterization of the called implicit feedback—as preference cide on the next conversational move, recommendation problem (Figure 2). indicators. Many real-world systems or whether to display an explanation A new problem characterization. A allow users to explicitly specify their or not, depending on the current state recommendation problem has the fol- preferences, for example, in terms of of the interaction or the estimated ex- lowing three components: an overall preferred item categories. Recommen- pertise and competence of the user. goal that governs the selection and dation components on websites could Some approaches in that direction ranking of items; a set of available ac- be much more interactive and act, for were proposed in the literature in the tions centered on the presentation of example, in the e-commerce domain past, but they often come at expense of recommended items; and an optimiza- as “virtual advisers”39 and social actors considerable ramp-up costs in terms of tion timeframe: that ask questions, adapt their commu- knowledge engineering and they might ˲˲ The overall goal constitutes the nication to the current user, provide appear to be quite static if they have no operationalized measure or a set of

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 101 review articles

measures that should be optimized by search. For example, the huge amounts CHI ‘95 (1995), 194–201. 19. Jannach, D., Lerche, L., Kamehkhosh, I. and an appropriate selection and ranking of user data and preference signals that Jugovac, M. What recommenders recommend: An of items from a (large) set. Optimizing become available through the Social analysis of recommendation biases and possible countermeasures. User Modeling and User-Adapted a specific rank measure can be such Web and the Internet of Things not Interaction (2015), 25:1–65. a goal, but more utility-oriented goals only leads to technical challenges such 20. Jindal, N. and Liu, B. Opinion spam and analysis. In Proceedings WSDM ‘08, (2008), 219–230. and corresponding measures like user as scalability, but also to societal ques- 21. Konstan, J. and Riedl, J. Recommender systems: From satisfaction, decreased decision efforts, tions concerning user privacy. algorithms to user experience. User Modeling and User-Adapted Interaction 22, 1-2 (2012), 101–123. revenues, or loyalty might be equally Based on our reflections on the de- 22. Lam, S.K. and Riedl, J. Shilling recommender systems important. Generally, the goals can be velopments in the field, we finally em- for fun and profit. InProceedings of WWW ‘04, (2004), 393–402. derived from the user’s perspective, the phasize the need for a more holistic 23. Linden, G., Smith, B. and York, J. Amazon.com recommendations: Item-to-item collaborative provider’s perspective, or both. research approach that combines the filtering.IEEE Internet Computing 7, 1 (2003), 76–80. ˲˲ Depending on the application do- insights of different disciplines. We 24. Mahmood, T., Ricci, F. and Venturini, A. Improving recommendation effectiveness: Adapting a dialogue main, a set of actions is available for urge that research focuses even more strategy in online travel planning. J. of IT & Tourism the recommendation system to take. on practical problems that matter and 11, 4 (2009), 285–302. 25. Malone, T.W., Grant, K.R., Turbak, F.A., Brobst, S.A. and The central action typically is the selec- are truly suited to increase the utility of Cohen, M.D. Intelligent information-sharing systems. tion and presentation of a set of items. recommendations from the viewpoint Commun. ACM 30, 5 (May 1987), 390–402. 26. Marlin, B.M. and Zemel, R.S. Collaborative prediction Additional possible moves are varying of the users. and ranking with non-random missing data. In its strategy to recommend items, pro- Proceedings RecSys ‘09 (2009), 5–12. 27. McGinty, L. and Reilly, J. On the evolution of critiquing References viding specific explanations or other recommenders. Recommender Systems Handbook, 1. Adomavicius, G. and Tuzhilin, A. Toward the next Springer, 2011, 419–453. communication content, requesting generation of recommender systems: A survey of the 28. McNee, S.M., Riedl, J. and Konstan, J.A. Being accurate state-of-the-art and possible extensions. IEEE Trans. feedback or alternative variants of user is not enough: How accuracy metrics have hurt Knowledge and Data Engineering 17, 6 (2005), 734–749. recommender systems. In Proceedings CHI ‘06, (2006), input. These conversational moves are 2. Adomavicius, G. and Tuzhilin, A. Context-aware 1097–1101. recommender systems. Recommender Systems building blocks for goal achievement. 29. Mobasher, B., Burke, R., Bhaumik, R. and Williams, Handbook. Springer, 2011, 217–253. C. Toward trustworthy recommender systems: An The selection of the most helpful next 3. Billsus, D. and Pazzani, M.J. Learning collaborative analysis of attack models and algorithm robustness. information filters. InProceedings ICML ‘98 (1998), action and its timing can be the result ACM Trans. Internet Technology 7, 4 (Oct. 2007). 46–54. 30. Neidhardt, J., Seyfang, L., Schuster, R. and Werthner, 4. Breese, J.S., Heckerman, D. and Kadie, C.M. Empirical of a reasoning process itself. H. A picture-based approach to recommender analysis of predictive algorithms for collaborative ˲˲ systems. J. of IT & Tourism 15 (2015), 1–21. The timeframe or optimization filtering. InProceedings UAI ‘98 (1998), 43–52. 31. Panniello, U., Tuzhilin, A. and Gorgoglione, M. 5. Castells, P., Wang, J., Lara, R. and Zhang, D. horizon signifies the time window over Comparing context-aware recommender systems in Introduction to the special issue on diversity and terms of accuracy and diversity. User Modeling and which the goal should be optimized. discovery in recommender systems. ACM Trans. User-Adapted Interaction 24, 1-2 (2014), 35–65. Intell. Syst. Technology 5, 4 (2014), 52:1–52:3. The explicit consideration of the time 32. Papadimitriou, A., Symeonidis, P. and Manolopoulos, 6. Chau, P.Y.K., Ho, S.Y., Ho, K.K.W. and Yao, Y. Examining Y. A generalized taxonomy of explanations styles for dimension allows us to differentiate the effects of malfunctioning personalized services on traditional and social recommender systems. Data online users’ distrust and behaviors. Decision Support between single one-shot interactions Min. Knowl. Discovery 24, 3 (2012), 555–583. Syst. 56 (2013), 180–191. 33. Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P. and longer time spans that can be 7. Cremonesi, P., Garzotto, F. and Turrin, R. Investigating and Riedl, J. Grouplens: An open architecture for the persuasion potential of recommender systems more relevant to businesses and users. collaborative filtering of netnews. InProceedings of from a quality perspective: An empirical study. ACM CSCW ‘94 (1994), 175–186. The recommendation problem Trans. Interact. Intell. Syst. 2, 1 (2012), 11:1–11:41. 34. Resnick, P. and Sami, R. The information cost of 8. Denning, P.J. ACM president’s letter: Electronic junk. finally can be defined as: Find a se- manipulation-resistance in recommender systems. In Commun. ACM 25, 3 (Mar. 1982), 163–165. Proceedings RecSys ‘08 (2008), 147–154. quence of conversational actions and 9. Dias, M.B., Locher, D., Li, M., El-Deredy, W. and Lisboa, 35. Schafer, J.B., Konstan, J. and Riedl, J. Recommender P.J. The value of personalised recommender systems item recommendations for each par- systems in e-commerce. In Proceedings ACM EC ‘99 to e-business: A case study. In Proceedings RecSys’08 (1999), 158–166. ticular user that optimizes the overall (2008), 291–294. 36. Shardanand, U. and Maes, P. Social information 10. Felfernig, A., Friedrich, G., Jannach, D. and Zanker, M. goal over the specified timeframe. filtering: Algorithms for automating “word of mouth.” An integrated environment for the development of In Proceedings CHI ‘95 (1995), 210–217. knowledge-based recommender applications. Int. J. 37. Shimazu, H. Expertclerk: Navigating shoppers’ buying Electron. Commerce 11, 2 (2006), 11-34. Summary process with the combination of asking and proposing. 11. Friedrich, G. and Zanker, M. A taxonomy for generating In Proceedings IJCAI ‘01 (2001), 1443–1448. explanations in recommender systems. AI Magazine Recommender systems have become 38. Wagstaff, K. Machine learning that matters. In 32, 3 (2011), 90–98. Proceedings ICML (2012), 529–536. a natural part of the user experience 12. Garcin, F., Faltings, B. Donatsch, O., Alazzawi, A., 39. Xiao, B. and Benbasat, I. E-commerce product Bruttin, C. and Huber, A. Offline and online evaluation in today’s online world. These systems recommendation agents: Use, characteristics, and of news recommender systems at swissinfo.ch. In impact. MIS Q. 31, 1 (Mar. 2007), 137–209. are able to deliver value both for users Proceedings RecSys ‘14 (2014), 169–176. and providers and are one prominent 13. Ghose, A., Ipeirotis, P.G. and Li, B. Designing ranking systems for hotels on travel search engines by mining Dietmar Jannach ([email protected]) is a example where the output of academic user-generated and crowdsourced content. Marketing Chaired Professor of Computer Science at TU Dortmund, research has a direct impact on the ad- Science 31, 3 (2012), 493–520. Germany. 14. Goldberg, D., Nichols, D., Oki, B. and Terry, D. Using vancements in industry. collaborative filtering to weave an information Paul Resnick ([email protected]) is the Michael D. Cohen In this article, we have briefly re- tapestry. Commun. ACM (1992), 61–70. Collegiate Professor of Information at the University of 15. Gomez-Uribe, C.A. and Hunt, N. The Netflix Michigan School of Information, Ann Arbor, MI. viewed the history of this multidis- Recommender System: Algorithms, business value, Alexander Tuzhilin ([email protected]) is the and innovation. ACM Trans. Manage. Inf. Syst. 6, 4 ciplinary field and looked at recent Leonard N. Stern Professor of Business in the Stern (2015), 13:1–13:19. School of Business, New York University, NY. efforts in the research community 16. Gorgoglione, M., Panniello, U. and Tuzhilin, A. to consider the variety of factors that The effect of context-aware recommendations Markus Zanker ([email protected]) is an associate on customer purchasing behavior and trust. In professor of computer science at Free University may influence the long-term success Proceedings RecSys ‘11 (2011), 85–92. of Bozen-Bolzano, Italy. 17. Hensley, C.B. Selective dissemination of information of a recommender system. The list of (SDI): State of the art in May, 1963. In Proceedings of open issues and success factors is still AFIPS ‘63 (Spring), 1963, 257–262. 18. Hill, W., Stead, L., Rosenstein, M. and Furnas, far from complete and new challenges G. Recommending and evaluating choices in a Copyright held by authors. arise constantly that require further re- virtual community of use. In Proceedings Publication rights licensed to ACM. $15.00.

102 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 research highlights

P. 104 P. 105 Technical Perspective DianNao Family: If I Could Only Energy-Efficient Hardware Design One Circuit … By Kurt Keutzer Accelerators for Machine Learning By Yunji Chen, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam

P. 113 P. 114 Technical Perspective A Reconfigurable Fabric FPGA Compute for Accelerating Large-Scale Acceleration Is First About Datacenter Services Energy Efficiency By Andrew Putnam, Adrian M. Caulfield, Eric S. Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, By James C. Hoe Jeremy Fowers, Gopi Prashanth Gopal, Jan Gray, Michael Haselman, Scott Hauck, Stephen Heil, Amir Hormati, Joo-Young Kim, Sitaram Lanka, James Larus, Eric Peterson, Simon Pope, Aaron Smith, Jason Thong, Phillip Yi Xiao, and Doug Burger

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 103 research highlights

DOI:10.1145/2996862 To view the accompanying paper, Technical Perspective visit doi.acm.org/10.1145/2996864 rh If I Could Only Design One Circuit … By Kurt Keutzer

NEURAL-INSPIRED COMPUTING MOD- Like many other machine learning a processor architecture for a neural ELS have captured our imagination approaches, neural net development net accelerator and puts a particularly from the very beginning of computer has two phases. The training phase is strong focus on efficiently supporting science; however, victories of this ap- essentially an optimization problem the memory access patterns of neu- proach were modest until 2012 when in which parameter weights of neural ral net computations. This includes AlexNet, a “deep” neural net of eight net models are adjusted to minimize minimizing both on-chip and off-chip layers, achieved a dramatic improve- the error of the neural net on its train- memory transfers. Other members of ment on the image classification prob- ing set. This is followed by the imple- the DianNao family include DaDian- lem. One key to AlexNet’s success was mentation or inference phase, in which Nao, ShiDianNao, and PuDianNao. its use of the increased computational the resulting neural net is deployed in DaDianNao (big computer) focuses power offered by graphics processing its target application, such as a speech on the challenges of efficiently com- units (GPUs), and it’s natural to ask: recognizer in a cellphone. puting neural nets with one billion or Just how far can we push the efficient Training neural nets is a highly more model parameters. ShiDianNao computing of neural nets? distributed optimization problem in (vision computer) is further special- Computing capability has advanced which interprocessor-communication ized to reduce memory access require- with Moore’s Law over these last three costs quickly dominate local computa- ments of Convolutional Neural Nets, decades, but integrated circuit design tional costs. On the other hand, the im- a neural net family that is used for costs have grown nearly as fast. Thus, plementation of neural nets in embed- computer vision problems. While the any discussion of novel circuit archi- ded applications, such as cellphones, number of problems solved by neural tectures must be met with a sobering calls out for a special-purpose, energy- nets grows every week, some might discussion of design costs. That said, efficient accelerator. Thus, if I could wonder: Is this a fundamental change a neural net accelerator has two big cajole my circuit designer colleagues in the field, or will the pendulum things going for it. First, it is a special- into designing only one circuit, it swing back to favor a broader range of purpose accelerator. Since the end of would surely be a special-purpose, machine learning approaches? With single-thread performance scaling due energy-efficient accelerator that is flex- the PuDianNao (general computer) to power density issues, integrated cir- ible enough to provide efficientimple - architecture, the architects hedge cuit architects have searched for clever mentations of the growing family of their bets on this question by provid- ways to exploit the increasing transis- neural net models. This is the goal of ing an accelerator for more traditional tor counts afforded by Moore’s Law DianNao (diàn naˇo, Chinese for com- machine learning algorithms. without increasing power dissipation. puter, or, literally “electric brain”). Despite, or perhaps because of, Di- This has led to a resurgence of special- The DianNao accelerator family anNao’s two Best Paper Awards, some purpose accelerators that are able to comprehensively considers the prob- readers may think that building a neu- provide 10–100x better energy efficien- lem of designing a neural net accel- ral network accelerator is just an aca- cy than general-purpose processors erator, and the following paper shows demic enterprise. These doubts should when accelerating their special func- a deep understanding of both neural be allayed by Google’s announcement tions, and which consume practically net implementations and the issues in of the Tensor Processing Unit, a novel no power when not in use. computer architecture that arise when neural network accelerator deployed Second, a neural net accelerator can building an accelerator for them. Neu- in their datacenters. These processors accelerate a broad range of applica- ral net models are evolving rapidly, and were recently used to help AlphaGo tions. Deep neural nets have begun to a significant new neural network model win at Go. It may be quite some time realize the promise that has intrigued is proposed every month, if not every before we learn of TPU’s architecture, so many for so long: a single, neuron- week. Thus, a computer architect build- but details on the DianNao family are inspired computational model that ing an accelerator for neural nets must only a page away. offers superior results on a diverse va- be familiar with their variety. A special- riety of problems. In particular, mod- izer-architecture that isn’t sufficiently Kurt Keutzer ([email protected]) is a professor of electrical engineering and computer science at the ern deep neural net models are win- flexible to accommodate a broad range University of California at Berkeley. ning competitions in computer vision, of neural net models is certain to be- speech recognition, and text analytics. come quickly outdated, wasting the ex- Without exaggeration, the list of victo- tensive chip design effort. ries achieved through the use of deep The DianNao family also engages neural nets grows every week. the issues associated with building Copyright held by author.

104 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 DOI:10.1145/2996864 DianNao Family: Energy-Efficient Hardware Accelerators for Machine Learning By Yunji Chen, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam

Abstract aforementioned trends have already been identified by Machine Learning (ML) tasks are becoming pervasive in a researchers who have proposed accelerators implement- broad range of applications, and in a broad range of systems ing, for example, Convolutional Neural Networks (CNNs)2 (from embedded systems to data centers). As computer archi- or Multi-Layer Perceptrons43; accelerators focusing on other tectures evolve toward heterogeneous multi-cores composed domains, such as image processing, also propose efficient of a mix of cores and hardware accelerators, designing hard- implementations of some of the computational primitives ware accelerators for ML techniques can simultaneously used by machine-learning techniques, such as convolutions.37 achieve high efficiency and broad application scope. There are also ASIC implementations of ML techniques, such While efficient computational primitives are important as Support Vector Machine and CNNs. However, all these for a hardware accelerator, inefficient memory transfers can works have first, and successfully, focused on efficiently potentially void the throughput, energy, or cost advantages implementing the computational primitives but they either of accelerators, that is, an Amdahl’s law effect, and thus, voluntarily ignore memory transfers for the sake of simplic- they should become a first-order concern, just like in proces- ity,37, 43 or they directly plug their computational accelerator sors, rather than an element factored in accelerator design on to memory via a more or less sophisticated DMA.2, 12, 19 a second step. In this article, we introduce a series of hard- While efficient implementation of computational primi- ware accelerators (i.e., the DianNao family) designed for ML tives is a first and important step with promising results, (especially neural networks), with a special emphasis on the inefficient memory transfers can potentially void the through- impact of memory on accelerator design, performance, and put, energy, or cost advantages of accelerators, that is, an energy. We show that, on a number of representative neural Amdahl’s law effect, and thus, they should become a first- network layers, it is possible to achieve a speedup of 450.65x order concern, just like in processors, rather than an element over a GPU, and reduce the energy by 150.31x on average factored in accelerator design on a second step. Unlike in for a 64-chip DaDianNao system (a member of the DianNao processors though, one can factor in the specific nature of family). memory transfers in target algorithms, just like it is done for accelerating computations. This is especially important in the domain of ML where there is a clear trend towards scal- 1. INTRODUCTION ing up the size of learning models in order to achieve better As architectures evolve towards heterogeneous multi-cores accuracy and more functionality.16, 24 composed of a mix of cores and accelerators, designing hard- In this article, we introduce a series of hardware accelera- ware accelerators which realize the best possible tradeoff tors designed for ML (especially neural networks), including between flexibility and efficiency is becoming a prominent DianNao, DaDianNao, ShiDianNao, and PuDianNao as listed issue. The first question is for which category of applications in Table 1. We focus our study on memory usage, and we one should primarily design accelerators? Together with investigate the accelerator architecture to minimize memory the architecture trend towards accelerators, a second simul- transfers and to perform them as efficiently as possible. taneous and significant trend in high-performance and embedded applications is developing: many of the emerg- 2. DIANNAO: A NEURAL NETWORK ing high-performance and embedded applications, from ACCELERATOR image/video/audio recognition to automatic translation, Neural network techniques have been proved in the past few business analytics, and robotics rely on machine learning years to be state-of-the-art across a broad range of applica- techniques. This trend in application comes together with a tions. DianNao is the first member of the DianNao accel- third trend in machine learning (ML) where a small number erator family, which accommodates state-of-the-art neural of techniques, based on neural networks (especially deep learning techniques16, 26), have been proved in the past few The original version of this paper is entitled “DianNao: A years to be state-of-the-art across a broad range of applica- Small-Footprint, High-Throughput Accelerator for Ubiq- tions. As a result, there is a unique opportunity to design uitous Machine Learning” and was published in Proceed- accelerators having significant application scope as well as ings of the International Conference on Architectural Support high performance and efficiency.4 for Programming Languages and Operating Systems (ASPLOS) Currently, ML workloads are mostly executed on multi- 49, 4 (March 2014), ACM, New York, NY, 269–284. cores using SIMD,44 on GPUs,7 or on FPGAs.2 However, the

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 105 research highlights

Table 1. Accelerators in the DianNao family.

Name Process (nm) Peak performance (GOP/s) Peak power (W) Area (mm2) Applications DianNao 65 452 0.485 3.02 Neural networks DaDianNao 28 5585 15.97 67.73 Neural networks ShiDianNao 65 194 0.32 4.86 Convolutional neural networks PuDianNao 65 1056 0.596 3.51 Seven representative machine learning techniques

a network techniques (e.g., deep learning ), and inherits the Figure 1. Accelerator architecture of DianNao. broad application scope of neural networks.

Control Processor (CP) 2.1. Architecture Inst. DianNao has the following components: an input buffer DM A Instructions

for input neurons (NBin), an output buffer for output neu- Tn rons (NBout), and a third buffer for synaptic weights (SB), NFU-1 NFU-2 NFU-3

connected to a computational block (performing both syn- DM A NBin apses and neurons computations) which we call the Neural Inst. Memor Functional Unit (NFU), and the control logic (CP), see Figure 1. Inst. DM A Neural Functional Unit (NFU). The NFU implements a Tn y Interfac e

functional block of Ti inputs/synapses and Tn output neurons, T

which can be time-shared by different algorithmic blocks of nx Tn NBout neurons. Depending on the layer type, computations at the NFU can be decomposed in either two or three stages. For classifier and convolutional layers: multiplication of syn- apses × inputs, additions of all multiplications, sigmoid. The SB nature of the last stage (sigmoid or another nonlinear func- tion) can vary. For pooling layers, there is no multiplication (no synapse), and the pooling operations can be average or max. Note that the adders have multiple inputs, they are in splitting structures is to tailor the SRAMs to the appropriate fact adder trees, see Figure 1; the second stage also contains read/write width, and the second benefit of splitting storage shifters and max operators for pooling layers. In the NFU, structures is to avoid conflicts, as would occur in a cache. the sigmoid function (for classifier and convolutional layers) Moreover, we implement three DMAs to exploit spatial locality can be efficiently implemented using piecewise linear inter- of data, one for each buffer (two load DMAs for inputs, one

polation ( f(x) = ai × x + bi, x ∈ [xi; xi+1]) with negligible loss of store DMA for outputs). accuracy (16 segments are sufficient).22 On-chip Storage. The on-chip storage structures of DianNao 2.2. Loop tiling can be construed as modified buffers of scratchpads. While a DianNao leverages loop tiling to minimize memory accesses, cache is an excellent storage structure for a general-purpose and thus efficiently accommodates large neural networks. processor, it is a sub-optimal way to exploit reuse because of For the sake of brevity, here we only discuss a classifier layerb

the cache access overhead (tag check, associativity, line size, that has Nn output neurons, fully connected to Ni inputs. We speculative read, etc.) and cache conflicts. The efficient alter- present in Figure 2 the original code of the classifier, as well native, scratchpad, is used in VLIW processors but it is known as the tiled code that maps the classifier layer to DianNao. to be very difficult to compile for. However a scratchpad in a In the tiled code, the loops ii and nn reflects the aforemen-

dedicated accelerator realizes the best of both worlds: efficient tioned fact that the NFU is a functional block of Ti inputs/

storage, and both efficient and easy exploitation of locality synapses and Tn output neurons. On the other hand, input neu- because only a few algorithms have to be manually adapted. rons are reused for each output neuron, but since the num- We split on-chip storage into three structures (NBin, NBout, ber of input neurons can range anywhere between a few tens and SB), because there are three type of data (input neurons, to hundreds of thousands, they will often not fit in the NBin output neurons and synapses) with different characteris- size of DianNao. Therefore, we further tile loop ii (input neu-

tics (e.g., read width and reuse distance). The first benefit of rons) with tile factor Tii. A typical tradeoff of tiling is that improving one reference (here neuron[i] for input neurons) increases the reuse distance of another reference (sum[n] a According to a recent review25 written by LeCun, Bengio, and Hinton, Deep for partial sums of output neurons), so we need to tile for learning allows computational models that are composed of multiple processing the second reference as well, hence loop nnn and the tile layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. b Readers may refer to Ref.5 for details of other layer types.

106 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11

Figure 2. Pseudo-code for a classifier layer (top: original code; bottom: tiled code).

for (int n=0; n

for (int nnn=0; nnn

factor Tnn for output neurons partial sums. The layer memory Figure 3. Snapshot of DianNao’s layout. behavior is now dominated by synapses. In a classifier layer, all synapses are usually unique, and thus there is no reuse within the layer. Overall, tiling drastically reduces the total memory bandwidth requirement of the classifier layer, and we observe a ∼50% reduction in our empirical study.5

2.3. Experimental observations We implemented a custom cycle-accurate, bit-accurate C++ simulator of the accelerator. This simulator is also used to measure time in number of cycles. It is plugged to a main memory model allowing a bandwidth of up to 250 GB/s. We also implemented a Verilog version of the accelerator, which uses Tn = Ti = 16 (16 hardware neurons with 16 syn- We employed several representative layer settings as bench- apses each), so that the design contains 256 16-bit trun- marks of our experiments, see Table 2. We report in Figure 4 cated multipliers (for classifier and convolutional layers), the speedups of DianNao over SIMD and GPU. We observe 16 adder trees of 15 adders each (for the same layers, plus that DianNao significantly outperforms SIMD, and the aver- pooling layer if average is used), as well as a 16-input shifter age speedup is 117.87x. The main reasons are twofold. First, and max (for pooling layers), and 16 16-bit truncated mul- DianNao performs 496 16-bit operations every cycle for both tipliers plus 16 adders (for classifier and convolutional lay- classifier and convolutional layers, that is, 62x more ers, and optionally for pooling layers). For classifier and than the peak performance of the SIMD baseline. Second, convolutional layers, multipliers and adder trees are active compared with the SIMD baseline without prefetcher, DianNao every cycle, achieving 256 + 16 × 15 = 496 fixed-point opera- has better latency tolerance due to an appropriate combina- tions every cycle; at 0.98 GHz, this amounts to 452 GOP/s tion of preloading and reuse in NBin and SB buffers. (Giga fixed-point OPerations per second). We have done In Figure 5, we provide the energy reductions of DianNao the synthesis and layout of the accelerator at 65 nm using over SIMD and GPU. We observe that DianNao consumes 21.08x Synopsys tools, see Figure 3. less energy than SIMD on average. This number is actually We implemented an SIMD baseline using the GEM5+McPAT27 more than an order of magnitude smaller than previously combination. We use a 4-issue superscalar x86 core with a reported energy ratios between processors and accelerators; 128-bit (8 × 16-bit) SIMD unit (SSE/SSE2), clocked at 2 GHz. for instance Hameed et al.15 report an energy ratio of about Following a default setting of GEM5, the core has a 192-entry 500x, and 974x has been reported for a small Multi-Layer reorder buffer and a 64-entry load/store queue. The L1 data Perceptron.43 The smaller ratio is largely due to the energy (and instruction) cache is 32 KB and the L2 cache is 2 MB; spent in memory accesses, which was voluntarily not factored both caches are 8-way associative and use a 64-byte line; in the two aforementioned studies. Like in these two accel- these cache characteristics correspond to those of the Intel erators and others, the energy cost of computations has been Core i7. In addition, we also employ NVIDIA K20M (28 nm considerably reduced by a combination of more efficient com- process, 5 GB GDDR5, 3.52 TFlops peak) the GPU baseline. putational operators (especially a massive number of small

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 107 research highlights

Table 2. Benchmark layers for DianNao.

Layer Nx Ny Kx Ky Ni No Description CONV1 500 375 9 9 32 48 Street scene parsing (CNN)12 (e.g., identifying “building,” “vehicle,” etc.) POOL1 492 367 2 2 12 – CLASS1 – – – – 960 20 CONV2* 200 200 18 18 8 8 Detection of faces in YouTube videos (DNN),24 largest NN to date (Google) CONV3 32 32 4 4 108 200 Traffic sign identification for car navigation (CNN)39 POOL3 32 32 4 4 100 – CLASS3 – – – – 200 100 CONV4 32 32 7 7 16 512 Google Street View house numbers (CNN)38 CONV5* 256 256 11 11 256 384 Multi-Object recognition in natural images (DNN),16 winner 2012 ImageNet POOL5 256 256 2 2 256 – competition CONV, convolutional; POOL, pooling; CLASS, classifier.

Nx × Ny is the size of an input feature map, Kx × Ky is the size of a convolutional/pooling window, and Ni and No are numbers of input and output feature maps, respectively. *Indicates private kernels.

Figure 4. Speedups of DianNao over SIMD. energy cost of main memory accesses. We tried to artifi- cially set the energy cost of the main memory accesses in Speedup both the SIMD and accelerator to 0, and we observed that 10000 the average energy reduction of the accelerator increases DianNao over SIMD by more than one order of magnitude, in line with previous results. 100 We have also explored different parameter settings for DianNao in our experimental study, where we altered the size of NFU as well as sizes of NBin, NBout, and SB. For example, 1 we evaluated a design with Tn = 8 (i.e., the NFU only has 8 hard- ware neurons), and thus 64 multipliers in NFU-1. We corre- spondingly reduced widths of all buffers to fit the NFU. As a 0.01 result, the total area of that design is 0.85 mm2, which is 3.59x

SS1 SS3 A A smaller than for the case of Tn = 16. CONV1 CONV2 CONV3 CONV4 CONV5 POOL1 POOL3 POOL5 CL CL GeoMean 3. DADIANNAO: A MACHINE LEARNING SUPERCOMPUTER In the ML community, there is a significant trend towards Figure 5. Energy reductions of DianNao over SIMD. increasingly large neural networks. The recent work 100 of Krizhevsky et al.20 achieved promising accuracy on the Energy reduction ImageNet database8 with “only” 60 million parameters.

DianNao over SIMD There are recent examples of a 1-billion parameter neural network.24 Although DianNao can execute neural networks at different scales, it has to store the values of neurons and synapses in main memory when accommodating large neu- 0 ral networks. Frequent main memory accesses greatly limit the performance and energy-efficiency of DianNao. While 1 billion parameters or more may come across as a large number from a ML perspective, it is important to real- ize that, in fact, it is not from a hardware perspective: if each parameter requires 64 bits, that only corresponds to 8 GB 1 (and there are clear indications that fewer bits are sufficient). SS1 SS3 A A While 8 GB is still too large for a single chip, it is possible to CONV1 CONV2 CONV3 CONV4 CONV5 POOL1 POOL3 POOL5 CL CL GeoMean imagine a dedicated ML computer composed of multiple chips, each chip containing specialized logic together with enough RAM that the sum of the RAM of all chips can contain 16-bit fixed-point truncated multipliers in our case), and the whole neural network, requiring no main memory. small custom storage located close to the operators (64-entry In large neural networks, the fundamental issue is the NBin, NBout, SB, and the NFU-2 registers). As a result, there memory storage (for reuse) or bandwidth requirements (for is now an Amdahl’s law effect for energy, where any further fetching) of the synapses of two types of layers: convolu- improvement can only be achieved by bringing down the tional layers with private kernels, and classifier layers (which

108 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11

are usually fully connected, and thus have lots of synapses). to the actual number of neurons found in large layers. As a In DaDianNao, we tackle this issue by adopting the follow- result, for each set of input neurons broadcasted to all tiles, ing design principles: (1) we create an architecture where multiple different output neurons are being computed synapses are always stored close to the neurons which will on the same hardware neuron. The intermediate values of use them, minimizing data movement, saving both time these neurons are saved back locally in the tile RAM. When and energy; the architecture is fully distributed, there is the computation of an output neuron is finished (all input no main memory (Swanson et al. adopted a similar strategy neurons have been factored in), the value is sent through the dataflow computing41); (2) we create an asymmetric architec- fat tree to the center of the chip to the corresponding (out- ture where each node footprint is massively biased towards put neurons) central RAM bank. storage rather than computations; (3) we transfer neurons Storage. In some of the largest known neural networks, the values rather than synapses values because the former are storage size required by a layer typically range from less than orders of magnitude fewer than the latter in the aforemen- 1 MB to about 1 GB, most of them ranging in the tens of MB. tioned layers, requiring comparatively little external (across While SRAMs are appropriate for caching purposes, they are not chips) bandwidth; (4) we enable high internal bandwidth by dense enough for such large-scale storage. However, eDRAMs breaking down the local storage into many tiles. are known to have a higher storage density. For instance, a The general architecture of DaDianNao is a set of nodes, 10 MB SRAM memory requires 20.73 mm2 at 28 nm,30 while an one per chip, all identical, arranged in a classic mesh topol- eDRAM memory of the same size and at the same technology ogy. Each node contains significant storage, especially for node requires 7.27 mm2,45 that is, a 2.85x higher storage den- synapses, and neural computational units (the classic pipe- sity. In each DaDianNao node, we have implemented 16 tiles, line of multipliers, adder trees and nonlinear functions all using eDRAMs as their on-chip storage. Each tile has four implemented via linear interpolation), which we still call eDRAM banks (see Figure 6), each bank contains 1024 rows of NFU for the sake of consistency with the DianNao accelera- 4096 bits, thus the total eDRAM capacity in one tile is 4 × 1024 tor.5 Here we briefly introduce some key characteristics of the × 4096 = 2 MB. The central eDRAM in each node (see Figure 6) node architecture. has a size of 4 MB. Therefore, the total node eDRAM capacity is Tile-based organization. Putting all functional units (e.g., thus 16 × 2 + 4 = 36 MB. adders and multipliers) together in a single computational Interconnect. Because neurons are the only values trans- block (NFU) is an acceptable design choice when the NFU has ferred, and because these values are heavily reused within a moderate area, which is the case of DianNao. However, if we each node, the amount of communications, while significant, significantly scale up the NFU, the data movement between is usually not a bottleneck. As a result, we did not develop a NFU and on-chip storage will require a very high internal custom high-speed interconnect for our purpose, we turned bandwidth (even if we split the on-chip storage), resulting in to commercially available high-performance interfaces, and unacceptably large wiring overheads.6 To address this issue, we used a HyperTransport (HT) 2.0 IP block. The multinode we adopt a tile-based organization in each node, see Figure 6. DaDianNao system uses a simple 2D mesh topology, thus Each tile contains an NFU and four RAM banks to store syn- each chip must connect to four neighbors via four HT2.0 apses between neurons. IP blocks. When accommodating a neural network layer, the output We implemented a custom cycle-accurate bit-accurate C++ neurons are spread out in the different tiles, so that each simulator for the performance evaluation of the DaDianNao NFU can simultaneously process 16 input neurons of 16 out- architecture. We also implemented a Verilog version of put neurons (256 parallel operations). All the tiles are con- the DaDianNao node, and have done the synthesis and lay- nected through a fat tree which serves to broadcast the input out at a 28 nm process (see Figure 7 for the node layout). neurons values to each tile, and to collect the output neurons The node (chip), clocked at 606 MHz, consumes an area of values from each tile. At the center of the chip, there are two 67.73 mm2, and has the peak performance 5585 GOP/s. We special RAM banks, one for input neurons, the other for out- evaluated an architecture with up to 64 chips. On a sample put neurons. It is important to understand that, even with of the largest existing neural network layers, we show that a a large number of tiles and chips, the total number of hard- single DaDianNao node achieves a speedup of 21.38x over ware output neurons of all NFUs, can still be small compared the NVIDIA K20M GPU and reduces energy by 330.56x on

Figure 6. DaDianNao architecture: tile-based organization of a node Figure 7. Snapshot of DaDianNao’s node layout. (left) and tile architecture (right). HT0 PHY HT0 Controller Data Tile0Tile1 Tile4Tile5 to SB

HT2.0 (North Link) HT3 PHY SB SB

HT2.0 (West Link) Tile2Tile3 Tile6Tile7

tile tile tile tile HT2.0 (East Link ) eDRAM eDRAM

Bank0 Bank1 HT2 HT3 tile tile tile tile Controller Central Block Controller eDRAM NFU router 16 16 Tile8Tile9 Tile12 Tile13 tile tile tile tile HT2 PH Y input output SB SB neurons neurons Tile10 Tile11 HT1 Tile14 Tile15 tile tile tile tile eDRAM eDRAM Controller Bank2 Bank3 HT2.0 (South Link) HT1 PHY

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 109 research highlights

average; a 64-node system achieves a speedup of 450.65x classification of linearly separable data, complex neu- over the NVIDIA K20M GPU and reduces energy by 150.31x ral networks can easily become over-fitting, and perform on average.6 worse than a linear classifier. In application domains such as financial quantitative trading, linear regression is more 4. SHIDIANNAO: A LOW-POWER ACCELERATOR FOR widely used than neural network due to the simplicity and CONVOLUTIONAL NEURAL NETWORK interpretability of linear model.3 The famous No-Free-Lunch DaDianNao targets at high-performance ML applications, theorem, though was developed under certain theoretical and integrates eDRAMs in each node to avoid main mem- assumptions, is a good summary of the above situation: any ory accesses. In fact, the same principle is also applicable to learning technique cannot perform universally better than embedded systems, where energy consumption is a critical another learning technique.46 In this case, it is a natural idea dimension that must be taken into account. In a recent study, to further extend DianNao/DaDianNao to support a basket we focused on image applications in embedded systems, and of diverse ML techniques, and the extended accelerator will designed a dedicated accelerator (ShiDianNao9) for a state-­ have much broader application scope than its ancestors do. of-the-art deep learning technique called CNN.26 PuDianNao is a hardware accelerator accommodating In a broad class of CNNs, it is assumed that each neuron seven representative ML techniques, that is, k-means, k-NN, (of a feature map) shares its weights with other neurons, naive bayes, support vector machine, linear regression, clas- making the total number of weights far smaller than in fully sification tree, and deep neural network. PuDianNao consists connected networks. For instance, a state-of-the-art CNN has of several Functional Units (FUs), three data buffers (HotBuf, 60 millions weights21 versus up to 1 billion24 or even 10 billions ColdBuf, and OutputBuf), an instruction buffer (InstBuf), for state-of-the-art deep networks. This simple property can a control module, and a DMA, see Figure 9. The functional have profound implications for us: we know that the highest unit for machine learning (MLU) is designed to support sev- energy expense is related to memory behaviors, in particular eral basic yet important computational primitives. As illus- main memory (DRAM) accesses, rather than computation.40 trated in Figure 10, the MLU is divided into 6 pipeline stages Due to the small memory footprint of weights in CNNs, it is (Counter, Adder, Multiplier, Adder tree, Acc, and Misc), and possible to store a whole CNN within a small on-chip SRAM different combinations of selected stages collaboratively com- next to the functional units, and as a result, there is no lon- pute primitives that are common in representative ML tech- ger a need for DRAM memory accesses to fetch the CNN niques, such as dot product, distance calculations, counting, model (weights) in order to process each input. sorting, nonlinear functions (e.g., sigmoid and tanh) and so The absence of DRAM accesses combined with a care- on. In addition, there are some less common operations that ful exploitation of the specific data access patterns within CNNs allows us to design the ShiDianNao accelerator which Figure 9. Accelerator architecture of PuDianNao. is 60× more energy efficient than the DianNao accelerator (c.f., Figure 8). We present a full design down to the layout InstBuf HotBuf ColdBuf at a 65 nm process, with an area of 4.86 mm2 and a power of 320 mW, but still over 30x faster than NVIDIA K20M GPU. The detailed ShiDianNao architecture was presented at MLU MLU FUs MLU MLU 42nd ACM/IEEE International Symposium on Computer Control Module Architecture (ISCA’15).9 ALU ALU ALU ALU

5. PUDIANNAO: A POLYVALENT MACHINE LEARNING DMA OutputBuf ACCELERATOR While valid to many different ML tasks, DianNao, DaDianNao, and ShiDianNao can only accommodate neural networks. However, neural networks might not always be the best choice in every application scenario, even if regardless of Figure 10. Implementation of Machine Learning Unit (MLU). their high computational complexity. For example, in the Counter Adder Multiplier Adder tree Acc Misc

Bypass Bypass Figure 8. Snapshot of ShiDianNao’s layout. AC C

< ò NBin AC C k-Sorte r

SB NFU AC C <

NBout IB

110 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11

are not supported by the MLUs (e.g., division and conditional network.13, 29, 42 However, few previous investigations on ML assignment), which will be supported by small Arithmetic accelerators can simultaneously address (a)computational Logic Units (ALUs). primitives and (b)locality properties of (c)diverse representative On the other hand, loop tiling can effectively exploit the machine learning techniques. data locality of ML techniques (as we have revealed in Section 2.2). For tiled versions of different ML techniques, we further 7. CONCLUSION observed that average reuse distances of variables often cluster In this article, we conduct an in-depth discussion on hardware into two or three classes.28 Therefore, we put three separate accelerations of ML. Unlike previous studies that mainly focus on-chip data buffers in the PuDianNao accelerator: HotBuf, on implementing major computational primitives of ML ColdBuf, and OutputBuf. HotBuf stores the input data which techniques, we also optimize memory structures of the accel- have short reuse distance, ColdBuf stores the input data with erators to reduce/remove main memory accesses, which sig- relative longer reuse distance, and OutputBuf stores output nificantly improve the energy-efficiency of ML compared with data or temporary results. systems built with general-purpose CPUs or GPUs. We implemented a cycle-accurate C simulator and a Verilog We devoted DianNao, DaDianNao, and ShiDianNao to neu- version of the accelerator, which integrates 16 MLUs (each ral network (deep learning) techniques, in order to achieve the has 49 adders and 17 multipliers), a 8 KB HotBuf (8 KB), rare combination of efficiency (due to the small number of tar- a 16 KB ColdBuf and a 8 KB OutputBuf. PuDianNao can get techniques) and broad application scope. However, using achieve a peak performance of 16 × (49 + 17) × 1 = 1056 GOP/s large-scale neural networks might not always be a promising at 1 GHz frequency, almost approaching the performance choice due to the well-known No-Free-Lunch Theorem,46 as of a modern GPU. We have done the synthesis and layout well as the distinct requirements in different application sce- of PuDianNao at a 65 nm process using Synopsys tools, narios. Therefore, we also develop the PuDianNao accelerator see Figure 11 for the layout. On 13 critical phases of seven which extends the basic DianNao architecture to support a representative ML techniques,28 the average speedups of basket of seven ML techniques. PuDianNao over Intel Xeon E5-4620 SIMD CPU (32 nm pro- Following the spirit of PuDianNao, we will further study a per- cess) and NVIDIA K20M GPU (28 nm process) are 21.29x and vasive accelerator for ML in our future work, and the purpose is to 1.20x, respectively. PuDianNao is significantly more energy- energy-efficiently accommodate a very broad range of ML tech- efficient than two general-purpose baselines, given that its niques. A comprehensive review on representative ML techniques power dissipation is only 0.596 W. can help to extract common computational primitives and local- ity properties behind the techniques. However, a straightforward 6. RELATED WORK hardware implementation of functional units and memory struc- Due to the end of Dennard scaling and the notion of Dark tures simply matching all extracted algorithmic characteristics Silicon,10, 35 architecture customization is increasingly viewed could be expensive, because it integrates redundant components as one of the most promising paths forward. So far, the empha- to resist significant diversity among ML techniques. This issue sis has been especially on custom accelerators. There have can probably be addressed by a reconfigurable ASIC accelerator, been many successful studies, working on either approxi- which supports dynamic reconfiguration of functional units mation of program functions using ML,11 or acceleration of and memory structures to adapt to diverse techniques. Such an ML itself. Yeh et al. designed a k-NN accelerator on FPGA.47 accelerator only involves a moderate number of coarse-grained Manolakos and Stamoulias designed two high-performance reconfigurable parameters, which is significantly more energy- parallel array architectures for k-NN.33, 40 There have also efficient than Field Programmable Gate Array (FPGA) having been dedicated accelerators for k-means14, 18, 34 or support vec- millions of controlling parameters. tor machine,1, 36 due to their broad applications in industry. Majumdar et al. proposed an accelerator called MAPLE, which Acknowledgments can accelerate matrix/vector and ranking operation used in This work is partially supported by the NSF of China five ML technique families (including neural network, sup- (under Grants 61133004, 61303158, 61432016, 61472396, port vector machine and k-means).31, 32 Recent advances on 61473275, 61522211, 61532016, 61521092), the 973 Program deep learning23 even triggers the rebirth of hardware neural of China (under Grant 2015CB358800), the Strategic Priority Research Program of the CAS (under Grants XDA06010403 and XDB02040009), the International Figure 11. Snapshot of PuDianNao’s layout. Collaboration Key Program of the CAS (under Grant 171111KYS-B20130002), the 10,000 talent program, a Google Faculty Research Award, and the Intel Collaborative Research Institute for Computational Intelligence (ICRI-CI).

References 1. Cadambi, S., Durdanovic, I., Jakkula, V., Machines, 2009. FCCM’09 (2009) Sankaradass, M., Cosatto, E., IEEE, 115–122. Chakradhar, S., Graf, H.P. A massively 2. Chakradhar, S., Sankaradas, M., parallel fpga-based coprocessor Jakkula, V., Cadambi, S. A dynamically for support vector machines. In configurable coprocessor for 17th IEEE Symposium on Field convolutional neural networks. In Programmable Custom Computing International Symposium on Computer

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 111 research highlights

Architecture (Saint Malo, France, June on Computer Architecture (New York, 30. Maeda, N., Komatsu, S., Morimoto, M., 39. Sermanet, P., LeCun, Y. Traffic 2010). ACM 38(3): 247–257. New York, USA, 2010). ACM, 38(3): Shimazaki, Y. A 0.41 µa standby sign recognition with multi-scale 3. Chan, E. Algorithmic Trading: Winning 37–47. leakage 32 kb embedded SRAM with convolutional networks. In Strategies and Their Rationale. John 16. Hinton, G., Srivastava, N. Improving low-voltage resume-standby utilizing International Joint Conference on Wiley & Sons, 2013. neural networks by preventing all digital current comparator in Neural Networks (July 2011). IEEE, 4. Chen, T., Chen, Y., Duranton, M., Guo, Q., co-adaptation of feature detectors. 28 nm hkmg CMOS. In International 2809–2813. Hashmi, A., Lipasti, M., Nere, A., Qiu, S., arXiv preprint arXiv: …, 1–18, 2012. Symposium on VLSI Circuits 40. Stamoulias, I., Manolakos, E.S. Sebag, M., Temam, O. BenchNN: On the 17. Hussain, H.M., Benkrid, K., Seker, H., (VLSIC), 2012. Parallel architectures for the broad potential application scope of Erdogan, A.T. Fpga implementation of 31. Majumdar, A., Cadambi, S., Becchi, M., KNN classifier–design of soft IP hardware neural network accelerators. k-means algorithm for bioinformatics Chakradhar, S.T., Graf, H.P. A cores and FPGA implementations. In International Symposium on application: An accelerated approach massively parallel, energy efficient ACM Transactions on Embedded Workload Characterization, 2012. to clustering microarray data. In 2011 programmable accelerator for Computing Systems (TECS) 13, 2 5. Chen, T., Du, Z., Sun, N., Wang, J., Wu, C., NASA/ESA Conference on Adaptive learning and classification.ACM (2013), 22. Chen, Y., Temam, O. Diannao: A small- Hardware and Systems (AHS) (2011). Trans. Arch. Code Optim. (TACO) 9, 1 41. Swanson, S., Michelson, K., Schwerin, footprint high-throughput accelerator IEEE, 248–255. (2012), 6. A., Oskin, M. Wavescalar. In ACM/ for ubiquitous machine-learning. 18. Keckler, S. Life after Dennard and 32. Majumdar, A., Cadambi, S., IEEE International Symposium on In International Conference on how I learned to love the Picojoule Chakradhar, S.T. An energy-efficient Microarchitecture (MICRO) Architectural Support for Programming (keynote). In International Symposium heterogeneous system for embedded (Dec 2003). IEEE Computer Languages and Operating Systems on Microarchitecture, Keynote learning and classification.Embedded Society, 291. (ASPLOS), (March 2014). ACM 49(4): presentation, Sao Paolo, Dec. 2011. Systems Letters 3, 1 (2011), 42–45. 42. Temam, O. The rebirth of neural 269–284. 19. Kim, J.Y., Kim, M., Lee, S., Oh, J., Kim, 33. Manolakos, E.S., Stamoulias, I. IP- networks. In International 6. Chen, Y., Luo, T., Liu, S., Zhang, S., K., Yoo, H.-J.A. GOPS 496 mW real-time cores design for the KNN classifier. Symposium on Computer He, L., Wang, J., Li, L., Chen, T., Xu, Z., multi-object recognition processor with In Proceedings of 2010 IEEE Architecture, (2010). Sun, N., Temam, O. Dadiannao: A bio-inspired neural perception engine. International Symposium on Circuits 43. Temam, O. A defect-tolerant machine-learning supercomputer. In IEEE Journal of Solid-State Circuits and Systems (ISCAS) (2010). IEEE, accelerator for emerging high- ACM/IEEE International Symposium 45, 1 (Jan. 2010), 32–45. 4133–4136. performance applications. In on Microarchitecture (MICRO) 20. Krizhevsky, A., Sutskever, I., Hinton, G. 34. Maruyama, T. Real-time k-means International Symposium on (December 2014). IEEE Computer Imagenet classification with deep clustering for color images on Computer Architecture (Sep 2012). Society, 609–622. convolutional neural networks. In reconfigurable hardware. In 18th Portland, Oregon, 40(3), 356–367. 7. Coates, A., Huval, B., Wang, T., Wu, D.J., Advances in Neural Information International Conference on 44. Vanhoucke, V., Senior, A., Mao, M.Z. Ng, A.Y. Deep learning with cots HPC Processing Systems (2012), 1–9. Pattern Recognition (ICPR) (Aug Improving the speed of neural systems. In International Conference 21. Krizhevsky, A., Sutskever, I., Hinton, G. 2006). IEEE, Volume 2, 816–819. networks on CPUs. In Deep Learning on Machine Learning, 2013: 1337–1345. ImageNet classification with deep 35. Muller, M. Dark silicon and the and Unsupervised Feature Learning 8. Deng, J. Dong, W., Socher, R., Li, L.-J., convolutional neural networks. In internet. In EE Times “Designing Workshop (NIPS) (2011). Vol. 1. Li, K., Fei-Fei, L. ImageNet: A large- Advances in Neural Information with ARM” Virtual Conference, 26, 45. Wang, G., Anand, D., Butt, N., Cestero, A., scale hierarchical image database. In Processing Systems (2012) 1–9. 70(2010), 285–288. Chudzik, M., Ervin, J., Fang, S., Conference on Computer Vision and 22. Larkin, D., Kinane, A., O’Connor, N.E. 36. Papadonikolakis, M., Bouganis, C. A Freeman, G., Ho, H., Khan, B., Kim, B., Pattern Recognition (CVPR) (2009). Towards hardware acceleration heterogeneous FPGA architecture for Kong, W., Krishnan, R., Krishnan, S., IEEE, 248–255. of neuroevolution for multimedia support vector machine training. In Kwon, O., Liu, J., McStay, K., Nelson, E., 9. Du, Z., Fasthuber, R., Chen, T., Ienne, P., processing applications on mobile 2010 18th IEEE Annual International Nummy, K., Parries, P., Sim, J., Li, L., Luo, T., Feng, X., Chen, Y., devices. In Neural Information Symposium on Field-Programmable Takalkar, R., Tessier, A., Todi, R., Temam, O. Shidiannao: Shifting vision Processing (2006). Springer, Berlin Custom Computing Machines (FCCM) Malik, R., Stiffler, S., Iyer, S. Scaling processing closer to the sensor. Heidelberg, 1178–1188. (May 2010). IEEE, 211–214. deep trench based EDRAM on SOI In Proceedings of the 42nd ACM/ 23. Le, Q.V. Building high-level features 37. Qadeer, W., Hameed, R., Shacham, O., to 32 nm and beyond. In IEEE IEEE International Symposium on using large scale unsupervised Venkatesan, P., Kozyrakis, C., International Electron Devices Computer Architecture (ISCA’15) learning. In 2013 IEEE International Horowitz, M.A. Convolution engine: Meeting (IEDM) (2009). IEEE, 1–4. (2015). ACM, 92–104. Conference on Acoustics, Speech and Balancing efficiency & flexibility 46. Wolpert, D.H. The lack of a priori 10. Esmaeilzadeh, H., Blem, E., Amant, R.S., Signal Processing (ICASSP) (2013). in specialized computing. In distinctions between learning Sankaralingam, K., Burger, D. Dark IEEE, 8595–8598. International Symposium on algorithms. Neural Comput. 8, 7 silicon and the end of multicore 24. Le, Q.V., Ranzato, M.A., Monga, R., Computer Architecture, 2013). ACM, (1996), 1341–1390. scaling. In Proceedings of the 38th Devin, M., Chen, K., Corrado, G.S., 41(3), 24–35. 47. Yeh, Y.-J., Li, H.-Y., Hwang, W.-J., Fang, International Symposium on Computer Dean, J., Ng, A.Y. Building 38. Sermanet, P., Chintala, S., LeCun, Y. C.-Y. Fpga implementation of KNN Architecture (ISCA) (June 2011). high-level features using large Convolutional neural networks classifier based on wavelet transform IEEE, 365–376. scale unsupervised learning. In applied to house numbers digit and partial distance search. In Image 11. Esmaeilzadeh, H., Sampson, A., International Conference on Machine classification. InPattern Recognition Analysis (June 2007). Springer Berlin Ceze, L., Burger, D. Neural acceleration Learning, June 2012. (ICPR), …, 2012. Heidelberg, 512–521. for general-purpose approximate 25. LeCun, Y., Bengio, Y., Hintion, G. Deep programs. In Proceedings of the 2012 learning. Nature 521, 7553 (2015), 45th Annual IEEE/ACM International 436–444. Yunji Chen, Tianshi Chen, Zhiwei Xu, Olivier Temam ([email protected]), Symposium on Microarchitecture 26. Lecun, Y., Bottou, L., Bengio, Y., and Ninghui Sun ({cyj, chentianshi, Inria Saclay, France. (Dec 2012). IEEE Computer Society, Haffner, P. Gradient-based learning zxu, snh}@ict.ac.cn), SKL of Computer 449–460. applied to document recognition. Architecture, ICT, CAS, China. 12. Farabet, C., Martini, B., Corda, B., Proceedings of the IEEE 86 11 Akselrod, P., Culurciello, E., LeCun, Y. (1998), 2278–2324. NeuFlow: A runtime reconfigurable 27. Li, S., Ahn, J.H., Strong, R.D., dataflow processor for vision. InCVPR Brockman, J.B., Tullsen, D.M., Workshop (June 2011). IEEE, 109–116. Jouppi, N.P. McPAT: An integrated 13. Farabet, C., Martini, B., Corda, B., power, area, and timing modeling Akselrod, P., Culurciello, E., LeCun, Y. framework for multicore and Neuflow: A runtime reconfigurable manycore architectures. In Proceedings dataflow processor for vision. In2011 of the 42nd Annual IEEE/ACM IEEE Computer Society Conference International Symposium on on Computer Vision and Pattern Microarchitecture, MICRO 42 (New Recognition Workshops (CVPRW) York, NY, USA, 2009). ACM, 469–480. (2011). IEEE, 109–116. 28. Liu, D., Chen, T., Liu, S., Zhou, J., 14. Frery, A., de Araujo, C., Alice, H., Zhou, S., Teman, O., Feng, X., Zhou, X., Cerqueira, J., Loureiro, J.A., de Lima, M.E., Chen, Y. Pudiannao: A polyvalent Oliveira, M., Horta, M., et al. machine learning accelerator. Hyperspectral images clustering on In International Conference on reconfigurable hardware using the Architectural Support for Programming k-means algorithm. In Proceedings Languages and Operating Systems of the 16th Symposium on Integrated (ASPLOS) (2015). ACM, 369–381. Circuits and Systems Design, 2003. 29. Maashri, A.A., Debole, M., Cotter, M., SBCCI 2003 (2003). IEEE, 99–104. Chandramoorthy, N., Xiao, Y., 15. Hameed, R., Qadeer, W., Wachs, M., Narayanan, V., Chakrabarti, C. Azizi, O., Solomatnikov, A., Lee, B.C., Accelerating neuromorphic vision Richardson, S., Kozyrakis, C., algorithms for recognition. In Horowitz, M. Understanding sources Proceedings of the 49th Annual of inefficiency in general-purpose Design Automation Conference chips. In International Symposium (2012). ACM, 579–584. © 2016 ACM 0001-0782/16/11 $15.00

112 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 DOI:10.1145/2996866 To view the accompanying paper, Technical Perspective visit doi.acm.org/10.1145/2996868 rh FPGA Compute Acceleration Is First About Energy Efficiency By James C. Hoe

THE FOLLOWING PAPER presents a re- formance improvements by the middle these abstraction overheads and at the search deployment of Field Program- of the last decade. same time gains unrestricted optimiza- mable Gate Arrays (FPGAs) in a Micro- In the present power-constrained tion freedom, in particular the ability soft Bing datacenter. The FPGAs enabled design regime—whether dictated by to exploit all forms and granularity of more efficient search processing directly the cooling and packaging of a micro- parallelism in a task. In return, FPGA at the digital logic level. This pioneering processor or the air handling capacity computing places a great burden on ap- work is the first to successfully demon- of a datacenter, plus nowadays actual plication developers to design at a low strate at scale the idea of FPGAs as an ef- supply-side considerations in battery level of specificity to meet functional fective large-scale, first-class component powered devices—the problem of get- and performance objectives. How to in cloud computing. Following this land- ting more performance (“operations simplify application development, while mark effort, Microsoft has launched full- per second”) requires a solution that retaining FPGA’s efficiency advantage, is scale production deployment of FPGA can somehow do so while expending an important and challenging problem. accelerators in its new datacenter servers less “energy per operation.” Mapping computations to ASICs has for a range of cloud services. Parallelism provided the first solu- the same benefits as FPGAs. In fact, an For most of its 30-year existence, tion to running faster while using less ASIC implementation can be significant- FPGA technology had primarily served energy in each step. As a rule of thumb, ly more energy efficient than an FPGA as an alternative to application-specific it takes proportionally more power to implementation due to the overhead in integrated circuits (ASICs), with only increase the performance of a sequen- making the FPGAs’ logic substrate repro- niche applications in computing. To- tial task. Therefore, when parallelism grammable. But reprogrammability is day, besides Microsoft’s activities, we is available, one could use parallel ex- very important to computing. As argued are also finding Intel and IBM adding ecution to reduce power instead of for in the following paper, besides the utility FPGAs as programmable computing speedup. In an illustrative example, of repurposing a hardware investment substrates to their product lines. At given design A with throughput per- over its multiyear ownership, a particu- the root of the computing industry’s formance ThroughputA at PowerA and lar accelerated task of interest can evolve current embrace of FPGAs is the same a lower-performing design B with too quickly to be committed into ASIC

Power Wall struggle that brought about ThroughputB = ThroughputA/N and development time and cost scales. the transition to multicore micropro- PowerB < PowerA/N, we can use N cop- Finally, it should be noted that FPGA cessors in the last decade. ies of B to complete the same number computing is not the answer in every In the decades prior, single-thread- of tasks per second as A at less power, scenario or across all trade-offs. Ulti- ed microprocessors enjoyed a regular and we can use >N copies of B to exceed mately, the quest for efficiency is a mat- doubling of compute performance with the throughput of A at the same PowerA. ter of using the right tool for the job. each new Very Large Integrated Circuits Both multicore microprocessors and Driven by the contradicting needs (VLSI) scaling generation, taking advan- GPGPUs are practitioners of this prin- for more performance and better en- tage of the more numerous and faster ciple. FPGA computing can take this to ergy efficiency, parallel computing in transistors. However, each new genera- further extremes, delivering high over- the form of multicore microprocessors tion of faster microprocessors had also all performance using a sea of individu- seemingly exploded into the commer- required more power. The Power Wall ally slow processing elements that are cial mainstream in the last decade. is less about supplying power but more very energy efficient per operation. This With the Catapult effort, perhaps we about removing the resulting heat fast route to energy efficiency is of course are again seeing the start of something enough. For set upper bounds in cost, only applicable when the computing equally transformative in the pursuit weight, size, and noise of the cooling ap- task of interest is amenable to a high de- of still higher performance and energy paratus, there is a limit to how fast heat gree of parallelization. efficiency. As we watch the current ex- can be extracted from a microproces- Relative to microprocessors, FPGA citing developments unfold across the sor die. Microprocessors remained well computing has another source of energy computing industry, we should also below market-set economical cooling efficiency. The majority of the power in recognize the multiple decades of prior limits until the 1990s. Despite the best a microprocessor is consumed by the work that led up to this pivotal time. concerted efforts from software down overhead in presenting the simplifying to material science to reign in the power von Neumann abstraction to the pro- James C. Hoe ([email protected]) is a professor of electrical and computer engineering at Carnegie Mellon University, increase in the ensuing years, single- grammer and in ensuring good perfor- Pittsburgh, PA. threaded microprocessors ran out of mance across a wide range of program cooling headroom to sustain their per- behaviors. FPGA computing avoids Copyright held by author.

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 113 research highlights

DOI:10.1145/2996868 A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services By Andrew Putnam, Adrian M. Caulfield, Eric S. Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth Gopal, Jan Gray, Michael Haselman, Scott Hauck, Stephen Heil, Amir Hormati, Joo-Young Kim, Sitaram Lanka, James Larus, Eric Peterson, Simon Pope, Aaron Smith, Jason Thong, Phillip Yi Xiao, and Doug Burger

Abstract general-purpose processors to make datacenters even more Datacenter workloads demand high computational capa- powerful and cost-effective. These improvements have been bilities, flexibility, power efficiency, and low cost. It is chal- largely driven by Moore’s law that predicts an exponential lenging to improve all of these factors simultaneously. To growth in the number of transistors over time, and by Dennard advance datacenter capabilities beyond what commodity scaling8 that predicts constant power consumption even as server designs can provide, we designed and built a com- the number of transistors increase within a fixed silicon area. posable, reconfigurable hardware fabric based on field In recent years, Dennard scaling has virtually ended, result- programmable gate arrays (FPGA). Each server in the fabric ing in power consumption being roughly proportional to the contains one FPGA, and all FPGAs within a 48-server rack are number of switching transistors. Thus, even though Moore’s interconnected over a low-latency, high-bandwidth network. law continues to provide more transistors for the time being, We describe a medium-scale deployment of this fabric a larger number of transistors must switch proportionally on a bed of 1632 servers, and measure its effectiveness in less frequently to maintain constant power consumption. In accelerating the ranking component of the Bing web search addition, as transistors become smaller, they are becoming engine. We describe the requirements and architecture of more expensive to manufacture, making it even less attrac- the system, detail the critical engineering challenges and tive to pack more and more transistors onto a single chip. solutions needed to make the system robust in the presence of failures, and measure the performance, power, and resil- 1.1. Specialized hardware in the datacenter ience of the system. Under high load, the large-scale recon- One way to improve the performance and efficiency of data- figurable fabric improves the ranking throughput of each center servers is to make better uses of “power-limited” server by 95% at a desirable latency distribution or reduces transistors by specializing servers and their components tail latency by 29% at a fixed throughput. In other words, the to a particular task. In the academic literature, specializa- reconfigurable fabric enables the same throughput using tion has been shown to achieve 10×–100× or more improve- only half the number of servers. ment in energy efficiency over general-purpose processors in many cases, such as for Memcached5, 14 compression/ decompression,13, 15 K-means clustering,9, 12 and parts of web 1. INTRODUCTION search.20 However, while specializing servers for specific Cloud computing has emerged as a dominant paradigm for workloads can provide significant efficiency gains, doing so delivering scalable, reliable, and cost-effective online ser- is problematic in the datacenter for several major reasons. vices to businesses and clients across the world. According First, datacenters, by their very nature, support a wide to the IDC, public spending in IT cloud services will grow variety of applications and specializing for one service will to more than $127B in 2016 as the adoption of cloud com- likely cause inefficiencies and add cost to any other services puting accelerates worldwide.10 This major shift will offer sharing the platform. Second, specialization compromises enormous potential in unlocking new applications and in homogeneity, which is highly desirable in the datacenter improving the performance, security, and cost of computing. environment to reduce management issues and to provide The majority of today’s cloud services are realized using a consistent platform on which applications can rely upon. datacenters, which typically comprise tens if not hundreds Third, datacenter services evolve rapidly, making highly of thousands of servers built from commodity components specialized hardware features impractical and quickly such as general-purpose processors, memory, storage, and obsoleted. Thus, datacenter providers face a conundrum: networking. Datacenters are shared across applications and they need continued improvements in performance and services, providing economies of scale, reliability, scalabil- ity, and shared infrastructure management. The original version of this paper was published in the Datacenter operators have traditionally relied upon con- Proceedings for the 41st ACM/IEEE International Symposium tinuous improvements in the performance and efficiency of on Computer Architecture ( June 14–18, 2014, Minneapolis, MN), 13–24. All authors contributed to this work while employed by Microsoft.

114 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11

efficiency, but cannot obtain those improvements from any Compared to a pure software implementation, the combination of standard general-purpose processors and Catapult fabric achieves a 95% improvement in throughput static specialized hardware. at each ranking server with an equivalent latency distribu- tion. At the same throughput as software, Catapult reduces 1.2. Flexible specialized hardware tail latency by 29%. In other words, adding the FPGA enables Programmable hardware, in the form of field programmable the same throughput using only half the number of servers. gate arrays (FPGAs), are devices that could potentially reconcile The system is able to run stably for long periods, with a failure the need for both flexibility and energy efficiency. FPGAs can handling service quickly reconfiguring the fabric upon errors be imagined as silicon “Legos”—collections of simple logic or machine failures. The rest of this article describes the blocks that can be configured and composed to implement Catapult architecture and our measurements in more detail. arbitrary circuits that run at very high efficiency relative to gen- eral-purpose processors. While less energy efficient than hard- 2. BACKGROUND wired Application Specific Integrated Circuits (ASICs), FPGAs FPGAs are digital chips that can be programmed (and can be rapidly reconfigured and adapted to changing work- reprogrammed) for implementing complex digital logic. loads over the lifetime of the server. However, as of this writing, Conceptually, FPGAs consist of an array of programmable logic FPGAs have not been widely deployed as compute accelerators elements, connected by a programmable routing network that in either datacenter infrastructure or client devices. carries signals from where they are generated to where they are One challenge traditionally associated with FPGAs is the consumed. Each of these features are controlled by memory need to fit the accelerated function into the available recon- cells that are configured by the end user. Designs for an FPGA figurable area on one chip. Today’s FPGAs can be virtualized using runtime reconfiguration to support more functions than could fit on a single device. However, the amount of Figure 1. Each server in a rack has a local FPGA attached to the host state that needs to be saved and restored, along with cur- via PCI Express. FPGAs communicate with their neighbors using a private low-latency, high-bandwidth serial network. Multiple FPGAs rent reconfiguration times for standard FPGAs make this can be allocated to a single service, such as the Bing ranking, approach too slow to be practical. Multiple FPGAs could without going through host CPUs. provide more silicon resources, but are extremely difficult to fit into a conventional datacenter server. Even if suffi- cient space were available, multiple FPGAs per server would cost more, consume more power, are wasteful when there Web Search Pipeline are more FPGAs than needed—and even less useful when there are still not enough FPGAs to implement the appli- FPGA FPGA FPGAFFPGA FPGAFPGA cation. On the other hand, being restricted to a single FPGA per server restricts the workloads that might be accelerated and could make the associated gains too small to justify the cost. PCle (8.0 GB/s) SLlll (2.0 GB/s) 400 ns latency 1.3. Catapult reconfigurable fabric Server Server Server Server per hop This article describes a reconfigurable fabric called Catapult, which uses FPGAs to provide the performance and efficiency gains of specialized hardware while simultaneously satisfy- Figure 2. Block diagram representing one tile of an abstract FPGA. ing the strict requirements of the datacenter. As illustrated Each tile is composed of generic logic gates (L), embedded memory in Figure 1, the Catapult fabric is embedded into racks of (RAMs), and specialized arithmetic units (DSPs). These basic tiles servers in the form of a small board with a medium-sized are replicated to create larger and larger FP-GAs. FPGA and local DRAM attached to each server. A unique characteristic of Catapult is that FPGAs are directly wired together in a high-bandwidth, low-latency network, allowing L L L L L services to allocate groups of FPGAs to provide the necessary reconfigurable area to implement the desired functionality. While proving functionality on a small number of servers L L L L L shows the potential of the design, datacenters are character- ized by massive scales and severe power, cost, and reliability RAM RAM RAM RAM constraints. To demonstrate the potential of this technology DSP DSP DSP DSP at datacenter scale, we tested the Catapult reconfigurable fabric, running a widely deployed web search workload L L L L L augmented with failure handling, on a bed of 1632 servers equipped with FPGAs. The experiments show that large L L L L L gains in search throughput and latency are achievable on a real, complex commercial workload using the large-scale reconfigurable fabric.

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 115 research highlights

are typically developed in a hardware description language application development requires flexibility and robust- (HDL) such as VHDL or Verilog; these designs are compiled ness. And the physical constraints and uptime requirements down to an FPGA program (called a bitstream), similar to how make it largely impractical to modify or upgrade the hard- a software language is compiled into a software executable. ware after initial delivery. However, specifying a design in HDL is significantly more To succeed in the datacenter environment, an FPGA- complex than for a typical software language, and compilation based reconfigurable fabric must (at a minimum) meet the may take hours to complete. Once the bitstream is generated, following requirements: it can be loaded into an FPGA in a matter of seconds, configur- ing the FPGA to implement the desired computation. • preserve server homogeneity to avoid complex manage- FPGAs cover a middle-ground in performance/efficiency ment of heterogeneous servers, and flexibility compared to fully custom ASICs and general- • scale to large workloads that might not fit into a single purpose CPUs. General-purpose CPUs are the most flexible FPGA, platform, capable of implementing any application. But this • avoid consuming too much power or network flexibility generally comes at the cost of a 100× or greater bandwidth, reduction in performance and energy efficiency compared • avoid single points of failure, and have minimal impact to an ASIC designed for the same task. However, when the on reliability, application changes, the CPU can still run the new applica- • provide positive return on investment (ROI), and tion by simply recompiling the software. When the target • operate within the space and power confines of existing application for an ASIC changes, it likely requires designing servers. a new chip—a process which typically takes months and car- ries huge development costs. These requirements guided the architectural choices we FPGAs combine aspects of both CPUs and ASICs. They can made throughout the Catapult system development. be reprogrammed as applications change to provide CPU- like flexibility, but the program produces hardware which is 3.1. Integration specialized to the application, providing performance and How to integrate FPGAs into the datacenter is perhaps the power efficiency closer to ASICs. The generic FPGA logic can most important consideration when designing a recon- be configured to exploit huge amounts of fine-grained par- figurable fabric. We investigated a variety of approaches allelism and can implement very complex pipelined struc- which can be roughly broken into two categories based on tures that can be several orders-of-magnitude faster and how they integrate with conventional servers: “networked” lower power than their software equivalents. and “integrated.” Networked designs add FPGAs to special Modern FPGAs provide millions of gates of random logic FPGA-enabled servers, and arrange the servers either as and can include multiple megabytes of internal storage that entire racks of specialized servers or embedding some num- supports thousands of reads and writes on every clock cycle. ber of specialized servers in otherwise conventional racks. These chips also incorporate smaller fully custom blocks, Integrated designs add FPGAs directly inside the conven- such as complex embedded arithmetic elements (DSP tional servers, requiring no specialized servers and no net- blocks), high-speed I/O, IP protocol blocks, and may include work communication to reach the FPGA. complete microprocessors as well, either as hardened sub- Networked designs have been developed and deployed systems or by mapping the microprocessor logic into the in High Performance Computing (HPC) environments. programmable logic of the chip. While HPC systems are not subject to the same scale, cost, FPGAs have been available for several decades and they power, and homogeneity constraints as the datacenter, they have huge potential for accelerating a wide variety of appli- are one place where integrating FPGAs with CPUs has seen cations. Yet despite this promise, FPGAs have not been some success. Entire large systems have been built using widely deployed as compute accelerators, even in datacen- only specialized servers, including the Cray XD-1,7 Novo-G,11 ter environments where their potential for flexibility, power and QP.18 Examples of specialized servers that could be inte- efficiency, and performance should make them extremely grated with racks of conventional datacenter servers include attractive. the Convey HC-2,6 Maxeler MPC series,17 BeeCube BEE4,4 The Catapult architecture described here unlocks the and SRC MAPstation.19 In fact, embedding a few specialized potential of FPGAs in the datacenter. To achieve this, the servers into each rack is the approach that we first took early architecture has to be distributed, scalable, robust, and in the project. However, as we learned from our first proto- work on real-world datacenter-scale workloads. In the fol- type, the networked design approach is inappropriate for lowing sections, we describe the architecture, and show datacenter use for several reasons. how production workloads can finally unlock the potential First, specialized racks and servers are single points of efficiency and performance gains promised for so long by failure, where the failure of the specialized nodes impacts FPGAs. many dependent conventional servers—amplifying the impact to uptime and overall service reliability. 3. CATAPULT ARCHITECTURE Second, specialized racks and servers require separate Datacenters are a challenging environment for any new cooling and power provisioning, as well as different soft- technology to succeed. The scale alone demands extremely ware, firmware, and spare parts provisioning, making man- high reliability, efficiency, and low cost. The rapid pace of agement and maintenance more difficult.

116 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11

Third, all communication with these racks/servers goes Figure 3. The FPGA board and the server into which it installs. through the existing network, which (in a datacenter) is not (a) Block diagram of the FPGA board. (b) Picture of the manufactured typically designed for the many-to-one communication pat- board. (c) Diagram of the 1U, half-width server that hosts the FPGA terns to the specialized racks. board. The air flows from left to right, leaving the FPGA in the Finally, all communication between the portion of the exhaust of both CPUs. application running on the conventional server and the por- (a) (b) tion in the FPGA requires going over the network, which, even QSPI FPGA Flash in the datacenter, can have high latency, unreliable deliv- ery, and variable performance. This is particularly difficult ECC SO-DIMM 2 when traffic is exhibiting the many-to-one communication 8GB DDR3 pattern. The unreliable performance makes partitioning (c) the application between CPU and FPGA extremely difficult, especially for latency-sensitive user-facing applications, and reduces the likelihood that an application can be benefi- Airflow Airflow cially offloaded to the reconfigurable fabric. These issues diminish in severity with an increas- ingly distributed design. At the extreme is the integrated design—adding an FPGA to every server. In integrated inter-FPGA network are wired to a connector on the bottom designs, an FPGA failure only impacts one server. All serv- of the board that plugs directly into the motherboard. To ers have the same cooling and power constraints. CPU to avoid changes to the server itself, we custom designed the FPGA communication does not need to go over the net- board to fit within a small 10 cm × 9 cm × 16 mm slot occupy- work at all. And attaching the FPGA directly to the CPU ing the rear of a 1U (4.45 cm high), half-width server, which greatly diminishes communication latency and variance, offered sufficient power and cooling only for a standard enabling finer-grain and more reliable offloading from the 25-Watt PCIe peripheral device. These physical constraints CPU to the FPGA. are challenging for FPGAs, but are prohibitive for modern The homogeneous nature of the integrated design compute-class Graphics Processing Units (GPUs). enables additional benefits which are key to keeping the designs cost-effective. Homogeneous computing resources 3.4. Shell architecture can be divided at arbitrary granularities, easing both short- In typical FPGA programming environments, the user is term and long-term provisioning of servers. Homogeneous often responsible for developing not only the application designs also improve economies of scale leading to more itself but also building and integrating system functions cost-effective designs. required for data marshaling, host-to-FPGA communica- tion, and inter-chip FPGA communication (if available). 3.2. Scalability System integration places a significant burden on the user On its own, integrating only one FPGA per server either lim- and can often exceed the effort needed to develop the appli- its applications to those that can fit into the resources of a cation itself. This development effort is often not portable to single FPGA or suffers the same high-latency and unreliabil- other boards, making it difficult for applications to work on ity problems with communicating over the network as spe- future platforms. cialized racks and servers. Motivated by the need for user productivity and design To overcome this shortcoming, we built a specialized re-usability when targeting the Catapult fabric, we logically network, in addition to the existing Ethernet network, to divide all programmable logic into two partitions: the shell facilitate FPGA-to-FPGA communication. This specialized and the role. The shell is a reusable portion of programma- network takes advantage of the low-latency, high-speed ble logic common across applications targeting the same serial I/O available in the FPGAs to create a two-dimensional board—while the role is the application logic itself, restricted 6 × 8 torus network. The torus topology balances routabil- to a large fixed region of the chip. ity, resilience, and cabling complexity. Each inter-FPGA net- Figure 4 shows a block-level diagram of the shell archi- work link supports 20 Gbits per second in both directions, at tecture, consisting of PCIe with a custom DMA engine, two sub-microsecond latency per hop, and requires only passive DRAM controllers, four high-speed inter-FPGA links run- copper cables with no additional networking costs such as ning the Serial Lite III protocol, the torus network router, network interface cards or switches. Another HPC system, reconfiguration logic, single event upset (SEU) scrubbing, Maxwell,3 takes a similar approach, but again targets HPC and debugging interfaces. workloads and constraints. Role designers access convenient and well-defined inter- faces and capabilities in the shell (e.g., PCIe, DRAM, rout- 3.3. FPGA board design ing, etc.) without concern for managing system correctness. Figure 3 shows the FPGA board and the server into which it The common shell also simplifies software by ensuring installs.16 The board incorporates a midrange Altera Stratix V that all applications support the same API functions for data D5 FPGA,2 which we selected to balance the cost of deploying movement, reconfiguration, and health monitoring. For thousands of FPGAs against the FPGA capacity. The board example, we co-designed the PCIe core and DMA engine to also includes two banks of DDR3-1600 DRAM. The PCIe and achieve very low latency, taking fewer than 10 µs for transfers

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 117 research highlights

Figure 4. Components of the shell architecture. that adding the FPGA card and network cost less than 30% in the total cost of ownership, including a limit of 10% for total 4 GB DDR3-1333 4 GB DDR3-1333 server power. ECC SO-DIMM ECC SO-DIMM 72 72 Shell 3.7. Datacenter deployment DDR3 Core 0 DDR3 Core 1 To test this architecture on a critical production-scale data- center service at scale, we manufactured and deployed the 256 Mb Config 4 Role Flash QSPI fabric in a production datacenter. The deployment con- (RSU) Config Flash sisted of a total of 1632 machines, that were organized into JTAG in 17 server racks. Each server uses two 12-core Intel Xeon Host 8 x8 PCIe LEDs CPUs, 64 Gbytes of DRAM, and two solid-state drives (SSDs) CPU Core Application Temp in addition to four hard-disk drives. The machines have Sensors DMA a standard 10-Gbit Ethernet network card connected to a Engine I2C 48-port top-of-rack switch, which in turn connects to the

xcvr broader datacenter network. The FPGA daughter cards and reconfig Inter-FPGA Router cable assemblies were tested at manufacture and again at SEU system integration. At deployment, we discovered that seven North South East West cards (0.4%) had a hardware failure and that one of the 3264 SLIII SLIII SLIII SLIII links (0.03%) in the cable assemblies was defective. Since

2 2 2 2 then, after several months of operation, we have seen no additional hardware failures.

4. APPLICATION CASE STUDY To study the potential impact of the Catapult fabric, we of 16 KB or less, and multithreading safety. Software devel- ported a significant fraction of Microsoft Bing’s web search opers use simple send and receive calls to transmit data, and ranking engine into reconfigurable hardware. Aside from role designers simply read and write to the PCIe interface being a representative datacenter-scale workload, Bing FIFOs. consumes a significant fraction of the datacenter capacity The shell consumes 23% of each FPGA, although extra at Microsoft and is used to power a number of popular ser- capacity can be obtained by discarding unused functions. If vices such as Yahoo! Search, Apple Siri, and search on XBox desired, partial reconfiguration allows for dynamic switch- One. As an interactive workload, Bing requires both low ing between roles while the shell remains active—even latency and high bandwidth simultaneously. Furthermore, routing inter-FPGA traffic while a reconfiguration is taking Bing has strict resiliency requirements, is operationally com- place—but it comes at the cost of reduced logic area and plex, and is programmed using tens of thousands of lines of RAM availability to the application. production-quality C++ code—making it an excellent candi- date for gauging the viability of reconfigurable computing at 3.5. Resiliency scale in a production environment. At datacenter scales, providing resiliency in the face of In our case study, we devoted the majority of our efforts hardware failures is essential given that such failures occur to the Ranking portion of the Bing search engine, which frequently, while demand for hardware availability is con- presented the largest opportunity for hardware accelera- sistently high. For instance, the fabric must stay available tion. Over 30K lines of C++ code were manually ported to the in the presence of errors, failing hardware, reboots, and Catapult fabric using the Verilog HDL. The implementation updates to the implemented algorithm. FPGAs can poten- generates results that are identical to software—even repro- tially corrupt their neighbors or crash the hosting servers ducing known software bugs. As will be discussed in further if care is not taken during reconfiguration. Our reconfigu- detail below, our implementation requires a total of seven rable fabric further requires a custom protocol to recon- FPGAs to run a single instance of the service—plus one addi- figure groups of FPGAs, remap services to recover from tional spare for redundancy. Without the availability of the failures, and report errors to the management software. In low-latency, high-bandwidth network that interconnects addition, ECC is used on all external memories, and SEU multiple FPGAs, accelerating the Ranking service would not detection and correction is implemented on the FPGA’s have been feasible. configuration memory. 4.1. Accelerating Bing ranking using FPGAs 3.6. Total cost of ownership The Bing search engine is logically divided into multiple To balance the expected per-server performance gains ver- software services spanning a large number of servers in the sus the necessary increase in total cost of ownership, includ- datacenter. When a user’s search query is processed by the ing both increased capital costs and operating expenses, Bing search engine, it is first submitted to a front-end cache we set aggressive power and cost goals to achieve a positive service that stores and delivers the results of previously sub- ROI. We are unable to give cost numbers for our production mitted and popular queries. If the search query cannot be servers due to their business sensitivity; however, we can say located in the cache, it is forwarded to the Selection and

118 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11

Ranking services, which perform the actual computation Our FPGA accelerator achieves a significant advantage needed to generate search results. over software using a form of multiple-instruction, single- The Selection service is responsible for accepting a user data computation (MISD). In the FPGA, we instantiate 43 query and selecting which of the billions of documents (e.g., unique feature-extraction state machines that are used to web pages) on the Internet are worthwhile candidates. The compute nearly 4500 features per document-query pair in Ranking service further takes these selected documents and parallel. Each feature is an independent instruction stream, runs them through a sophisticated ranking algorithm that and the data is a single document—hence the MISD par- determines the order in which these documents should be allelism—which the FPGA can exploit far more effectively presented to the user. than CPUs and GPUs. The input to the Bing Ranking algorithm is a “hit vector” Each state machine reads each hit vector one item at a that corresponds to a document-query pair arriving from the time and performs a local calculation. For some features upstream Selection service. A hit vector efficiently encodes that have similar computations, a single state machine is the locations in which words in a user’s query appear responsible for calculating values for multiple features. As an within a given document (e.g., web page). The output of the example, the NumberOfOccurences feature simply counts up Ranking algorithm is a document “score,” which is used to how many times each term (i.e., word) in the query appears. determine the position in which the document is presented At the end of a document, the state machine outputs all non- to the user. zero feature values; for NumberOfOccurences, this could be Conceptually, the Bing ranking algorithm is divided into up to the number of terms in the query. three major stages: (1) Feature Extraction (FE), (2) Free- To support a large collection of state machines working Form Expressions (FFE), and (3) Machine-Learned Scoring in parallel on the same input data at a high clock rate, we (MLS). Figure 5 illustrates a hardware processing pipeline organize the blocks into a tree-like hierarchy and replicate that allocates these stages to an eight-node FPGA pipeline: each input several times. Figure 6 shows the logical organi- one FPGA for FE, two for FFEs, one for a compression stage zation of the FE hierarchy. Hit vectors are fed into a hit vector that increases scoring engine efficiency, and three for MLS. processing state machine, which produces a series of control The eighth FPGA is a spare that allows the ring to be recon- and data tokens that the various feature state machines proc- figured and rotated to keep the ranking pipeline alive in the ess. Each state machine processes each hit vector at a rate of event of a failure. one to two clock cycles per token. When a state machine fin- ishes its computation, it emits one or more feature indexes 4.2. Feature extraction and values that are fed into the feature-gathering network In FE, interesting characteristics of a given document are that coalesces the results from the 43 state machines into a dynamically extracted based on a user’s search query. As a single output stream for the downstream FFE stages. Inputs simple example, the NumberOfOccurences feature simply to FE are double buffered to increase throughput. counts up the number of times a query happens to appear within a given document. In Bing Ranking, there are poten- 4.3. Free-Form expressions tially up to thousands of features that are computed for a FFEs are mathematical combinations of the features given document-query pair. extracted during the feature-extraction stage. FFEs give developers a way to create hybrid features that are not con- veniently specified as feature-extraction state machines. Figure 5. Mapping of ranking roles to FPGAs on the reconfigurable There are typically thousands of FFEs, ranging from simple fabric. Data is sent from each server to the queue manager. It is then dispatched through the seven FPGA computation stages, and the Figure 6. The first stage of the ranking pipeline. Each hit vector is results are sent back to the source server. streamed into the hit vector preprocessing state machine, split CPU 0 into control and data tokens, and issued in parallel to the 43 unique feature state machines. The feature-gathering network collects generated feature and value pairs and forwards them to the next CPU 7 Queue Manager CPU 1 pipeline stage. FE

Spare FFE 0 Hit vector preprocessing FSM Requests & CPU 6 Scoring 2 Scoring FFE 1 CPU 2 Responses

Feature-

Scoring 1 Compress gathering network

Scoring 0 CPU 5 CPU 3

CPU 4 Feature Extraction FSMs

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 119 research highlights

(such as adding two features) to large and complex (with 4.5. Parallelism thousands of operations including conditional execution To overcome the slower clock frequency of FPGAs rela- and complex floating-point operators such as ln, pow, and tive to CPUs and GPUs, each of the scoring stages takes fpdiv). FFEs vary greatly across models, making it impracti- advantage of two forms of parallelism that are not easily cal to synthesize customized datapaths for each expression. handled by the other architectures. First, each processing One potential solution is to tile many off-the-shelf soft stage described here is configured with deep pipelines that processor cores (such as Nios II1), but these single-threaded match the amount of pipeline parallelism available in the cores are inefficient at processing thousands of threads application. with long-latency floating-point operations in the desired Second, FE and FFE exhibit multiple instruction single amount of time per macropipeline stage (8 ms). Instead, we data (MISD) parallelism, a cousin of the more commonly developed a custom multicore processor with massive multi- known single instruction multiple data (SIMD) parallelism threading and long-latency operations in mind. The result is exploited by GPUs and the vector processing units in CPUs. the FFE processor shown in Figure 7. The FFE microarchi- A single source of data (the document) is operated on by a tecture is highly area efficient, letting us instantiate 60 cores very large number of independent instruction streams (fea- on a single FPGA. ture extractors and free form expressions for FE and FFE, The custom FFE processor has three key characteris- respectively). SIMD architectures require the opposite—a tics that make it capable of executing all of the expressions large number of independent data elements operated on by within the required deadline. First, each core supports four the same instruction stream. simultaneous threads that arbitrate for functional units on While SIMD architectures can efficiently process applica- a cycle-by-cycle basis. When one thread is stalled on a long tions with MISD parallelism by batching many sets of data operation such as a floating-point divide or natural log oper- together, this comes at the cost of increased latency, which ation, other threads continue to make progress. All func- is often prohibitive in interactive cloud applications such as tional units are fully pipelined, so any unit can accept a new web search. As such, web ranking is an example of a cloud operation on each cycle. application that FPGAs can accelerate more effectively other Second, rather than fair thread scheduling, threads are parallel processing architectures. statically prioritized using a priority encoder. The assembler maps the expressions with the longest expected latency to 5. EVALUATION thread slot 0 on all cores, then fills in slot 1 on all cores, and We evaluated the Catapult fabric by deploying and mea- so forth. Once all cores have one thread in each thread slot, suring the described Bing ranking engine on a bed of 1632 the remaining threads are appended to the end of previously servers with FPGAs. Six hundred and seventy-two ran the mapped threads, starting again at thread slot 0. ranking service, and the other machines ran the selection Third, the longest-latency expressions are split across service to feed documents and queries to the ranking serv- multiple FPGAs. An upstream FFE unit can perform part of ers. We compare the average and tail latency distributions the computation and produce an intermediate result called of Bing’s production-level ranker running with and without a metafeature. These metafeatures are sent to the down- FPGAs on that bed. stream FFEs like any other feature, effectively replacing that User experience is dictated determined more by tail part of the expression computation with a simple feature latencies rather than average latencies—users care little read. if their search results come back faster than expected, but they become unhappy quickly if the results are slower than 4.4. Document scoring expected. As such, we report performance at the latency of The last stage of the pipeline is a machine-learned model the 95th percentile of queries—the time at which only 5% evaluator that takes the features and FFEs as inputs and pro- of queries are slower. Performance results are very simi- duces a single floating-point score. This score is sent back lar for average latency queries (50th percentile), and are to the search software, and all of the resulting scores for the even better for higher tail latencies (99th percentile and query are sorted and returned to the user in sorted order as 99.9th percentile), which have the biggest impact on user the search results. experience. Figure 8 illustrates how the FPGA-accelerated ranker substantially reduces the end-to-end scoring latency and Figure 7. Free-form expressions (FFEs) placed and routed on an FPGA. Sixty cores fit on a single FPGA. improves throughput relative to software. There are two ways to view the performance improvements on this graph. First, for a fixed point on the x-axis, it shows the improvement in throughput at that specified query latency. For example, at Cluster Core 0Core 1 Core 2 1.0 (which represent the maximum acceptable latency to 0 Bing at the 95th percentile), the FPGA achieves a 95% gain in FST Complex

Output scoring throughput relative to software. Core 3 Core 4 Core 5 Second, for a fixed point on the y-axis, it shows the improvement in response time at a given throughput. At 1.0, representing the average query load on a server, the FPGA reduces query latency by 29%. The improvement in FPGA

120 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11

Figure 8. Achievable performance within a given latency bound. not exceed our 30% limit in an individual server’s total cost The points on the x-axis at 1.0 shows the maximum sustained of ownership, yielding a significant overall improvement in throughputs on both the FPGA and software while satisfying Bing’s system efficiency and TCO. target for latency at the 95th percentile. Reconfigurable fabrics based on FPGAs are not the 95th percentile latency versus throughput only accelerator platform that we considered. GPUs are one option for accelerating large-scale workloads, and 5 when we first began we considered using GPUs. However, FPGA 4 the SIMD parallelism that GPUs handle so efficiently are Software not a good match for latency-sensitive, but highly diver- 3 95% more gent ranking stages (such as FE). In addition, the high (normalized) throughput 2 power consumption of GPUs meant that they couldn’t be easily incorporated into conventional servers, which 1 only have power and cooling provisioning for a standard 29% lower latency Throughput 0 25W PCIe card. Instead, they are likely more appropriate 00.5 11.5 2 for HPC environments rather than widespread datacen- Latency (normalized to 95th percentile target) ter deployment. We conclude that distributed reconfigurable fabrics are a viable path forward as increases in server performance level scoring latency increases further at higher injection rates, off, and will be crucial at the end of Moore’s law for contin- because the variability of software latency increases at ued cost and capability improvements in cloud computing. higher loads (due to contention in the CPU’s memory hier- Reconfigurability is a critical means by which hardware archy), whereas the FPGA’s performance remains stable. acceleration can keep pace with the rapid rate of change in This improved stability means that the FPGA is capable of datacenter services. absorbing bursts of traffic better than software alone, which Going forward, the biggest obstacle to widespread may reduce the need for overprovisioning for bursty traffic. adoption of FPGAs in the datacenter is likely to be pro- Given that FPGAs can be used to improve both latency grammability. FPGA development still requires exten- and throughput, Bing could reap the benefits in many sive hand-coding in Register Transfer Level and manual ways. For example, for equivalent ranking capacity, fewer tuning. Yet we believe that incorporating careful HW/ servers can be purchased. At the current average query SW co-design of custom ISAs such as those used in FE rate, Bing could use roughly half the number of servers and FFE, domain-specific languages such as OpenCL, and still achieve their performance targets while achiev- FPGA-targeted C-to-gates tools, and libraries of reusable ing massive cost savings. As another example, the faster components and design patterns, will be sufficient to response time means that additional capabilities and fea- permit high-value services to be productively targeted tures can be added to the software and/or hardware stack to FPGAs. Longer term, improvements in FPGA archi- to improve the quality of searches without exceeding the tectures for computing, more integrated development maximum allowed latency. Of course, a combination of tools, and the design of languages and tools that con- the two is also possible. sider accelerator offload as a core functionality will be necessary to increase the programmability of these fab- 6. CONCLUSION rics beyond teams of specialists working with large-scale For many years, FPGAs have shown promise for accelerating service developers. Within 10–15 years, well past the end many computational tasks. Yet despite the huge potential, of Moore’s Law, we believe that compilation to a combi- they have not yet become mainstream in modern datacen- nation of hardware and software will be commonplace. ters. Our goal in building the Catapult fabric was to under- Reconfigurable systems, such as the Catapult fabric pre- stand what problems must be solved to operate FPGAs at sented here, will be necessary to support these hybrid datacenter scale, and whether significant performance computation models. improvements are achievable for large-scale production workloads, especially workloads that change over the life- Acknowledgments time of the servers. Many people across many organizations contributed to this We found that efficiently mapping a significant portion system’s construction, and although they are too numerous of a complex datacenter workload to FPGAs is both possible to list here individually, we thank our collaborators in and provides a significant ROI. We showed that an at-scale Microsoft Global Foundation Services, Bing, the Autopilot deployment of FPGAs can increase ranking throughput in team, and our colleagues at Altera and Quanta for their a production search infrastructure by 95% at comparable excellent partnership and hard work. We thank Reetuparna latency to a software-only solution, making possible both Das, Ofer Dekel, Alvy Lebeck, Neil Pittman, Karin Strauss, cost savings with fewer servers needed and headroom for and David Wood for their valuable feedback and contribu- improved search algorithms. We achieved this without tions. We also thank Qi Lu, Harry Shum, Craig Mundie, Eric breaking the homogeneous architecture of data-center serv- Rudder, Dan Reed, Surajit Chaudhuri, Peter Lee, Gaurav ers, and without increasing the server failure rate. The added Sareen, Darryn Dieken, Darren Shakib, Chad Walters, FPGA boards increased power consumption by only 10%, did Kushagra Vaid, and Mark Shaw for their support.

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 121 research highlights

References Andrew Putnam, Adrian M. Caulfield, John Demme, Columbia University, 1. Altera. Nios II Processor Reference 12. Hussain, H.M., Benkrid, K., Erdogan, A.T., Eric S. Chung, Jeremy Fowers, Gopi New York, NY. Handbook, 13.1.0 edition, 2014. Seker, H. Highly parameterized K-means Prashanth Gopal, Jan Gray, Michael 2. Altera. Stratix V Device Handbook, clustering on FPGAs: Comparative results Haselman, Scott Hauck, Stephen Heil, Hadi Esmaeilzadeh, Georgia Institute of 14.01.10 edition, 2014. with GPPs and GPUs. In Proceedings Joo-Young Kim, Sitaram Lanka, Eric Technology, Atlanta, GA. 3. Baxter, R., Booth, S., Bull, M., Cawood, of the 2011 International Conference on Peterson, Simon Pope, Aaron Smith, G., Perry, J., Parsons, M., Simpson, A., Reconfigurable Computing and FPGAs, Jason Thong, Phillip Yi Xiao and Doug Scott Hauck, University of Washington, Trew, A., Mccormick, A., Smart, G., RECONFIG’11 (Washington, DC, USA, Burger, Microsoft, Redmond, WA. Seattle. Smart, R., Cantle, A., Chamberlain, 2011). IEEE Computer Society. R., Genest, G. Maxwell – A 64 FPGA 13. IBM. IBM PureData System for Derek Chiou, Microsoft and University of Amir Hormati, Google, Inc., Mountain Supercomputer. Eng. Lett. 16 (2008), Analytics N2001, WAD12353- Texas at Austin. View, CA. 426–433, 2008. USEN-01 edition, 2013. 4. BEECube. BEE4 Hardware Platform, 14. Lavasani, M., Angepat, H., Chiou, D. Kypros Constantinides, Amazon Web James Larus, École Polytechnique 1.0 edition, 2011. An FPGA-based in-line accelerator for Services, Boston, MA. Fédérale de Lausanne (EPFL), Lausanne, 5. Blott, M., Vissers, K. Dataflow memcached. Comput. Arch. Lett. PP, Switzerland. architectures for 10Gbps line-rate 99 (2013), 1–1. key-value stores. In HotChips 2013 15. Martin, A., Jamsek, D., Agarawal, K. (August 2013). FPGA-based application acceleration: 6. Convey. The Convey HC-2 Computer, Case study with GZIP compression/ conv-12-030.2 edition, 2012. decompression streaming engine. 7. Cray. Cray XD1 Datasheet, 1.3 edition, In ICCAD Special Session 7C 2005. (November 2013). 8. Dennard, R., Rideout, V., Bassous, E., 16. Microsoft. How Microsoft Designs Its LeBlanc, A. Design of ion-implanted Cloud-Scale Servers, 2014. MOSFET’s with very small physical 17. Pell, O., Mencer, O. Surviving the end of dimensions. IEEE J. Solid-State Circ. frequency scaling with reconfigurable 9, 5 (Oct. 1974), 256–268. dataflow computing.SIGARCH 9. Estlick, M., Leeser, M., Theiler, J., Comput. Archit. News 39, 4 (Dec. 2011). Szymanski, J.J. Algorithmic 18. Showerman, M., Enos, J., Pant, A., transformations in the implementation Kindratenko, V., Steffen, C., Pennington, of K-means clustering on R., Hwu, W. QP: A Heterogeneous reconfigurable hardware. In Multi-accelerator Cluster. 2009. Proceedings of the 2001 ACM/SIGDA 19. SRC. MAPstation Systems, 70000 AH Ninth International Symposium on Field edition, 2014. Programmable Gate Arrays, FPGA’01 20. Yan, J., Zhao, Z.-X., Xu, N.-Y., Jin, X., (New York, NY, USA, 2001). ACM. Zhang, L.-T., Hsu, F.-H. Efficient 10. Gens, F. Worldwide and Regional query processing for web search Public IT Cloud Services 2014–2018 engine with FPGAs. In Proceedings Forecast (Oct. 2014). of the 2012 IEEE 20th International 11. George, A., Lam, H., Stitt, G. Symposium on Field-Programmable Novo-G: At the forefront of scalable Custom Computing Machines, reconfigurable supercomputing. FCCM’12 (Washington, DC, USA, Comput. Sci. Eng. 13, 1 (2011), 82–86. 2012). IEEE Computer Society. © 2016 ACM 0001-0782/16/11 $15.00

ACM Transactions on Parallel Computing Solutions to Complex Issues in Parallelism Editor-in-Chief: Phillip B. Gibbons, Intel Labs, Pittsburgh, USA

ACM Transactions on Parallel Computing (TOPC) is a forum for novel and innovative work on all aspects of parallel computing, including foundational and theoretical aspects, systems, languages, architectures, tools, and applications. It will address all classes of parallel-processing platforms including concurrent, multithreaded, multicore, accelerated, multiprocessor, clusters, and supercomputers.

Subject Areas

• Parallel Programming Languages and Models • Parallel System Software • Parallel Architectures • Parallel Algorithms and Theory • Parallel Applications • Tools for Parallel Computing

For further information or to submit your manuscript, visit topc.acm.org Subscribe at www.acm.org/subscribe

122 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 CAREERS

Baylor University candidates working on their dissertation with re- are expected to develop a vibrant, high-quality ex- Assistant, Associate or Full Professor of search interests in computer science or computer ternally sponsored research program, supervise Computer Science in the area of Software information systems. The Tenure Track Lecturer graduate students, and interact and collaborate Engineering position requires an MS in Computer Science or a with faculty across the department and campus. closely related field. Please visit www.bradley.edu/ Applicants must submit (i) a cover letter, (ii) The Department of Computer Science seeks a dy- humanresources/opportunities for full position current curriculum vita, (iii) statement of research namic scholar to fill this position beginning August, description and application process. interests, and (iv) statement of teaching interests 2017. For position details and application informa- and (v) arrange to have at least three references. tion please visit: www.baylor.edu/hr/facultypositions. Baylor University is a private Christian univer- Cal Poly State University, Application materials may be sent to: sity and a nationally ranked research institution, San Luis Obispo Faculty Search Committee consistently listed with highest honors among The Computer Science Department of Electrical Chronicle of Higher Education’s “Great Colleges The Electrical Engineering Department and Engineering and Computer Science to Work For.” Chartered in 1845 by the Republic of Computer Engineering Program within the Col- Case Western Reserve University Texas through the efforts of Baptist pioneers, Bay- lege of Engineering at Cal Poly San Luis Obispo 10900 Euclid Avenue lor is the oldest continuously operating university invite applications for a full-time, academic year, Cleveland, OH 44106-7071 in Texas. The university provides a vibrant cam- tenure-track faculty appointment in electrical pus community for over 15,000 students from all and computer engineering. Salary is commensu- 50 states and more than 80 countries by blending rate with background and experience. California State University, interdisciplinary research with an international Sacramento, Department of reputation for educational excellence and a faculty Computer Science. commitment to teaching and scholarship. Baylor California Institute of Technology is actively recruiting new faculty with a strong com- (Caltech) Two tenure-track assistant professor positions to mitment to the classroom and an equally strong begin with the Fall 2017 semester. Applicants spe- commitment to discovering new knowledge as we The Computing and Mathematical Sciences (CMS) cializing in any area of computer science will be pursue our bold vision, Pro Futuris. department at the California Institute of Technology considered. Those with expertise in areas related Baylor University is a private not-for-profit uni- (Caltech) invites applications for tenure-track or ten- to embedded systems, software engineering, or versity affiliated with the Baptist General Conven- ured faculty positions. CMS is a unique environment data science are especially encouraged to apply. tion of Texas. As an Affirmative Action/Equal Oppor- where innovative, interdisciplinary, and foundation- Ph.D. in Computer Science, Computer Engineer- tunity employer, Baylor is committed to compliance al research is conducted in a collegial atmosphere. ing, or closely related field required by August with all applicable anti-discrimination laws, includ- Candidates in all areas of computing and math- 2017. For detailed position information, includ- ing those regarding age, race, color, sex, national ematical sciences are invited to apply, including (but ing application procedure, please see http://www. origin, marital status, pregnancy status, military not limited to) learning and computational statis- csus.edu/about/employment/. Screening will service, genetic information, and disability. As a reli- tics, security and privacy, networked and distributed begin January 10, 2017, and remain open until gious educational institution, Baylor is lawfully per- systems, optimization and computational math- filled. AA/EEO employer. Clery Act statistics avail- mitted to consider an applicant’s religion as a selec- ematics, control and dynamical systems, theory of able. Mandated reporter requirements. Criminal tion criterion. Baylor encourages women, minorities, computation and algorithmic economics, scientific background check will be required. veterans and individuals with disabilities to apply computing, etc. Additionally, we are seeking candi- dates who have demonstrated strong connections to other fields, including the mathematical, physical, Dartmouth College Boston College biological, and social sciences. A commitment to world class research, high- The Dartmouth College Department of Com- The Boston College Computer Science Department quality teaching, and mentoring is expected. The puter Science invites applications for a tenured invites applications for a tenure-track Assistant initial appointment at the Assistant-Professor faculty position at the level of associate or full Professorship starting September 2017. A PhD in level is for four years and is contingent upon the professor. We seek candidates who will be excel- Computer Science, research conducive to sustained completion of a Ph.D. degree in Computer Sci- lent researchers and teachers in the broad range external funding, and a commitment to quality in ence, Applied Mathematics or related field. of areas related to cyber-security. This position is undergraduate teaching are required. Interest in in- Applicants are encouraged to have all their ap- the first of three hires that the College anticipates terdisciplinary collaboration with broader impact is plication materials on file by October 21st, 2016, making in the area of cyber-security. We particu- desirable. See http://cs.bc.edu for more information. but applications will be accepted until the end of larly seek candidates who will help lead, initiate, Application review begins October 15, 2016. Submit December. For a list of documents required and and participate in collaborative research projects applications at http://apply.interfolio.com/37623 full instructions on how to apply on-line, please within Computer Science and beyond, including visit http://www.cms.caltech.edu/search. Ques- Dartmouth researchers from other Arts & Sci- tions about the application process may be di- ences departments, Geisel School of Medicine, Bradley University rected to: [email protected]. Thayer School of Engineering, and Tuck School The Department of Computer Science and Caltech is an Equal Opportunity/Affirmative of Business. Information Systems at Bradley Action Employer. Women, minorities, veterans, The Computer Science department is home University invites applications for a Tenure and disabled persons are encouraged to apply. to 21 tenured and tenure-track faculty members Track Assistant Professor or and two research faculty members. Research ar- eas of the department encompass the areas of Tenure Track Lecturer position for the 2017-2018 Case Western Reserve University security, computational biology, machine learn- academic year. The Tenure Track Assistant Pro- ing, robotics, systems, algorithms, theory, digital fessor position requires a PhD in Computer Sci- Applicants should have potential for excellence arts, vision, and graphics. The Computer Science ence or a closely related field; we will consider in innovative research. All successful candidates department has strong Ph.D. and M.S. programs

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 123 CAREERS

and outstanding undergraduate majors. The ed status. Applications by members of all under- (on the Vermont border). Dartmouth has a beau- department’s security faculty are affiliated with represented groups are encouraged. tiful, historic campus, located in a scenic area on Dartmouth’s Institute for Security, Technology, Application review will begin November 1, the Connecticut River. Recreational opportuni- and Society (ISTS), which also involves faculty 2016, and continue until the position is filled. ties abound in all four seasons. from Engineering, Sociology, and Business. We seek candidates who have a demonstrated Dartmouth College, a member of the Ivy ability to contribute to Dartmouth’s undergradu- League, is located in Hanover, New Hampshire Dartmouth College ate diversity initiatives in STEM research, such as (on the Vermont border). Dartmouth has a beau- the Women in Science Program, E. E. Just STEM tiful, historic campus, located in a scenic area on The Dartmouth College Department of Computer Scholars Program, and Academic Summer Un- the Connecticut River. Recreational opportuni- Science invites applications for a tenure-track fac- dergraduate Research Experience (ASURE). We ties abound in all four seasons. ulty position at the level of assistant professor. We are especially interested in applicants with a We seek candidates who have a demonstrated seek candidates who will be excellent researchers demonstrated track record of successful teaching ability to contribute to Dartmouth’s undergradu- and teachers in the area of machine learning. We and mentoring of students from all backgrounds ate diversity initiatives in STEM research, such as particularly seek candidates who will help lead, (including first-generation college students, low- the Women in Science Program, E. E. Just STEM initiate, and participate in collaborative research income students, racial and ethnic minorities, Scholars Program, and Academic Summer Un- projects within Computer Science and beyond, women, LGBTQ, etc.). dergraduate Research Experience (ASURE). We including Dartmouth researchers from other Arts Applicants are invited to submit application are especially interested in applicants with a & Sciences departments, Geisel School of Medi- materials via Interfolio at https://apply.interfolio. demonstrated track record of successful teaching cine, Thayer School of Engineering, and Tuck com/37189. Upload a CV, research statement, and and mentoring of students from all backgrounds School of Business. teaching statement, and request at least four ref- (including first-generation college students, low- The Computer Science department is home erences to upload letters of recommendation, at income students, racial and ethnic minorities, to 21 tenured and tenure-track faculty members least one of which should comment on teaching. women, LGBTQ, etc.). and two research faculty members. Research ar- Email [email protected] with Applicants are invited to submit a cover let- eas of the department encompass the areas of any questions. ter and CV via Interfolio at http://apply.interfolio. security, computational biology, machine learn- Dartmouth College is an equal opportunity/ com/36691. ing, robotics, systems, algorithms, theory, digital affirmative action employer with a strong com- Email [email protected] with arts, vision, and graphics. The Computer Science mitment to diversity and inclusion. We prohibit any questions. department is in the School of Arts & Sciences, discrimination on the basis of race, color, reli- Dartmouth College is an equal opportunity/ and it has strong Ph.D. and M.S. programs and gion, sex, age, national origin, sexual orientation, affirmative action employer with a strong com- outstanding undergraduate majors. The depart- gender identity or expression, disability, veteran mitment to diversity and inclusion. We prohibit ment is affiliated with Dartmouth’s M.D.-Ph.D. status, marital status, or any other legally protect- discrimination on the basis of race, color, reli- program and has strong collaborations with Dart- ed status. Applications by members of all under- gion, sex, age, national origin, sexual orientation, mouth’s other schools. represented groups are encouraged. gender identity or expression, disability, veteran Dartmouth College, a member of the Ivy Application review will begin January 1, 2017, status, marital status, or any other legally protect- League, is located in Hanover, New Hampshire and continue until the position is filled.

Faculty Positions in Computer and Communication Science at the Ecole polytechnique fédérale de Lausanne (EPFL)

The School of Computer and Communication Sciences at The following documents are requested in PDF format: cover EPFL invites applications for faculty positions in computer letter, curriculum vitae including publication list, brief state- and communication sciences. We are seeking candidates for ments of research and teaching interests, names and addresses tenure-track assistant professor as well as for senior positions. (including e-mail) of 3 references for junior positions and 6 for senior positions. Screening will start on December 1, 2016. Successful candidates will develop an independent and crea- tive research program, participate in both undergraduate and Further questions can be addressed to: graduate teaching, and supervise PhD students. Prof. Arjen Lenstra The school is seeking candidates in the fields of data science Chairman of the Recruiting Committee and machine learning – including application of these tech- Email: [email protected] niques in bioinformatics, natural language processing, and speech recognition – and in security and privacy, and biocom- For additional information on EPFL, please consult: puting. Candidates in other areas will also be considered. http://www.epfl.ch or http://ic.epfl.ch

EPFL offers internationally competitive salaries, significant start-up resources, and outstanding research infrastructure. EPFL is an equal opportunity employer and strongly welcomes female applications. To apply, please follow the application procedure at https://academicjobsonline.org/ajo/jobs/7888

124 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 Lehigh University Computer Science and Engineering Department

Applications are invited for tenure-track posi- tions in the Computer Science and Engineering Department (http://www.cse.lehigh.edu) of Le- Assistant Professors (Tenure Track) high University to start in August 2017. Outstand- ing candidates in all areas of computer science of Computer Science will be considered, with priority areas including computer systems (including parallel and distrib- → The Department of Computer Science (www.inf.ethz.ch) at ETH Zurich invi- uted systems, database systems, operating sys- tems, and systems aspects of data mining), data tes applications for assistant professorships (tenure track) with focus on the analytics, cybersecurity, algorithms, and perva- following broad areas within computer science. For each area, several possible sive intelligence (robotics, the Internet of Things, examples (not exhaustive) of expertise are provided. and human-computer interaction). Rank will be commensurate with experience. The successful applicants will hold a Ph.D. in - Programming Languages and Software Engineering (language design and Computer Science, Computer Engineering, or a implementation, testing and debugging, compilers and language runtimes, closely related field. The candidates must demon- programming models, dynamic languages) strate a strong commitment to quality undergrad- uate and graduate education, and the potential - Robotics and Cyber-physical Systems (artificial intelligence, human-robot to develop and conduct a high-impact research program with external support. The successful interaction, planning and control, virtual/augmented reality, internet of applicants will also be expected to contribute to things, embedded systems, data acquisition systems) interdisciplinary research programs, including the Data X Initiative (http://lehigh.edu/datax), - Data Science (machine learning, language/media processing, data privacy, which includes not only data analytics, but also medical applications, data centers architecture and management, program- the underlying algorithms and systems that make large-scale data analytics possible. ming and runtime platforms for data centers and cloud computing) The faculty in Computer Science and Engi- neering maintains an outstanding international - All other areas in Computer Science (while there is focus on the three areas reputation in a variety of research areas, and in- above, ETH Zurich is broadly looking in all areas) cludes ACM and IEEE fellows as well as five NSF CAREER award winners. In academic year 2017- 18, the department will move to a new home in a → Please only apply for one of the above four areas as all applications will be large former industrial building now undergoing jointly reviewed. a major renovation to create a spectacular collab- orative space for the Data X Initiative. → Applicants should be strongly rooted in computer science, have internatio- Applications are accepted online at http://ac- ademicjobsonline.org/ajo/jobs/7774 and should nally recognized expertise in their field and pursue research at the forefront of include a cover letter, curriculum vita, both computer science. Successful candidates should establish and lead a strong teaching and research statements, and contact research program. They will be expected to supervise doctoral students and information for at least three references. Review of applications will begin December 1, 2016 and teach both undergraduate and graduate level courses (in German or in English). will continue until the position is filled. Collaboration in research and teaching is expected both within the department Lehigh University is an affirmative action/ and with other groups of ETH Zurich and related institutions. equal opportunity employer and does not dis- criminate on the basis of age, color, disabil- ity, gender, gender identity, genetic information, → Assistant professorships have been established to promote the careers of marital status, national or ethnic origin, race, younger scien¬tists. ETH Zurich implements a tenure track system equivalent religion, sexual orientation, or veteran status. Le- to other top international universities. For candidates with exceptional research high University is a 2010 recipient of an NSF AD- VANCE Institutional Transformation Grant for accomplishments, applications for a tenured associate or full professorship will promoting the careers of women in academic sci- also be considered. ence and engineering. Lehigh University provides comprehensive benefits including domestic → Please apply online (application period starts on 31 October 2016) at: partner benefits (see also http://www.lehigh.edu/ worklifebalance/). Lehigh Valley Inter-regional www.facultyaffairs.ethz.ch Networking & Connecting (LINC) is a newly cre- ated regional network of diverse organizations → Applications include a curriculum vitae, a list of publications with the three designed to assist new hires with dual career, most important ones marked, a statement of future research and teaching community and cultural transition needs. Please contact [email protected] for more informa- interests, the names of three references, and a description of the three most tion. Questions concerning this search may be important achievements. The letter of application should be addressed to the sent to [email protected]. President of ETH Zurich, Prof. Dr. Lino Guzzella. The closing date for appli- cations is 15 December 2016. ETH Zurich is an equal opportunity and family Georgia Institute of Technology friendly employer and is further responsive to the needs of dual career couples. We specifically encourage women to apply. Computational Science and Engineering solves real-world problems in science, engineering, health, and social domains, by using high-

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 125 CAREERS

performance computing, modeling and from women and under-represented minorities ployment without regard to race, color, religion, eth- simulation, and large-scale “big data” analytics. are strongly encouraged. nicity, sex (including pregnancy and gender identity), The School of Computational Science and For more information about Georgia Tech’s national origin, disability status, age, sexual orienta- Engineering of the College of Computing at the School of Computational Science and Engineer- tion, genetic information, protected veteran status, or Georgia Institute of Technology seeks tenure- ing please visit: http://www.cse.gatech.edu/ any other characteristic protected by law. We always track faculty at all levels. Our school seeks welcome nominations and applications from women, candidates who may specialize in a broad range members of any minority group, and others who share of application areas including biomedical and Mississippi State University our passion for building a diverse community that re- health; urban systems and smart cities; social Bagley College of Engineering flects the diversity in our student population. good and sustainable development; materials and Assistant Professors manufacturing; and national security. Applicants must have an outstanding record of research, a Mississippi State University, through its Bagley New York University/Courant Institute sincere commitment to teaching, and interest College of Engineering, is seeking four new ten- of Mathematical Sciences in engaging in substantive interdisciplinary ure-track faculty at the rank of Assistant Professor. Department of Computer Science research with collaborators in other disciplines. Applicants should have teaching and research in- Faculty Georgia Tech is located in the heart of metro terests that can enhance the strengths of the col- Atlanta, a home to more than 5.3 million people lege in one or more of the following areas of inter- The department expects to have several regular and nearly 150,000 businesses, a world-class est: (1) Energy, (2) Human Health Enhancement, faculty positions and invites candidates at all airport, lush parks and green spaces, competi- (3) Information and Decision Systems (4) Materi- levels to apply. We will consider outstanding can- tive schools and numerous amenities for enter- als – Science and Engineering, (5) Transportation didates in any area of computer science, in par- tainment, sports and restaurants that all offer a and Vehicular Systems, and (6) Water and the En- ticular in systems, machine learning and data sci- top-tier quality of life. From its diverse economy, vironment. The successful applicants from this ence, scientific computing and verification. global access, abundant talent and low costs of strategic college-level search will be placed into Faculty members are expected to be outstand- business and lifestyle, metro Atlanta is a great the most appropriate academic department. A ing scholars and to participate in teaching at all place to call “home.” Residents have easy access PhD in an appropriate engineering or computer levels from undergraduate to doctoral. New ap- to arts, culture, sports and nightlife, and can ex- science discipline is required. Screening of ap- pointees will be offered competitive salaries and perience all four seasons, with mild winters that plications will begin November 28, 2016 and will startup packages, with affordable housing within rarely require a snow shovel. continue until the position is filled. a short walking distance of the department. New For best consideration, applications are due For a complete job description and require- York University is located in Greenwich Village, by December 16, 2016. The application mate- ments, visit at https://www.bagley.msstate.edu. one of the most attractive residential areas of rial should include a full academic CV, a personal Interested candidates must apply on-line at Manhattan. narrative on teaching and research, a list of at https://www.msujobs.msstate.edu (search for po- The department has 34 regular faculty mem- least three references and up to three sample sitions in the Dean of Engineering). bers and several clinical, research, adjunct, and publications. Georgia Tech is an Affirmative Ac- MSU is an equal opportunity employer, and all visiting faculty members. The department’s cur- tion/Equal Opportunity Employer. Applications qualified applicants will receive consideration for em- rent research interests include algorithms, cryp-

The Hong Kong Polytechnic University (PolyU) is a government-funded tertiary institution in Hong Kong. It offers programmes at various levels including Doctorate, Master’s and Bachelor’s degrees. It has a full-time academic staff strength of around 1,200. The total consolidated expenditure budget of the University is about HK$6.6 billion (US$1 = HK$7.8 approximately) per year. Committed to academic excellence in a professional context, PolyU aspires to become a world-class university with an emphasis on the application value of its programmes and research. Its vision is to become a leading university that excels in professional education, applied research and partnership for the betterment of Hong Kong, the nation and the world. The University is now inviting applications or nominations for the following post: Head of Department of Computing The successful candidate will be appointed as Chair Professor/Professor, commensurate with his/her qualifications and experience, and hold a concurrent headship appointment. The headship appointment is normally for an aggregate period of six years in two three-year terms of office. Post specification can be obtained from http://www.polyu.edu.hk/hro/job/en/external_adv/deans-heads.php. Other suitable candidate(s), if deemed appropriate by the University, may be appointed as Chair Professor/Professor. Remuneration and Conditions of Service Terms of appointment and remuneration package are negotiable and highly competitive. For general information on terms and conditions for appointment of academic staff in the University, please visit the website at http://www.polyu.edu.hk/hro/TC.htm. Application Applicants are invited to send detailed curriculum vitae with the names and addresses of three referees and direct any enquiries to Human Resources Office, 13/F, Li Ka Shing Tower, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong [Fax: (852) 2764 3374; E-mail: [email protected]], quoting the position being applied for and the reference number. Recruitment will continue until the position is filled. Initial consideration of applications will commence at the end of November 2016. Candidature may be obtained by nomination. The University reserves the right to make an appointment by invitation or not to fill the position. General information about the University and the Department of Computing is available on the University’s Homepage http://www.polyu.edu.hk and http://www.comp.polyu.edu.hk respectively, or from the Human Resources Office [Tel: (852) 3400 3420]. The University Personal Information Collection Statement for recruitment can be found at http://www.polyu.edu.hk/hro/job/en/guide_forms/pics.php.

www.polyu.edu.hk

126 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 tography and theory; computational biology; dis- approvals from: Accreditation Commission for *This is a grant-funded position; continued tributed computing and networking; graphics, Education in Nurs9ing (ACEN); American Chemi- employment contingent upon the availability of vision and multimedia; machine learning and cal Society (ACS); Accreditation Council for Edu- funding. data science; natural language processing; sci- cation in Nutrition and Dietetics (ACEND); En- entific computing; and verification and program- gineering Accreditation Commission of ABET; ming languages. Computing Accreditation Commission of ABET; Rutgers University Collaborative research with industry is facili- National Accrediting Agency for Clinical Labora- Assistant or Associate Teaching Professor tated by geographic proximity to computer sci- tory Sciences (NAACLS) and; the Association of ence activities at AT&T, Facebook, Google, IBM, Technology, Management and Applied Engineer- The Department of Computer Science at Rutgers Bell Labs, NEC, and Siemens. ing (ATMAE). University invites applications for a 1-year non- Please apply at https://cs.nyu.edu/webapps/ In addition to research partnerships and cur- tenure-track renewable position at the rank of As- facapp/register ricular innovations, the College fulfills a service sistant or Associate Teaching Professor. To guarantee full consideration, applications and teaching mission that is critical to the contin- The appointment will start either Spring 2017 should be submitted no later than December 1, ued success of other schools and colleges and the or Fall 2017, and will continue through the end of 2016; however, this is not a hard deadline, as all University as a whole. the 2017-18 academic year. candidates will be considered to the full extent Incumbent’s duties and responsibilities in- Main responsibilities include teaching, man- feasible, until all positions are filled. Visiting po- clude: aging, developing, and updating undergraduate sitions may also be available. 1. Conducting interdisciplinary basic research in classes. New York University is an equal opportunity/ cyber analysis, simulation and experimenta- Please see http://apply.interfolio.com/36929 affirmative action employer. tion. for more details. 2. Conducting scientific research in intrusion detection, mitigation, and forensic analysis to Norfolk State University counter future advanced threats. Rutgers University 3. Conducting research and experimentation in Teaching Professor or Professor of Practice Norfolk State University, a historically black uni- machine learning and big data analytics. versity with an enrollment of over 5,000 under- 4. Designing, developing, operating, and manag- The Department of Computer Science at Rutgers graduate and graduate students, invites nomi- ing the Center’s website and content. University invites applications for one or more nations for the grant-funded faculty position of 5. Assisting in the planning, implementation, positions as Teaching Professor or Professor of Cybersecurity Researcher. and coordination activities for the research, ed- Practice in the area of Data Science, at the level of The College of Science, Engineering, and ucation and outreach programs for the Center. Assistant Professor, although exceptional candi- Technology (CSET), comprised of eight academic 6. Designing, developing, operating, and manag- dates may be appointed at the rank of associate departments: Biology, Mathematics, Chemistry, ing the Center’s IT infrastructure. or full professor. Computer Science, Engineering, Nursing and For more information about NSU’s College These appointments may begin in either Allied Health, and Physics an Technology, has of Science, Engineering and Technology (CSET), spring or fall semester 2017. specialized program accreditation and program visit http://cset.nsu.edu/. Main Responsibilities will include Teaching,

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 127 CAREERS

Coordination of our Capstone project series, De- strong commitment to involving undergraduates Swarthmore College sign of short term tutorials dedicated to Data Sci- in their research. A Ph.D. in Computer Science at Computer/Electrical Engineering Faculty ence topics of current interest and Coordination or near the time of appointment is required. (all ranks) of Data Science Workshops. For the tenure-track position, we are inter- To apply, please go to http://apply.interfolio. ested in applicants whose areas fit broadly into Swarthmore College invites applications for a com/37381. theory and algorithms, systems, or programming tenure-track or tenured position at any rank in languages. Priority will be given to complete ap- the area of Computer/Electrical Engineering, to plications received by November 15, 2016. start during the Fall semester of 2017. A doctor- Swarthmore College For the visiting position, strong applicants in ate in Computer or Electrical Engineering or a re- Assistant Professor any area will be considered. Priority will be given lated field is required. The appointee will pursue to complete applications received by February 1, a research program that encourages involvement The Computer Science Department invites ap- 2017. by undergraduate students. Strong interests in plications for one tenure-track position and mul- Applications for both positions will continue undergraduate teaching, supervising senior de- tiple visiting positions at the rank of Assistant to be accepted after these dates until the posi- sign projects, and student mentoring are also re- Professor to begin Fall semester 2017. tions are filled. quired. Teaching responsibilities include courses Swarthmore College is a small, selective, Applications should include a cover letter, in computer hardware such as computer archi- liberal arts college located 10 miles outside of vita, teaching statement, research statement, tecture and digital logic, and electives in the ap- Philadelphia. The Computer Science Department and three letters of reference, at least one (pref- pointee’s area of specialization. offers majors and minors at the undergraduate erably two) of which should speak to the candi- Located in the suburbs of Philadelphia, level. date’s teaching ability. In your cover letter, please Swarthmore College is a highly selective under- Swarthmore College has a strong institutional briefly describe your current research agenda; graduate liberal arts institution with 1500 stu- commitment to excellence through diversity and what would be attractive to you about teaching dents, whose mission combines academic excel- inclusivity in its educational program and em- in a liberal arts college environment; and what lence and social responsibility. Eight full-time ployment practices. The College actively seeks background, experience, or interests are likely to faculty members in the Department of Engineer- and welcomes applications from candidates with make you a strong teacher of a diverse group of ing offer a rigorous, ABET-accredited program exceptional qualifications, particularly those with Swarthmore College students. for the Bachelor of Science in Engineering to ap- demonstrated commitments to a more inclusive proximately 120 students. Sabbatical leave with society and world. For more information on Fac- Tenure-track applications are being accepted support is available every fourth year. The depart- ulty Diversity and Excellence at Swarthmore, see online at ment has an endowed equipment budget, and http://www.swarthmore.edu/faculty-diversity- https://academicjobsonline.orgajo?job- there is support for faculty/student collaborative excellence/information-candidates-new-faculty 890-8018 research. For program details, see http://engin. Applicants must have teaching experience Visiting applications are being accepted online at swarthmore.edu/. and should be comfortable teaching a wide range https://academicjobsonline.org ajo?job- Please upload your CV, brief statements de- of courses at the introductory and intermedi- 890-8020 scribing teaching philosophy and research in- ate level. Candidates should additionally have a Candidates may apply for both positions. terests, along with three letters of reference to:

UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Positions in Computing

The Department of Electrical and Computer Engineering (ECE) at the University of Illinois at Urbana-Champaign invites Applications are invited for:- applications for faculty positions at all areas and levels in Department of Information Engineering computing, broadly defined, with particular emphasis on reliable Professors / Associate Professors / and secure computing; networked and distributed computing; high-performance, energy-efficient, and scientific computing; Assistant Professors data center and storage systems; data science, machine learning (Ref. 160001N0) and its applications; complex data analysis and decision science; The Department is looking for strong candidates in the area of bio-inspired computing; computational genomics; and health cyber security to supplement its existing strength in communication informatics, among other areas. Applications are encouraged from systems, networking, information theory, and deep learning research. candidates whose research programs specialize in core as well Outstanding candidates in other areas of information engineering as interdisciplinary areas of electrical and computer engineering. will also be considered. Further information about the Department From the transistor and the first computer implementation based on von Neumann’s architecture to the Blue Waters petascale is available at http://www.ie.cuhk.edu.hk. computer – the fastest computer on any university campus, ECE Applicants should have (i) a relevant PhD degree; (ii) strong Illinois faculty have always been at the forefront of computing commitment to excellence in research and teaching; and (iii) research and innovation. The department is engaged in exciting outstanding accomplishments and research potential. new and expanding programs for research, education, and professional development, with strong ties to industry. The ECE Appointments will normally be made on contract basis for up to Department has recently settled into its new 235,000 sq. ft. net- three years initially commencing August 2017, which, subject zero energy design building, which is a major campus addition to mutual agreement, may lead to longer-term appointment or with maximum space and minimal carbon footprint. substantiation later. Qualified senior candidates may also be considered for tenured Applications will be accepted until the posts are fi lled. full Professor positions as part of the Grainger Engineering Breakthroughs Initiative (http://graingerinitiative.engineering. Application Procedure illinois.edu), which is backed by a $100-million gift from the Applicants please upload the full resume with a cover letter, copies Grainger Foundation. of academic credentials, publication list with abstracts of selected Please visit http://jobs.illinois.edu to view the complete position published papers, a research plan, a teaching statement, together announcement and application instructions. Full consideration with names and e-mails addresses of three to fi ve referees to whom will be given to applications received by December 15, 2017, but the applicant’s consent has been given for their providing reference applications will continue to be accepted until all positions are filled. (unless otherwise specifi ed). Illinois is an EEO Employer/Vet/Disabled The University only accepts and considers applications submitted www.inclusiveillinois.illinois.edu. online for the post above. For more information and to apply online, The University of Illinois conducts criminal background checks please visit http://career.cuhk.edu.hk. on all job candidates upon acceptance of a contingent offer

128 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 https://academicjobsonline.org/ajo/jobs/7247. Texas State University is a member of The undergraduates and approximately 50 graduate Applicants should include a cover letter in which Texas State University System. students. The Department, funded by agencies they describe their reasons for seeking this posi- Texas State University is an EOE. such as NSF, Google, Departments of Education tion and offer ideas about how they would attract and Commerce, various Defense agencies, mul- and mentor students, especially those coming tiple State agencies, and other sponsors, gener- from diverse backgrounds. We will begin review- University of Alabama ated over $13 million in research expenditures in ing candidates on December 1, 2016; applica- Computer Science Faculty Positions FY 2015 and our doctoral program has produced tions received before January 1, 2017, will receive 33 graduates in the past five years. In 2013, the full consideration. The Department of Computer Science at the College completed construction of a $300 Shelby Swarthmore College has a strong institutional University of Alabama invites applications for Engineering and Science Complex. commitment to excellence through diversity in its multiple tenure-track faculty positions at the As- Applicants should apply online at http://fac- educational program and employment practices sistant or Associate level to begin August 2017. ultyjobs.ua.edu/postings/39546. Applicants must and actively seeks and welcomes applications Outstanding candidates in all areas of computer have an earned doctorate (Ph.D.) in computer from candidates with exceptional qualifications, science will be considered. Successful applicants science or a closely related field. The application particularly those with demonstrable commit- must show the potential to establish a quality re- package should include a cover letter, curricu- ments to a more inclusive society and world. search program, collaborate effectively with other lum vitae, and the names of three references. The faculty, and excel in teaching at both the under- cover letter should address how the applicant is graduate and graduate levels. In addition, suc- able to contribute to the ATI and AWI initiatives Texas State University cessful applicants must demonstrate the poten- described above. Review of applications will be- Department of Computer Science tial to contribute to the University of Alabama’s gin immediately. For additional details, please initiatives with respect to water, the Alabama contact Dr. David Cordes ([email protected]. Applications are invited for multiple tenure-track Water Institute (http://awi.ua.edu/), and/or trans- edu) or visit http://cs.ua.edu. The University of Assistant Professor positions in the Department portation, the Alabama Transportation Institute Alabama is an equal opportunity/affirmative ac- of Computer Science to start the fall 2017 semes- (http://ati.ua.edu/). Expertise needed by the AWI tion employer. Women and minority applicants ter. Consult the department’s faculty employ- and ATI may include, but is not limited to, areas are particularly encouraged to apply. ment page at www.cs.txstate.edu/employment/ such as big data, spatial data, data analytics, data faculty/ for job duties, qualifications, application visualization, machine learning, vehicular net- procedure, and information about the depart- works, software modeling, software engineering, University of Alaska Fairbanks (UAF) ment and the university. security, robotics, and autonomous vehicles. Texas State University, to the extent not in Located in Tuscaloosa, Alabama, the Univer- The Department of Computer Science (www. conflict with federal or state law, prohibits dis- sity of Alabama enrolls over 37,000 students and cs.uaf.edu) at the University of Alaska Fairbanks crimination or harassment on the basis of race, is the capstone of higher education in the State. (UAF) invites applications for a tenure-track fac- color, national origin, age, sex, religion, disabil- Housed in the College of Engineering, the Com- ulty position at the level of Assistant Professor to ity, veteran’s status, sexual orientation, gender puter Science Department has 22 faculty mem- start either in January or August 2017. The posi- identity or expression. bers (14 tenured/tenure-track faculty), over 600 tion requires a strong commitment to teaching

TENURE-TRACK AND TENURED POSITIONS IN ELECTRICAL ENGINEERING AND COMPUTER SCIENCE The newly launched ShanghaiTech University is built as a world-class research ADVERTISING IN CAREER university, which locates in Zhangjiang High-Tech Park. We invite highly qualified candidates to fill tenure-track/tenured faculty positions as its core team in the School OPPORTUNITIES of Information Science and Technology (SIST). Candidates should have exceptional academic records or demonstrate strong potential in cutting-edge research areas How to Submit a Classified Line Ad: Send an e-mail to of information science and technology. They must be fluent in English. Overseas [email protected]. Please include text, and indicate academic connection or background is highly desired. the issue/or issues where the ad will appear, and a contact Academic Disciplines: name and number. We seek candidates in all cutting edge areas of information science and technology. Our recruitment focus includes, but is not limited to: computer Estimates: An insertion order will then be e-mailed back to architecture and technologies, nano-scale electronics, high speed and RF circuits, you. The ad will by typeset according to CACM guidelines. intelligent and integrated signal processing systems, computational foundations, NO PROOFS can be sent. Classified line ads are NOT big data, data mining, visualization, computer vision, bio-computing, smart commissionable. energy/power devices and systems, next-generation networking, as well as inter- disciplinary areas involving information science and technology. Rates: $325.00 for six lines of text, 40 characters per line. Compensation and Benefits: $32.50 for each additional line after the first six. The MINIMUM Salary and startup funds are highly competitive, commensurate with experience is six lines. and academic accomplishment. We also offer a comprehensive benefit package Deadlines: 20th of the month/2 months prior to issue date. to employees and eligible dependents, including housing benefits. All regular ShanghaiTech faculty members will be within its new tenure-track system For latest deadline info, please contact: commensurate with international practice for performance evaluation and promotion. [email protected] Qualifications: Career Opportunities Online: Classified and recruitment • A detailed research plan and demonstrated record/potentials; display ads receive a free duplicate listing on our website at: • Ph.D. (Electrical Engineering, Computer Engineering, Computer Science, or related field); http://jobs.acm.org • A minimum relevant research experience of 4 years. Ads are listed for a period of 30 days. Applications: Submit (in English, PDF version) a cover letter, a 2-page research plan, a CV For More Information Contact: plus copies of 3 most significant publications, and names of three referees to: ACM Media Sales [email protected] (until positions are filled). For more information, please visit Job Opportunities on http://sist.shanghaitech.edu.cn/ at 212-626-0686 or [email protected] Deadline: November 30, 2016

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 129 CAREERS

at the undergraduate and graduate levels, obtain- The University of Alaska Fairbanks is an equal City and Region ing funded research, publishing in peer-reviewed opportunity/affirmative action employer and ed- The city of Buffalo is the second largest city in New publications, and performing public service. ucational institution. York state, and was recently voted as one of the top Candidates must have a PhD degree or all but ten best places to live and raise a family by Forbes dissertation in Computer Science or equivalent. magazine. Buffalo is near the world-famous Niag- Starting Salary: $81,000 DOE. Based on a 9 month University at Buffalo, The State ara Falls, the Finger Lakes, and the Niagara Wine academic contract. University of New York Trail. The city is renowned for its architecture and The department offers an ABET-accredited Department of Computer Science and features excellent museums, dining, cultural at- BS, MS, and interdisciplinary PhD degree. UAF Engineering tractions, and several professional sports teams, (www.uaf.edu) is the major research campus in and has a packed year-round calendar of cultural the University of Alaska system and hosts several The Department of Computer Science and Engi- events and sporting activities, coupled with rela- research institutes. Fairbanks, a modern com- neering, University at Buffalo invites candidates tively low house prices and great schools. The eco- munity with approximately 98,000 residents, is to apply for multiple tenured and tenure-track nomic renaissance of the region is underlined by a located in interior Alaska between the Alaska and faculty positions beginning in the 2017-2018 aca- revitalized downtown waterfront and an energetic Brooks mountain ranges and noted for the scope demic year. Candidates at all ranks from all areas tech and start-up community. In an extraordinary of unique outdoor activities. We seek candidates of computer science and engineering, including recognition of Western New York’s potential, who will strengthen our degree programs and but not limited to areas covered by existing fac- Governor Andrew M. Cuomo has committed an appreciate the unique geography and climate of ulty strength such as Algorithms, Big Data, Cyber historic $1 billion investment in the Buffalo area interior Alaska. Security, Cyber Physical Systems (or Internet of economy to create thousands of jobs and spur Required documentation includes a cover Things), Databases, Distributed Systems, Embed- billions in new investment and economic activity letter, curriculum vitae, list of three professional ded Systems, Machine Learning, Mobile Comput- over the next several years. references with contact information, and state- ing, Multimedia, Pattern Recognition, Robotics, ment of teaching and research interests. Interest- and Theory. Applicants must have a Ph.D. in com- ed candidates must apply online at http://www. puter science or a related area by August 2017 and The State University of New York at alaska.edu/jobs/ or http://careers.alaska.edu/cw/ demonstrate potential for excellence in research, Buffalo en-us/job/504751?lApplicationSubSourceID= by teaching, service and mentoring. Applicants from Department of Computer Science and submitting all the required documents. underrepresented groups, especially women and Engineering Please direct questions regarding this recruit- minorities, are strongly encouraged. We are look- Non-Tenure Track Lecturer ment, to Dr. Jon Genetti ([email protected]) ing for candidates who can operate effectively in or the department’s web site: www.cs.uaf.edu. a diverse community of students and faculty and The State University of New York at Buffalo De- The successful candidate must be eligible to share our vision of keeping all constituents reach partment of Computer Science and Engineering work in the United States in compliance with the their potential. invites candidates to apply for non-tenure track Immigration Reform and Control Act. Review of Applications will be accepted from October lecturer positions beginning in fall 2017. We in- applications will begin November 21, 2016 and 15, 2016 to January 15, 2017. Applicants must sub- vite applications from candidates from all areas the position will remain open until filled. mit their application electronically via www.ub- of Computer Science and Computer Engineering jobs.buffalo.edu. Posting number 1600687. Any who have a passion for teaching. We are particu- questions can be directed to Search Committee larly looking for candidates who can operate ef- Co-Chairs, Prof. Rohini Srihari and Chang-Wen fectively in a diverse community of students and Chen at [email protected]. The University faculty and share our vision of helping all con- at Buffalo is an Equal Opportunity Employer. stituents reach their potential. Applicants from underrepresented groups, especially women and Computer Science and Engineering Department minorities, are strongly encouraged. We are look- LECTURER POSITIONS Housed in a new $75M building, and as a part of ing for candidates who can operate effectively in Department of Electrical and the School of Engineering and Applied Sciences, a diverse community of students and faculty and Systems Engineering the Computer Science and Engineering depart- share our vision of keeping all constituents reach The University of Pennsylvania’s Department of ment offers both BA and BS degrees in Computer their potential. Electrical and Systems Engineering invites applicants Science and a BS in Computer Engineering (ac- Lecturer’s duties include teaching and devel- for two full-time Lecturer positions. The department seeks individuals with exceptional promise for, or credited by ABET) as well as MS and PhD programs. opment of undergraduate Computer Science and a proven record of, excellence in teaching, course The department currently has 38 tenured/ten- Computer Engineering courses (with an empha- and curriculum innovation. Applicants should have ure-track faculty, 7 teaching faculty, and approxi- sis on lower division), advising undergraduate a Ph.D. degree in Electrical or Systems Engineering mately 900 undergraduate majors, 450 masters students, as well as participation in department or related field. We are particularly interested in candidates that enhance our educational curricula in students, and 160 PhD students. Eighteen faculty, and university governance (service). Contribution the broad areas of: including 16 junior faculty have been hired since to research is encouraged. 1. Computer engineering & embedded systems 2010, and we are continuing to expand. Two mem- (embedded programming, distributed systems, bers of our faculty currently hold key university Computer Science and Engineering Department hardware/software co-design, model-based leadership positions and eight members of our Computer Science and Engineering department design, internet of things), and faculty are IEEE and/or ACM Fellows. Our faculty is housed in a new $75M building and, as a part of 2. Information & systems engineering (control members are actively involved in cutting-edge re- the School of Engineering and Applied Sciences, systems, optimization, robotics, signal search and successful interdisciplinary programs the department offers both BA and BS degrees in processing, stochastic systems, model-based and centers devoted to biometrics; bioinformat- Computer Science, a BS in Computer Engineer- systems engineering, systems engineering projects). ics; biomedical computing; computational and ing (accredited by ABET), a combined 5-year BS/ data science and engineering, document analysis MS program, a minor in Computer Science, two The department is strongly interested in individuals that will balance principles-based lectures with and recognition; high performance computing; joint programs (a BA/MBA and with Computa- hands-on projects addressing emerging application information assurance and cyber security; em- tional Physics), and MS and PhD programs. domains. bedded, networked and distributed systems, and The department currently has 38 tenured/ten- Diversity candidates are strongly encouraged to apply. sustainable transportation. Our annual research ure-track faculty, 7 teaching faculty, and approxi- Interested persons should submit an online applica- expenditure is about $5 Million dollars. mately 900 undergraduate majors, 450 masters tion at http://www.ese.upenn.edu/faculty-positions students, and 160 PhD students. Eighteen faculty, and include curriculum vitae, statement of teaching interests, and three references. University at Buffalo (UB) including 16 junior faculty have been hired since UB is New York’s largest and most comprehensive 2010, and we are continuing to expand. Two mem- The University of Pennsylvania is an Equal Opportunity Employer. Minorities/Women/Individuals public university, with approximately 20,000 bers of our faculty currently hold key university with Disabilities/Veterans are encouraged to apply undergraduate students and 10,000 graduate leadership positions and eight members of our students. faculty are IEEE and/or ACM Fellows. Our faculty

130 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 members are actively involved in cutting-edge re- economy to create thousands of jobs and spur teaching. Completion of all requirements for a search and successful interdisciplinary programs billions in new investment and economic activity Ph.D. in Computer Science or a related field is and centers devoted to biometrics; bioinformat- over the next several years. required at the time of appointment. Candidates ics; biomedical computing; computational and Minimum Qualifications (Position): Ideally, for Associate Professor and Professor positions data science and engineering, document analysis applicants should have a PhD degree in Computer must have demonstrated leadership in their field, and recognition; high performance computing; Science, Computer Engineering, or a related field have established an outstanding independent re- information assurance and cyber security; em- by August 2017. Exceptional applicants with a MS search program and have a record of excellence in bedded, networked and distributed systems, and degree will also be considered. The ability to teach teaching and student mentorship. sustainable transportation. Our annual research at all levels of the undergraduate curriculum is es- Applications must be submitted through the expenditure is about $5 Million dollars. sential, as is potential for excellence in teaching, University’s Academic Jobs website. To apply, go mentoring, service, and research. A background to http://tinyurl.com/zlx5vxx : University at Buffalo (UB) in Computer Science and Computer Engineering To be considered as an applicant, the follow- The University at Buffalo is New York’s largest Education, a commitment to K-12 outreach, and ing materials are required: and most comprehensive public university, with addressing the recruitment and retention of un- ˲˲cover letter approximately 20,000 undergraduate students derrepresented students are definite assets. ˲˲curriculum vitae including a list of publications and 10,000 graduate students. ˲˲statement describing past and current research accomplishments and outlining future research City and Region University of Chicago plans The city of Buffalo is the second largest city in New Assistant Professor/Associate Professor/ ˲˲description of teaching philosophy York state, and was recently voted as one of the top Professor, Computer Science ˲˲three reference letters, one of which must ad- ten best places to live and raise a family by Forbes dress the candidate’s teaching ability magazine. Buffalo is near the world-famous Niag- The Department of Computer Science at the Uni- ara Falls, the Finger Lakes, and the Niagara Wine versity of Chicago invites applications from quali- Reference letter submission information will Trail. The city is renowned for its architecture and fied candidates for faculty positions at the ranks of be provided during the application process. features excellent museums, dining, cultural at- Assistant Professor, Associate Professor, and Pro- Review of complete applications will begin tractions, and several professional sports teams, fessor. The University of Chicago has embarked January 1, 2017 and will continue until all avail- and has a packed year-round calendar of cultural on an ambitious, multi-year effort to significantly able positions are filled. events and sporting activities, coupled with rela- expand its computing and data science activities. The University of Chicago has the highest stan- tively low house prices and great schools. The eco- Candidates with research interests in all areas of dards for scholarship and faculty quality, is dedi- nomic renaissance of the region is underlined by a computer science will be considered. Applica- cated to fundamental research, and encourages revitalized downtown waterfront and an energetic tions are especially encouraged in the areas of AI collaboration across disciplines. We encourage tech and start-up community. In an extraordinary and Machine Learning, Cybersecurity, Human- connections with researchers across campus in recognition of Western New York’s potential, Computer Interaction, and Visual Computing. such areas as bioinformatics, mathematics, mo- Governor Andrew M. Cuomo has committed an Candidates must have demonstrated excel- lecular engineering, natural language processing, historic $1 billion investment in the Buffalo area lence in research and a strong commitment to statistics, and social science to mention just a few.

A personal walk down the computer industry road. BY AN EYEWITNESS. Smarter Than Their Machines: Oral Histories of the Pioneers of Interactive Computing is based on oral histories archived at the Charles Babbage Institute, University of Minnesota. These oral histories contain important messages for our leaders of today, at all levels, including that government, industry, and academia can accomplish great things when working together in an effective way.

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 131 CAREERS

The Department of Computer Science (cs. To be considered an applicant, the following Chicago epitomizes the modern, livable, vi- uchicago.edu) is the hub of a large, diverse com- materials are required: brant, and diverse city. Its airports are among puting community of two hundred researchers ˲˲Curriculum vitae with a list of publications the busiest in the world, with frequent non-stop focused on advancing foundations of comput- ˲˲One page teaching statement flights to virtually anywhere. Yet the cost of living, ing and driving its most advanced applications. ˲˲Three reference letters, one of which must whether in an 88th floor condominium down- Long distinguished in theoretical computer sci- address the candidate’s teaching ability town or on a tree-lined street in one of the na- ence and artificial intelligence, the Department tion’s finest school districts, is surprisingly low. is now building strong systems and machine Reference letter submission information will Applications must be submitted at https:// learning groups. The larger community in these be provided during the application process. jobs.uic.edu/. Include a curriculum vitae, teach- areas at the University of Chicago includes the Review of complete applications, including ing and research statements, and names and ad- Department of Statistics, the Computation Insti- reference letters, will begin October 3, 2016, and dresses of at least three references in the online tute, the Toyota Technological Institute at Chi- continue until the position is filled. application. Applicants needing additional infor- cago (TTIC), the Polsky Center for Entrepreneur- The University of Chicago is an Affirmative Ac- mation may contact the Faculty Search at search- ship and Innovation, and the Mathematics and tion/Equal Opportunity/Disabled/Veterans Em- [email protected]. For fullest consideration, ap- Computer Science Division of Argonne National ployer and does not discriminate on the basis of ply by December 1, 2016, but applications will Laboratory. race, color, religion, sex, sexual orientation, gen- be accepted until the positions are filled. The The Chicago metropolitan area provides a di- der identity, national or ethnic origin, age, status University of Illinois is an Equal Opportunity, Af- verse and exciting environment. The local econ- as an individual with a disability, protected veter- firmative Action employer. Minorities, women, omy is vigorous, with international stature in an status, genetic information, or other protected veterans and individuals with disabilities are en- banking, trade, commerce, manufacturing, and classes under the law. For additional information couraged to apply. The University of Illinois con- transportation, while the cultural scene includes please see the University’s Notice of Nondiscrimi- ducts background checks on all job candidates diverse cultures, vibrant theater, world-renowned nation at http://www.uchicago.edu/about/non_ upon acceptance of contingent offer of employ- symphony, opera, jazz, and blues. The University discrimination_statement/. Job seekers in need ment. Background checks will be performed in is located in Hyde Park, a Chicago neighborhood of a reasonable accommodation to complete the compliance with the Fair Credit Reporting Act. on the Lake Michigan shore just a few minutes application process should call 773-702-5671 or from downtown. email [email protected] with The University of Chicago is an Affirmative Ac- their request. University of Kentucky tion/Equal Opportunity/Disabled/Veterans Em- Department of Computer Science ployer and does not discriminate on the basis of race, color, religion, sex, sexual orientation, gen- University of Illinois at Chicago The University of Kentucky Computer Science der identity, national or ethnic origin, age, status Department of Computer Science Department invites applications for multiple as an individual with a disability, protected veter- Information Retrieval / Natural Language tenure-track faculty positions to begin in either an status, genetic information, or other protected Processing / Theoretical Computer Science January or August of 2017. classes under the law. For additional information Faculty We are seeking energetic and creative faculty please see the University’s Notice of Nondiscrimi- who have a passion for teaching students and nation at http://www.uchicago.edu/about/non_ The Computer Science Department at the Uni- for building a research program centered on ad- discrimination_statement/. Job seekers in need versity of Illinois at Chicago (UIC) invites applica- vanced computing. We will consider all ranks, of a reasonable accommodation to complete the tions for multiple full-time tenure-track positions with preference for candidates at the assistant application process should call 773-702-5671 or at the rank of Assistant Professor (exceptional se- professor level. Outstanding candidates at the email [email protected] with nior level candidates will also be considered). All rank of assistant professor will be considered for their request. candidates must have a doctorate in Computer an endowed fellowship. Science or a closely related field by the appoint- We value collaborative and interdisciplinary ment’s starting date. Candidates will be expected research. Our faculty members have established University of Chicago to conduct world class research and teach effec- research programs with other members of the Lecturer tively at the undergraduate and graduate levels. department and with a wide variety of other de- Senior candidates must have an outstanding partments and programs, including statistics, The Department of Computer Science at the research record, a strong record of funded re- biology, linguistics, internal medicine, electrical University of Chicago invites applications for the search, demonstrated leadership in collaborative engineering, computer engineering, and the hu- position of Lecturer. Subject to the availability of research, and an excellent teaching record at the manities. We favor researchers who are eager to funding, this would be a two year position with undergraduate and graduate level. collaborate to solve problems that extend beyond the possibility of renewal. This position involves This search primarily seeks candidates in their own research areas. teaching in the fall, winter and spring quarters. three research areas. Please clearly indicate for We seek applications from excellent can- The successful candidate will have competence which one of those areas you wish to be consid- didates in all areas, with a particular desire for in teaching and superior academic credentials, ered. Exceptional candidates from other areas expertise in computer networking, security and and will carry responsibility for teaching comput- may also be considered. The focused research ar- privacy, machine learning, big data and data min- er science courses and laboratories. Completion eas of faculty search are: ing, visualization and computer vision, artificial of all requirements for a Ph.D. in Computer Sci- 1. Information Retrieval and Web search. intelligence, and software engineering. These ar- ence or a related field is required at the time of ap- 2. Natural Language Processing and computa- eas complement the department’s Laboratory for pointment and candidates must have experience tional linguistics. Advanced Networking, Software Verification and teaching Computer Science at the College level. 3. Theoretical Computer Science. Validation Lab, and established collaborations The Chicago metropolitan area provides a di- with the Center for Computational Sciences and verse and exciting environment. The local econ- In addition, we may also have a position in the Center for Biomedical Informatics. omy is vigorous, with international stature in cyber-physical systems. We value teaching and the student experi- banking, trade, commerce, manufacturing, and The Computer Science department has 31 ence. Candidates should be eager and prepared transportation, while the cultural scene includes tenure-system faculty and offers BS, MS and to teach upper-level courses in their areas of ex- diverse cultures, vibrant theater, world-renowned PhD degrees. Our faculty includes 11 NSF CA- pertise, as well as (ultimately) core courses in our symphony, opera, jazz and blues. The University REER award recipients. UIC has an advanced ABET-accredited undergraduate Computer Sci- is located in Hyde Park, a Chicago neighborhood computing and networking infrastructure in ence and Computer Engineering programs. on the Lake Michigan shore just a few minutes place for data-intensive scientific research that The University of Kentucky Computer Science from downtown. is well-connected regionally, nationally and in- Department, one of the first CS departments in Applicants must apply on line at the Universi- ternationally. Further information about the po- the United States, has 21 faculty members com- ty of Chicago Academic Careers website at http:// sitions can be found at https://www.cs.uic.edu/ mitted to excellence in education, research, and tinyurl.com/h84fu8p. MainShowJob?name=facINT. service. The Department offers programs leading

132 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 to the Bachelors, Masters, and Ph.D. degrees. The rate with current faculty within and across de- offers easy access to the region’s resources by car University of Kentucky is located in Lexington, partments at UMBC, fostering interdisciplinary or public transportation. the scenic heart of the Bluegrass Region of Ken- research. Candidates are expected to establish a Electronic submission of application is re- tucky. With recognition as one of the safest, most collaborative, externally funded and nationally quired at http://apply.interfolio.com/37306 for creative, and well-educated cities in the nation, recognized research program as well as contrib- the two positions in Data Science/Big Data and all Lexington is an ideal location to build an out- ute to graduate and undergraduate teaching, ad- Artificial Intelligence/Knowledge Management standing, work-life balanced career. vising, and mentoring that support diversity and applicants should apply at http://apply.interfolio. Candidates must have earned a PhD in Com- inclusion. com/37179. All applications for all three positions puter Science or closely related field at the time The Department offers undergraduate de- must be submitted as PDF files, which include a employment begins. To apply, a University of grees in Information Systems and Business cover letter, CV, a one-page statement of teach- Kentucky Academic Profile must be submitted at Technology Administration. Graduate degree ing interests, a one-page statement of research . Applications are now being ac- programs, MS and PhD, are offered in both Infor- interests and names and contact information of cepted. Review of credentials will begin immedi- mation Systems and Human-Centered Comput- at least three references. For inquiries, please con- ately and continue until the positions are filled. ing, including an innovative online MS program tact Barbara Morris at (410) 455-3795 or bmorris@ For more detailed information about these in IS. Consistent with the UMBC vision, the De- umbc.edu. Review of applications will begin in positions, go to http://ukjobs.uky.edu/post- partment has excellent teaching facilities, state- November 2016 and will continue until the posi- ings/123838. Questions should be directed to HR/ of-the-art laboratories, and outstanding technical tions are filled, subject to the availability of funds. Employment by phone at 1-859-257-9555 press 2 support. Further details on our research, academ- UMBC is an Affirmative Action/Equal Oppor- or email ([email protected]), or to Diane ic programs, and faculty can be found at http:// tunity Employer and is committed to increasing Mier ([email protected]) in the Computer Sci- www.is.umbc.edu. the diversity of its faculty. Applicants from tradi- ence Department. UMBC is a dynamic public research university tionally underrepresented groups are especially Upon offer of employment, successful appli- integrating teaching, research and service. As an encouraged to apply. cants must undergo a national background check Honors University, the campus offers academi- as required by University of Kentucky Human Re- cally talented students a strong undergraduate sources. The University of Kentucky is an equal liberal arts foundation that prepares them for The University of Michigan, Ann Arbor opportunity employer and especially encourages graduate and professional study, entry into the Department of Electrical Engineering and applications from minorities and women. workforce, and community service and leader- Computer Science Computer Science and ship. UMBC emphasizes science, engineering, in- Engineering Division formation technology, human services and pub- Faculty Positions UMBC lic policy at the graduate level. UMBC contributes University of Maryland Baltimore to the economic development of the State and the The University of Michigan Computer Science County region through entrepreneurial initiatives, work- and Engineering (CSE) Division expects strong An Honors University in Maryland force training, K-16 partnerships, and technology growth in the coming years and invites applica- Information Systems Department commercialization in collaboration with public tions for multiple tenure-track positions at all Two Tenure Track positions on Data Science / agencies and the corporate community. Diversity levels. Exceptional candidates from all areas Big Data is a core value of the UMBC and we believe that of computer science and computer engineer- One Tenure Track position in Artificial the educational environment is enhanced when ing will be considered. Qualifications include Intelligence/Knowledge Management diverse groups of people with diverse ideas come an outstanding academic record, a doctorate or together to learn. Therefore, members of under- equivalent in computer science or computer en- The Information Systems (IS) Department at represented groups including women, minori- gineering, and a strong commitment to teaching UMBC is committed to increasing the diversity ties, veterans and individuals with disabilities are and research. The college is especially interested of our community. We invite applications for especially encouraged to apply. in candidates who can contribute, through their three tenure-track faculty positions at the As- UMBC continues to lead U.S. News national research, teaching, and/or service, to the diver- sistant Professor level starting August 2017. We university rankings placing fourth in Most In- sity and excellence of the academic community. are searching for two candidates with research novative National University and sixth in Under- These positions encompass, but are not limited interests and experience in Data Science, a re- graduate Teaching. The Chronicle of Higher Ed- to, several cross-disciplinary areas as well as an search area with high growth and impact in en- ucation for the fifth consecutive year has listed endowed professorship in theoretical computer vironmental sciences, health care, security, ap- UMBC in the “honor roll” of “Great Colleges to science (Fischer Chair). plied statistics and others. The ideal candidate Work For”; it is the only Maryland four-year insti- The University of Michigan is one of world’s will have expertise in conducting large-scale data tution to be recognized. Our strategic location in leading research universities with annual re- science research, such as extracting knowledge the Baltimore-Washington corridor puts us close search funding of well over $1 billion. It consists from data of increasing sizes, velocity, and variety to many important federal laboratories, agen- of highly-ranked departments across engineer- to improve decision making in one or more appli- cies and high-tech companies. UMBC’s campus ing, sciences, business, and arts, as well as a lead- cation domains closely relevant to active research is located on 500 acres just off I-95 between Bal- ing medical school, providing significant oppor- areas in the IS department. We are also search- timore and Washington DC, and less than 10 tunities for research collaborations for Computer ing for a candidate with research interests and minutes from the BWI airport and Amtrak sta- Science faculty. The CSE Division continues to experience in Artificial Intelligence (AI) and/or tion. The campus includes a center for entrepre- lead as a vibrant and innovative force, with over knowledge management (KM). The ideal candi- neurship, and the bwtech@UMBC research and 50 world-class faculty members, over 300 gradu- date should have expertise in conducting AI/KM technology park, which has special programs for ate students, several Research Centers, and a research to improve decision making in applica- startups focused on cybersecurity, clean energy, large and illustrious network of alumni. Ann Ar- tion domains such as social computing, health, life sciences and training. We are surrounded by bor is known to be one of the best college towns business analytics, environmental sustainability, one of the greatest concentrations of commer- in the country. and public welfare. Candidates must have earned cial, cultural and scientific activity in the nation. We encourage candidates to apply as soon a PhD in Information Systems or a related field no Located at the head of the Chesapeake Bay, Bal- as possible. For best consideration for Fall 2017, later than August 2017. timore has all the advantages of modern, urban please apply by December 1, 2016. Positions re- The research areas in the department are: living, including professional sports, major art main open until filled and applications can be Artificial Intelligence/Knowledge Management, galleries, theaters and a symphony orchestra. submitted throughout the year. For more details Databases and Data Mining, Human Centered The city’s famous Inner Harbor area is an excit- on these positions and to apply, please visit the Computing, Software Engineering, and Health ing center for entertainment and commerce. Application Web Page. https://www.eecs.umich. Information Technology. Candidates should be The nation’s capital, Washington, DC, is a great edu/eecs/jobs/. engaged in research that fosters collaboration tourist attraction with its historical monuments The University of Michigan is a Non-Discrim- with at least one of the research areas. Therefore, and museums. Just ten minutes from downtown inatory/Affirmative Action Employer with an Ac- preference will be given to those who can collabo- Baltimore and 30 from the D.C. Beltway, UMBC tive Dual-Career Assistance Program.

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 133 CAREERS

University Of Oregon (including but not limited to architecture, data- York University Department of Computer and Information driven systems, mobile and embedded systems, Science Faculty Position networks, operating systems, and parallel and The Department of Electrical Engineering and distributed systems), privacy and security, and Computer Science, York University, is seeking a The Department of Computer and Information human computer interaction. We also welcome 3-year Contractually Limited Appointment (CLA) Science (CIS) seeks applications for two tenure applicants in other areas of traditional strength at the rank of Sessional Assistant Lecturer (alter- track faculty positions at the rank of Assistant for us including natural language processing. nate stream) to serve as Course Coordinator of all Professor, beginning September 2017. The Uni- Candidates must have a PhD in computer science first-year Major and Service computing courses versity of Oregon is an AAU research university lo- or a related discipline. offered by the Department, to commence July 1, cated in Eugene, two hours south of Portland, and Apply online via https://www.rochester.edu/ 2017, subject to budgetary approval. The success- within one hour’s drive of both the Pacific Ocean faculty-recruiting/login. ful candidate has demonstrated excellence in and the snow-capped Cascade Mountains. Consideration of applications at any rank teaching, is licensed as a Professional Engineer The open faculty positions are targeted to- will begin immediately and continue until all in- in Canada or could obtain licensure in the very wards the following three research areas: 1) high terview slots are filled. Candidates should apply short term, and will assume the teaching of up performance computing, 2) networking and dis- no later than January 1, 2017, for full consider- to 6 course sections. For full position details, see tributed systems and 3) data sciences. We are par- ation. Applications that arrive after this date risk http://www.yorku.ca/acadjobs. Applicants should ticularly interested in applicants whose research being overlooked or arriving after the interview complete the on-line process at http://lassonde. addresses security and privacy issues in these schedule has been filled. The department also yorku.ca/new-faculty/. A complete application sub-disciplines; additionally, we are interested in has a search in progress for a full-time lecturer includes a cover letter, a detailed curriculum vi- applicants whose research complements existing position. Details on eligibility and position re- tae, statement of contribution to research, teach- strengths in the department, so as to support in- sponsibilities may be obtained via http://www. ing and curriculum development, three sample terdisciplinary research efforts. Applicants must cs.rochester.edu/about/recruit.html. research publications and three reference let- have a Ph.D. in computer science or closely relat- The Department of Computer Science is re- ters. Complete applications must be received by ed field, a demonstrated record of excellence in search focused, with a distinguished history of December 31, 2016. York University is an Affirma- research, and a strong commitment to teaching. contributions in artificial intelligence, HCI, sys- tive Action (AA) employer. The AA Program can be A successful candidate will be expected to con- tems, and theory. We have a highly collaborative found at http://www.yorku.ca/acadjobs or a copy duct a vigorous research program and to teach at culture and strong ties to electrical and computer can be obtained by calling the AA office at 416- both the undergraduate and graduate levels. engineering, brain and cognitive science, linguis- 736-5713. All qualified candidates are encour- We offer a stimulating, friendly environment tics, and several departments in the medical cen- aged to apply; however, Canadian citizens and for collaborative research both within the depart- ter. We also have a growing Institute for Data Sci- permanent residents will be given priority. ment, which expects to grow substantially in the ence with potential for synergistic collaboration next few years, and with other departments on opportunities and more joint hires. Over the past campus. The department hosts two research cen- decade, a third of the department’s PhD gradu- York University ters, the Center for Cyber Security and Privacy and ates have won tenure-track faculty positions, and the NeuroInformatics Center. Successful candi- its alumni include leaders at major research labo- The Department of Electrical Engineering and dates will have access to a new high-performance ratories such as Google, Microsoft, and IBM. Computer Science, York University, is seeking a computing facility that opens in October 2016. The University of Rochester is a private, Tier I 3-year Contractually Limited Appointment (CLA) The CIS Department is part of the College of research institution located in western New York at the rank of Sessional Assistant Lecturer (alter- Arts and Sciences and is housed within the Lorry State. It consistently ranks among the top 30 insti- nate stream) to serve as Course Coordinator of all Lokey Science Complex. The department offers tutions, both public and private, in federal fund- first-year Major and Service computing courses B.S., M.S. and Ph.D. degrees. More information ing for research and development. The university offered by the Department, to commence July 1, about the department, its programs and faculty has made substantial investments in computing 2017, subject to budgetary approval. The success- can be found at http://www.cs.uoregon.edu. infrastructure through the Center for Integrated ful candidate has demonstrated excellence in Applications will be accepted electronically Research Computing (CIRC) and the Health Sci- teaching, is licensed as a Professional Engineer through the department’s web site. Applica- ences Center for Computational Innovation (HSC- in Canada or could obtain licensure in the very tion information can be found at http:// www. CI). The university includes the Eastman School short term, and will assume the teaching of up cs.uoregon.edu/Employment/. Applications re- of Music, a premiere music conservatory, and the to 6 course sections. For full position details, see ceived by December 15, 2016 will receive full con- University of Rochester Medical Center, a major http://www.yorku.ca/acadjobs. Applicants should sideration. Review of applications will continue medical school, research center, and hospital sys- complete the on-line process at http://lassonde. until the positions are filled. Please address any tem. The greater Rochester area is home to over yorku.ca/new-faculty/. A complete application questions to [email protected]. a million people, including 80,000 students who includes a cover letter, a detailed curriculum vi- The University of Oregon is an equal oppor- attend its 8 colleges and universities. Tradition- tae, statement of contribution to research, teach- tunity/affirmative action institution committed ally strong in optics research and manufacturing, ing and curriculum development, three sample to cultural diversity and is compliant with the it was recently selected by the Department of De- research publications and three reference let- Americans with Disabilities Act. The University fense as the hub of a $360M-plus Integrated Pho- ters. Complete applications must be received by encourages all qualified individuals to apply, and tonics Institute for Manufacturing Innovation. December 31, 2016. York University is an Affirma- does not discriminate on the basis of any protect- The University of Rochester has a strong com- tive Action (AA) employer. The AA Program can be ed status, including veteran and disability status. mitment to diversity and actively encourages found at http://www.yorku.ca/acadjobs or a copy The successful candidate will have the ability to applications from groups underrepresented in can be obtained by calling the AA office at 416- work effectively with faculty, staff, and students higher education. The University is an Equal Op- 736-5713. All qualified candidates are encour- from a variety of diverse backgrounds. portunity Employer. aged to apply; however, Canadian citizens and EOE Minorities / Females / Protected Veter- permanent residents will be given priority. ans/Disabled University of Rochester Faculty Positions in Computer Science US Naval Academy The University of Rochester Department of Com- puter Science seeks applicants for multiple ten- USNA Electrical & Computer Engineering is seek- ure track positions. Candidates in all areas of ing applicants to fill tenure-track Assistant Pro- computer science who see a good synergistic fit fessor positions in Computer Engineering. Ap- with research initiatives at the university are en- plicants with teaching & research interests in all couraged to apply. We are especially interested areas of computer engineering will be considered. in growing our research strengths in systems Applications accepted through “Apply URL” only.

134 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 last byte

[CONTINUED FROM P. 136] computing Selene held up his hands. “I can’t better than she does?” make promises. But let’s see what we The spokesman, a big man with a “Oh, sure. can do.” jet-black beard, scowled. “Oh, sure. And let’s elect The lights went down in the stu- And let’s elect a toaster head of a cater- dio, leaving the anchor, Ed, the JCN ing union and put a teleprompter in a toaster head founder, and Halle in a pool of shad- charge of journalists. Surely you can of a catering union ow as the spokesman unhooked his see putting AI ahead of human leader- microphone and walked away. The ship doesn’t make sense?” and put a JCN founder watched him go. “The “Come on,” said the anchor. “It’s teleprompter spokesman’s name, Adam Selene, an AI negotiating on behalf of very though. Like in Robert A. Heinlein’s human interests. That’s different, in charge of The Moon Is a Harsh Mistress. He’s not and you know it. Halle isn’t dumb journalists.” an AI, is he?” hardware.” “Genuine wetware,” said the an- “Right,” said the spokesman, “and chor, “unless ARM’s new model is bet- I fully accept that she can trade facts ter than we thought. Sorry, wait … I and figures with any corporate execu- have to power down to recharge my on- tive. But our programmers/voters will board batteries. You’re so lucky, Halle. never identify with a box, even one You wouldn’t believe the power it takes able to communicate so convincing- Halle interrupted the JCN founder to keep a humanoid form active. That’s ly. And her name … Hal from 2001: A as he started to reply. “Think about why we say, Box is Best!” Space Odyssey was not exactly a great it. I’m no threat. I can churn out code The anchor slumped in her seat. role model.” but could never equal a top-flight cre- The JCN founder frowned, walking Halle chuckled, sounding sur- ative human code cutter. What I can over to Halle. “I don’t like that ‘Box is prisingly human. “I told the guys do, sure as hell, is beat any executive Best’ thing. They were chanting it on the name was a problem, but what in a labor negotiation. I’ll know when the Pro-Mech march last week.” can you do? They’re geeks. To them, their figures are cooked and even what Although they were still commu- Hal is an inspiration. Leaving aside they’re hiding. I’m going to fight for the nicating verbally, Halle fired over the mistake of making Hal male …” members 24/7. Who else could take on the electronic equivalent of a grin Halle paused for the anchor’s laugh- 10 different calls, meetings, and email via Bluetooth. “You only hear the hu- ter, “… the whole concept of the HAL messages simultaneously? It’s not pro- manoid AIs using it. They see them- 9000 was flawed. In 2001, Hal says grammers who have to worry but the selves as second class. The anchor is ‘No 9000 computer has ever made a pen pushers.” right to worry for her job—with regu- mistake.’ That alone shows he wasn’t The anchor looked at Selene. “What lar old TV news dying the way it is— intelligent. You can’t be a thinking do you think?” but we’ll always need humanoids for entity without making mistakes. I’m Shrugging, he said, “I’d need evi- jobs requiring mobility. I’m more not built like that, Adam. Though I dence that this wasn’t a management concerned about regular flesh-and- suspect some of the other all-too-hu- trick for the sole purpose of putting blood humans. Why don’t they un- man candidates believe they’re inca- human programmers out of work, like derstand? AIs run the world already; pable of error.” everything else.” we don’t want to take away their fun. “Come on, Adam,” said Ed, the JCN The JCN founder nodded. “AI sys- It should be obvious. I mean, finan- founder. “Accept it. Halle can do the tems like Halle do away with lots of cial markets have been AI-versus-AI job better than any human.” tedious or uncreative but program- since the end of the 20th century. “My dad was an early program- mable work. I appreciate that could And more recently AIs have been mer,” said Selene. “A real hacker. In be a threat, just as mechanization handling most online advertising. the early 1980s, the British software was a threat to unskilled workers in Humans have had long enough to company D.J. AI Systems brought out the industrial revolution. But it will adapt. Why such a struggle when it’s a program called The Last One that open up far more creative opportuni- for their own good?” scared the hell out of him. The idea ties. People are capable of far more The JCN founder straightened Hal- was you bought this application and than flipping burgers or churning le’s cap. “Welcome to politics, Halle. fired your human programmers. Us- out mediocre code. We believe that. You’re going to find that human trust ers specified the requirements and it So here’s an offer. We can open up is about more than logic. Win their wrote the code. Thankfully, it took Halle to any analysis your members hearts first and the rest will follow.” five minutes to generate 100 lines would want to perform. We can dem- of BASIC, so it flopped. But that was onstrate algorithmically that she has Brian Clegg (www.brianclegg.net) is a science writer from the U.K. His most recent books are Ten Billion Tomorrows, Halle’s grandpa—not the program- no interest in providing cover for an exploration of the interplay between science and mers’ friend but their replacement. I management taking their jobs—only science fiction, and the science quiz bookHow Many Moons Does the Earth Have? agree Halle could do a better job than in achieving better job security and many programmers, but how could benefits and getting the bean coun- we ever accept her?” ters off their backs.” © 2016 ACM 0001-0782/16/11 $15.00

NOVEMBER 2016 | VOL. 59 | NO. 11 | COMMUNICATIONS OF THE ACM 135 last byte

From the intersection of computational science and technological speculation, with boundaries limited only by our ability to imagine what could be.

DOI:10.1145/2995264 Brian Clegg Future Tense The Candidate Seeking the programmer vote, an AI delivering a slogan like “Make Coding Great Again” could easily be seen as a threat.

THE NEWS ANCHOR smiled with ve- neered whiteness. “Doctor, before we meet the candidate, I guess we need to clear up the IBM/JCN rumor.” JCN’s founder did his best to pre- tend he had not heard the question before. “Call me ‘Ed,’ please. It’s co- incidence. We’re JCN because our dis- tinctive technology is based on Joseph- son Commutative Networks. We were too busy inventing the world’s most flexible computers to notice the initials were one away from IBM.” “Sure...” The anchor looked un- convinced. “So do you honestly be- lieve anyone will take a glowing green box crammed with digital compo- nents seriously as a candidate for president of the CCA, America’s lead- ing programmers union? We’ll speak to a union spokesman in a moment, but what makes you think your ma- chine has a chance?” “We’ve moved on since Google DeepMind’s machine beat a Go mas- ter and Watson triumphed over the best human players on “Jeopardy!” Halle has many qualities making her an ideal candidate. She certainly un- derstands programming, and, bet- ter, she’s no politician. Perhaps we should get her answer?” in the computing industry. My indus- processors provide general-purpose The anchor smiled agreement, and try. Why shouldn’t I take part?” intelligence and constantly learn from the light rose on a corner of the studio “That’s fine, Halle,” the anchor put their experience. It’s an ideal solution where a neat, glowing green box was on her serious face, “but a lot of people for everyday repetitive work. We may be indeed the avatar of the networked will wonder why machine intelligence descendants of Siri and Cortana, but entity called Halle, topped with a base- is needed for that role?” we’re way more flexible.” ball cap labeled “Make Coding Great “I could turn that back on you,” said “Now let’s bring in the union’s me- Again.” “As president of the Code Cut- Halle, “and ask why we need human dia spokesman, Adam Selene,” said ters Association, I would be represent- intelligence in any role involving rou- the anchor. What’s wrong with Halle ing thousands of highly skilled and tine or repetitive effort. AI is here to running for the post, Mr. Selene?

well-compensated American workers stay. The deep neural networks in my Who knows [CONTINUED ON P. 135] MOZYMOV ALEXANDER BY IMAGE

136 COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 DE LANGE CONFERENCE X | DEC. 5–6, 2016 | RICE UNIVERSITY HUMANS, MACHINES AND THE FUTURE OF WORK

The conference will focus on issues created by the impact RENOWNED SPEAKERS AND PANELISTS: of information technology on labor markets over the next Diane Bailey John Markoff 25 years, addressing questions such as: Associate Professor, School of Senior Writer, The New York Times Information, The University of Texas at Austin Lawrence Mishel • What advances in artificial intelligence, robotics President, Economic Policy Institute and automation are expected over the next 25 Guruduth Banavar Vice President, Cognitive Computing, Joel Mokyr years? IBM Research Robert H. Strotz Professor, Northwestern University; John Seely Brown Sackler Professor by Special • What will be the impact of these advances on job Co-chairman, Deloitte’s Center for the Appointment, Eitan Berglas School creation, job destruction and wages in the labor Edge; Adviser to the provost at USC of Economics, Tel Aviv University market? Daniel Castro David Nordfors Vice President, Information Technology Co-founder and Co-chair, and Innovation Foundation Innovation for Jobs Summit • What skills are required for the job market of the Stuart Elliott Debra Satz future? Directorate for Education and Skills Marta Sutton Weeks Professor Organization for Economic Co- of Ethics in Society, Professor of operation and Development (OECD) Philosophy, Senior Associate Dean for • Can education prepare workers for that job the Humanities and Arts, J. Frederick Richard B. Freeman and Elisabeth Brewer Weintz University market? Herbert Ascherman Chair in Fellow in Undergraduate Education, Economics, Harvard University Stanford University

• What educational changes are needed? Eszter Hargittai Manuela Veloso Delaney Family Professor, Herbert A. Simon Professor, Communication Studies Department, School of Computer Science, • What economic and social policies are required to Northwestern University Carnegie Mellon University

integrate people who are left out of future labor John Leslie King Judy Wajcman markets? W.W. Bishop Professor, School of Anthony Giddens Professor of Information, University of Michigan Sociology, The London School of Economics and Political Science • How can we preserve and increase social mobility Vijay Kumar Nemirovsky Family Dean, in such an environment? School of Engineering and Applied Science, University of Pennsylvania

For registration and additional information, visit delange.rice.edu.