COMMUNICATIONS CACM.ACM.ORG OF THEACM 09/2017 VOL.60 NO.09

Moving Beyond the Turing Test with the Allen AI Science Challenge Association for Computing Machinery http://www.can-cwic.ca/ Canadian Celebration of Women in Computing The ACM Canadian Celebration of Women in Computing

November 3-4, 2017 Montreal, QC at Le Centre Sheraton Hotel

The Canadian Celebration of Women in Computing 2017

Come celebrate with us at the largest gathering of Women in Computing in Canada! Registration starting The conference will feature prominent September 1 st , 2017 keynote speakers, panels, workshops, presentations and posters, as well as a programming challenge and a large career fair.

For more information contact us at [email protected]

Association for Computing Machinery Previous A.M. Recipients

1966 A.J. Perlis 1967 1968 R.W. Hamming 1969 1970 J.H. Wilkinson 1971 John McCarthy 1972 E.W. Dijkstra 1973 1974 1975 1975 Herbert Simon 1976 Michael Rabin 1976 1977 1978 Robert Floyd 1979 Kenneth Iverson 1980 C.A.R Hoare ACM A.M. TURING AWARD 1981 Edgar Codd 1982 1983 NOMINATIONS SOLICITED 1983 1984 Nominations are invited for the 2017 ACM A.M. Turing Award. 1985 Richard Karp 1986 This is ACM’s oldest and most prestigious award and is given 1986 to recognize contributions of a technical nature which are of 1987 1988 lasting and major technical importance to the computing field. 1989 The award is accompanied by a prize of $1,000,000. 1990 Fernando Corbató 1991 Financial support for the award is provided by Google Inc. 1992 1993 Nomination information and the online submission form 1993 Richard Stearns 1994 are available on: 1994 http://amturing.acm.org/call_for_nominations.cfm 1995 1996 1997 Additional information on the Turing Laureates 1998 James Gray is available on: 1999 Frederick Brooks http://amturing.acm.org/byyear.cfm . 2000 2001 Ole-Johan Dahl 2001 The deadline for nominations/endorsements is 2002 January 15, 2018. 2002 Ronald Rivest 2002 2003 For additional information on ACM’s award program 2004 Vinton Cerf 2004 Robert Kahn please visit: www.acm.org/awards/ 2005 2006 Frances E. Allen 2007 Edmund Clarke 2007 E. Allen Emerson 2007 2008 2009 Charles P. Thacker 2010 Leslie G. Valiant 2011 2012 2012 2013 2014 2015 2015 2016 Sir Tim Berners-Lee COMMUNICATIONS OF THE ACM

Departments News Viewpoints

5 Letter from Members of 26 Law and Technology the ACM U.S. Public Policy Council Digitocracy Toward Algorithmic Transparency Considering law and and Accountability governance in the digital age. By Simson Garfinkel, By Joel R. Reidenberg Jeanna Matthews, Stuart S. Shapiro, and Jonathan M. Smith 29 Computing Ethics Is That Social Bot Behaving Unethically? 6 Cerf’s Up A procedure for reflection and Take Two Aspirin and discourse on the behavior of bots Call Me in the Morning in the context of law, deception, By Vinton G. Cerf and societal norms. By Carolina Alves de Lima Salge 7 Vardi’s Insights and Nicholas Berente Divination by Program Committee 16 By Moshe Y. Vardi 32 The Profession of IT 13 It’s All About Image Multitasking Without Thrashing 8 Letters to the Editor Image recognition technology is Lessons from operating Computational Thinking Is advancing rapidly. Researchers are systems teach how to do Not Necessarily Computational discovering new ways to tackle the multitasking without thrashing. task without enormous datasets. By Peter J. Denning 10 BLOG@CACM By Samuel Greengard Assuring Software Quality By 35 Viewpoint Preventing Neglect 16 Broadband to Mars Why Agile Teams Fail Robin K. Hill suggests software Scientists are demonstrating Without UX Research neglect is a failure of the coder to pay that lasers could be the future Failures to involve end users or enough attention and take enough of space communication. to collect comprehensive data trouble to ensure software quality. By Gregory Mone representing user needs are described and solutions to avoid 39 Calendar 18 Why GPS Spoofing Is a Threat such failures are proposed. to Companies, Countries By Gregorio Convertino 101 Careers Technology that falsifies navigation and Nancy Frishberg data presents significant dangers to public and private organizations. 38 Viewpoint Last Byte By Logan Kugler When Does Law Enforcement’s Demand to Read Your Data Become 104 Q&A 20 Turing Laureates Celebrate Award’s a Demand to Read Your Mind? All The Pretty Pictures 50th Anniversary On cryptographic backdoors and Alexei Efros, recipient of By Lawrence M. Fisher prosthetic intelligence. the 2016 ACM Prize in Computing, By Andrew Conway and Peter Eckersley works to harness the power 24 Charles W. Bachman: 1924–2017 of visual complexity. An engineer best known for By Leah Hoffmann his work in database management systems, and in techniques of layered architecture that include Bachman diagrams. By Lawrence M. Fisher IMAGE COURTESY OF NASA COURTESY IMAGE

2 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 09/2017 VOL. 60 NO. 09

Practice Contributed Articles Review Articles

72 Security in High-Performance Computing Environments Exploring the many distinctive elements that make securing HPC systems much different than securing traditional systems. By Sean Peisert

Watch the author discuss his work in this exclusive Communications video. https://cacm.acm.org/ videos/security-in-high- performance-computing- environments

48 60 Research Highlights

42 The Calculus of Service Availability 60 Moving Beyond the Turing Test 82 Technical Perspective You’re only as available as with the Allen AI Science Challenge A Gloomy Look at the Integrity the sum of your dependencies. Answering questions correctly of Hardware By Ben Treynor, Mike Dahlin, from standardized eighth-grade By Charles (Chuck) Thacker Vivek Rau, and Betsy Beyer science tests is itself a test of machine intelligence. 83 Exploiting the Analog 48 Data Sketching By Carissa Schoenick, Peter Clark, Properties of Digital Circuits The approximate approach is Oyvind Tafjord, Peter Turney, for Malicious Hardware often faster and more efficient. and Oren Etzioni By Kaiyuan Yang, Matthew Hicks, By Graham Cormode Qing Dong, Todd Austin, and Dennis Sylvester Watch the authors discuss 56 10 Ways to Be a Better Interviewer their work in this exclusive Plan ahead to make the interview Communications video. 92 Technical Perspective https://cacm.acm.org/ a successful one. videos/moving-beyond-the- Humans and Computers By Kate Matsudaira turing-test Working Together on Hard Tasks By Ed H. Chi Articles’ development led by 65 Trust and Distrust in queue.acm.org Online Fact-Checking Services 93 Scribe: Deep Integration of Human Even when checked by and Machine Intelligence to Caption fact checkers, facts are often still Speech in Real Time open to preexisting bias and doubt. By Walter S. Lasecki, By Petter Bae Brandtzaeg Christopher D. Miller, Iftekhar Naim, and Asbjørn Følstad Raja Kushalnagar, Adam Sadilek, Daniel Gildea, and Jeffrey P. Bigham About the Cover: The Turing Test has long served as the imposing benchmark for artificial intelligence technology. Last year, researchers at the Allen Institute for Artificial Intelligence took a different route by devising a challenge that tested whether machines could handle the reasoning and understanding needed to complete an eighth-grade science test. See their Association for Computing Machinery results on p. 60. Cover photo by Andrey Popov, with robot Advancing Computing as a Science & Profession

PHOTO BY TAFFPIXTURE; ROBOT ILLUSTRATION BY PETER CROWTHER ASSOCIATES BY ILLUSTRATION ROBOT TAFFPIXTURE; BY PHOTO illustration by Peter Crowther Associates.

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 3 COMMUNICATIONS OF THE ACM Trusted insights for computing’s leading professionals.

Communications of the ACM is the leading monthly print and online magazine for the computing and information technology fields. Communications is recognized as the most trusted and knowledgeable source of industry information for today’s computing professional. Communications brings its readership in-depth coverage of emerging areas of , new trends in information technology, and practical applications. Industry leaders use Communications as a platform to present and debate various technology implications, public policies, engineering challenges, and market trends. The prestige and unmatched reputation that Communications of the ACM enjoys today is built upon a 50-year commitment to high-quality editorial content and a steadfast dedication to advancing the arts, sciences, and applications of information technology.

ACM, the world’s largest educational STAFF EDITORIAL BOARD ACM Copyright Notice and scientific computing society, delivers DIRECTOR OF PUBLICATIONS EDITOR-IN-CHIEF Copyright © 2017 by Association for resources that advance computing as a Scott E. Delman Andrew A. Chien Computing Machinery, Inc. (ACM). science and profession. ACM provides the [email protected] [email protected] Permission to make digital or hard copies computing field’s premier Digital Library of part or all of this work for personal and serves its members and the computing or classroom use is granted without Executive Editor SENIOR EDITOR profession with leading-edge publications, fee provided that copies are not made Diane Crawford Moshe Y. Vardi conferences, and career resources. or distributed for profit or commercial Managing Editor advantage and that copies bear this Thomas E. Lambert Executive Director and CEO NEWS notice and full citation on the first Senior Editor Bobby Schnabel Co-Chairs page. Copyright for components of this Andrew Rosenbloom Deputy Executive Director and COO William Pulleyblank and Marc Snir work owned by others than ACM must Senior Editor/News Patricia Ryan Board Members be honored. Abstracting with credit is Lawrence M. Fisher Director, Office of Information Systems Mei Kobayashi; Michael Mitzenmacher; permitted. To copy otherwise, to republish, Web Editor Wayne Graves Rajeev Rastogi; François Sillion to post on servers, or to redistribute to David Roman Director, Office of Financial Services lists, requires prior specific permission Rights and Permissions Darren Ramdin and/or fee. Request permission to publish Deborah Cotton VIEWPOINTS Director, Office of SIG Services from [email protected] or fax Editorial Assistant Co-Chairs Donna Cappo (212) 869-0481. Jade Morris Tim Finin; Susanne E. Hambrusch; Director, Office of Publications John Leslie King; Paul Rosenbloom Scott E. Delman Board Members For other copying of articles that carry a Art Director William Aspray; Stefan Bechtold; code at the bottom of the first or last page or screen display, copying is permitted ACM COUNCIL Andrij Borys Michael L. Best; Judith Bishop; provided that the per-copy fee indicated President Associate Art Director Stuart I. Feldman; Peter Freeman; in the code is paid through the Copyright Vicki L. Hanson Margaret Gray Mark Guzdial; Rachelle Hollander; Clearance Center; www.copyright.com. Vice-President Assistant Art Director Richard Ladner; Carl Landwehr; Cherri M. Pancake Mia Angelica Balaquiot Carlos Jose Pereira de Lucena; Subscriptions Secretary/Treasurer Production Manager Beng Chin Ooi; Loren Terveen; An annual subscription cost is included Elizabeth Churchill Bernadette Shade Marshall Van Alstyne; Jeannette Wing in ACM member dues of $99 ($40 of Past President Advertising Sales Account Manager which is allocated to a subscription to Alexander L. Wolf Ilia Rodriguez PRACTICE Communications); for students, cost Chair, SGB Board Chair is included in $42 dues ($20 of which Jeanna Matthews Columnists Stephen Bourne and Theo Schlossnagle is allocated to a Communications Co-Chairs, Publications Board David Anderson; Phillip G. Armour; Board Members subscription). A nonmember annual Jack Davidson and Joseph Konstan Michael Cusumano; Peter J. Denning; Eric Allman; Samy Bahra; Peter Bailis; subscription is $269. Members-at-Large Mark Guzdial; Thomas Haigh; Gabriele Anderst-Kotis; Susan Dumais; Terry Coatta; Stuart Feldman; Nicole Forsgren; Leah Hoffmann; Mari Sako; ACM Media Advertising Policy Elizabeth D. Mynatt; Pamela Samuelson; Camille Fournier; Benjamin Fried; Pamela Samuelson; Marshall Van Alstyne Communications of the ACM and other Eugene H. Spafford ; Tom Killalea; Tom Limoncelli; ACM Media publications accept advertising SGB Council Representatives Kate Matsudaira; Marshall Kirk McKusick; in both print and electronic formats. All Paul Beame; Jenna Neefe Matthews; CONTACT POINTS Erik Meijer; George Neville-Neil; advertising in ACM Media publications is Barbara Boucher Owens Copyright permission Jim Waldo; Meredith Whittaker [email protected] at the discretion of ACM and is intended Calendar items to provide financial support for the various BOARD CHAIRS [email protected] CONTRIBUTED ARTICLES activities and services for ACM members. Board Change of address Co-Chairs Current advertising rates can be found Mehran Sahami and Jane Chu Prey [email protected] James Larus and Gail Murphy by visiting http://www.acm-media.org or Practitioners Board Letters to the Editor Board Members by contacting ACM Media Sales at Terry Coatta and Stephen Ibaraki [email protected] William Aiello; Robert Austin; (212) 626-0686. Elisa Bertino; Gilles Brassard; Kim Bruce; Alan Bundy; Peter Buneman; Carl Gutwin; WEBSITE Single Copies REGIONAL COUNCIL CHAIRS Yannis Ioannidis; Gal A. Kaminka; http://cacm.acm.org Single copies of Communications of the ACM Europe Council Karl Levitt; Igor Markov; Gail C. Murphy; ACM are available for purchase. Please Dame Professor Wendy Hall Bernhard Nebel; Lionel M. Ni; Adrian Perrig; contact [email protected]. ACM India Council AUTHOR GUIDELINES Sriram Rajamani; Marie-Christine Rousset; http://cacm.acm.org/about- Srinivas Padmanabhuni Krishan Sabnani; Ron Shamir; Yoav Shoham; COMMUNICATIONS OF THE ACM communications/author-center ACM China Council Josep Torrellas; Michael Vitale; (ISSN 0001-0782) is published monthly Jiaguang Sun Hannes Werthner; Reinhard Wilhelm by ACM Media, 2 Penn Plaza, Suite 701, ACM ADVERTISING DEPARTMENT New York, NY 10121-0701. Periodicals RESEARCH HIGHLIGHTS postage paid at New York, NY 10001, PUBLICATIONS BOARD 2 Penn Plaza, Suite 701, New York, NY Co-Chairs and other mailing offices. Co-Chairs 10121-0701 Azer Bestavros and Gregory Morrisett Jack Davidson; Joseph Konstan T (212) 626-0686 Board Members POSTMASTER Board Members F (212) 869-0481 Martin Abadi; Amr El Abbadi; Sanjeev Arora; Please send address changes to Karin K. Breitman; Terry J. Coatta; Michael Backes; Maria-Florina Balcan; Communications of the ACM Anne Condon; Nikil Dutt; Roch Guerrin; Advertising Sales Account Manager ; Doug Burger; Stuart K. Card; 2 Penn Plaza, Suite 701 Chris Hankin; Carol Hutchins; Ilia Rodriguez Jeff Chase; Jon Crowcroft; Alexei Efros; New York, NY 10121-0701 USA Yannis Ioannidis; M. Tamer Ozsu; [email protected] Alon Halevy; Sven Koenig; Steve Marschner; Eugene H. Spafford; Stephen N. Spencer; Tim Roughgarden; Guy Steele, Jr.; Alex Wade; Keith Webster Printed in the U.S.A. Media Kit [email protected] Margaret H. Wright; Nicholai Zeldovich; Andreas Zeller ACM U.S. Public Policy Office 1701 Pennsylvania Ave NW, Suite 300, WEB Washington, DC 20006 USA Association for Computing Machinery Chair T (202) 659-9711; F (202) 667-1066 (ACM) James Landay

E R E C 2 Penn Plaza, Suite 701 Board Members S Y A C E L L E Computer Science Teachers Association New York, NY 10121-0701 USA Marti Hearst; Jason I. Hong; P

T E H Mark R. Nelson, Executive Director T (212) 869-7440; F (212) 869-0481 Jeff Johnson; Wendy E. MacKay N I I S Z M A G A

4 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 letter from members of the acm u.s. public policy council

DOI:10.1145/3125780 Simson Garfinkel, Jeanna Matthews, Stuart S. Shapiro, and Jonathan M. Smith Toward Algorithmic Transparency and Accountability

LGORITHMS ARE REPLACING principled, and independent source of gies used in computer security should or augmenting human de- scientific and technical expertise, free be employed to increase confidence in cision making in crucial from the influence of product vendors automated systems. ways. People have become or other vested interests. As organizations deploy complex al- accustomed to algorithms More recently, the ACM Europe gorithms for automated decision mak- Amaking all manner of recommenda- Council Policy Committee (EUACM) ing, system designers should build tions, from products to buy, to songs to has been doing the same in Europe. these principles into their systems. In listen to, to social network connections. USACM and EUACM, both separately some cases, doing so will require ad- However, algorithms are not just rec- and jointly, provide information and ditional research. For example, how to ommending, they are also being used analysis to policymakers and the pub- design and deploy large-scale neural to make big decisions about people’s lic regarding important societal issues networks while ensuring compliance lives, such as who gets loans, whose ré- involving IT, including algorithmic with laws prohibiting discrimination sumés are reviewed by humans for pos- transparency and accountability. against legally protected groups? This sible employment, and the length of USACM and EUACM have identi- is especially crucial given the ability to prison terms. While algorithmic deci- fied and codified a set of principles in- infer characteristics such as gender, sion making can offer benefits in terms tended to ensure fairness in this evolv- race, or disability status even if the of speed, efficiency, and even fairness, ing policy and technology ecosystem.a computer system is not provided with there is a common misconception that These are: (1) awareness; (2) access and that data directly. How should informa- algorithms automatically result in un- redress; (3) accountability; (4) explana- tion on automated decisions be logged biased decisions. In reality, inscrutable tion; (5) data provenance; (6) audit- to ensure auditability? How can the op- algorithms can also unfairly limit op- ability; and (7) validation and testing. eration of these networks be explained portunities, restrict services, and even Awareness speaks to educating the to technologists and non-technical improperly curtail liberty. public regarding the degree to which policymakers alike? Information and communication decision making is automated. Ac- One model for moving forward may technologies invariably raise these cess and redress means there is a way be self-regulation by industry. Our expe- kinds of important public policy is- to investigate and correct erroneous rience, however, is that self-regulation is sues. How should self-driving cars be decisions. Accountability rejects the only possible when there is a consensus required to act? How private is informa- common deflection of blame to an on a set of relevant standards. We hope tion stored on a cellphone? Can elec- automated system by ensuring those our principles can serve as input to such tronic voting machines be trusted? How who deploy an algorithm cannot es- an effort. If policymakers determine will the increasing uses of automation in chew responsibility for its actions. Ex- regulation is necessary, our principles the workplace impact workers? Since its planation means the logic of the algo- are available, potentially in the way that founding, ACM’s members have played rithm, no matter how complex, must the Code of Fair Information Practices a leading role in discussing these issues be communicable in human terms. provided a basis for decades of privacy within the computing profession and As many modern techniques are regulation around the world. with policymakers. based on statistical analyses of large USACM and EUACM seek input and The ACM U.S. Public Policy Council pools of collected data, decisions will involvement from ACM’s members in (USACM) was established in the early be influenced by the choice of data- providing technical expertise to de- 1990s as a focal point for ACM’s inter- sets for training, and thus knowing cision makers on the often difficult actions with U.S. government organiza- the data sources and their trustwor- policy questions relating to algorithmic tions, the computing community, and thiness—that is, their provenance—is transparency and accountability, as the public in all matters of U.S. public essential. Auditability for a decision- well as those relating to security, policy related to information technol- making system requires logging and privacy, accessibility, intellectual ogy. USACM came to prominence dur- record keeping, for example, for dis- property, big data, voting, and other ing the debates over cryptography and pute resolution or regulatory compli- technical areas. For more information, key escrow technology. Today, USACM ance. Finally, validation and testing visit www.acm.org/public-policy/usacm continues to make public policy recom- on an ongoing basis means that tech- or www.acm.org/euacm. mendations that are based on scientific niques such as regression tests, vetting evidence, follow recognized best prac- of corner cases, or red-teaming strate- The authors are members of the ACM U.S. Public Policy tices in computing, and are grounded Council, for which Stuart S. Shapiro ([email protected]) serves as chair. in the ACM Code of Ethics. It has estab- a https://www.acm.org/binaries/content/assets/public- lished a reputation as a non-partisan, policy/2017_usacm_statement_algorithms.pdf Copyright held by authors.

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 5 cerf’s up

DOI:10.1145/3130331 Vinton G. Cerf Take Two Aspirin and Call Me in the Morning I use a lot of metaphors in this column and this one is about security. Security is much on my mind these days along with safety and privacy in an increasingly online,

programmed world. There is surely quarantine computers showing signs of other site, Stopbadware.org, helps in- little doubt that we are at risk as cy- infection until they have been purged of fected websites rid themselves of viral ber-attacks increase in scope, scale, their viral load? Of course, that raises the load. There are, of course, a number of and complexity. Our lives are made question “How do you know that com- companies that offer anti-virus detec- complex by some of the responses: puter or IOT device is infected?” and tion software that tries to detect mal- “Oh, you want to log into this service? “How do you cleanse it?” Answering ware as it is encountered or ingested what’s your username and password? these questions might take you into into a computer. So far, these efforts OK. Now go to your mobile to get a sec- potential privacy-violating territory: have had only limited success and lead ond password that I have sent you. You suppose your computer keeps track me to wonder whether there are more don’t have cell service where you are? of every domain name and IP address effective ways of discovering infection Too bad.” I am not dissing two-factor it has interacted with. Could you use by way of behavioral observation. authentication as I am a huge propo- this list as a detector of potential It is tempting to imagine a home nent, but I have experienced situations hazard? Could you go to a service and router/firewall that does sophisti- like this, or a dead battery and the frus- say “Here’s where I have been—am cated, machine-learned observation trations are material. At that point, the I at risk?” Alternatively, you might to protect programmable devices at system might turn to “answers to secret download a blacklist of bad sites and home, but since our laptops, mobiles, questions,” but that opens up the pos- addresses and compare to your list of and other programmed devices roam sibility that your choices of questions places. We’ve seen some of the nega- with us, they really need an on-board and answers are discoverable with a tive side effects of spam blacklists so detection system (or logging system?) search of the World Wide Web. Ugh. I am not sure this would work, to say to protect while on the road. So where does this leave us? I am nothing of the question: “Quis custo- Perhaps we all need to get into fascinated by the metaphor of cyber diet ipsos custodes?”a a cyber-hygiene habit and run our security as a public health problem. I do wonder whether machine devices through regular infection Our machines are infected and they are learning might be useful. Could my checks? And we surely need much sometimes also contagious. Our reac- computer generate a profile of “nor- better tools with which to detect and tions in the public health world involve mal” Internet interactions and warn combat this endless escalation. We inoculation and quarantine and we me about unusual ones? Will the could also do with better user train- tolerate this because we recognize our false alarm rate drive me crazy? How ing and services to avoid unsafe plac- health is at risk if other members of would I know if something is a false es on the Internet and poor security society fail to protect themselves from alarm? Is there anything like a cen- practices that lead to compromise. infection. Sadly, virus detection seems ter for disease control in this space? While I am not advocating for an In- to be closing the barn door after the Google acquired a company called ternet driver’s license, the prepara- horses have left, to mangle a metaphor. Virustotalb a few years ago that main- tion for such a metaphorical exam Zero Day attacks cannot be detected tains a library of viral profiles that might do us all some good. with previously cataloged viral signa- allows users to check whether partic- tures, for example. They may help, but ular URLs or files carry malware. An- Vinton G. Cerf is vice president and Chief Internet Evangelist perhaps not enough. at Google. He served as ACM president from 2012–2014. One wonders whether we should a Roughly, “Who will watch the watchmen?” take the metaphor more seriously and b https://www.virustotal.com Copyright held by owner/author.

6 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 vardi’s insights

DOI:10.1145/3122847 Moshe Y. Vardi Divination by Program Committee

IVINATION IS THE practice people of Borneo use birdwatching to modus operandi of program com- of an occultic ritual as an decide which sites to farm and which mittees. The standard approach in aid in decision making. It sites to leave fallow, they are simply such committees can be viewed as has old historical roots. Ac- randomizing in the face of uncer- “guilty until proven innocent.” We cording to the biblical book tainty about rain, pests, and more, expect only 25%–35% of the papers Dof Samuel I, in the 11th century BCE, but this randomization comes with a to be accepted, so the default deci- the Hebrew King Saul sought wisdom belief in the divine source of the deci- sion is to reject unless there is strong from the Witch of Endor, who sum- sion. (See essay by Michael Sulson at agreement to accept. But the reality is moned the dead prophet Samuel, https://goo.gl/RYb264.) that a different committee may have before his impending battle with the But what does this have to do with reached a different decision on the Philistines. Alexander the Great, after program committees? In 2014, the majority of accepted papers. Is it wise conquering Egypt in 332 BCE, visited Neural Information Processing Sys- to reject papers based essentially on the Oracle of Amun at the Siwa Oasis tems Foundation (NIPS) Conference the whim of the program committee? to learn about his future prospects. split the program committee into two If we switch mode to “innocent until Divination can be practiced in many independent committees, and then proven guilty,” we would reject only ways, including sortilege (casting of subjected 10% of the submissions— papers on which there is strong agree- lots), reading tea leaves or animal en- 166 papers—to decision making by ment to reject, and accept all other trails, random querying of texts, and both committees. The two commit- papers. more. Divination has been dismissed tees disagreed on 43 papers. Given the Beyond the increased fairness of as superstition since antiquity; the NIPS paper acceptance rate of 25%, “innocent until proven guilty,” this Greek scholar Lucian derided divina- this means that close to 60% of the approach would also increase the effi- tion already in the 2nd century CE. Yet papers accepted by the first commit- ciency of the conference-publication the practice persists. tee were rejected by the second one system. A high rejection rate means Developments in and vice versa. (See analysis by Eric that papers are submitted, resubmit- and in computer science in the 20th Price at https://goo.gl/fy5jLR.) This ted, and re-resubmitted, resulting in century shed new light on the power high level of randomness came as a a very high reviewing burden on the of divination. Unless we believe that surprised to many people, but I have community. It also results in the pro- divination truly allows us to consult found it quite expected. My own ex- liferation of conferences, which frag- the divine, we can view it simply as a perience is that in a typical program- ments research communities. As I form of randomization, which is rec- committee meeting there is broad argued in an earlier editorial (https:// ognized as a powerful construct in agreement for acceptance about the goo.gl/dUMkwZ), I believe the proper game theory and algorithm design. top 10% of the papers, as well as broad way to adapt to the growth of the com- The classical game-theoretic example agreement rejections about the bot- puting research is to grow our confer- is the game of Rock-Scissors-Paper in tom 25% of the papers. For the other ences rather than proliferate confer- which there is no Nash equilibrium 65% of the submissions, there is no ences. of pure strategies, but there is a Nash agreement and the final accept/reject NIPS should be lauded for applying equilibrium in which both players decision is fairly random. This is par- the “publication method” to scientif- choose their actions uniformly at ran- ticularly true when the accept/reject ic inquiry. It is up to the computing- dom. The classical Dining Philoso- decision pivots on issues such as sig- research community to draw the con- phers Problem has no symmetric dis- nificance and interestingness, which clusions and act accordingly! tributed deterministic solution, but, can be quite subjective. Yet, we seem Follow me on Facebook, Google+, as shown by Michael Rabin, has such to pretend that this random decision and Twitter. a solution if we allow randomization. reflects the deep wisdom of the pro- The essential insight is that random- gram committee. Moshe Y. Vardi ([email protected]) is the Karen Ostrum George Distinguished Service Professor in Computational ization is a powerful way to deal with I believe the NIPS experiment Engineering and Director of the Ken Kennedy Institute for incomplete information. Thus, as re- should not only teach us some hu- Information Technology at Rice University, Houston, TX. He is the former Editor-in-Chief of Communications. alized by the anthropologist Michael mility, but should also suggest that Dove in the 1970s, when the Kantu we may want to reconsider the basic Copyright held by author.

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 7 letters to the editor

DOI:10.1145/3128899 Computational Thinking Is Not Necessarily Computational

APPLAUD PETER J. DENNING’S View- that would help students move into the ing definition by Al Aho: “Abstractions point “Remaining Trouble Spots field, should that be their preference. called computational models are at with Computational Thinking” But should computational thinking also the heart of computation and compu- (June 2017), especially for point- be taught to artists, writers, poets, physi- tational thinking. Computation is a ing out the subject itself is of- cians, and lawyers? Not as I see it . . . process that is defined in terms of an Iten characterized by “vague definitions The faulty thinking behind the “com- underlying model of computation, and and unsubstantiated claims”; “com- puter science for all” approach to peda- computational thinking is the thought putational thinking primarily benefits gogy is best seen in Denning’s table, processes involved in formulating people who design computations and . labeled “Traditional versus New Com- problems so their solutions can be rep- . . claims of benefit to nondesigners are putational Thinking.” Its entry on “do- resented as computational steps and not substantiated”; and “I am now wary main knowledge” suggested tradition- algorithms.” But as Aho’s definition is of believing that what looks good to me alists see domain knowledge as vitally highly circular, it reveals very little. as a is good for every- important to the person doing the com- All disciplines rely on models. The one.” Moreover, the accompanying table putational thinking, while “new” think- only specifically computational word outlined various historic definitions of ing says the importance of computa- here is “algorithms.” If we replaced it “computational thinking,” including a tional thinking is domain-independent. with similar words, like “procedures” comparison of what Denning called the As a practicing programmer who has or “sequences,” we would arrive at such “new” and the “traditional” view of the dabbled in many different application vacuous “definitions” as, say, “Medicine subject. However, my own interest in domains over a long professional career, is a process that is defined in terms of computational thinking differs some- I see it as beyond understanding how an underlying model of medicine, and what from Denning’s. First, I question anyone could fail to see the importance medical thinking is the thought process- the legitimacy of the term “computation- of deeply knowing a domain to being es involved in formulating problems so al” itself. Why say it, when the very subject able to solve problems in that domain. their solutions can be represented as is “computers” and the chief academic Robert L. Glass, Toowong, Australia medical steps and procedures.” And approach to their study is “computer sci- “Drama is a process that is defined in ence”? If one looks at how computers are terms of an underlying model of drama, actually used, it may come as a surprise to Author Responds: and dramatic thinking is the thought learn that few such uses actually involve Computational thinking is the habits of mind processes involved in formulating prob- computing. For example, applications developed from designing computations. The lems so their solutions can be represent- that deal with scientific and engineer- meaning of computation has evolved from ed as dramatic steps and sequences.” ing problems are of course heavily com- the 1960s “sequence of states of a computer One could analogously “define” musi- puting-focused, but, last I heard, they executing a program” to today’s “evolution cal thinking, artistic thinking, chemical constitute only approximately 20% of of an information process.” This changed thinking, and so forth. all applications being developed world- meaning reflects the ever-expanding reach Unless somebody can come up with wide. The most predominant applica- of computing into all sectors of work and life. a more insightful definition, it is indeed tions—those for business—involve lit- Many of today’s most popular apps feature time to retire “computational thinking.” tle computation beyond arithmetic. And computations well beyond arithmetic, as in, Lawrence C. Paulson, systems programs like operating sys- say, facial recognition, speech transcription, Cambridge, England tems and compilers, the focus of much driverless cars, and industrial robots. The computer science study, historically at computational thinking developed by least, involve little or no computation those who worked on these achievements Toward a True Measure and primarily concern manipulating in- is much more powerful than the handful of Patent Intensity formation rather than numbers. of programming concepts offered as the In their article “How Important Is IT?” The problem is that computational- definition of “new CT.” (July 2017), Pantelis Koutroumpis et al. thinking enthusiasts, as Denning wrote, Peter J. Denning, Monterey, CA described a methodology for assess- are driven to spread the subject across ing the importance of information and all academic majors. I certainly believe communications technologies (ICTs) in the importance of programming and Time to Retire compared to non-ICT technologies, using computers for the variety of appli- ‘Computational Thinking’? using PatStat, a dataset from the Euro- cations for which they provide benefit Peter J. Denning asked, “What is com- pean Patent Office of 90 million patents and that educational systems worldwide putational thinking?” in his Viewpoint awarded from 1900 to 2014. Controlling should provide the knowledge and skills (June, 2017), then quoted the follow- for variables (such as patent office, year

8 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 letters to the editor of grant, and patent family), they con- Trademark Office’s economy update,2 cluded ICT patents are more influential the non-ICT “basic chemicals” category than non-ICT patents because they re- ranked first, with $64.5 billion in mer- ceive significantly more citations and a chandise exports of selected intellectu- considerably higher PageRank. al-property-intensive industries, while When one publication (not just those “semiconductors and electronic com- involving patents) is cited more often ponents” was second at $54.8 billion. than some other publication, the more- Most industries involve non-ICT tech- cited one is thus more influential. How- nology. As for “patent intensity,” or the ever, patent publications are unique ratio of patents to employees measured because they not only describe novel sys- as patents/thousand jobs, “computer tems and methods but also hold com- and peripheral equipment” and “com- mercial value and represent licensable munications equipment” topped the Call for assets for their holders. A patent may be list, though this was due directly to the cited hundreds of times yet still have rel- relatively high number of patents issued Nominations atively low financial value; on the other in the industry versus the industry’s rela- hand, a patent may be cited only rarely tively low number of employees. Conclu- for ACM yet reflect enormous valuation. sions regarding level of influence of ICT General Election Consider that in 2013, Kodak, the technologies versus other types of tech- company that invented the digital cam- nologies should thus be reported with era, sold its portfolio of 1,100 digital care when a comparison is based solely photography-related patents to multiple on number of inventions and citations. The ACM Nominating licensees for $525 million (or $477.3K If such influence is indeed the ba- per patent). Earlier, Google bought Mo- sis for a comparison, then additional Committee is preparing torola Mobility and its 17,000 patents covariates should be controlled for, in- to nominate candidates for $12.5 billion (or $735.3K per patent), cluding the mean estimated valuation for the officers of ACM: and Microsoft acquired 800 patents per patent, number of employees in the President, from AOL for $1.06 billion (or $1.33M industry, and additional financial and per patent). Snap paid the exceptional industry-specific characteristics. Vice-President, price of $7.7 million for Mobli’s Geo- Secretary/Treasurer; filters patent, believed by TechCrunch References 1. Kartoun, U. A user, an interface, or none. Interactions 24, 1 and two to be the highest amount ever paid for (Jan.-Feb. 2017), 20–21. a patent from an Israeli tech company. 2. U.S. Patent and Trademark Office.Intellectual Property Members at Large. and the US Economy: 2016 Update. U.S. Patent and However, the valuations of most pat- Trademark Office, Washington, D.C., 2016; https:// ents are unknown until they are indeed www.uspto.gov/sites/default/files/documents/ Suggestions for candidates IPandtheUSEconomySept2016.pdf auctioned or sold off. For instance, ICT- are solicited. Names should be related patents (such as those involv- Uri Kartoun, Cambridge, MA sent by November 5, 2017 to the Nominating Committee Chair, ing Google’s and Microsoft’s methods c/o Pat Ryan, for faster Internet browsing)1 may have Authors Respond: Chief Operating Officer, impressive valuations, but those valua- Although there may be some correlation ACM, 2 Penn Plaza, Suite 701, tions are difficult to predict before actu- between patent price and technological New York, NY 10121-0701, USA. ally being auctioned or sold off. influence, the relationship is neither clear nor Considering non-ICT patents, the systematic. Patent prices are more likely driven With each recommendation, by how incremental/radical/breakthrough it is, revenue streams of several pharmaceu- please include background tical companies depend on patents and whether its value is standalone or as part of a information and names of individuals their corresponding expiration dates, bundle, projected commercialization timescale, the Nominating Committee and one patent could be worth billions cost versus risk, bidder’s experience, patent age, can contact for additional over the course of its licensing period. rate of technological change, and substitution information if necessary. Notable patented medications include and reverse-engineering risk, to say nothing Pfizer’s Lipitor (for lowering fatty acids of broader economic factors. Perhaps our Alexander L. Wolf is the Chair known as lipids), Bristol-Myers Squibb’s technological-influence measure could thus of the Nominating Committee, Plavix (for preventing heart attacks and be used to help understand patent pricing. and the members are strokes), and Teva’s Copaxone (for treat- Pantelis Koutroumpis, London, U.K., Karin Breitman, Judith Gal-Ezer, ing multiple sclerosis). Other non-ICT Aija Leiponen, Ithaca, NY, and Rashmi Mohan, and Satoshi Matsuoka. patents that have significantly and di- Llewellyn D W Thomas, London, U.K. rectly improved people’s lives are cited only rarely, including those related to Communications welcomes your opinion. To submit a Letter to the Editor, please limit yourself to 500 words or agriculture, transportation, and cre- less, and send to [email protected]. ation of new materials. In the most recent U.S. Patent and ©2017 ACM 0001-0782/17/09

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 9 The Communications Web site, http://cacm.acm.org, features more than a dozen bloggers in the BLOG@CACM community. In each issue of Communications, we’ll publish selected posts or excerpts.

Follow us on Twitter at http://twitter.com/blogCACM

DOI:10.1145/3121430 http://cacm.acm.org/blogs/blog-cacm Assuring Software Quality By Preventing Neglect Robin K. Hill suggests software neglect is a failure of the coder to pay enough attention and take enough trouble to ensure software quality.

Robin K. Hill open-source projects, that developers refined by some other rules to correct The Ethical Problem produce no documentation at all, as for what happens at longer periods, of Software Neglect a matter of course, and that further- but this code is a prototype ... She http://bit.ly/2roEDf1 more, during maintenance cycles, retains the simple test, meaning to May 31, 2017 they do not correct the old source code look up the specifics ... but her boss comments, seeing such edits as risky commits her code. No harm is fore- Ethical concern about technology and presumptuous. All of these peo- seeable ... except that it turns out to enjoys booming popularity, evident ple are fine coders, and fine people. interface with another module where in worry over artificial intelligence, Their practices seem oddly reason- the leap-year calculation incorporates threats to privacy, the digital divide, able in the circumstances, under the the complete set of conditions, which reliability of research results, and pressure of haste, even while those is discovered to drive execution down vulnerability of software. Concern practices degrade the understandabil- the wrong path in some calculations. over software shows in cybersecurity ity of the program. Couple that with The program is designated for fixing efforts and professional codes.1 The the complexity of modern programs, but it continues to run, those in the black hats are hackers who deploy and we conclude that, in some cases, know compensating for it somehow... software as a weapon with malicious programmers simply don’t know what What sort of violation is neglect? intent, and the white hats are the orga- their code does. It doesn’t attack security because it nizations that set safeguards against Examples of software quality short- occurs behind the firewall. It doesn’t defective products. But we have a gray- comings readily come to mind—out- attack ideals of quality because no- hat problem—neglect. of-bounds values unchecked, com- one officially disputes those ideals. It My impression is that the criteria plex conditions that identify the is a failure of degree, a failure to pay under which I used to assess student wrong cases, initializations to the enough attention and take enough programs—rigorous thought, design, wrong constant. Picture a clever and trouble. Can philosophy help clarify and testing, clean nested conditions, conscientious coder finishing up a what’s wrong? An emerging theory meaningful variable names, complete calendar module before an impor- called the ethics of care displaces case coverage, careful modulariza- tant meeting. She knows that the the classical agent-centered moral- tion—have been abandoned or weak- test for leap years from the numeric ity of duty and justice, endorsing in- ened. I have been surprised to find, at yyyy value, if (yyyy mod 4 = 0) stead patient-centered morality as prestigious institutions working on and (yyyy mod 100 != 0), must be manifest real-time in relationships.2,4

10 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 blog@cacm

The theory offers a contextual per- 5. Franssen, M., Lokhorst, G., and van de Poel, I. Philosophy of Technology. The Stanford Encyclopedia spective rather than the cut-and-dried of Philosophy (Fall 2015 Edition), Edward N. Zalta directives of more traditional views. What sort (ed.). https://plato.stanford.edu/archives/fall2015/ entries/technology/. While care can be construed as a vir- of violation tue (relating to my prior post in this Note: While the Web encyclopedias, as cited, provide good space3) or as a goal like justice, the is neglect? surveys of current philosophical views, pursuit of any ideas in depth will require reading original research. promoters of care ethics resist a uni- It doesn’t attack versal mandate. They may also reject this attempt to apply it to software, of security because Comments all things; the heart of the matter for it occurs behind This is possibly the most important care ethics is the work of delivering paragraph of the article, outlining the exact care to a person in need. the firewall. problem in the industry: Yet software neglect seems exactly It doesn’t attack “The quality that has corrected for the type of transgression addressed neglect in the past is professionalism, by the ethics of care, if we allow its ideals of quality by which I mean that the expert does reinterpretation outside of human because no one what’s best for the client even at a cost relationships. Appeal to the theory to personal time, energy, money, or allows us to identify the opposite of officially disputes prestige—within reason! Certainly these care, that is, neglect, as the quality to those ideals. judgments are subjective, and viable condemn. This yields our account of when the professional is autonomous, software quality as an ethical issue, when that single person exercises especially piquant in its application control over the product and its quality. of tools from the feminist foundry Counterforces in the current tech to the code warrior culture. But little business world are (1) employment, credit is due! We are not solving the under which most programmers are not problem, only embedding it in the consultants, but rather given orders by terms of a philosophical platform. ity, one possible resolution, odd as it a company; and (2) collaboration, under This account raises issues in the eth- may seem, is simply to acknowledge which most software is the product of ics of engineering, such as individual the situation, to admit to the public committees, in effect. Professionalism versus corporate responsibility (and that software is not always reliable, or also depends on strong personal whether corporate responsibility mature, or even understood. Given its identification with disciplinary peers and can be rendered coherent and en- familiarity with bug fixes, the public pride in the group’s traditions.” forceable short of the law). For a may not be unduly shocked. If we pre- It sounds like, short of working for concise summary, see Section 3.3.2, fer to reject that fatalistic move, the enlightened organizations, software on Responsibility, in Stanford Ency- pressing question is, are there some developers should be leaning towards more clopedia of Philosophy entry on the public standards that developers can autonomy and self-ownership. Philosophy of Technology.5 and will actually follow? The collec- I recently read Developer Hegemony The quality that has corrected for tive response will determine whether (a very bold title!), http://amzn. neglect in the past is professional- software engineering is a profession. I to/2pA18wB, and it addresses that ism, by which I mean that the expert urge all coders who wish to take pride side of the issue by encouraging more does what’s best for the client even at in their jobs to read the draft profes- professionalism and autonomy. a cost to personal time, energy, mon- sional standards,1 which mention There’s already a strong movement ey, or prestige—within reason! Cer- code quality in Section 2.1. in favor of Software Craftsmanship, tainly these judgments are subjec- We see that ethical issues appear and the free software and open source tive, and viable when the professional not only in the external social context, movements both seem to care more is autonomous, when that single per- but in the heart of software, the cod- about quality than most companies son exercises control over the prod- ing practice itself, a gray-hat problem, (though they do neglect documentation uct and its quality. Counterforces in if you will. We hope that the ethics of sometimes). For example, we already the current tech business world are care can somehow help to alleviate prefer software written by recognizably (1) employment, under which most those issues. smart/professional developers. programmers are not consultants, Here’s hoping to more autonomy but rather given orders by a com- References in the future and the allowance of our pany; and (2) collaboration, under 1. Association for Computing Machinery. Code 2018 professionalism to counteract the neglect Project. https://ethics.acm.org/. which most software is the product of 2. Burton, B.K., and Dunn, C.P. Ethics of Care. of software. committees, in effect. Professional- Encyclopædia Britannica, https://www.britannica.com/ —Rudolf Olah topic/ethics-of-care. ism also depends on strong personal 3. Hill, R.K. Ethical Theories Spotted in Silicon Valley. Blog@CACM, March 16, 2017, https://cacm.acm.org/ identification with disciplinary peers blogs/blog-cacm/214615-ethical-theories-spotted-in- Robin K. Hill is an adjunct professor in the Department of and pride in the group’s traditions. silicon-valley/fulltext. Philosophy at the University of Wyoming. 4. Sander-Staudt, M. Care Ethics. The Internet In the face of knotty difficulties Encyclopedia of Philosophy, 2017. http://www.iep.utm. enforcing or fostering ideals of qual- edu/care-eth/. © 2017 ACM 0001-0782/17/09 $15.00

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 11 Introducing ACM Transactions on Human-Robot Interaction

Now accepting submissions to ACM THRI

In January 2018, the Journal of Human-Robot Interaction (JHRI) will become an ACM publication and be rebranded as the ACM Transactions on Human-Robot Interaction (THRI).

Founded in 2012, the Journal of HRI has been serving as the premier peer-reviewed interdisciplinary journal in the eld.

Since that time, the human-robot interaction eld has experienced substantial growth. Research ndings at the intersection of robotics, human-computer interaction, arti cial intelligence, haptics, and natural language processing have been responsible for important discoveries and breakthrough technologies across many industries.

THRI now joins the ACM portfolio of highly respected journals. It will continue to be open access, fostering the widest possible readership of HRI research and information. All issues will be available on the ACM Digital Library.

Editors-in-Chief Odest Chadwicke Jenkins of the University of Michigan and Selma Šabanović of Indiana University plan to expand the scope of the publication, adding a new section on mechanical HRI to the existing sections on computational, social/behavioral, and design-related scholarship in HRI.

The inaugural issue of the rebranded ACM Transactions on Human-Robot Interaction is planned for March 2018.

To submit, go to https://mc.manuscriptcentral.com/thri news

Science | DOI:10.1145/3121434 Samuel Greengard It’s All About Image Image recognition technology is advancing rapidly. Researchers are N discovering new ways to tackle the task without enormous datasets.

ISCOVERING THE SECRETS of the universe is not a task for the timid and the impatient; there’s a need to peer into the deepest reaches of outer Dspace and try to make sense of distant galaxies, stars, gas clouds, quasars, ha- los, and black holes. “Understanding how these objects behave and how they interact gives us answers to how the universe was formed and how it works,” says Kevin Schawinski, an astrophysi- cist and assistant professor in the Insti- tute for Astronomy at ETH Zurich, the Swiss Federal Institute of Technology. The problem is that traditional tools such as telescopes can see only so far, even with radical advances in optics and the placement of observatories in space, where they are free of the light and dust of Earth. For instance, the Hubble Tele- the equation. As huge volumes of data Center for Cosmology at Carnegie Mel- scope changed the way astrophysicists stream in, they are able to find answers lon University. and astronomers viewed deep space by to previously unfathomable questions. Indeed, the combination of more delivering far clearer images than pre- In recent years, scientists have begun data, advances in data science, and viously possible. Of course, in this con- to train neural nets to analyze data new methods that allow researchers text, distance and time are inextricably from images captured by cameras in to easily and cheaply train neural net- linked. “But the images still do not al- telescopes located on Earth and in works is allowing scientists to boldly low us to see as far back in time as we space. In many cases, the resulting ma- see where they have never seen before. would like,” Schawinski says. “The far- chine-based algorithms can sharpen No less important, these advances are ther we can see, the more we can under- blurs and identify distant objects bet- not limited to astrophysics and as- stand about the origins of the universe ter than humans can. tronomy; they have touched an array of and how it has evolved.” “Data science and big data are revo- other fields and have advanced autono- Enter computer image recognition, lutionizing many areas of astrophys- mous vehicles, robots, drones, smart- artificial neural networks, and data ics,” says François Lanusse, a post- phones and more. They’re also being

IMAGE FROM SHUTTERSTOCK.COM IMAGE science; together, they are changing doctoral researcher in the McWilliams used to better understand everything

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 13 news

from how linguistic patterns contrib- ing methods, as well as unsupervised ute to racism to identifying the poten- learning. University researchers as well tial severity of hurricanes as they form. Researchers as companies such as Alphabet, which Says Jeff Clune, an assistant profes- are turning to operates Google Brain and DeepMind, sor of computer science at the Univer- have begun to study this space. They sity of Wyoming, “Until very recently, convolutional are turning to convolutional systems computers did not see and understand systems modeled modeled after the visual processing the world very well. The ability to train that takes place in humans, and gen- neural nets quickly and easily is trans- from human visual erative systems that rely on a more con- forming image recognition and en- processing, and ventional statistical-based approach to abling remarkable breakthroughs.” learn the features of a dataset. generative systems The end goal? “We want to just Picture Perfect that rely on a hand the computer the data and the Artificial neural nets are nothing new. algorithm and have it deliver results,” The concept originated in the 1940s statistical approach. Schawinski says. “This type of capabil- and researchers have experimented ity would revolutionize astrophysics, with them for the last quarter-century. but also science in general.” Yet it was only over the last few years that the technology has matured to the A Sharper Focus point where computer image recogni- Advances in AI are now pushing the tion and other artificial intelligence one task makes a neural network faster boundaries of neural nets and deep (AI) capabilities have become viable. and better at learning the second task,” learning into an almost sci-fi realm, Using anywhere from one to some- Clune explains. “The system already though the results produced by these times hundreds of graphical process- has a basic understanding of things systems are very real. Consider: Clune ing units (GPUs), these training net- that are common to both tasks, such now uses generative systems to pro- works—which function in a similar as eyes, ears, legs, and fur.” As train- duce artificial images that look com- way to neural pathways in the human ing proceeds and a neural net becomes pletely real to the human eye. These brain—recognize patterns in data that smarter, it can identify photos and photo-realistic images range from birds other computing systems cannot. Lay- other images it has never seen before. and insects to mountains and even ve- ered nodes learn from each other— For example, Clune has achieved an ac- hicles. He describes the technology as and from other networks—much like curacy rate as high as the 96.6% in the a “game changer.” Remarkably, over the way children learn. Remarkably, neural net compared to the 40,000+ time, certain neurons in the deep learn- because of their overall complex- humans who volunteered to label the ing network become better than others ity, nobody knows exactly how each same images. Others have found that at recognizing and generating specific trained artificial neural net produces the neural nets actually outperform hu- things, such as eyes, noses, bugs, or its useful results. mans. Remarkably, “In most cases, we volcanoes. “The system actually figures Rapid advancements in neural nets can train a neural net within a couple out what it needs to recognize and know and deep learning are a result of sev- of days,” he says. and allocates neurons to these concepts eral factors, including faster and better Of course, this doesn’t mean that automatically,” he says. GPUs, larger nets with deeper layers, all systems are equally effective--and To be sure, generative networks huge labeled datasets to train on, new that the results are consistently use- have value that extends beyond pro- and different types of neural nets, and ful. There’s also the goal of pushing the ducing artificial images for art, video improved algorithms. Typically, for boundaries of computer image recog- games, or /virtual re- computer image recognition, research- nition further. At present, researchers ality (AR/VR). Researchers have begun ers feed lots of pictures of things—mo- train systems using labels. This means to use generative networks in competi- torcycles, chimpanzees, trees, or space designating images for one type of ani- tion with image-recognition networks objects, for example—into the system mal ‘a lion’ and another ‘a zebra,’ or one to generate even more accurate results. so the neural net can learn what an ob- galaxy ‘a spiral’ and another ‘an ellipti- Within this scenario, the generator ject looks like and how to differentiate cal.’ The problem with this approach is network creates fake images and the it from others. If a researcher is train- that it’s time consuming and sometimes image recognition network, known as ing the neural net to recognize ani- expensive. What is more, “sometimes a discriminator, analyzes the images mals, the system tends to learn faster you don’t have labels, or they are noisy and attempts to separate the real from and better if old data is transferred labels,” says Ce Zhang, an assistant the fake images. The discriminator to the new task. For instance, if the professor in the Systems Group at ETH later checks the validity of its findings original task was to identify lions and Zurich. For instance, a “cougar” label and uses those results to further refine zebras, adding this data to the job of might confuse the system if it is present- its algorithm. Over time, the discrimi- identifying elk and bears will help. ed with both the car and the animal. nator becomes smarter and tells the The system succeeds because there Consequently, researchers are in- generator how to adapt its output to is now a shared knowledge between terested in an emerging area of deep generate even more realistic images. the two paths. “Already being good at learning that relies on different train- The advantage of this approach is

14 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 news that the discriminator, referred to as a do a job better, but they also offer new generative adversarial net (GAN), learns ways of looking at the data.” ACM over time what matters most in the im- The view into the future is equally age, Zhang says. At a certain point, the compelling. Lanusse says that in the Member system displays almost human-like in- coming years neural networks will drive tuition, he says; “results improve sig- enormous advances in fields beyond as- nificantly.” Interestingly, this approach trophysics. These systems will not only News not only improves the quality of image detect, recognize, and classify objects, ENSURING TECHNOLOGY detection, it may also trim the time re- they will understand what is taking place BEHAVES CORRECTLY quired to train a network by reducing in an image or in a scene in real time. “Things should the number of images—essentially the This, of course, could profoundly impact do what they are expected to do, volume of data—required to obtain everything from the way autonomous ve- according to a useful results. Says Zhang: “An interest- hicles operate to how medical diagnos- specification,” ing question is how can we lower the re- tics work. Ultimately, they will help us says Marta Kwiatkowska, quirement of a neural network in terms unlock the mysteries of our planet and professor of computing systems of how much data it needs to achieve the universe. They will deliver a level of at the University of Oxford. She the current level of quality?” understanding that wouldn’t have been explains that something should Another step is to make today’s ar- imaginable only a few years ago. happen with high probability, within an appropriate or tificial neural nets easier to use. The Says Lanusse, “Computer image expected time or expected range. technology is still in its infancy and recognition is advancing rapidly. We “My main focus is on developing researchers often struggle to use tools are finding ways to train networks verification techniques and and technology effectively. In some faster and better. Every gain in speed model checking for probabilistic systems, which ensure software, cases, they have to work with multiple and accuracy of even a few percent systems, hardware, and nets in an iterative fashion to find one makes a profound difference in the protocols behave correctly.” that works best. As a result, Zhang has real-world impact.” Kwiatkowska has held a statutory chair in the Department developed a software program, ease.ml, of Computer Science at Oxford, that configures deep learning neural and a professorial fellowship at Further Reading networks in a more automated and ef- the University’s Trinity College, ficient way. This includes optimizing Nguyen, A., Yosinski, J., Bengio, Y., since 2007. Prior to that, she was a professor in the School components such as CPUs, GPUs, and Dosovitskiy, A., and Clune, J. Plug & Play Generative Networks: of Computer Science at the FPGAs and providing a declarative lan- Conditional Iterative Generation of Images University of Birmingham, a guage for better managing algorithms. in Latent Space. Computer Vision and lecturer at the University of “Right now, the user needs to deal Pattern Recognition (CVPR ‘17), 2017. Leicester, and an assistant http://www.evolvingai.org/ppgn professor at Jagiellonian with a lot of different decisions, includ- University in Krakow, Poland. ing the type of neural net they want to Lanusse, F., Quanbin, M, Li, N., Collett, T.E., Li, She earned an undergraduate use. There may be 20 different neural C., Ravanbakhsh, S., Mandelbaun, R., degree in computer science at Jagiellonian University, writing nets available for the same task. Choos- and Poczos, B. CMU DeepLens: Deep Learning for programs on punch cards in ing the right model and reducing com- Automatic Image-based Galaxy-Galaxy PASCAL. Kwiatkowska then plexity is important,” he explains. Strong Lens Finding. March 2017. earned a master’s degree Already, the software, combined arXiv:1703.02642. from Oxford, and a Ph.D. in https://arxiv.org/abs/1703.02642. computer science from the with other deep learning techniques— University of Leicester. including an algorithm called ZipML Wang, K., Guo, P., Luo, A., Xin, X., and Duan, F. Initially her research interests that reduces data representation with- Deep neural networks with local centered on concurrent and distributed systems, but in 1995 out reducing accuracy—has cut noise connectivity and its application to astronomical spectral data. Kwiatkowska started working and sharpened images significantly for 2016 IEEE International Conference on verification techniques. the astrophysics group at ETH Zurich. on Systems, Man, and Cybernetics (SMC), Her research covers a range of As a result, Schawinski and others can Budapest, 2016, pp. 002687-002692. applications including biological systems, DNA computations, now peer more deeply into the universe. doi: 10.1109/SMC.2016.7844646. http://ieeexplore.ieee.org/ and analyzing the behavioral “Unlike other areas of science, we correctness of pacemakers, document/7844646/ cannot run experiments in a lab and among others. simply analyze the results,” ETH Zurich Goodfellow, I.J., Pouget-Abadie, J., Kwiatkowska now studies Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., autonomous systems and the explains. “We are dependent on tele- Courville, A., and Bengio, Y. application of verification scopes and images to look back in time. Generative Adversarial Networks. techniques to robotics. “We need We have to piece together all these fixed June 2014. eprint arXiv:1406.2661. to develop methods to verify the correctness of the behavior snapshots—essentially huge data- http://adsabs.harvard.edu/cgi-bin/bib_ query?arXiv:1406.2661. of robots,” she says. “I am sets—to gain insight and knowledge.” also looking at verification for Adds Lanusse: “Classical methods machine learning, specifically of astronomy and astrophysics are rap- Samuel Greengard is an author and journalist based in neural networks, which are West Linn, OR. now being used in perception idly being superseded by data science algorithms for self-driving cars.” and machine learning. They not only © 2017 ACM 0001-0782/17/09 $15.00 —John Delaney

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 15 news

Technology | DOI:10.1145/3121442 Gregory Mone Broadband to Mars Scientists are demonstrating that lasers could be the future of space communication.

N MARCH, THE U.S. National Aero- nautics and Space Administra- tion (NASA) announced that its planned Orion spacecraft, which could one day carry as- Itronauts to the Moon and Mars, will include a new kind of communica- tion system. Typically, manned and unmanned vehicles and probes use radio waves to send and receive infor- mation. For decades, though, scien- tists have been pushing toward using laser-based communications in space. Lasers are no faster, but they can de- liver far more information than radio waves in the same amount of time. NASA’s Apollo missions to the Moon were capable of transmitting 51kb worth of data per second, for example, but Orion’s planned Laser-Enhanced Mission and Navigation Operational Artist’s conception of how a NASA spacecraft would use lasers to communicate with Earth. Services (LEMNOS) system could send back more than 80 megabytes each sec- the invention of the laser itself, notes applied within Earth’s orbit as well. ond from the lunar surface. Abi Biswas, supervisor of the Optical The European Space Agency (ESA) and That stream could be packed with Communications Systems group at Airbus recently put lasers to work as a rich scientific data, or it could in- NASA’s Jet Propulsion Laboratory in broadband data transfer technology, the clude ultra-high-resolution video of Pasadena, CA. From a basic physics European Data Relay System (EDRS). distant worlds. Scaled-up versions of standpoint, Biswas says the advan- Normally, a satellite flying in low this system could dispatch movies of tage is clear: lasers occupy the higher- Earth orbit transmits data only when it dust devils, storms, or even astronauts frequency end of the electromagnetic is within view of a ground station. As a walking on the surface of Mars. Dur- spectrum, relative to radio waves. That result, it may take 90 minutes for the ing the six-month-long trip to the Red means the beam itself is much nar- ground station to receive data after it Planet, space travelers could poten- rower. If you were to aim a beam of ra- has been collected. tially trade videos with family mem- dio waves back at Earth from Mars, the In the EDRS system, lasers are used bers back on Earth, and mitigate the beam would spread out so much that both to send more data and to acceler- psychological toll of the long journey. the footprint would be much larger ate its transfer. A geostationary satel- The LEMNOS project is just one of than the size of our planet. “If you did lite locks onto the low-orbiting satel- many planned or existing laser-based the same thing with a laser,” Biswas lite via laser the moment it passes over communications systems in orbit and says, “the beam footprint would be the horizon, then remains connected beyond. about the size of .” as the craft soars over the hemisphere These recent and anticipated ad- When those beams are sent with below. The observing satellite begins vances cannot be attributed to a single, the same amount of power, the laser transmitting data via laser once the revolutionary breakthrough, according ends up concentrating more power link is established. The satellite can to experts. Instead, this new age of laser- on that receiver. “You can send many transfer far more data this way, but based broadband in space has resulted more bits of information for the same it also gets that data to the ground from steady improvements in detectors, amount of power,” Biswas explains. faster. Instead of waiting for the ob- actuators, control systems, and more. Relative to radio, laser or optical com- serving satellite to fly within view of munications can transmit anywhere the ground station, the laser transfer Broadband in Orbit from 10 to 100 times as much data. begins once the craft establishes line The idea of laser communications in The advantages are not limited to of sight with the geostationary craft,

space has been around nearly since solar system exploration; they can be which then transmits data to the OF NASA COURTESY IMAGE

16 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 news ground via radio. “You cut down the one meter wide. One way around this been working on a project to propel time or delivery of the data to the end would be to build a larger receiving miniature spacecraft beyond the so- user on the ground from hours to 10 antenna, but the goal of the LLCD lar system using a phased array of to 20 minutes,” says Michael Witting, was to show that an optical commu- either ground- or space-based lasers. program manager for ESA EDRS. nication system could work without a The spacecraft would have a modest This speed, combined with the abil- massive—and massively expensive— laser to send back data, and Lubin ity to transmit more high-resolution dish on the ground. “You have to fig- says the array used to propel the craft satellite images, will allow organiza- ure out, how can I catch this dancing could also be engineered to receive tions to track the movement of ice in signal onto a very sensitive detector its messages. “If we’re setting up to polar regions to help ships navigate and then add very little noise?” asks blast something out with lasers, then the Arctic crossing. Officials could Don Boroson, a research fellow in the why not use that system to send some- monitor oil spills, earthquakes, floods, Massachusetts Institute of Technol- thing back?” he asks. That something and other instances in which informa- ogy (MIT) Lincoln Laboratory’s Com- probably will not include video, but tion needs to travel quickly to disaster munication Systems Division, and a the lasers could dispatch images and response teams. major contributor to the LLCD. other information. The EDRS is already in use, and ESA For the Moon demonstration, Bo- Back on Earth, larger receiving tele- is scheduled to launch a second satel- roson says the group used an old idea scopes would help pick up signals from lite in 2018. known as error correction coding, the Moon, Mars, or beyond. Currently, NASA has a similar project in the which intelligently bundles in redun- NASA scientists are demonstrating how works, and while the link does not ex- dant bits, so you can still decipher an laser communications systems work tend all the way to the Moon or Mars, entire message even if you only catch with small receivers, but with the kind Witting says the technical challenges part of the beam. So, if they were try- of ground telescopes that measured 10 were significant. The system operates ing to send a message that was 10,000 to 15 meters across, it would be pos- over approximately 45,000 kilometers bits, they’d add in another 10,000 sible to catch far more light and infor- (about 28,000 miles), and each la- carefully chosen redundant bits, and mation. Boroson doesn’t expect those ser terminal must locate and remain send 20,000 in all. Then, even if only receivers to be built anytime soon, but locked on the other throughout flight. half of that message was received, the he does anticipate laser communica- “It’s like taking a torch from Europe original 10,000-bit code could still be tions will be used more and more. and hitting a coin in New York,” Wit- deciphered. This approach was criti- “It’s going to happen slowly,” says ting says— all while the coin is racing cal, Boroson explains; “it allowed us Boroson. “First we’ll see lots of systems at about 17,000 miles per hour. to have as small as possible a receiver around the Earth, then a few systems on the ground and still do these very further out in space, and then more and Lasers from the Moon high data rates and make no errors. more. But it’s all coming, it’s definitely As you move out to larger distances, We did the lunar link with half a watt coming.” such as the Moon or Mars, the chal- and a four-inch telescope in space, lenge increases. Biswas compares the and we still did 622 megabits per sec- Further Reading effort involved with hitting a target on ond to the ground.” Earth from the Moon or Mars to try- Biswas, A. Piazzolla, S. Moision, B., ing to look at a small object through Making Every Photon Count vand Lisman, D. Evaluation of deep-space laser a one-meter-long straw; holding that Pushing beyond satellite or lunar communication under different mission straw steady enough to keep it fo- communication increases the techni- scenarios, Proceedings of SPIE, 2012. cused is a tremendous challenge. If cal difficulty, because the laser beam Boroson, D.M. and Robinson, B.S. not held steady and aimed accurately, loses energy at a rate proportional to The Lunar Laser Communication the California-sized footprint of a laser the square of the distance between Demonstration: NASA’s First Step Toward beam traveling from Mars could actu- transmitter and receiver. Scaling up Very High Data Rate Support of Science ally miss its target on Earth, and fail to the power used to generate the laser and Exploration Missions, Space Science Reviews, Volume 185, 2014. transmit the data. is not an option, Biswas explains, be- Experts say the success of NASA’s cause the laser systems would become Lubin, P. A Roadmap to Interstellar Flight, Journal of 2013 test of such a system, the Lu- too large and expensive. “As you get the British Interplanetary Society, vol. 69, nar Laser Communication Demon- farther and farther away, you have to 2016. stration (LLCD), can be attributed improve the efficiency of your system,” Space Data Without Delay to a number of advances, including says Biswas. “You have to make every http://bit.ly/2pcIlt2 improvements in the actuators that photon count.” Hemmati, H. make micro-adjustments to the posi- Despite the challenges of larger Deep Space Optical Communications, John tion of the beam, ensuring it remains distances, physicist Philip Lubin of Wiley & Sons, 2006. on target, and advances in the control the University of California, Santa systems that determine exactly where Barbara, argues that lasers would still Gregory Mone is a Boston-based science writer and the it needs to aim. When the laser struck author, with Bill Nye, of Jack and the Geniuses: At the be a preferred means of communica- Bottom of the World. Earth, the beam was six kilometers tion for missions to the edges of our wide, but the receiver was less than solar system and beyond. Lubin has © 2017 ACM 0001-0782/17/09 $15.00

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 17 news

Society | DOI:10.1145/3121436 Logan Kugler Why GPS Spoofing Is a Threat to Companies, Countries Technology that falsifies navigation data presents significant dangers to public and private organizations.

HEN THE CREW of an $80-million super- yacht in the Ionian Sea checked its computer, they realized they were Wdrifting slightly off course, likely as a result of strong currents buffeting their ship. The crew made adjustments and went back to work—without realizing they were now taking directions from a hacker. In the bowels of the ship, Todd Humphreys, an associate professor in the Department of Aerospace Engi- neering and Engineering Mechanics at the University of Texas at Austin, Part of an animation showing how a radio navigation research team from The University of worked with his team to feed the super- Texas at Austin was able to successfully spoof the GPS system of an $80-million private yacht. yacht’s crew false navigation data us- ing a few thousand dollars worth of war in a billion-dollar battleship. A makes receivers behave any way you like. hardware and software. range of GPS devices and networks are “So far as I know, no commercial The crew was completely unaware used for everything from military appli- GPS receivers offer any strong de- they were now piloting in a direction of cations to commercial needs—and all fense against spoofing or even any re- Humphreys’ choosing. the use cases in between. liable spoofing detection capability,” Thankfully, it was all an experiment Yet all of these systems rely on the says Humphreys. that took place with the yacht owner’s data from the network of GPS satel- blessing. If it had been real, Hum- lites. If you can corrupt the data com- Stealing an $80-Million Superyacht phreys could have sent the superyacht ing from those satellites, you can cre- In 2013, Humphreys, then a researcher 1,000 miles off-course into the hands ate a world of headaches for systems in the Department of Aerospace Engi- of a rogue government, terrorist group, that rely on this data. neering and Engineering Mechanics at or professional criminal organiza- GPS spoofing can be performed with the Cockrell School of Engineering, was tion—and the crew would not have re- relatively low-cost tech, which is an ex- invited, along with a team of students, alized it until it was far too late. pensive problem for the people, com- aboard an $80-million yacht in the Io- Welcome to the very real dangers panies, and governments that trust the nian Sea to test their GPS spoofing tech- posed by Global Positioning System system implicitly. In the case of Hum- nology. Using his hardware and soft- (GPS) spoofing, or the dark art of con- phreys’ superyacht hacking, he and his ware rig, Humphreys managed to falsify vincing computers you are somewhere team used about $2,000 worth of tech. GPS data used by the ship, effectively that you’re not. It is surprisingly easy— Even in more advanced spoofing sce- giving him control over the vessel. and shockingly dangerous, because narios, the technology is still straight- Humphreys explained GPS receivers we’re not prepared for it at all. forward, says Dinesh Manandhar, an calculate their distance from several associate professor and GPS expert at satellites at the same time. Each satel- GPS Is Easy to Spoof the University of Tokyo. lite has a code—called a pseudoran- The U.S. Global Positioning System “A device that can generate GPS sig- dom noise (PRN) code—that identifies consists of 24 satellites that orbit Earth. nals is necessary. Such devices are avail- which satellite in the GPS network is GPS devices receive signals from the able from GPS signal simulator device broadcasting. Humphreys’ spoofing nearest satellites that allow them to de- manufacturers,” Manandhar explains. equipment slowly replaced the real termine their precise location, whether These devices are used to test GPS re- GPS signals with fake ones, working you’re looking for creatures in the wild- ceivers in factories. As such, they can be delicately so the ship’s system did not

ly popular Pokémon Go app, or going to programmed to transmit a signal that detect an abrupt change in signal. AUSTIN AT OF UNIVERSITY OF TEXAS COURTESY IMAGE

18 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 news

The spoofed GPS reported the yacht Department of Homeland Security’s re- was three degrees off-course. The crew, cent document on anti-spoofing, ”Im- unaware when the experiment would Cargo shipments proving the Operation and Develop- take place, adjusted the ship’s course are at risk from ment of Global Positioning System based on the spoofed GPS. The crew as- (GPS) Equipment Used by Critical Infra- sumed it was due to natural forces such GPS spoofing, structure,” as a sign that the right par- as water currents and crosswinds.” as are geofences — ties are taking GPS spoofing seriously. GPS spoofing can be used for all Manandhar has developed anti- sorts of nefarious purposes. As seen digitally proscribed spoofing methodologies for Japanese with the yacht, cargo shipments are at boundaries used satellites that may be used in the next risk, especially dangerous or high-val- generation to be sent into orbit, he ue ones that are required to follow des- by many corporations says. He recommends that major navi- ignated GPS routes. Geofences—or to protect gation data provider countries like the digitally proscribed boundaries—are U.S., Japan, the European Union, Chi- used to protect sensitive data in many sensitive data. na, and India conduct official joint dis- corporations; GPS spoofing could be cussions on the security of their sys- used to access that data well out of the tems at the International Committee bounds intended. on Global Navigation Satellite Sys- Once you add emerging technolo- tems, an organization under the um- gies, like self-driving cars, to the mix, it the data right away; only when the sig- brella of the United Nations. gets even scarier. Autonomous vehicles nature was verified would the client The dangers, however, are not going use GPS data at regular intervals not use the GPS data it had received. away. Humphreys worries particularly only to understand where they are, but “Using cryptography makes it hard that spoofing the GPS-sourced timing also to decide where to drive passen- to forge a signature, such that even an used to regulate financial databases gers and cargo. adversary that can feed the client with could create havoc. Industries like fi- Humphreys’ yacht spoofing was the false data cannot forge a signature, nancial services, he says, “have back- first time commercial tech had been thus the client does not use forged ups in place, but on close inspection used in such an effective—and power- data,” Ashur says. one realizes that the backups them- ful—demonstration. This would prevent, say, spoofing the selves are either short-term or eventu- Now, said Manandhar, it is even eas- signal to hijack a self-driving car or re- ally trace their source to GPS.” ier to acquire spoofing technology. “Re- route a drone that relied upon the data. “A coordinated attack that under- cently, software-based low-cost devices However, the Galileo system, which stood the finance world’s dependency have become available that cost less comes fully online in 2020, presented a on GPS would be hard to detect and than $1,000.” unique obstacle: low bandwidth. Gali- even harder to defeat,” he cautions. leo has relatively low-bandwidth sig- A Problem for Governments, People nals that make a typical approach to Further Reading It is not just yacht owners who need to the problem, using public-key cryptog- be concerned; the problem is espe- raphy, impossible. Psiaki, M., and Humphreys, T. cially acute for national governments Protecting GPS from Spoofers “The uniqueness of our solution is Is Critical to the Future of Navigation, and international bodies, which are that it uses symmetric cryptography and IEEE Spectrum, Jul 29, 2016, waking up to the dangers posed by can thus fit into the bandwidth con- http://spectrum.ieee.org/telecom/security/ GPS spoofing. straints,” says Ashur. The protocol is protecting-gps-from-spoofers-is-critical-to- Incredibly, Europe’s Galileo glob- scheduled to go into effect in 2018, ac- the-future-of-navigation al navigation satellite system—the cording to ZDNet. Until all 24 of Gali- Amirtha, T. European Union’s version of GPS— leo’s satellites are deployed and opera- Satnav spoofing attacks: Why these operated beginning in December tional in 2020, however, the protocol will researchers think they have the answer, ZDNet, Mar 27, 2017, 2016 “with no way to protect civilian “operate in test mode.” http://www.zdnet.com/article/satnav- users from hacking attempts,” re- In the meantime, manufacturers spoofing-attacks-why-these-researchers- ported ZDNet. are starting to pay attention to the think-they-have-the-answer/ University of Leuven researchers problem, says Humphreys. Some, like U.S. Department of Homeland Security, Ashur and Rijmen say they have devel- u-blox, a Swiss company that creates National Cybersecurity & Communications oped an authentication protocol to deter wireless semiconductors and modules Integration Center, National Coordinating the forging of Galileo’s navigation data. for consumer, automotive, and indus- Center for Communications Improving the Operation and Development The protocol, called the TESLA sig- trial markets, offer anti-spoofing mea- of Global Positioning System (GPS) nature, is designed to complement lo- sures such as the capability to detect Equipment Used by Critical Infrastructure, cation data with a cryptographic “sig- fake global navigation satellite system http://bit.ly/2oZewfz nature,” so Galileo’s satellites would (GNSS) signals, as well as a message in- send both navigation data and the tegrity protection system to prevent Logan Kugler is a freelance technology writer based in Tampa, FL. He has written for over 60 major publications. cryptographic signature to the receiv- “man in the middle” attacks. ing client. The client would not trust Humphreys also points to the U.S. © 2017 ACM 0001-0782/17/09 $15.00

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 19 news

Milestones | DOI:10.1145/3122790 Lawrence M. Fisher Turing Laureates Celebrate Award’s 50th Anniversary

CM RECENTLY HELD a con- ics related to their fields of study. mental contributions to artificial intel- ference in celebration of After welcomes from Hanson, pro- ligence through the development of a the first 50 years of the gram chair Craig Partridge, and master calculus for probabilistic and causal ACM A.M. Turing Award. of ceremonies (and past ACM presi- reasoning”), who spoke about an evo- “Just over 50 years ago, dent) Dame Wendy Hall, 2008 Turing lutionary advance 40,000 years ago that AACM awarded its first A.M. Turing Laureate Barbara Liskov (who received allowed Homo sapiens to advance past Award to for his work on the award “for contributions to prac- competitor species Homo erectus and advanced programming techniques tical and theoretical foundations of the Neanderthals. “The ability to imag- and compiler construction,” said ACM programming language and system de- ine things that do not physically exist president Vicki L. Hanson. “In total, 64 sign, especially related to data abstrac- … the ability to model one’s environ- people from around the world have re- tion, fault tolerance, and distributed ment, imagine other worlds, served to ceived the Turing Award, recognizing computing”) offered a presentation accelerate evolution in favor of Homo work that laid the foundations of mod- on the “Impact of Turing Recipients’ sapiens,” he said. ern computing.” Work” focusing on the impact of early The session on “Restoring Person- The award was presented to its 65th Turing recipients, which she described al Privacy Without Compromising Na- recipient, Sir Tim Berners-Lee, at the as “tremendous.” tional Security” featured 2015 Turing event in June. A session on “Advances in Deep Laureate Whitfield Diffie (co-recipi- The conference included more than Neural Networks” featured 2011 Tur- ent of the award with Martin Hellman 20 Turing Laureates speaking on top- ing Laureate Judea Pearl (“for funda- “for inventing and promulgating both

Among the 22 Turing Laureates in attendance at the conference were: Front row, from left: Whitfield Diffie (2015), Martin Hellman (2015), Robert Tarjan (1986), Barbara Liskov (2008). Second row, from left: Vinton Cerf (2004), Richard Karp (1985), Richard Stearns (1993), Dana Scott (1976). Third row, from left: Ivan Sutherland (1988), (2010), Robert Kahn (2004). Fourth row, from left: Frederick Brooks

(1999), Raj Reddy (1994), William (Velvel) Kahan (1989), Donald Knuth (1974). LAYNE MISTI BY PHOTOGRAPHS

20 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 news asymmetric public-key cryptography, including its application to digital signatures, and a practical crypto- graphic key-exchange method”), who observed that calls by government agencies to incorporate “backdoors” in computing systems that would al- low them to bypass normal authen- tication or encryption are not really necessary. “New backdoors aren’t re- quired; the security failures of most programs give the government ample opportunity to ‘break in.’” In a discussion about “Preserving Our Past For The Future,” 2004 Tur- ing Laureate (and ACM past president) (“with Robert E. Kahn, for pioneering work on internetworking, including the design and implemen- tation of the Internet’s basic commu- nications protocols, TCP/IP, and for Laureates, from left, Vinton Cerf, Edward Feigenbaum, and Raj Reddy. inspired leadership in networking”) related an anecdote about coming across an old 3.5-inch floppy disk and tracking down a compatible disk drive, but still being unable to open the files on the disk because they were saved in an outdated version of WordPerfect. “Backward compatibility suffers be- cause you can’t keep everything,” like the version of WordPerfect needed to open those files, he said. In a session on the future of micro- electronics entitled “Moore’s Law Is Really Dead: What’s Next?” moderator John Hennessy of said, “We’re reaching the end of silicon technology as we know it. “ As a result, said Doug Burger of Microsoft Re- search, “We’re entering a wild, messy, destructive time. It sounds like a lot of fun.” Margaret Martonosi of Princeton A panel on Moore’s Law was moderated by John Hennessy (left) and included Doug Burger, University said, “We’re entering a post- Norman Jouppi, Butler Lampson (1992), and Margaret Martonosi. ISA, Post-CPU era … we need to be ex- ploring design processes to be domain- “A consequence of hardware changes and construction of large-scale artifi- specific, and we need to train students not going to be invisible anymore is, cial intelligence systems, demonstrat- that way as well.” you need a strategy for changes in the ing the practical importance and po- Butler Lampson, the 1992 Turing software stack.” tential commercial impact of artificial Laureate (“for contributions to the With regard to hardware advances, intelligence technology“) said, “We development of distributed, personal Lampson said, “What people care need to identify technological solu- computing environments and the about is that the cost of running their tions to societal problems. I believe we technology for their implementation: application drops. “ can.” One of those solutions, he said, workstations, networks, operating sys- Norman P. Jouppi, Distinguished might be “designing self-healing sys- tems, programming systems, displays, Hardware Engineer at Google, con- tems in every system we design.” security, and document publishing”), cluded Moore’s Law is “not dead, it’s In the future, Reddy said, there will said, “There’s plenty of room at the top; just resting.” be no separation between humans and there’s room in software, algorithms, Regarding “Challenges in Ethics technology. “Humans will have tech- and hardware.” He added, “We know and Computing,” 1994 Turing Laure- nology in their bodies and be able to do there’s a lot of software bloat, that we ate Raj Reddy (co-recipient with Ed Fei- things no person or computer could do can get rid of, at a cost.” Also, he said, genbaum “for pioneering the design alone. That system should have ethics.

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 21 news

Leonard Adleman (2002).

Kenneth Thompson (1983).

Judea Pearl (2011) moderated a panel on deep neural networks.

Andrew Chi-Chih Yao (2000).

The newest Turing Laureate—Sir Tim Berners-Lee.

22 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 news

Unfortunately, that’s trumped by laws puter Science as a Major Body of Accu- Around the Corner? Or Maybe Both at and government.” mulated Knowledge.” Computer sci- the Same Time?” 2000 Turing Laureate “Accountability is what we want from ence, he said, shares with mathematics Andrew Chi-Chih Yao (“in recognition all systems,” Reddy said. “The role of “the great privilege that we can invent of his fundamental contributions to philosophers/ethicists is to convince the problems to work on.” Basically, he said, the theory of computation, including government,” because “if it is not writ- computer science and mathematics are the complexity-based theory of pseudo- ten into the law, nothing will change. “are two parallel disciplines with a lot in random number generation, cryptogra- Unless we find mechanisms to get it into common, but a distinct difference.” phy, and communication complexity”) the legal system, we can have all kinds of Knuth said he was both “ optimistic said, “ I am a believer in quantum com- discussions and nothing will happen.” and pessimistic” about artificial intelli- puting,” adding, “it seems clear that Opening the second day of the con- gence, and that he is “more pessimistic the technology of quantum computing ference, 1974 Turing Laureate Donald when it is based on the notion humans is going to have a big practical impact.” Knuth (“for his major contributions to make rational decisions.” Yao described quantum computing the analysis of algorithms and the de- The 79-year-old Knuth said he con- as “a great experiment, and we’re all sign of programming languages, and in siders “computer programming is art, waiting to see what can come of it.” He particular for his contributions to the in the sense that it’s not from nature, also called is “a great paradigm for in- ‘art of computer programming’ through as well as being beautiful.” terdisciplinary computing.” his well-known books in a continuous As a member of the panel discuss- The session on “Augmented Reality: series by this title“) addressed “Com- ing “Quantum Computing: Far Away? From Gaming to Cognitive Aids and Beyond” was the only session to fea- ture two Turing Laureates: 1988’s Ivan Sutherland (“for his pioneering and visionary contributions to computer , starting with , and continuing after“), and 1999’s Freder- ick P. Brooks, Jr. (“for landmark con- tributions to computer architecture, operating systems, and software engi- neering”). Brooks said he has a vision of using augmented reality (AR) for the purpose of training emergency teams. He asked the panel about “the state of actual use of augmented reality today? Who is using is a tool to earn their living?” Sutherland responded that the pilot of a jumbo jet, who trains in a simulator, is taking advantage of “some of the best VR () in use today,” while A young conference attendee takes a selfie with Ivan Sutherland (1988). Yvonne Rogers of University College London pointed out that head-up dis- plays “are a reality for navigation.” Pe- ter Lee, of Microsoft AI and Research, said there is “a lot of belief, interest, and a growing amount of experimenta- tion in AR, such as the ability to “tele- port” (virtual visit other locations); he added, “If we can teleport, there really isn’t a need for so many airplanes.” Sutherland added that the “greatest value of AR/VR is to show people things in a way that makes the underlying physics, the meaning, clear.” The full conference sessions are available at https://www.facebook. com/pg/AssociationForComputingMa- chinery/videos/. —Lawrence M. Fisher

Panel discussions during the conference drew a packed house. © 2017 ACM 0001-0782/17/09 $15.00

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 23 news

In Memoriam | DOI:10.1145/3125605 Lawrence M. Fisher Charles W. Bachman: 1924–2017 An engineer best known for his work in database management systems, and in techniques of layered architecture that include Bachman diagrams.

HARLES WILLIAM “CHARLIE” enced me as much as his creative ge- Bachman, the “father of data- nius. His respect for his colleagues, al- bases” who received the ACM Who inspired ways looking for their positive contribu- A.M. Turing Award for 1973 Bachman? tion, his patience in explaining ideas to for creating the first database “The inventors, people who were not always at his level, Cmanagement system, died June 13 at his humility and open mind in always the age of 92. the developers of listening to others as an opportunity to Born in Manhattan, KS, in 1924, learn something new, characterize him Bachman earned his B.S. in mechani- new concepts, the as a gentleman in this industry.” cal engineering in 1948, as well as an solvers of previously Haigh last saw Bachman when he M.S. in mechanical engineering from was “close to 90 but still sharp and en- the University of Pennsylvania. unsolved problems.” joying life; talking about the article he He went to work for Dow Chemical in was working on and his chats with E.O. 1950, using mechanical punched-card Wilson in the retirement community computing devices to solve networks of they shared. He never stopped trying to simultaneous equations representing tion, the first to win for a specific piece understand how things worked, or try- data from Dow plants. In 1957, Bach- of software, and the first who would ing to make them work better. I feel man became head of Dow’s Data Pro- spend his whole career in industry.” honored to have known him.” cessing Department, through which he The British Computer Society In 2014, Bachman was named a Fel- became a member of Share Inc., and a named Bachman a Distinguished Fel- low of the ACM for his contributions to founding member of the Share Data low in 1977 for his work in database sys- database technology. Processing Committee. tems. Bachman was named a Fellow of the In 1960, Bachman joined the Gener- Bachman received the U.S. National in 2015, for al Electric (GE) Production Control Ser- Medal of Technology and Innovation his work on database management sys- vices Group in New York City, using a (NMTI) for 2012. The award was pre- tems. Also that year, Michigan State factory in Philadelphia to test designs sented to Bachman in 2014 by President University awarded Bachman an honor- for a system to automate factory plan- Barack Obama. ary doctorate of engineering for being ning, scheduling, operational control, He was nominated for the NMTI by “at the forefront of computer science and inventory control. The resulting MI- U.S. Senator Edward J. Markey (D-MA), for more than 65 years.” ACS was based on the Integrated Data who said, “The would not Bachman’s son, Jon, said his father’s Store (IDS), Bachman’s concept of an be the worldwide hub for technological vision of the Integrated Data Store re- “information inventory,” and was first innovation had it not been for the sulted in “a high-performance direct ac- to adopt the “network data model” in achievements of Charles Bachman.” cess storage model (that) allows devel- which the system would support and Data scientist Gary Rector said Bach- opers to build large efficient databases enforce relationships between records. man was “humble, kind, generous, and of any type of business or operational Bachman moved to GE’s Computer a gentle soul; his entire family reflects data. In fact, the first versions were so Department in 1964, where he helped that humanity. Charlie loved flowers successful that they became established build another management information and had a smile that embraced every- as the most important system software system, the Weyerhauser Comprehen- one. His heart connected to people on mainframe computers of that era.” sive Operating Supervisor (WEYCOS 2). more meaningfully than any database In an interview in 2008, Bachman Bachman was awarded the ACM could ever do merely with data. To con- was asked who in the IT industry “in- A.M. Turing Award for 1973 for his con- nect to people in this way is the greatest spired you or was a role model for you?” tributions to database technology. As lesson he gave me.” He replied, “The inventors, the develop- biographer Thomas Haigh observed, George Colliat, a colleague from GE, ers of new concepts, the solvers of previ- “Bachman was the first Turing Award said, “I have learned from his ability to ously unsolved problems, the assem- winner without a Ph.D., the first to be look for solutions that transcend the blers of new and interesting combina- trained in engineering rather than sci- problems at hand and thereby multiply tions of old technologies. Take Sir ence, the first to win for the application the value of the solutions.” He added, Maurice Wilkes, Edsger Dijkstra, Sir

of computers to business administra- “Charlie’s human values have influ- Tim Berners-Lee.” FAMILY OF BACHMAN COURTESY PHOTO

24 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 25 viewpoints

DOI:10.1145/3126489 Joel R. Reidenberg VLaw and Technology Digitocracy Considering law and governance in the digital age.

IGITAL TECHNOLOGIES HAVE rules for citizens’ interactions online. The Internet’s Promise unleashed profound forces Where public-sector surveillance and Without a doubt, the Internet revolu- changing and reshaping private-sector tracking are so pervasive, tionized the dissemination of informa- rule making in the democ- citizens lose the ability to control the tion and the ability of individuals to racies of the information disclosure of their thoughts, friends, engage with each other. The euphoria Dsociety. Today, we are witnessing a activities, and no longer have privacy. surrounding the early days of the Inter- transformative period for law and Where lone coders wreak massive hav- net’s expansion into the public sphere governance in the digital age. Elected oc for private gain or for opposition to predicted that technology would ex- representative government and demo- governmental policies, they can use pand democracy and empower citizens cratically chosen rules vie for author- their information resources to reject around the world. The conventional ity with new players who have emerged majority rule. Where technology can wisdom thought citizen participation from the network environment. At the protect the anonymity of wrongdoers, would multiply online with e-govern- same time, network technologies have rule-breakers can escape accountabil- ment, and the public would have better unraveled basic foundational prereq- ity. In short, the modern information oversight of the state thanks to new ca- uisites for the rule of law in democracy society destroys one of the most fun- pabilities for monitoring administra- like privacy, freedom of association, damental truths of any democracy that tive and executive actions. The power and government oversight. The digital “the power to make the laws rests with of the Internet to disseminate informa- age, thus, calls for the emergence of a those chosen by the people.”a tion from one to millions and the pow- Digitocracy—a new set of more complex er of the Internet to foster conversa- governance mechanisms assuring pub- a King v. Burwell, 135 S. Ct. 2480, 2496 (2015). tions seemed an unstoppable force for lic accountability for online power held democratic discourse. Popular move- by state and nonstate actors through ments like the Arab Spring, the Occupy the creation of new checks and bal- We are witnessing Movement, and the Bernie Sanders ances among a more diverse group of U.S. presidential campaign illustrated players than democracy’s traditional a transformative that information technologies could grouping of a representative legisla- period for law indeed significantly enhance and en- ture, executive branch, and judiciary. able political organizing on a new, Where Google and Facebook know and governance unprecedented scale. Many expected more than most spy agencies about the in the digital age. that mechanisms like open electronic lives of millions of citizens as well as the proceedings for rule making and open inner workings of companies and gov- data for government transparency ernments, information powerhouses would herald better representative gov- and platforms can establish their own ernment and decision making.

26 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 V viewpoints

The Internet’s technical infrastruc- ternet in our daily lives has effectively circumvent traditional political checks ture turns out to challenge the promise demonstrated new vulnerabilities. The and balances and the public’s over- of the political empowerment of citi- Internet’s infrastructure has already sight of government suffers irrepara- zens. Just as network technologies of- displaced three key areas essential to bly. For example, in Oakland, CA, the fered organizational tools for political the rule of law in democracy: sover- police engaged in a mass-scale surveil- empowerment, the technologies them- eignty, government accountability, lance program to geo-locate thousands selves provided the means to reverse the and respect for law. Internet technolo- of mobile phones using stingray devic- hope that the Internet would be a one- gies restructure a state’s ability to pre- es without any judicial approval and, in way pro-democracy force. Network in- scribe and assure the enforcement of New York City, the police program to frastructure proved that it could be used law. Governments forfeit sovereignty record drivers through traffic cams and to frustrate empowerment dreams. to networks when services like cloud smart city sensors also escapes judicial Egypt, for example, pulled the plug on computing transcend borders and oversight. At the same time, technolog- the Internet for several days during the enable organizations to choose rules ically enabled leaks and wide dissemi- Arab Spring uprisings to block political in the blink of an eye. Network archi- nation of non-public activities of gov- organizing; Brazil shut down WhatsApp tecture enables technology develop- ernment through sites like WikiLeaks for 48 hours; local police in the U.S. used ers and service providers to embed may jeopardize legitimate functions stealth Stingray technology to engage in rules for online activities through of government such as international large-scale geo-surveillance of citizens. infrastructure choices. For example, relations and active law enforcement And, at the same time, Twitter bots cloud service providers like Dropbox investigations. Snowden’s leaks, for flooded social media in order to shut make determinations every day on example, are reported to have endan- down political dialog or to falsify sup- the security of users’ data. These en- gered the lives of British M16 agents in port for candidates, while hate and bul- cryption decisions determine the very Russia and China. lying flourish online. In short, the Inter- capability of states to examine user Laws lose their authority when gov- net has embedded the means to block data in lawful investigations. ernments can no longer control the political empowerment and discourse. Network infrastructure undermines use of power to enforce rules and hack- the oversight and accountability of ers have control over weapons of mass Undermining Democracy government. While open government disruption. Network infrastructure In the intervening years since the early technologies enable greater transpar- removes the state’s monopoly on the euphoria over the Internet’s political ency of public institutions, electronic use of coercive, police power to enforce

IMAGE BY ALICIA KUBISTA/ANDRIJ BORYS ASSOCIATES BORYS ALICIA KUBISTA/ANDRIJ BY IMAGE potential, the embedding of the In- tools also empower governments to rules and protect its citizens. Technol-

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 27 viewpoints

ogy allows lone-wolf actors unchecked for privacy have become more power- by states to create and deploy weapons ful in people’s lives than rules from the of mass disruption whether through Beyond undermining democratic constitutional framework. malware, ransomware, or botnets. For key aspects of Business organizations are likely to example, hospitals across the U.S. in serve as counterweights to govern- the spring of 2016 faced a wave of ran- the rule of law, ment power. Google’s Transparency somware attacks that left some in a the Internet Report, Apple’s defiance of an FBI re- “state of emergency.” ISIS uses crowd quest for encryption keys, and Micro- sourcing to sow terror in the U.S. and infrastructure has soft’s challenge to U.S. government Europe. Simultaneously, the infrastruc- toppled critical access to foreign-based servers each ture empowers private actors to engage reflect a check on the state’s intrusive- in vigilante actions. The underground substantive legal ness. And, individuals like Snowden group, Anonymous, recently illustrat- pillars of democracy. may serve as counterweights to states ed such actions when they threatened and businesses. Individuals and as- an electronic attack against ISIS fol- sociations of individuals have direct lowing the Paris massacres in Novem- authority when they coalesce with on- ber 2016. In essence, individuals and line tools ranging from social media to associations now have tools—outside hacktivism as they perceive the need the ability of state control—to enforce to interject and amplify their end goals their choices and rules online in ways online. All while national government that are independent of the state. To rapid and widespread dissemination provides checks on overreaching pri- be sure when a Texas college discov- of harmful content, while wrongdo- vate actors. Where each actor from a ered in 2015 that Facebook provided ers can shield their activities from ac- state to an individual can assure mass better real-time information for an on- countability through encryption and disruption online, fair governance will campus police emergency than 911, it anonymity tools. At the same time, free- require co-existence among the rule- becomes clear the state has even lost dom of expression limits the authority making actors. control over basic information it needs of states to ban nefarious online con- At the core, the assurance of public to protect its citizens. tent. In the U.S., for example, there is accountability online is the key objec- Beyond undermining key aspects of no public recourse for the rapid growth tive of Digitocracy. The mechanisms the rule of law, the Internet’s infrastruc- of anti-Semitic Twitter accounts. Users for states, private actors and citizens ture has toppled critical, substantive must appeal to the social media firms to co-exist as rule-makers in the net- legal pillars of democracy. Freedom of who, in turn, then decide what to sup- worked society are likely to be defined thought and association as well as pub- press or censor. By contrast, in Europe, in unexpected ways incorporating no- lic safety are essential elements of de- platforms bear more legal responsibil- tions of federalism, multistakeholder mocracy and privacy is a prerequisite. ity for content, but firms are often left governance, and subsidiarity. These Yet, the network infrastructure con- in the same position as an all-powerful tools will draw the boundaries of rule- tradicts the basic tenents of freedom censor. In effect, government is un- making authority among the state ac- of association and privacy. Network able to suppress the vile and corrosive tors, platform operators, corporate orga- functionality works thanks to ubiqui- online material that threatens citizens nizations, and empowered users. Each tous data surveillance. The resulting without resorting to oppressive, anti- actor, whether state or non-state, has an transparency of citizens to those in the democratic controls. important role to prevent overreaching network undermine both state and citi- by the other actors. In essence, Digitoc- zen’s respect for the rule of law. States The Opportunity of Digitocracy racy constructs a more multifaceted lose important checks and balances The information society lacks a model set of interwoven checks and balances against omnipotent acquisition of in- of governance suited to the digital age. to establish limits on the powers of formation and citizen’s freedom of Going forward, the digital age will need both state and non-state actors and a thought and association are undercut. a new system of checks and balances reliance on both to protect the public Counterintuitively, public safety and for its political decision making—a good. For our future, now is the time security are also destabilized by the “Digitocracy”—offering the opportuni- to begin the robust public discussion transparency when stalkers, social en- ty to develop new governing principles on our means of governance in the gineering hackers, and cyberwarriors that articulate who regulates what to digital age. find the informational keys to success preserve public accountability online. readily accessible online. Our challenge is how to construct Joel R. Reidenberg ([email protected]) is the Stanley D. and Nikki Waxberg Chair and Professor of Law, Freedom of expression is another the appropriate checks and balances. Fordham University, Director, Fordham Center on Law and cornerstone of democracy. Yet, de- Digitocracy’s dynamic will be much Information Policy, and Visiting Research Affiliate, Center for Information Technology Policy, . mocracies have a capability problem more complex than the analog world.

dealing with socially destructive con- Online private rule making like Twit- The author is preparing a book on this topic to be tent like hate, threats, and cyberbul- ter’s decisions regarding censorship, published by Yale University Press. lying that jeopardize public order and Adobe’s technical protections on digi- individual safety. Technology allows tal content, and Facebook’s settings Copyright held by author.

28 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 viewpoints

VDOI:10.1145/3126492 Carolina Alves de Lima Salge and Nicholas Berente Computing Ethics Is That Social Bot Behaving Unethically? A procedure for reflection and discourse on the behavior of bots in the context of law, deception, and societal norms.

TTEMPTING TO ANSWER the question posed by the title of this column requires us to reflect on moral goods and moral Aevils—on laws, duties, and norms, on actions and their consequences. In this Viewpoint, we draw on informa- tion systems ethics6,7 to present Bot Ethics, a procedure the general social media community can use to decide whether the actions of social bots are unethical. We conclude with a consid- eration of culpability. Social bots are computer algo- rithms in online social networks.8 They can share messages, upload pic- tures, and connect with many users on social media. Social bots are more common than people often think.a Twitter has approximately 23 million Items purchased by Random Darknet Shopper, an automated computer program designed as an online shopping system that would make random purchases on the deep Web. The robot of them, accounting for 8.5% of total would have its purchases delivered to a group of artists who then put the items in an exhibition users; and Facebook has an estimated in Switzerland; the robot was ‘arrested’ by Swiss police after it bought illegal drugs. 140 million social bots, which are be- tween 5.5%–1.2% total users.b,c Almost service by disseminating information been reported to behave badly in a 27 million Instagram users (8.2%) are about earthquakes, as they happen, in variety of ways across various con- estimated to be social bots.d LinkedIn the San Francisco Bay area. However, texts—everything from disseminat- and Tumblr also have significant so- in other situations, social bots can be- ing spami and fake newsj to limit- cial bot activity.e,f Sometimes their have quite unethically. ing free speech.k But it is not always activity on these networks can be in- clear whether their undesirable ac- nocuous or even beneficial. For exam- Social Bots Behaving Unethically tivity is simply a nuisance or whether ple, SF QuakeBotg performs a useful LinkedIn reports that social bots on it is indeed unethical—particularly the professional networking plat- given the random nature of the logic a http://bit.ly/2uDfIbP form are often used to “steal data underlying many social bots. Bad ac- b http://cnnmon.ie/2uFR4XJ about legitimate users, breaching tions are not necessarily unethical— c http://bit.ly/1ieIIXN the user agreement and violating d http://read.bi/1LFQJFU h e http://bit.ly/1Ktz5kc copyright law.” Social bots have i http://ubm.io/1MbsSf3 f http://tcrn.ch/2tKo90x j http://bit.ly/2ftn0It

IMAGE COURTESY OF !MEDIENGRUPPE BITNIK COURTESY IMAGE g http://bit.ly/2vneleU h http://bit.ly/2vFRI4E k http://bit.ly/14bDiuN

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 29 viewpoints

Bot Ethics: How to determine whether social bot actions are unethical. ethical questions, such as whether algorithms plant viruses in someone else’s device. This is clearly illegal and unethical. There are cases where a so- Social Bot Action cial bot might ethically violate the law, such as civil disobedience for a cause the creator considers just. However, civil disobedience is only ethical in 1. Break Law? Y Appeal to very rare cases in constitutional de- Majority? mocracies where legal recourse for 2. Involve Y Higher unjust laws pervade.6 Cases where a Deception? Duty? N law may be broken that are not unethi- 3. Violate Y If Evil, Less cal require justification—compelling Strong Norm? than Good? N arguments that appeal to moral stan- Justifiable? dards of the majority.6 Only in such rare cases may illegal acts be seen as moral and therefore ethical.6 Thus we Not Y Unethical Unethical ask “Is the illegal act justifiable?” Acts that are not suitably justifiable (that is, do not appeal to the morality of the majority) are unethical. Swiss author- ities did not file charges against the there are shades of gray that are dif- Bot Ethics: A Procedure to Evaluate Random Darknet Shopper developers.p ficult to judge. the Ethics of Social Bot Activity They argued that social bots can buy For example, Tay,l a social bot cre- Ethics in philosophy dates back thou- illegal narcotics over the Internet for ated by Microsoft to conduct research sands of years, and this Viewpoint col- the purpose of artq and that “ecstasy on conversational understanding, umn cannot do justice to the entire in this presentation was safe.” The went from “humans are super cool” field. However, because of the increas- behavior was not unethical because it to “Hitler was right I hate the Jews” ing prominence of social bots and their was justified according to the pervad- in less than 24 hours on Twitter due potential for malicious activity, ethical ing morality of the community. to malicious humans interacting judgment about their activity is nec- with the social bot.m In another case, essary. The best way to guide ethical Involve Deception? a social bot tweeted “I seriously want conduct in a community is to provide a If a social bot’s behavior does not to kill people” from randomly gen- procedure for reflection and discourse.5 break any laws, next evaluate for truth- erated sentences during a fashion The procedure we created is called “Bot fulness: “Is any deception involved?” So- convention in Amsterdam.n Clearly Ethics” (see the figure here) and it fo- cial bots may act deceitfully. For exam- such inadvertent comments violate cuses on the behavior of social bots with ple, they can misrepresent themselves our sensibilities and are distaste- respect to law, deception, and norms. as human beings2 or spread untruth- ful, but are they unethical? Perhaps, ful information (such as fake news). but by what standard do we judge? Break Law? Deceiving acts communicate false or Some social bots do more than just Many laws are developed from ethical erroneous assertions, violating the comment—clearly those that steal principles.6 Even when a law may be prima facie duty of fidelity. Social bots information and other misdeeds flawed, it is typically the ethical course should always act truthfully.3 However, are engaging in unethical activity, of action to follow that law.9 Therefore deceitful acts can be justifiable if the but, again, it is not always so clear. a natural first question is: “Does the ac- duty of fidelity is superseded by a high- For instance, the Random Darknet tion of the social bot break the law?” The er-order duty, such as beneficence.r Shopper—a social bot coded to ex- objective is to assess straightforward Deceptive, satirical actions may not plore the dark Web in the name of be unethical since they elicit pleasure, art—inadvertently purchased 10 Ec- improving the life of others. Consider stasy pills (an illegal narcotic) and a Social bots have been Big Data Batmans as an illustration. counterfeit passport.o So a law was broken, but was this unethical be- reported to behave p By “developer” we are referring to either the havior? We developed a procedure, organization or management of the organiza- badly in a variety tion or the software developer involved in the which we describe next, to help an- creation of the social bot. swer such questions. of ways across q http://bit.ly/2ud2cZC various contexts. r Beneficence is the duty to bring virtue, knowl- edge or pleasure to others; other duties, ac- l https://twitter.com/TayandYou cording to Ross 1930, include non-malefi- m http://bit.ly/14bDiuN cence, self-improvement, justice, gratitude, n http://bit.ly/2ttN5Ox reparation (see Mason et al.7, p. 132–133). o http://bit.ly/2vFGdu9 s http://bit.ly/2ttNUH7

30 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 viewpoints

The social bot finds every tweet with Conclusion the term big data, replaces “big data” We do not purport to write the last with “Batman,” and then tweets the Should the general word on social bot ethics and culpabil- message as if it were its own. It obvi- social media ity. Ethics is simply too complex of a ously substitutes its words for others’ domain to deal with fully in such a for- words, but the satire makes it difficult community blame mat. Nevertheless, some readily acces- to judge its ethics. Because the social developers for the sible guidance rooted in sound ethical bot might insult and embarrass some thinking is in order. big-data advocates the community unethical behavior of For example, with the recent at- must go beyond the act (deontology) to their social bots? tention to the role of social bots in consider its consequences (teleology), spreading misinformation in the and ask whether potentially bad ac- form of “fake news,” other social tions (for example, insult and embar- bots, such as Reuters News Tracer, rassment) outweigh, or supersede, the are being created to ferret out such good (for example, pleasure through deceitful activity.v The Bot Ethics laughter) for the involved parties. Hitler—Microsoft developers or those procedure can help the social media Again, is the deception justifiable? De- teaching the social bot to generate community understand when these ception in the absence of supersession racist statements? Similarly, who is re- deceitful actions are indeed unethi- is likely to be unethical. sponsible for the social bot buying the cal. It further helps to expand the illegal narcotics? focus of the community beyond nar- Violate Strong Norm? Aristotle1 said we can only assign cul- row (that is, only deceitfulness) and Social bots that are legal and truthful pability if we know that individuals be- simplistic (that is, good or bad bot) can still behave unethically by violat- haved voluntarily and knowingly. Invol- assessments of social bot activity to ing strong norms that create more evil untary situations likely do not apply to attend to the complexities of ethical than good. Moral evils inflict “limits on social bots. Developers who are coerced assessments. In short, the Bot Ethics human beings and contracts human into doing something unethical with- procedure serves as a starting point life.”4 Evil restrains, instead of emanci- out a choice may not be entirely cul- and guide for ethics-related discus- pating, evil actions reduce opportuni- pable, but in the case of free enterprise sion among various participants in ties. Let us go back to Tay’s racist com- there is always a choice. Therefore, cul- a social media community, as they ments on Twitter. Although not illegal pability rests on the knowledge of the evaluate the actions of social bots. (First Amendment protections apply), developers. Developers who knowingly nor deceitful, they violated the strong create social bots to engage in unethi- v http://bit.ly/2hIlfXG norm of racial equality. Social media cal actions are clearly culpable. They companies like Twitter that temporar- should be punished if evidence of their References ily lock or permanently suspend ac- wrongdoing is convincing—the penalty 1. Aristotle. Nicomachean Ethics of Aristotle. E.P. Dutton, NY, 1911. counts that “directly attack or threaten must be consistent and proportional 2. Ferrara, E. et al. The rise of social bots. Commun. ACM other people on the basis of race,”t to the harm done and those affected 59, 7 (July 2016); 96–104; DOI: 10.1145/2818717 7 3. Gotterbarn, D., Miller, K. and Rogerson, S. Computer have established that the moral evil should be compensated. society and ACM approve software engineering code of of racism outweighs the moral good But what about situations where ethics. Computer Society Connection, (1999), 84–88. 4. Grisez, G. and Shawn, R. Beyond the New Morality: The of free speech. By applying Bot Ethics developers act unknowingly? In those Responsibilities of Freedom. University of Notre Dame to Twitter’s norms we conclude that occasions the community must deter- Press, Notre Dame, IN, 1980. 5. Habermas, J. The Theory of Communicative Action, Tay’s actions were unethical. Yet, there mine whether developers are culpably Volume 1: Reason and the Rationalization of Society. 1985. are cases where social bots may violate ignorant—did they ignore industry best 6. Kallman, E.A. and Grillo, J.P. Ethical Decision Making and Information Technology. McGraw-Hill, New York, strong norms and not act unethically, practices in creating and testing their NY, 1996. as with asking inappropriate questions algorithms? If industry guidelines were 7. Mason, R.O., Mason, F.M., and Culnan, M. Ethics of Information Management. Sage Publications, (what is your salary?). Such violations not followed and the action was unethi- London, U.K. do not create moral evils. cal, developers are culpable. However, 8. Morstatter, F. et al. A new approach to bot detection: Striking the balance between precision and recall. developers who followed good develop- ASONAM, 2016. Culpability of Unethical ment practices and incorporated the 9. Rawls, J. The justification of civil disobedience. Arguing about Law (2013). 244–253. Social Bot Behavior current industry thinking, and yet their Should the general social media com- social bot still acted unethically, de- Carolina Alves de Lima Salge ([email protected]) is a munity blame developers for unethi- serve our pity and pardon, but they are doctoral candidate at the University of Georgia. cal behavior of their social bots? In not culpable. They should apologize, Nicholas Berente ([email protected]) is an associate the example of the algorithm that correct immediately, learn from their professor at the University of Georgia. randomly generated that it wanted to experience, and communicate the oc- kill people, who is responsible for the currence to the development commu- death threat? The programmer? Who nity. For example, Microsoft posted its is responsible for Tay’s remark about learning from Tay in blog form.u t http://bit.ly/19SJwlt u http://bit.ly/2tiPfMH Copyright held by authors.

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 31 viewpoints

VDOI:10.1145/3126494 Peter J. Denning The Profession of IT Multitasking Without Thrashing Lessons from operating systems teach how to do multitasking without thrashing.

UR INDIVIDUAL ABILITY to The first four destinations basi- mercial world with its OS 360 in 1965. be productive has been cally remove incoming tasks from your Operating systems implement mul- hard stressed by the sheer workspace, the fifth closes quick loops, titasking by cycling a CPU through a load of task requests we and the sixth holds your incomplete list of all incomplete tasks, giving each receive via the Internet. In loops. GTD helps you keep track of one a time slice on the CPU. If the task 2001, David Allen published Getting these unfinished loops. does not complete by the end of its O 1 Things Done, a best-selling book about The idea of tasks being closed loops time slice, the OS interrupts it and puts a system for managing all our tasks to of a conversation between a requester it on the end of the list. To switch the eliminate stress and increase produc- and a perform was first proposed in CPU context, the OS saves all the CPU tivity. Allen claims that a considerable 1979 by Fernando Flores.5 The “condi- registers of the current task and loads amount of stress comes our way when tions of satisfaction” that are produced the registers of the new task. The de- we have too many incomplete tasks. by the performer define loop comple- signers set the time slice length long He views tasks as loops connecting tion and allow tracking the movement enough to keep the total context switch someone making a request and you of the conversation toward completion. time insignificant. However, if the time as the performer who must deliver the Incomplete loops have many negative slice is too short, the system can signif- requested results. Getting systematic consequences including accumulations icantly slow down due to rapidly accu- about completing loops dramatically of dissatisfaction, stress, and distrust. mulating context-switching time. reduces stress. Many people have found the GTD When main memory was small, mul- Allen says that operating systems are operating system to be very helpful at titasking was implemented by loading designed to get tasks done efficiently completing their loops, maintaining only one task at a time. Thus, each con- on computers. Why not export key ideas satisfaction with work, and reducing text switch forced a memory swap: the about task management into a person- stress. It is a fine example of us taking pages of the running task were saved to al operating system? He calls his oper- lessons from technology to improve disk, and then the pages of the new task ating system GTD, for Getting Things our lives. loaded. Page swapping is extremely ex- Done. The GTD system supports you in pensive. The 1965 era OSs eliminated tracking open loops and moving them Multitasking this problem by combining multitask- toward completion. It routes incoming Unfortunately, GTD does not eliminate ing with multiprogramming: the pages requests to one of these destinations in another source of stress that was much of all active tasks stay loaded in main your filing system: less of a problem in 2001 than today. memory and context switching involves ˲˲ Trash This is the problem of thrashing when no swapping. However, if too many tasks ˲˲ Tasks that might one day turn out you have too many tasks in progress at were activated, their allocations would to be worth doing the same time.2 be too small and they would page exces- ˲˲ Tasks that serve as potential future The term multitasking is used in op- sively, causing system throughput to col- reference points erating systems to mean executing mul- lapse. Engineers called this thrashing, a ˲˲ Tasks delegated to someone else, tiple computational processes simulta- shorthand for “paging to death.” awaiting their response neously. The very first operating system Eventually researchers discovered ˲˲ Tasks that can be completed im- do this was the Atlas supervisor, running the root cause of thrashing and built mediately in under two minutes at the University of Manchester, U.K., in control systems to eliminate it—I will ˲˲ Tasks accepted for processing 1959. IBM brought the idea to the com- return to this shortly.

32 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 viewpoints viewpoints

Figure 1. In this memory map of a Firefox Browser in Linux, the colored pixels indicate that a page (vertical axis) is used during a fixed size execution interval (horizontal axis). The locality sets (pages used) are small compared to the whole address space and their use persists over extended intervals.

Instructions Modify Load Store html V 7bb0

6cc4

5dd8 PAGES

4eec

4000

0 75341312 150682624 226023936 301365249 INSTRUCTIONS (376706 per pixel) Page size: 4096: 0 to 2% memory

Human Multitasking numerals. With fewer context switches, decision process that can take quite a Humans multitask too by juggling sev- time-slicing is faster than fine-grained long time to decide—a situation known eral incomplete tasks at once. Cogni- multitasking but still slower than one- as the choice uncertainty problem.4 tive scientists and psychologists have at-a-time processing. A third factor that slows human multi- studied human multitasking for almost Human context switching is more tasking is gathering the resources neces- two decades. Their main finding is that complicated than computer context sary to continue with a task. Some resourc- humans do not switch tasks well. Psy- switching. Whereas the computer con- es are physical such as books, equipment, chologist Nancy Napier illustrates with text switch replaces a fixed number of and tools. Some are digital such as files, a simple do-it-yourself test.7 Write “I am bytes in a few CPU registers, the human images, sounds, Web pages, and remote a great multitasker” on line 1 and the has to recall what was “on the mind” at databases. And some are mental, things series of numbers 1, 2, 3, …, 20 on line the time of the switch and, if the human you have to remember about where you 2. Time how long it takes to do this. Now was interrupted with no opportunity to were in the task and what approach you do it again, alternating one letter from choose a “clean break,” the human has were taking to perform it. All these re- line 1 and one numeral from line 2. to reconstruct lost short term memory. sources must be close at hand so that you Time how long it takes. For most people, Context switching is not the only can access them quickly. the fine-grained multitasking in the sec- problem. Whereas a computer picks These three problems plague multi- ond run takes over twice as long as the the next task from the head of a queue, taskers of all age groups. Many studies one-task-at-a-time first run. Moreover, your brain has to consider all the tasks report considerable evidence of nega- you are likely to make more errors while and select one, such as the most urgent tive effects—multitasking seems to multitasking. This test reveals just how or the most important. The time to reduce productivity, increase errors, slow our brains are at context switching. choose a next task goes up faster than increase stress, and exhaust us. Some You can try the test a third time using linear with the number of tasks. More- researchers report that multitaskers are time-slicing, for example writing five over, if you have several urgent impor- less likely to develop expertise in a topic

FIGURE 1 COURTESY OF ANDRIAN MCMENAMIN FIGURE 1 COURTESY letters and then switching to write five tant tasks, your brain can get stuck in a because they do not get enough inten-

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 33 viewpoints

Figure 2. OS control system to maximize throughput with variable partition of main time needed for a task. memory determined by task working sets. ˲˲ Some tasks need to be held aside in an inactive status until you have the

main memory (active tasks) capacity to deal with them. Analog: the tasks awaiting waiting tasks queue. activation accepted ˲˲ When a task’s working set is in tasks free WS1 WS1 WS3 WS4 your workspace, protect it from being completed incoming tasks unloaded as long as the task is active. requests tasks put Analog: protect working sets of active aside by OS open valve when tasks and do not steal from other tasks. first waiting task’s ˲˲ WS fits into free You will thrash if you activate too many tasks so that the total demand is beyond your capacity. Analog: insuffi- cient CPU and memory for active tasks. sive focused practice with it. Some fret susceptible to thrashing as the num- ˲˲ If you are able to choose moments that if we do not learn to manage our ber of tasks sharing memory increases of context switch, select a moment of multitasking well, we may wind up be- because each gets a smaller workspace “clean break” that requires little men- coming a world of dilettantes with few and, when the workspaces are smaller tal reacquisition time when you return experts to keep our technology running. than the working sets, every task is to the task. If you cannot defer an in- Thrashing happens to human mul- quickly interrupted by a page fault. terruption to such a moment, you will titaskers when they have too many in- Under working-set partitioning need more reacquisition time because complete tasks. They fall into a mood the OS sizes the workspaces to hold you will have to reconstruct short-term of “overwhelm” in which they expe- each task’s measured working set. As memory lost at the interruption. Ana- rience considerable stress, cannot shown in Figure 2, it loads tasks into log: ill-timed interrupts can cause loss choose a next task to work on, and can- memory until the unused free space is of part of a working set. not stay focused on the chosen task. It too small to hold the next task’s work- You are likely to find that you can- can be a difficult state to recover from. ing set; the remaining tasks are held not accommodate more than a few Let us now take a look at what OSs aside in a queue until there is room for active tasks at once without thrash- do to avoid thrashing and see what les- their working sets. When a task has a ing. However, with the precautions sons we can take to avoid it ourselves. page fault, the new page is added to its described here, thrashing is unlikely. workspace by taking a free page; when If it does occur you will feel over- Locality, Working Sets, and Thrashing any page has not been used for T mem- whelmed and your processing effi- The OS seeks to allocate memory ory references, it is evicted from the ciency will be badly impaired. To exit among multiple tasks so as to maxi- task’s workspace and placed in the free the thrashing state, you need to reduce mize system throughput—the number space. Thus, the OS divides the memory demand or increase your capacity. You of completed tasks per second.3 among the active tasks such that each can do this by reaching out to other The accompanying Figure 1 is strong task’s workspace tracks its locality sets. people—making requests for help, re- graphical evidence of the principle of lo- Page faults do not steal pages from oth- negotiating deadlines, acquiring more cality—computations concentrate their er working sets. This strategy automati- resources, and in some cases cancel- memory accesses to relatively small lo- cally adjusts the load (number of active ing less important tasks. cality sets over extended intervals. Local- tasks) to keep throughput near its maxi- ity should be no surprise—it reflects the mum and to avoid thrashing. References 1. Allen, D. Getting Things Done. Penguin. 2001. way human designers approach tasks. Context switching is not the cause of 2. Christian, B. and Griffiths. T.Algorithms to Live By: The We use the term working set for OS’s thrashing. The cause of thrashing is the Computer Science of Human Decisions. Henry Holt and Company, 2016. estimate of a task’s locality set. The for- failure to give every active task enough 3. Denning, P. Working sets past and present. IEEE Trans mal definition is that working set is space for its working set, thereby caus- Software Engineering SE-6, 1 (Jan. 1980), 64–84. 4. Denning, P. and Martell, C. Great Principles of the pages used in a backward-looking ing excessive movement of pages be- Computing. MIT Press, 2015. 5. Flores, F. Conversations for Action and Collected window of a fixed sizeT memory refer- tween secondary and main memory. Essays. CreateSpace Independent Publishing ences. In Figure 1, T is the length of the Platform, 2012. Translation to Human Multitasking 6. McMenamin, A. Applying working set heuristics to sampling interval and the working set the Linux kernel. Masters , Birkbeck College, equals the locality set 97% of the time. Although the analogy with OSs is not University of London, 2011; http://bit.ly/2vFSgY8 7. Napier, N. The myth of multitasking, 2014; http://bit. Each task needs a workspace—its perfect, there are some lessons: ly/1vuBGcC own area of memory in which to load its ˲˲ Recognize that each task needs a pages. There are at least two ways to di- variable working set of resources (phys- Peter J. Denning ([email protected]) is Distinguished vide the total memory among the active ical, digital, and mental), which must Professor of Computer Science and Director of the Cebrowski Institute for information innovation at tasks. In fixed partitioning, the OS gives be easily accessible in your workspace. the Naval Postgraduate School in Monterey, CA, each task a fixed workspace. In work- Analog: the working set of pages. is Editor of ACM Ubiquity, and is a past president of ACM. The author’s views expressed here are not necessarily ing-set partitioning, the OS gives each ˲˲ Your capacity to deal with a task is those of his employer or the U.S. federal government. task a variable workspace that tracks the resources and time needed to get its locality sets. Fixed partitioning is it done. Analog: the memory and CPU Copyright held by author.

34 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 viewpoints

VDOI:10.1145/3126156 Gregorio Convertino and Nancy Frishberg Viewpoint Why Agile Teams Fail Without UX Research Failures to involve end users or to collect comprehensive data representing user needs are described and solutions to avoid such failures are proposed.

ESSONS LEARNED BY two user interactions supported by the app to ac- researchers in the software complish a goal).9 industry point to recurrent Even when customers ˲˲ With growing emphasis on good failures to incorporate user are involved, UX design, UX professionals, both de- experience (UX) research sometimes the teams signers and researchers, are gradually Lor design research. This leads agile being incorporated as required roles teams to miss the mark with their may still fail to involve in software development, alongside products because they neglect or mis- product managers and software de- characterize the target users’ needs the actual end users. velopers. A 2014 Forrester survey of and environment. While the reported 112 companies found that organiza- examples focus on software, the les- tions in which there was systematic sons apply equally well to the develop- investment in UX design process and ment of services or tangible products. user research self-evaluated as having greater impact than those with more Why It Matters to with wide adoption of mobile devices. limited scope of investment. the ACM Community Any new application needs to do some- These trends describe a new con- Over the past 15 years, agile and lean thing useful or fun, plus it needs to do text that often finds agile teams un- product development practices have it well and fast enough. In 2013, tech- prepared for two main reasons. First, increasingly become the norm in the nology analysts found that only 16% of while the agile process formally val- IT industry.3 At the same time, two people tried a new mobile app more ues the principle of collaboration synergistic trends have also emerged. than twice, suggesting that users have with customers to define the product ˲˲ End users’ demand for good user low tolerance for poor user experience vision, we and our colleagues in in- experience has increased significantly, (UX) (where UX is the totality of user’s dustry too often observe this princi- DILBERT © 2012 SCOTT ADAMS. USED BY PERMISSION OF ANDREWS MCMEEL SYNDICATION. ALL RIGHTS RESERVED. ALL RIGHTS PERMISSION OF ANDREWS MCMEEL SYNDICATION. USED BY ADAMS. © 2012 SCOTT DILBERT

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 35 viewpoints

ple not being put into practice: teams internal tools unavailable to external do not validate requirements system- customers; and do not need to use the atically in the settings of use. Second, Agile teams product within the target users’ time even when customers are involved, without constraints or digital environment. sometimes the teams may still fail to Second, the evidence internal prox- involve actual end users. As Rosen- user research ies bring to the team is also biased. berg puts it, when user requirements are prone Professional sales and support staff are not validated but are still called are more likely to channel the needs “user stories,” it creates “the illusion to building of the largest or most strategic existing of user requirements” that fools the the wrong customers in the marketplace. They team and the executives, who are then are more likely to focus on pain points mystified when the product fails in product. of existing customers and less on the marketplace.10 what works well. Also, they may ignore In this Viewpoint, we illustrate five new requirements that are not yet ad- classic examples of failures to involve dressed by the current tool or market. actual end users or to gather suffi- Therefore internal staff cannot be ciently comprehensive data to repre- the sole representative of “users”— sent their needs. Then we propose how who chooses it. Then a customer demo as shown in the “Dilbert” comic strip these failures can be avoided. (or stakeholder review) at the end of at the beginning of this column. an iteration confirms that each user User research welcomes their com- Five Cases of Neglect or story is satisfied. Here is when the ments about competitive analysis, Mischaracterizations of the User terms customer and user are conflat- current insights about information We identified five classic cases of fail- ed. For enterprise software and large architecture or other issues, which ures to involve actual end users. systems, practice teaches us that of- complement customer support data, The Wild West case. The first and ten the “end-of-iteration customer” UX research, and other sources of most obvious case occurs when the is someone representing the product user feedback. team does not do regular testing chooser rather than the end user. Executives liking sales demos ≠ with the users along the develop- So the end-of-iteration demo cannot target users adopting product. En- ment process. Thus the team fails to be the sole form of feedback to predict terprise software companies, during evaluate how well the software built user adoption and satisfaction. In ad- their annual customer conferences, fits target users, their tasks, and their dition, the software development team use a sales demo to portray features environments. A real-life example of should also leverage user research to and functions intended to excite the this failure is the development and answer questions such as: audience of buyers, investors, and the deployment of Healthcare.org, where ˲˲ What are the classes of users market analysts about the company the team, admittedly, did not fully test (personas)? strategy. However, positive responses the online health insurance market- ˲˲ Have we validated that the intended to the sales demos should not be tak- place until two weeks before it opened users have the needs specified in the en as equivalent to assertions about a to the public on October 1, 2013. Then user stories? product’s user requirements. Instead, the site ran into major failures.8 ˲˲ What are the current user practices these requirements need confirmation Chooser ≠ target user. The second before the introduction of the product via a careful validation cycle. Let sales case is neither new nor unique to ag- and the impact afterward? demos open a door toward users with ile. The term “customer” conflates the ˲˲ How would we extend the tool to the help of choosers and influencers. chooser with the user. Let’s unpack support new personas or future use Similarly, Customer Advisory these words: cases? Boards (which draw from customers ˲˲ A customer is often an organiza- Internal proxies ≠ target user. The who have large installations, or who tion (the target buyer of enterprise third case is about bias. Some teams represent a specific or important seg- software, that is, product chooser) as work with their in-house profes- ment of the market) stand in for all represented by the purchasing officer, sional services or sales support staff customers and offer additional op- an executive or committee that makes (that is, experts thought to represent portunities to showcase future fea- a buying decision. large groups of customers) as proxies tures or strategy. However, a basic law ˲˲ A customer is the target user only for end users. While we appreciate for success in the software industry is for consumer-facing products. For the expertise and knowledge these “Build Once, Sell Many.”7 This prin- enterprise software, target users may resources bring, we are wary of two ciple creates an inherent tension be- be far from the process of choosing a common types of misrepresentation tween satisfying current customers product, and have no input about prod- in these situations. and attracting new ones. Therefore, a ucts the organization selects. First, internal proxies are unrepre- software company needs to constant- Agile terminology adds to the confu- sentative as end users because they ly rethink their tiered offerings to in- sion: product teams write user stories have multiple unfair advantages: they clude new market segments or cus- from the perspective of the person know the software inside out, includ- tomer classes as these emerge, and who uses the software, not the one ing the work-arounds; have access to avoid one-off development efforts.

36 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 viewpoints

Confusing business leaders with us- Every software company is in the egories, or brands, and tries to predict ers or the sales demo with the product business of finding and keeping new the likelihood of purchase, engagement, prototype leads companies to build customers. Suppose the logs show the or subscription. products based on what sales and subscribers of an online dating applica- ˲˲ User research aims at improving product managers believe is awesome tion are not renewing. Should the com- the user experience by understand- (for example, see Loranger6). Instead, pany rejoice or despair? If people are ing the relation between actual usage we advocate validating the designs getting good matches, and thus are sat- behaviors and the properties of the with actual end users during the prod- isfied, non-renewal implies success. If design. To this end, it measures the uct development. they are hopelessly disappointed by not behavior and attitudes of users thereby Big data (What? When?) < The getting dates, non-renewal implies fail- learning whether the product (or ser- full picture (... How? Why?). Collect- ure. Big data won’t tell you which, but vice) is usable, useful and delightful, ing and analyzing big data about observing and listening to even a hand- including after decision to purchase. digital product use is popular among ful of non-renewing individuals will. We urge organizations to act strate- product managers and even soft- In brief, quantitative data is use- gically and connect market research, ware developers, who can now learn ful but has two limitations: First, it user research, and customer success what features get traction with us- will not tell the team why the current functions. This requires aligning goals ers. We support the use of big data features are or are not used.5 Different and sharing data among Marketing, techniques as part of user research classes of users can have different rea- Sales, Customer Success, and the UX and user-centered design, but not as sons. Second, it will not identify what Team (typically in Product or R&D).1,4 a substitute for qualitative user re- additional or alternative features ap- search. Let’s review two familiar ways peal to a new class of users unfamil- The Way Forward: to use big data on usage: user data iar with the product. To answer these Educate Managers and Agile analytics and A/B testing. questions the team needs to rely on Development Teams User data analytics can quickly an- qualitative research with existing and We have shown five different ways that swer questions about current usage: proposed classes of users. agile teams without user research are quantity and most frequent patterns, prone to building the wrong product. such as How many? How often? Market Research ≠ User Research To avoid such failures, we invite soft- When? Where? Once a product team Finally, we point to the growing and ware managers and product teams has worked out most of the design worrisome tendency in industry to mix to assess and fill the current gap in a (interaction patterns, page layouts, up user research with market research. team’s competencies. The closing ta- and more), A/B testing compares de- Market research groups make great ble gives short-term and longer-term sign alternatives, such as “which im- partners for user research. While user action items to address the gaps. age on a page produces more click- research and market research have a few throughs”? In vivo experiments with techniques in common (for example, References sufficient traffic can generate large surveys and focus groups), the goals and 1. Buley, L. The modern UX organization. Forrester Report. (2016); https://vimeo.com/121037431 amount of useful data. Thus, A/B variables they focus on are different. 2. Grudin J. From Tool to Partner: The Evolution of Human- testing is very helpful for small in- ˲˲ Market research seeks to under- Computer Interaction. Morgan & Claypool, 2017. 3. HP report. Agile Is the New Normal: Adopting Agile cremental adjustments. stand attitudes toward products, cat- Project Management. 4AA5-7619ENW, May 2015. 4. Kell, E. Interview by Steve Portigal. Portigal blog. Actions to address gaps in UX competencies. Podcast and transcript. (Mar. 1, 2016); http://www. portigal.com/podcast/10-elizabeth-kell-of-comcast/ 5. Klein, L. UX for Lean Startups: Faster, Smarter User Short term Experience Research and Design. O’Reilly, 2013. 1. Analyze the current skills of the team and 2. Support product managers (or product 6. Loranger, H. UX Without User Research Is Not UX. (Aug. 10, 2014) Nielsen Norman Group blog. http:// flag the gap. A functional product team needs owners) with investment in UX. www.nngroup.com/articles/ux-without-user-research/ several key skill sets or UX competencies: Too often, product managers find their role 7. Mironov, R. Four Laws Of Software Economics. Part 2: UX research, UX design, UI software is a sort of “kitchen sink” for any task Law of Build Once, Sell Many. (Sept. 14, 2015); http:// development and prototyping.11 These might be that is not software development. www.mironov.com/4law2/ filled by training the current team members or We encourage product managers to find 8. Pear, R. Contractors Describe Limited Testing of Insurance Web Site. New York Times (Oct. 24, 2013); hiring UX professionals full-time or part-time. additional resources in the UX competencies, http://nyti.ms/292NryG to benefit both product and their workload. 9. Perez, S. Users have low tolerance for buggy apps. Techcrunch. (Mar 12, 2013);[ http://tcrn.ch/Y80ctA 10. Rosenberg, D. Introducing the business of UX. Longer term Interactions. Forums. XXI.1 Jan.–Feb. 2014. 11. Spool, J.M. Assessing your team’s UX skills. UIE. (Dec. 3. Integrate UX competencies 10, 2007); https://www.uie.com/articles/assessing_ a. Teams need UX research competencies as well as UX design skills (interaction, visual). ux_teams/ Other related skill sets include content development and documentation; accessibility; globalization and localization. Gregorio Convertino ([email protected]) 4. Collect and prioritize findings from user research is a UX manager and principal user researcher at a. Seek user feedback early and often. Informatica LLC. b. Create channels to learn from end users and appropriate surrogates. Nancy Frishberg ([email protected]) is a UX researcher c. Prioritize UX issues during backlog grooming; remove friction and measure delight. and strategist, in private practice, and a 25+-year member d. Build new features only after steps 4.a.–c. are done for each key version of the product. of the local SIGCHI Chapter BayCHI.org.

Copyright held by authors.

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 37 viewpoints

VDOI:10.1145/3012006 Andrew Conway and Peter Eckersley Viewpoint When Does Law Enforcement’s Demand to Read Your Data Become a Demand to Read Your Mind? On cryptographic backdoors and prosthetic intelligence.

HE RECENT DISPUTE between the FBI and Apple has raised a potent set of questions about companies’ right to design strong cryptographic Tprotections for their customers’ data. The real stakes in these questions are not just whether the security of our de- vices should be weakened to facilitate FBI investigations, but ultimately, the ability of law enforcement and intelli- gence agencies to read our minds and most intimate private thoughts. In the U.S. and other countries, there have been many legal cases in recent years pitting the demands of law enforcement against the concerns of technology companies and privacy advocates over access to new, tech- nologically generated, information about people. The disputed topics have included spy agencies’ bulk col- lection of Internet traffic and mobile phone metadata; law enforcement use of location-tracking devices, malware, and fake cellphone towers; the consti- about the boundaries between types of cally fair game for law enforcement to tutionality of “gag orders” that make it information that the police can obtain demand if it had probable cause and a crime for individuals and companies about people simply by demanding it obtained a warrant. But there was not to ever discuss certain requests they re- with letters called subpoenas, and in- nearly as much to collect: people did ceive for others’ data. formation for which a court-issued war- not carry recording and tracking de- In some sense, this is not a new de- rant is necessary. What has changed vices with them everywhere, and they bate; the Fourth Amendment to the are the stakes of these disputes. did not turn over the most intimate U.S. constitution, for instance, has en- As the law has operated in the past, details of their lives to multinational

gendered a long history of litigation almost any information was theoreti- technology companies. There were ASSOCIATES/SHUTTERSTOCK ANDRIJ BORYS BY COLLAGE IMAGE

38 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 viewpoints viewpoints

also legal limits: the private thoughts of defendants were largely protect- Calendar ed by rights to remain silent and We have no choice against self-incrimination—histori- but to pour our of Events cal legal protections that sprang up as shields against religious persecu- minds out if we September 2 tion. Unfortunately, changes to our want to exist and APSys ‘17: 8th Asia-Pacific lifestyles, to our relationship with Workshop on Systems, perform at the Mumbai, India, V technology, and to the very process Sponsored: ACM/SIG, of human cognition are making these same level as the Contact: Purushottam Kulkarni, protections so impractical that they Email: [email protected] may cease to exist at all. humans around us. September 3–9 So, what do we mean by changes to ICFP ‘17: ACM SIGPLAN the process of human cognition? International Conference on Pens and paper are wonderful things. Functional Programming, Oxford, U.K., “Hang on. Let me write that down,” or “I Sponsored: ACM/SIG, need a pen and paper to work this out,” Contact: Jeremy Gibbons, are the kinds of utterances that reveal built on prosthetic intelligence, one Email: [email protected]. our dependence. It is intelligence that where the states we share through the ac.uk makes us human, and a pen and paper Internet and the financial system are September 4–7 magnifies our intelligence. becoming more important than the DocEng ‘17: ACM Symposium If you doubt this, consider any rea- biological and physical environment on Document Engineering 2017, sonable method of measuring intelli- around us. Valletta, Malta, Sponsored: ACM/SIG, gence. A human with a pen and paper But this has come at a complicated Contact: Kenneth P. Camilleri, will perform at least as well as, and price. You can think faster and more Email: kenneth.camilleri@ often much, much better than, the accurately, but your electronic devices um.edu.mt same human without a pen and pa- know where you are, where you have September 4–7 per. So it would be reasonable to state been, who you have talked to, what MobileHCI ‘17: 19th that the pen and paper constituted a you said, what your heart rate was at International Conference on prosthetic component of our intelli- the time, what you have looked at on Human-Computer Interaction with Mobile Devices gence, or at least a prosthetic aid for the Web, what medication you are and Services, our imperfect memory. taking, what you have bought, what Vienna, Austria, Furthermore, to read someone maps you have looked up, what spell- Sponsored: ACM/SIG else’s notes is often described as a ing mistakes you make, and it is only September 4–8 window into their mind. Reading accelerating. With virtual reality and ESEC/FSE’17: Joint Meeting someone else’s diary without their augmented reality looking imminent, of the European Software permission seems not only to be a vio- gadgets will begin to log almost every Engineering Conference and lation of privacy but perhaps a form of action we take. And we have no choice the ACM SIGSOFT Symposium on the Foundations of Software taboo mind reading. but to pour our minds out if we want to Engineering, Now consider the same human exist and perform at the same level as Paderborn, Germany, having access to Google, Wikipedia, the humans around us. Sponsored: ACM/SIG, GPS, a calculator, a mobile phone to Ignoring arguments about precise Contact: Wilhelm Schaefer, Email: wilhelm@uni- communicate with friends and col- definitions of words, it is clear that paderborn.de leagues, and indeed the whole In- many humans in the developed world ternet. As long as cat videos are not have a lot of their thoughts happen- September 6–8 too much of a distraction, this well- ing, or at least observable, outside of WomEncourage ‘17: ACM-W Europe womENcourage resourced human can answer hard their brain, and this is only likely to Celebration of questions and perform many difficult increase in the future. It is through Women in Computing, tasks much more quickly than people this lens that we need to understand Barcelona, Spain, Sponsored: ACM/SIG, even two decades earlier. the importance of Apple’s fight to use Contact: Núria Castell Ariño, As hunters, weapons were pros- encryption to protect some (presently Email: [email protected] thetic claws. As gatherers, baskets very small) portions of its customers’ were prosthetic arms. After the de- data so that Apple (and transitively, velopment of agriculture, horses and the FBI) cannot read it. The FBI wants plows were huge prosthetic muscles. to be able to turn over literally every Later the industrial revolution made digital stone in its investigation. But us physically strong to a level unimagi- in the era of prosthetic intelligence, nable beforehand. And looking back, that is equivalent to outlawing strong the invention of writing was the first privacy for any corners of the modern step on the road to a modern existence human mind.

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 39 viewpoints

Where is this heading? Consider strengthens the black market for in- a future technological innovation—a dustrial espionage—many people brain reader. It is a little device that With access to would pay to know the thoughts of you attach to your skull that lets some- a vast store their competitors, people they are ne- one read your thoughts. This could gotiating with, or even people they are be a great boon to law enforcement. of reference considering going on a date with. Trials could be conducted more ac- information Of course the state is not the only curately by reading the thoughts of institution that wants to read your the defendant. Even better, everyone massive deductions mind. There is great value to corpo- could be required to daily attend a can be made. rations in knowing about you. They mind reading to make sure they are collect this data from phone apps not plotting any criminal acts. This and operating systems, credit cards, would significantly cut down on pre- and web browsers; they use it to help meditated crime, making our lives design their products, but also for safer. Then we can concentrate on targeted advertising, differential unpremeditated crime. Possibly there pricing, and other debatable pur- are some thoughts that people who The available information is not poses. People joke, semi-seriously, are likely to commit unpremeditated complete, and there will be gaps. But that Google knows you better than crimes might think. We can proscribe you can inference an awful amount you know yourself. As well as being a those thoughts, and then preemp- with limited data. Think about how threat in their own right, corporations tively arrest people for thought crime. well you know your friends, and how provide an additional target of attack While we are at it, the morality police you can often predict what decisions for an intrusive state: as Snowden’s can put in laws against thinking rac- they will make, with only the small view leaks revealed, the NSA didn’t try to ist, sexist, extremist, sacrilegious, of- of their world that you get from your in- track the location of every cellphone fensive, or fattening thoughts. teractions with them. With access to on the planet directly: they let adver- While such an extreme society may a vast store of reference information tisements and tracking code in apps have a low crime rate, some people (in- massive deductions can be made. collect the data for them. cluding us) may think this police state Conversely, the possibility of faulty Ultimately, the question of what would not actually be a better society to deductions is itself a threat to individu- to do about the data accumulated by live in. Even ignoring the horrors that als. You would not want to have per- technology companies is different would result from imperfect readings, formed Internet searches for pressure from the question of what to do about who doesn’t feel guilty about some- cookers and backpacks just before the the FBI, but it should also be under- thing? As attributed to Cardinal Riche- Boston marathon bombings. stood that we have largely given these lieu, “If you give me six lines written by Dedicated, well-meaning people companies the power to read our the hand of the most honest of men, I in law enforcement naturally want minds, and might want to find alter- will find something in them which will to be able to do their jobs better and natives to that arrangement. hang him.” Such devices do not exist yet, make the world a safer, and thus bet- We fear we are slowly moving to- although the demand has been strong ter, place. They see the new data as a ward the era of universal mind moni- enough that polygraphs, notorious for boon, and law enforcement agencies toring without having recognized unreliability, are widely used in the U.S. select extremely unphotogenic crimi- and considered it in those terms. Other technologies like fMRI are al- nals and terrorists as the test cases And those are the terms in which we ready being used and may turn out to be that will set the rules for millions of should understand battles about the slightly more accurate than polygraphs, other people. Unfortunately, while right to use effective cryptography. but we are still some distance from hav- this surveillance apparatus may oc- That wonderful gadget in your pocket ing to worry about the societal effects of casionally be useful, it also poses a is not a phone. It is a prosthetic part active mind-reading machines. structural threat to democracy. of your mind—which happens to also What we have instead is a society Even beyond the threat of police be able to make telephone calls. We moving toward prosthetic brains that states in the Western world and else- need to think of it as such, and ask can be monitored at all times by the where, there is a fundamental issue again which parts of our thoughts state, without the inconvenience of with cryptography that mathematics should be categorically shielded having to have everyone check in each works the same regardless of whether against prying by the state. day at the police station. It may feel less you are naughty or nice. So if the state invasive to have one’s eye movements can break cryptography then so can Andrew Conway ([email protected]) is an engineer and mostly retired entrepreneur. He founded recorded by your augmented reality other actors. There are obvious di- and ran Silicon Genetics. glasses when an attractive member of rect applications to crime—knowing Peter Eckersley ([email protected]) is Chief Computer the opposite sex walks past than to have when someone is away from home; Scientist for the Electronic Frontier Foundation, San Francisco, CA. a daily visit to the mind reader. The for- knowing who is worth kidnapping mer is certainly more convenient than and what their movements are; iden- the latter. But practically speaking, the tity theft, bank fraud, and so forth. effects are the same. But ineffective cryptography also Copyright held by authors.

40 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 IDC2018_CACM-IX.indd 6 5/25/17 10:48 AM practice

DOI:10.1145/3080202 are far less than 100% available. Article development led by queue.acm.org Thus, the marginal difference be- tween 99.99% and 100% gets lost in the noise of other unavailability, and You’re only as available as the user receives no benefit from the the sum of your dependencies. enormous effort required to add that last fractional percent of availability. BY BEN TREYNOR, MIKE DAHLIN, VIVEK RAU, AND BETSY BEYER Notable exceptions to this rule in- clude antilock brake control systems and pacemakers! For a detailed discussion of how SLOs relate to SLIs (service-level indi- The Calculus cators) and SLAs (service-level agree- ments), see the “Service Level Objec- tives” chapter in the SRE book. That chapter also details how to choose of Service metrics that are meaningful for a par- ticular service or system, which in turn drives the choice of an appropriate SLO for that service. This article expands upon the topic Availability of SLOs to focus on service dependen- cies. Specifically, we look at how the availability of critical dependencies in- forms the availability of a service, and how to design in order to mitigate and minimize critical dependencies. Most services offered by Google aim AS DETAILED IN Site Reliability Engineering: How to offer 99.99% (sometimes referred Google Runs Production Systems1 (hereafter referred to as the “four 9s”) availability to us- ers. Some services contractually com- to as the SRE book), Google products and services mit to a lower figure externally but set seek high-velocity feature development while a 99.99% target internally. This more stringent target accounts for situations maintaining aggressive service-level objectives (SLOs) in which users become unhappy with for availability and responsiveness. An SLO says service performance well before a con- that the service should almost always be up, and the tract violation occurs, as the number one aim of an SRE team is to keep users service should almost always be fast; SLOs also provide happy. For many services, a 99.99% in- precise numbers to define what “almost always” ternal target represents the sweet spot means for a particular service. SLOs are based on the that balances cost, complexity, and availability. For some services, notably following observation: global cloud services, the internal tar- The vast majority of software services and systems get is 99.999%. should aim for almost-perfect reliability rather than 99.99% Availability: perfect reliability—that is, 99.999% or 99.99% rather Observations And Implications than 100%—because users cannot tell the difference Let’s examine a few key observations about and implications of designing between a service being 100% available and less than and operating a 99.99% service and “perfectly” available. There are many other systems in then move to a practical application. Observation 1. Sources of outages. the path between user and service (laptop, home WiFi, Outages originate from two main ISP, the power grid ...), and those systems collectively sources: problems with the service it-

42 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 self and problems with the service’s critical dependencies must be signifi- time. For example, three complete out- critical dependencies. A critical depen- cantly more than 99.99% available. ages per year that last 20 minutes each dency is one that, if it malfunctions, Internally at Google, we use the result in a total of 60 minutes of outag- causes a corresponding malfunction following rule of thumb: critical de- es. Even if the service worked perfectly in the service. pendencies must offer one additional the rest of the year, 99.99% availability Observation 2. The mathematics of 9 relative to your service—in the ex- (no more than 53 minutes of downtime availability. Availability is a function of ample case, 99.999% availability—be- per year) would not be feasible. the frequency and the duration of out- cause any service will have several crit- This implication is just math, but it ages. It is measured through: ical dependencies, as well as its own is often overlooked, and can be very in- ˲˲ Outage frequency, or the inverse: idiosyncratic problems. This is called convenient. MTTF (mean time to failure). the “rule of the extra 9.” Corollary to implications 1 and 2. If ˲˲ Duration, using MTTR (mean time If you have a critical dependency your service is relied upon for an avail- to repair). Duration is defined as it is that does not offer enough 9s (a rela- ability level you cannot deliver, you experienced by users: lasting from the tively common challenge!), you must should make energetic efforts to cor- start of a malfunction until normal be- employ mitigation to increase the ef- rect the situation—either by increas- havior resumes. fective availability of your dependency ing the availability level of your service Thus, availability is mathematically (for example, via a capacity cache, fail- or by adding mitigation as described defined as MTTF/(MTTF+MTTR), us- ing open, graceful degradation in the earlier. Reducing expectations (that ing appropriate units. face of errors, and so on.) is, the published availability) is also Implication 1. Rule of the extra 9. A Implication 2. The math vis-à-vis fre- an option, and often it is the correct service cannot be more available than quency, detection time, and recovery choice: make it clear to the dependent the intersection of all its critical de- time. A service cannot be more avail- service that it should either reengineer pendencies. If your service aims to of- able than its incident frequency mul- its system to compensate for your ser-

IMAGE BY PLING/SHUTTERSTOCK BY IMAGE fer 99.99% availability, then all of your tiplied by its detection and recovery vice’s availability or reduce its own tar-

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 43 practice

Key Definitions Some of the terms and concepts used Failing safe means whatever behavior Operational readiness practice: throughout this article may not be is required to prevent the system Exercises designed to ensure the team familiar to readers who don’t specialize from falling into an unsafe mode supporting a service knows how to in operations. when expected functionality suddenly respond effectively when an issue doesn’t work. For example, a given arises, and that the service is resilient Capacity cache: A cache that serves system might be able to fail open for a to disruption. For example, Google precomputed results for API calls while by serving cached data, but then performs disaster-recovery test drills or queries to a service, generating fail closed when that data becomes continuously to make sure that its cost savings in terms of compute/IO stale (perhaps because past a certain services deliver continuous uptime resource needs by reducing the volume point, the data is no longer useful). even if a large-scale disaster occurs. of client traffic hitting the underlying service. Failover: A strategy that handles failure Rollout policy: A set of principles Unlike the more typical of a system component or service applied during a service rollout (a performance/latency cache, a capacity instance by automatically routing deployment of any sort of software cache is considered critical to service incoming requests to a different component or configuration) to operation. A drop in the cache hit instance. For example, you might route reduce the scope of an outage in rate or cache ratio below the SLO database queries to a replica database, the early stages of the rollout. is considered a capacity loss. Some or route service requests to a replicated For example, a rollout policy capacity caches may even sacrifice server pool in another datacenter. might specify that rollouts occur performance (for example, redirecting progressively, on a 5%/20%/100% to remote sites) or freshness (for Fallback: A mechanism that allows timeline, so that a rollout proceeds example, CDNs) in order to meet hit a tool or system to use an alternative to a larger portion of customers rate SLOs. source for serving results when a only when it passes the first given component is unavailable. milestone without problems. Customer isolation: Isolating For example, a system might fall Most problems will manifest customers from each other may be back to using an in-memory cache when the service is exposed to advantageous so that the behavior of of previous results. While the results a small number of customers, one customer doesn’t impact other may be slightly stale, this behavior is allowing you to minimize the customers. For example, you might better than outright failure. This type scope of the damage. Note that for isolate customers from one another of fallback is an example of graceful a rollout policy to be effective in based on their global traffic. When a degradation. minimizing damage, you must have given customer sends a surge of traffic Geographic isolation: You can build a mechanism in place for rapid beyond what they’re provisioned for, rollback. you can start throttling or rejecting this additional reliability into your service excess traffic without impacting traffic by isolating particular geographic Rollback: This is the ability to revert from other customers. zones to have no dependencies on each a set of changes that have been other. For example, if you separate previously rolled out (fully or not) to a Failing safe/failing open/failing North America and Australia into given service or system. For example, closed: Strategies for gracefully separate serving zones, an outage you can revert configuration changes tolerating the failure of a dependency. that occurs in Australia because of a or run a previous version of a binary The “safe” strategy depends on traffic overload won’t also take out that’s known to be good. context: failing open may be the safe your service in North America. Note strategy in some scenarios, while that geographic isolation does come Sharding: Splitting a data failing closed may be the safe strategy at increased cost: isolating these structure or service into shards is a in others. geographic zones also means that management strategy based on the Australia cannot borrow spare capacity principle that systems built for a Failing open: When the trigger in North America. single machine’s worth of resources normally required to authorize an don’t scale. Therefore, you can action fails, failing open means to Graceful degradation: A service distribute resources such as CPU, let some action happen, rather than should be “elastic” and not fail memory, disk, file handles, and making a decision. For example, catastrophically under overload so on across multiple machines to a building exit door that normally conditions and spikes—that is, you create smaller, faster, more easily requires badge verification “fails open” should make your applications do managed parts of a larger whole. to let you exit without verification something reasonable even if not all is during a power failure. right. It is better to give users limited Tail latency: When setting a target functionality than an error page. for the latency (response time) of a Failing closed is the opposite of falling service, it is tempting to measure the open. For example, a bank vault door Integration testing: The phase in average latency. The problem with this denies all attempts to unlock it if software testing in which individual approach is that an average that looks its badge reader cannot contact the software modules are combined acceptable can hide a “long tail” of very access-control database. and tested as a group to verify that large outliers, where some users may they function correctly together. experience terrible response times. These “parts” may be code modules, Therefore, the SRE best practice is to individual applications, client and measure and set targets for 95th- and/ server applications on a network, or 99th-percentile latency, with the goal among others. Integration testing is of reducing this tail latency, not just usually performed after unit testing average latency. and before final validation testing.

44 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 practice get. If you do not correct or address the ˲˲ Time allotted for an on-call re- ond-order dependencies need two ex- discrepancy, an outage will inevitably sponder to start investigating an alert: tra 9s, third-order dependencies need force the need to correct it. five minutes. (On-call means that a three extra 9s, and so on. technical person is carrying a pager This inference is incorrect. It is Practical Application that receives an alert when the service based on a naive model of a dependen- Let’s consider an example service with is having an outage, based on a moni- cy hierarchy as a tree with constant fan- a target availability of 99.99% and work toring system that tracks and reports out at each level. In such a model, as through the requirements for both its SLO violations. Many Google services shown in Figure 1, there are 10 unique dependencies and its outage responses. are supported by an SRE on-call rota- first-order dependencies, 100 unique The numbers. Suppose your 99.99% tion that fields urgent issues.) second-order dependencies, 1,000 available service has the following ˲˲ Remaining time for an effective unique third-order dependencies, characteristics: mitigation: 10 minutes and so on, leading to a total of 1,111 ˲˲ One major outage and three mi- Implication. Levers to make a ser- unique services even if the architecture nor outages of its own per year. Note vice more available. It’s worth looking is limited to four layers. A highly avail- that these numbers sound high, but closely at the numbers just presented able service ecosystem with that many a 99.99% availability target implies a because they highlight a fundamental independent critical dependencies is 20- to 30-minute widespread outage point: there are three main levers to clearly unrealistic. and several short partial outages per make a service more reliable. A critical dependency can by itself year. (The math makes two assump- ˲˲ Reduce the frequency of outages— cause a failure of the entire service (or tions: that a failure of a single shard is via rollout policy, testing, design re- service shard) no matter where it ap- not considered a failure of the entire views, and other tactics. pears in the dependency tree. There- system from an SLO perspective, and ˲˲ Reduce the scope of the average fore, if a given component X appears that the overall availability is comput- outage—via sharding, geographic iso- as a dependency of several first-order ed with a weighted sum of regional/ lation, graceful degradation, or cus- dependencies of a service, X should be shard availability.) tomer isolation. counted only once because its failure ˲˲ Five critical dependencies on oth- ˲˲ Reduce the time to recover—via will ultimately cause the service to fail er, independent 99.999% services. monitoring, one-button safe actions no matter how many intervening ser- ˲˲ Five independent shards, which (for example, rollback or adding emer- vices are also affected. cannot fail over to one another. gency capacity), operational readiness The correct rule is as follows: ˲˲ All changes are rolled out progres- practice, and so on. ˲˲ If a service has N unique critical sively, one shard at a time. You can trade among these three dependencies, then each one contrib- The availability math plays out as levers to make implementation easier. utes 1/N to the dependency-induced follows. For example, if a 17-minute MTTR is unavailability of the top-level service, difficult to achieve, instead focus your regardless of its depth in the depen- Dependency requirements. efforts on reducing the scope of the dency hierarchy. ˲˲ The total budget for outages for the average outage. Strategies for minimiz- ˲˲ Each dependency should be count- year is 0.01% of 525,600 minutes/year, ing and mitigating critical dependen- ed only once, even if it appears multiple or 53 minutes (based on a 365-day year, cies are discussed in more depth later times in the dependency hierarchy (in which is the worst-case scenario). in this article. other words, count only unique depen- ˲˲ The budget allocated to outages dencies). For example, when counting of critical dependencies is five inde- Clarifying the “Rule of the Extra 9” dependencies of Service A in Figure 2, pendent critical dependencies, with for Nested Dependencies count Service B only once toward the a budget of 0.001% each = 0.005%; A casual reader might infer that each total N. 0.005% of 525,600 minutes/year, or additional link in a dependency chain For example, consider a hypo- 26 minutes. calls for an additional 9, such that sec- thetical Service A, which has an error ˲˲ The remaining budget for outages caused by your service, accounting for Figure 1. Dependency hierarchy: Incorrect model. outages of critical dependencies, is 53 - 26 = 27 minutes. example Outage response requirements. ˲˲ Expected number of outages: 4 (1 full outage, 3 outages affecting a single first order shard only) ˲˲ Aggregate impact of expected out- ages: (1 x 100%) + (3 x 20%) = 1.6 ˲˲ Time available to detect and recov- second order er from an outage: 27/1.6 = 17 minutes ˲˲ Monitoring time allotted to detect and alert for an outage: 2 minutes

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 45 practice

budget of 0.01%. The service owners ate, because the amount of allowable infrastructure is being used correctly. are willing to spend half that budget downtime is small. Be explicit in identifying the owners on their own bugs and losses, and Error budgets eliminate the struc- of shared infrastructure as additional half on critical dependencies. If the tural tension that might otherwise stakeholders. Also, beware of over- service has N such dependencies, develop between SRE and product loading your dependencies—coordi- each dependency receives 1/Nth of development teams by giving them a nate launches carefully with the own- the remaining error budget. Typical common, data-driven mechanism for ers of these dependencies. services often have about five to 10 assessing launch risk. They also give Internal vs. external dependencies. critical dependencies, and therefore both SRE and product development Sometimes a product or service de- each one can fail only one-tenth or teams a common goal of developing pends on factors beyond company con- one-twentieth as much as Service A. practices and technology that allow trol—for example, code libraries, or Hence, as a rule of thumb, a service’s faster innovation and more launches services or data provided by third par- critical dependencies must have one without “blowing the budget.” ties. Identifying these factors allows extra 9 of availability. you to mitigate the unpredictability Strategies for Minimizing and they entail. Error Budgets Mitigating Critical Dependencies Engage in thoughtful system plan- The concept of error budgets is covered Thus far, this article has established ning and design. Design your system quite thoroughly in the SRE book,1 but what might be called the “Golden Rule with the following principles in mind. bears mentioning here. Google SRE of Component Reliability.” This sim- Redundancy and isolation. You can uses error budgets to balance reliabil- ply means that any critical component seek to mitigate your reliance upon a ity and the pace of innovation. This must be 10 times as reliable as the over- critical dependency by designing that budget defines the acceptable level of all system’s target, so that its contribu- dependency to have multiple indepen- failure for a service over some period of tion to system unreliability is noise. It dent instances. For example, if storing time (often a month). An error budget follows that in an ideal world, the aim data in one instance provides 99.9% is simply 1 minus a service’s SLO, so is to make as many components as pos- availability for that data, then storing the previously discussed 99.99% avail- sible noncritical. Doing so means the three copies in three widely distributed able service has a 0.01% “budget” for components can adhere to a lower re- instances provides a theoretical avail- unavailability. As long as the service liability standard, gaining freedom to ability level of 1 - 0.013, or nine 9s, if hasn’t spent its error budget for the innovate and take risks. instance failures are independent with month, the development team is free The most basic and obvious strat- zero correlation. (within reason) to launch new features, egy to reduce critical dependencies is In the real world, the correlation updates, and so on. to eliminate single points of failure is never zero (consider network back- If the error budget is spent, the (SPOFs) whenever possible. The larg- bone failures that affect many cells service freezes changes (except for er system should be able to operate concurrently), so the actual avail- urgent security fixes and changes ad- acceptably without any given compo- ability will be nowhere close to nine dressing what caused the violation in nent that’s not a critical dependency 9s but is much higher than three 9s. the first place) until either the service or SPOF. Also note that if a system or service earns back room in the budget, or the In reality, you likely cannot get is “widely distributed,” geographic month resets. Many services at Google rid of all critical dependencies, but separation is not always a good proxy use sliding windows for SLOs, so the you can follow some best practices for uncorrelated failures. You may be error budget grows back gradually. For around system design to optimize re- better off using more than one system mature services with an SLO greater liability. While doing so isn’t always in nearby locations than the same sys- than 99.99%, a quarterly rather than possible, it is easier and more effec- tem in distant locations. monthly budget reset is appropri- tive to achieve system reliability if you Similarly, sending an RPC (remote plan for reliability during the design procedure call) to one pool of serv- Figure 2. Multiple dependencies in and planning phases, rather than af- ers in one cluster may provide 99.9% the dependency hierarchy. ter the system is live and impacting availability for results, but sending actual users. three concurrent RPCs to three dif- service A Conduct architecture/design re- ferent server pools and accepting the views. When you are contemplating a first response that arrives helps in- new system or service, or refactoring crease availability to well over three 9s or improving an existing system or ser- (noted earlier). This strategy can also service C vice, an architecture or design review reduce tail latency if the server pools can identify shared infrastructure and are approximately equidistant from service B internal vs. external dependencies. the RPC sender. (Since there is a high Shared infrastructure. If your service cost to sending three RPCs concur- is using shared infrastructure—for ex- rently, Google often stages the timing ample, an underlying database service of these calls strategically: most of our service B used by multiple user-visible prod- systems wait a fraction of the allotted ucts—think about whether or not that time before sending the second RPC,

46 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 practice and a bit more time before sending trigger safe rollbacks. some or many of the concepts this ar- the third RPC.) Systematically examine all possible ticle has covered, assembling this in- Failover and fallback. Pursue soft- failure modes. Examine each compo- formation and putting it into concrete ware rollouts and migrations that fail nent and dependency and identify the terms may make the concepts easier to safe and are automatically isolated impact of its failure. Ask yourself the understand and teach. Its recommen- should a problem arise. The basic prin- following questions: dations are uncomfortable but not ciple at work here is that by the time ˲˲ Can the service continue serving in unattainable. A number of Google ser- you bring a human online to trigger degraded mode if one of its dependen- vices have consistently delivered better a failover, you have likely already ex- cies fails? In other words, design for than four 9s of availability, not by su- ceeded your error budget. graceful degradation. perhuman effort or intelligence, but by Where concurrency/voting is not ˲˲ How do you deal with unavailabili- thorough application of principles and possible, automate failover and fall- ty of a dependency in different scenari- best practices collected and refined back. Again, if the issue needs a hu- os? Upon startup of the service? During over the years (see SRE’s Appendix B: A man to check what the problem is, the runtime? Collection of Best Practices for Produc- chances of meeting your SLO are slim. Conduct thorough testing. Design tion Services). Asynchronicity. Design dependen- and implement a robust testing envi- cies to be asynchronous rather than ronment that ensures each dependen- Acknowledgments synchronous where possible so that cy has its own test coverage, with tests Thank you to Ben Lutch, Dave Rensin, they don’t accidentally become criti- that specifically address use cases that Miki Habryn, Randall Bosetti, and Pat- cal. If a service waits for an RPC re- other parts of the environment expect. rick Bernier for their input. sponse from one of its noncritical Here are a few recommended strate- dependencies and this dependency gies for such testing: has a spike in latency, the spike will ˲˲ Use integration testing to perform Related articles on queue.acm.org unnecessarily hurt the latency of the fault injection—verify that your system parent service. By making the RPC can survive failure of any of its depen- There’s Just No Getting Around It: call to a noncritical dependency asyn- dencies. You’re Building a Distributed System Mark Cavage chronous, you can decouple the la- ˲˲ Conduct disaster testing to iden- http://queue.acm.org/detail.cfm?id=2482856 tency of the parent service from the tify weaknesses or hidden/unexpected Eventual Consistency Today: latency of the dependency. While dependencies. Document follow-up Limitations, Extensions, and Beyond asynchronicity may complicate code actions to rectify the flaws you uncover. Peter Bailis and Ali Ghodsi and infrastructure, this trade-off will ˲˲ Don’t just load test. Deliberately http://queue.acm.org/detail.cfm?id=2462076 be worthwhile. overload your system to see how it A Conversation with Wayne Rosing Capacity planning. Make sure that degrades. One way or another, your David J. Brown every dependency is correctly provi- system’s response to overload will be http://queue.acm.org/detail.cfm?id=945162 sioned. When in doubt, overprovision tested; better to perform these tests if the cost is acceptable. yourself than to leave load testing to Reference Configuration. When possible, your users. 1. Beyer, B., Jones, C., Petoff, J., Murphy, N.R. Site Reliability Engineering: How Google Runs Production standardize configuration of your de- Plan for the future. Expect changes Systems. O’Reilly Media, 2016; https://landing.google. pendencies to limit inconsistencies that come with scale: a service that be- com/sre/book.html. among subsystems and avoid one-off gins as a relatively simple binary on a failure/error modes. single machine may grow to have many Ben Treynor started programming at age six and joined Oracle as a software engineer at age 17. He has Detection and troubleshooting. Make obvious and nonobvious dependen- also worked in engineering management at E.piphany, SEVEN, and Google (2003-present). His current team detecting, troubleshooting, and diag- cies when deployed at a larger scale. of approximately 4,200 at Google is responsible for Site nosing issues as simple as possible. Every order of magnitude in scale will Reliability Engineering, networking, and datacenters Effective monitoring is a crucial com- reveal new bottlenecks—not just for worldwide. ponent of being able to detect issues in your service, but for your dependencies Mike Dahlin is a distinguished engineer at Google, where he has worked on Google’s Cloud Platform since 2013. a timely fashion. Diagnosing a system as well. Consider what happens if your Prior to joining Google, he was a professor of computer with deeply nested dependencies is dif- dependencies cannot scale as fast as science at the University of Texas at Austin. ficult. Always have an answer for miti- you need them to. Vivek Rau is an SRE manager at Google and a founding member of the Launch Coordination Engineering sub-team gating failures that doesn’t require an Also be aware that system depen- of SRE. Prior to joining Google, he worked at Citicorp operator to investigate deeply. dencies evolve over time and that your Software, Versant, and E.piphany. He currently manages various SRE teams tasked with tracking and improving the Fast and reliable rollback. Introduc- list of dependencies may very well reliability of Google’s Cloud Platform. ing humans into a mitigation plan sub- grow over time. When it comes to in- Betsy Beyer is a technical writer for Google, specializing stantially increases the risk of miss- frastructure, Google’s typical design in Site Reliability Engineering. She has previously written documentation for Google’s Data Center and Hardware ing a tight SLO. Build systems that are guideline is to build a system that will Operations Teams. She was formerly a lecturer on easy, fast, and reliable to roll back. As scale to 10 times the initial target load technical writing at Stanford University. your system matures and you gain con- without significant design changes. fidence in your monitoring to detect problems, you can lower MTTR by en- Conclusion Copyright held by owner/authors. gineering the system to automatically While readers are likely familiar with Publication rights licensed to ACM. $15.00.

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 47 practice

DOI:10.1145/3080008 scale, the amount of information may Article development led by queue.acm.org be too large to store in an impoverished setting (say, an embedded device) or to keep conveniently in fast storage. The approximate approach is In response to this challenge, the often faster and more efficient. model of streaming data processing has grown in popularity. The aim is no BY GRAHAM CORMODE longer to capture, store, and index ev- ery minute event, but rather to process each observation quickly in order to create a summary of the current state. Following its processing, an event is dropped and is no longer accessible. The summary that is retained is often referred to as a sketch of the data. Data Coping with the vast scale of infor- mation means making compromises: The description of the world is approx- imate rather than exact; the nature of queries to be answered must be decid- ed in advance rather than after the fact; Sketching and some questions are now insoluble. The ability to process vast quantities of data at blinding speeds with modest re- sources, however, can more than make up for these limitations. As a consequence, streaming meth- ods have been adopted in a number DO YOU EVER feel overwhelmed by an unending stream of domains, starting with telecom- of information? It can seem like a barrage of new munications but spreading to search engines, social networks, finance, and email and text messages demands constant attention, time-series analysis. These ideas are and there are also phone calls to pick up, articles to also finding application in areas using traditional approaches, but where the read, and knocks on the door to answer. Putting these rough-and-ready sketching approach pieces together to keep track of what is important can is more cost effective. Successful appli- be a real challenge. cations of sketching involve a mixture of algorithmic tricks, systems know- The same information overload is a concern in how, and mathematical insight, and many computational settings. Telecommunications have led to new research contributions companies, for example, want to keep track of the in each of these areas. This article introduces the ideas be- activity on their networks, to identify overall network hind sketching, with a focus on algo- health and spot anomalies or changes in behavior. Yet, rithmic innovations. It describes some algorithmic developments in the ab- the scale of events occurring is huge: many millions of stract, followed by the steps needed to network events per hour, per network element. While put them into practice, with examples. new technologies allow the scale and granularity The article also looks at four novel al- gorithmic ideas and discusses some of events being monitored to increase by orders of emerging areas. magnitude, the capacity of computing elements Simply Sampling (processors, memory, and disks) to make sense of When faced with a large amount of these is barely increasing. Even on a small information to process, there may be

48 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 a strong temptation just to ignore it With standard statistical results, for cords is not guaranteed to be random; entirely. A slightly more principled ap- questions like those in the customer there may be clustering through the proach is just to ignore most of it—that records example, the standard error data. You need to ensure every record is, take a small number of examples of a sample of size s is proportional to has an equal chance of being included from the full dataset, perform the com- 1/√s. Roughly speaking, this means in the sample. This can be achieved putation on this subset, and then try to that in estimating a proportion from by using standard random-number extrapolate to the full dataset. To give the sample, the error would be expect- generators to pick which records to in- a good estimation, the examples must ed to look like ±1/√s. Therefore, look- clude in the sample. A common trick be randomly chosen. This is the realm ing at the voting intention of a subset is to attach a random number to each of sampling. of 1,000 voters produces an opinion record, then sort the data based on this There are many variations of sam- poll whose error is approximately 3%— random tag and take the first s records pling, but this article uses the most providing high confidence (but not cer- in the sorted order. This works fine, as basic: uniform random sampling. Con- tainty) that the true answer is within long as sorting the full dataset is not sider a large collection of customer 3% of the result on the sample, assum- too costly. records. Randomly selecting a small ing the sample was drawn randomly Finally, how do you maintain the number of records provides the sam- and the participants responded hon- sample as new items are arriving? A ple. Then various questions can be an- estly. Increasing the size of the sample simple approach is to pick every record swered accurately by looking only at the causes the error to decrease in a pre- with probability p, for some chosen sample: for example, estimating what dictable, albeit expensive, way: reduc- value of p. When a new record comes, fraction of customers live in a certain ing the margin of error of an opinion pick a random fraction between 0 and city or have bought a certain product. poll to 0.3% would require contacting 1, and if it is smaller than p, put the re- The method. To flesh this out, let’s 100,000 voters. cord in the sample. The problem with fill in a few gaps. First, how big should Second, how should the sample be this approach is that you do not know

PHOTO BY TAFFPIXTURE BY PHOTO the sample be to supply good answers? drawn? Simply taking the first s re- in advance what p should be. In the

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 49 practice

previous analysis a fixed sample size tion that requires detailed knowledge query for any recorded attribute of the s was desired, and using a fixed sam- of individual records in the data can- sampled items. pling rate p means there are too few el- not be answered by sampling. For ex- Because of its flexibility, sampling ements initially, but then too many as ample, if you want to know whether is a powerful and natural way of build- more records arrive. one specific individual is among your ing a sketch of a large dataset. There Presented this way, the question customers, then a sample will leave you are many different approaches to sam- has the appearance of an algorithmic uncertain. If the customer is not in the pling that aim to get the most out of puzzle, and indeed this was a com- sample, you do not know whether this the sample or to target different types mon question in technical interviews is because that person is not in the data of queries that the sample may be used for many years. One can come up with or because he or she did not happen to to answer.11 Here, more information is clever solutions that incrementally ad- be sampled. A question like this ulti- presented about less flexible methods just p as new records arrive. A simple mately needs all the presence informa- that address some of these limitations and elegant way to maintain a sample tion to be recorded and is answered by of sampling. is to adapt the idea of random tags. At- highly compact encodings such as the tach to each record a random tag, and Bloom filter (described later). Summarizing Sets define the sample to be thes records A more complex example is when with Bloom Filters with the smallest tag values. As new the question involves determining the The Bloom filter is a compact data records arrive, the tag values decide cardinality of quantities. In a dataset structure that summarizes a set of whether to add the new record to the that has many different values, how items. Any computer science data- sample (and to remove an old item to many distinct values of a certain type structures class is littered with exam- keep the sample size fixed at s). are there? For example, how many dis- ples of “dictionary” data structures, Discussion and applications. Sam- tinct surnames are in a particular cus- such as arrays, linked lists, hash ta- pling methods are so ubiquitous that tomer dataset? Using a sample does bles, and many esoteric variants of there are many examples to consider. not reveal this information. Let’s say in balanced tree structures. The com- One simple case is within database a sample size of 1,000 out of one mil- mon feature of these structures is systems. It is common for the database lion records, 900 surnames occur just that they can all answer “membership management system to keep a sample once among the sampled names. What questions” of the form: Is a certain of large relations for the purpose of can you conclude about the popularity item stored in the structure or not? query planning. When determining of these names in the rest of the data- The Bloom filter can also respond to how to execute a query, evaluating dif- set? It might be that almost every other such membership questions. The an- ferent strategies provides an estimate name in the full dataset is also unique. swers given by the structure, however, of how much data reduction may occur Or it might be that each of the unique are either “the item has definitely not at each step, with some uncertainty of names in the sample reoccurs tens or been stored” or “the item has probably course. Another example comes from hundreds of times in the remainder been stored.” This introduction of un- the area of data integration and link- of the data. With the sampled infor- certainty over the state of an item (it age, in which a subproblem is to test mation there is no way to distinguish might be thought of as introducing po- whether two columns from separate between these two cases, which leads tential false positives) allows the filter tables can relate to the same set of en- to huge confidence intervals on these to use an amount of space that is much tities. Comparing the columns in full kinds of statistics. Tracking informa- smaller than its exact relatives. The fil- can be time consuming, especially tion about cardinalities, and omitting ter also does not allow listing the items when you want to test all pairs of col- duplicates, is addressed by techniques that have been placed into it. Instead, umns for compatibility. Comparing a such as HyperLogLog, addressed later. you can pose membership questions small sample is often sufficient to de- Finally, there are quantities that only for specific items. termine whether the columns have any samples can estimate, but for which The method. To understand the fil- chance of relating to the same entities. better special-purpose sketches ex- ter, it is helpful to think of a simple ex- Entire books have been written on ist. Recall that the standard error of a act solution to the membership prob- the theory and practice of sampling, sample of size s is 1/√s. For problems lem. Suppose you want to keep track particularly around schemes that try such as estimating the frequency of of which of a million possible items to sample the more important ele- a particular attribute (such as city of you have seen, and each one is help- ments preferentially, to reduce the er- residence), you can build a sketch of fully labeled with its ID number (an ror in estimating from the sample. For size s so the error it guarantees is pro- integer between one and a million). a good survey with a computational portional to 1/s. This is considerably Then you can keep an array of one perspective, see Synopses for Massive stronger than the sampling guarantee million bits, initialized to all 0s. Every Data: Samples, Histograms, Wavelets and only improves as we devote more time you see an item i, you just set the and Sketches.11 space s to the sketch. The Count-Min ith bit in the array to 1. A lookup query Given the simplicity and general- sketch described later in this article for item j is correspondingly straight- ity of sampling, why would any other has this property. One limitation is that forward: just see whether bit j is a 1 method be needed to summarize data? the attribute of interest must be speci- or a 0. The structure is very compact: It turns out that sampling is not well fied in advance of setting up the sketch, 125KB will suffice if you pack the bits suited for some questions. Any ques- while a sample allows you to evaluate a into memory.

50 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 practice

Real data, however, is rarely this positive is approximately ex p(k ln(1 that keeping the full database, as part nicely structured. In general, you e x p( k n / m ))).4 While extensive study of of the browser would be unwieldy, es- might have a much larger set of possi- this expression may not be rewarding pecially on mobile devices. ble inputs—think again of the names in the short term, some simple analy- Instead, a Bloom filter encoding of of customers, where the number of sis shows that this rate is minimized the database can be included with the possible name strings is huge. You by picking k = (m/n) ln 2. This corre- browser, and each URL visited can be can nevertheless adapt your bit-array sponds to the case when about half the checked against it. The consequence approach by borrowing from a differ- bits in the filter are 1 and half are 0. of a false positive is that the browser ent dictionary structure. Imagine the For this to work, the number of bits may believe that an innocent site is on bit array is a hash table: you will use a in the filter should be some multiple of the bad list. To handle this, the brows- hash function h to map from the space the number of items that you expect to er can contact the database author- of inputs onto the range of indices for store in it. A common setting is m = 10n ity and check whether the full URL is your table. That is, given input i, you and k = 7, which means a false posi- on the list. Hence, false positives are now set bit hi to 1. Of course, now you tive rate below 1%. Note that there is removed at the cost of a remote data- have to worry about hash collisions no magic here that can compress data base lookup. in which multiple entries might map beyond information-theoretical limits: Notice the effect of the Bloom filter: onto the same bit. A traditional hash under these parameters, the Bloom fil- it gives the all clear to most URLs and table can handle this, as you can keep ter uses about 10 bits per item and must incurs a slight delay for a small frac- information about the entries in the use space proportional to the number tion (or when a bad URL is visited). table. If you stick to your guns and of different items stored. This is a mod- This is preferable both to the solution keep the bits only in the bit array, est savings when representing integer of keeping a copy of the database with however, false positives will result: if values but is a considerable benefit the browser and to doing a remote you look up item i, it may be that entry when the items stored have large de- lookup for every URL visited. Brows- hi is set to 1, but i has not been seen; scriptions—say, arbitrary strings such ers such as Chrome and Firefox have instead, there is some item j that was as URLs. Storing these in a traditional adopted this concept. Current versions seen, where h(i) = h(j). structure such as a hash table or bal- of Chrome use a variation of the Bloom Can you fix this while sticking to a bit anced search tree would consume filter based on more directly encoding array? Not entirely, but you can make tens or hundreds of bytes per item. a list of hashed URLs, since the local it less likely. Rather than just hashing A simple example is shown in Figure copy does not have to be updated dy- each item i once, with a single hash 1, where an item i is mapped by k = 3 namically and more space can be saved function, use a collection of k hash hash functions to a filter of size m = 12, this way. functions h1, h2, . . . hk, and map i with and these entries are set to 1. The Bloom filter was introduced each of them in turn. All the bits corre- Discussion and applications. The in 1970 as a compact way of storing a sponding to h1(i), h2(i) . . . hk(i) are possibility of false positives needs to dictionary, when space was really at a set to 1. Now to test membership of j, be handled carefully. Bloom filters are premium.3 As computer memory grew, check all the entries it is hashed to, and at their most attractive when the con- it seemed that the filter was no longer say no if any of them are 0. sequence of a false positive is not the needed. With the rapid growth of the There’s clearly a trade-off here: Ini- introduction of an error in a computa- Web, however, a host of applications tially, adding extra hash functions re- tion, but rather when it causes some for the filter have been devised since duces the chances of a false positive as additional work that does not adversely around the turn of the century.4 Many more things need to “go wrong” for an impact the overall performance of the of these applications have the flavor incorrect answer to be given. As more system. A good example comes in the of the preceding example: the filter and more hash functions are added, context of browsing the Web. It is now gives a fast answer to lookup queries, however, the bit array gets fuller and common for Web browsers to warn us- and positive answers may be double- fuller of 1 values, and therefore colli- ers if they are attempting to visit a site checked in an authoritative reference. sions are more likely. This trade-off can that is known to host malware. Check- Bloom filters have been widely used be analyzed mathematically, and the ing the URL against a database of “bad” to avoid storing unpopular items in sweet spot found that minimizes the URLs does this. The database is large caches. This enforces the rule that an chance of a false positive. The analysis enough, and URLs are long enough, item is added to the cache only if it has works by assuming that the hash func- tions look completely random (which Figure 1. Bloom filter with K=3, M=12. is a reasonable assumption in prac- tice), and by looking at the chance that i an arbitrary element not in the set is reported as present. If n distinct items are being stored in a Bloom filter of size m, and k hash functions are used, then the chance of 0 1 1 0 0 0 1 1 0 0 0 1 a membership query that should re- ceive a negative answer yielding a false

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 51 practice

been seen before. The Bloom filter is could in principle link to one or more item. The counter was also potentially used to compactly represent the set of tweets, so allocating counters for each incremented by occurrences of other items that have been seen. The con- is infeasible and unnecessary. Instead, items that were mapped to the same sequence of a false positive is that a it is natural to look for a more compact location, however, since collisions are small fraction of rare items might also way to encode counts of items, possibly expected. Given the collection of coun- be stored in the cache, contradicting with some tolerable loss of fidelity. ters containing the desired count, plus the letter of the rule. Many large dis- The Count-Min sketch is a data noise, the best guess at the true count tributed databases (Google’s Bigtable, structure that allows this trade-off to of the desired item is to take the small- Apache’s Cassandra and HBase) use be made. It encodes a potentially mas- est of these counters as your estimate. Bloom filters as indexes on distributed sive number of item types in a small ar- Figure 2 shows the update proc- chunks of data. They use the filter to ray. The guarantee is that large counts ess: an item i is mapped to one entry keep track of which rows or columns will be preserved fairly accurately, in each row j by the hash function hj, of the database are stored on disk, thus while small counts may incur greater and the update of c is added to each avoiding a (costly) disk access for non- (relative) error. This means it is good entry. It can also be seen as modeling existent attributes. for applications where you are inter- the query process: a query for the same ested in the head of a distribution and item i will result in the same set of lo- Counting with Count-Min Sketch less so in its tail. cations being probed, and the smallest Perhaps the canonical data summari- The method. At first glance, the value returned as the answer. zation problem is the most trivial: to sketch looks quite like a Bloom filter, as Discussion and applications. As count the number of items of a certain it involves the use of an array and a set with the Bloom filter, the sketch type that have been observed, you do of hash functions. There are significant achieves a compact representation of not need to retain each item. Instead, differences in the details, however. The the input, with a trade-off in accuracy. a simple counter suffices, incremented sketch is formed by an array of coun- Both provide some probability of an with each observation. The counter has ters and a set of hash functions that unsatisfactory answer. With a Bloom to be of sufficient bit depth in order to map items into the array. More precise- filter, the answers are binary, so there cope with the magnitude of events ob- ly, the array is treated as a sequence of is some chance of a false positive re- served. When the number of events rows, and each item is mapped by the sponse; with a Count-Min sketch, the gets truly huge, ideas such as Robert first hash function into the first row, answers are frequencies, so there is Morris’s approximate counter can be by the second hash function into the some chance of an inflated answer. used to provide such a counter in fewer second row, and so on (note that this What may be surprising at first bits12 (another example of a sketch). is in contrast to the Bloom filter, which is that the obtained estimate is very When there are different types of allows the hash functions to map onto good. Mathematically, it can be shown items, and you want to count each type, overlapping ranges). An item is pro- that there is a good chance that the the natural approach is to allocate a cessed by mapping it to each row in returned estimate is close to the cor- counter for each item. When the num- turn via the corresponding hash func- rect value. The quality of the estimate ber of item types grows huge, however, tion and incrementing the counters to depends on the number of rows in the you encounter difficulties. It may not which it is mapped. sketch (each additional row halves the be practical to allocate a counter for Given an item, the sketch allows its probability of a bad estimate) and on each item type. Even if it is, when the count to be estimated. This follows a the number of columns (doubling the number of counters exceeds the capac- similar outline to processing an up- number of columns halves the scale of ity of fast memory, the time cost of in- date: inspect the counter in the first the noise in the estimate). These guar- crementing the relevant counter may row where the item was mapped by the antees follow from the random selec- become too high. For example, a social first hash function, and the counter in tion of hash functions and do not rely network such as Twitter may wish to the second row where it was mapped on any structure or pattern in the data track how often a tweet is viewed when by the second hash, and so on. Each distribution that is being summarized. displayed via an external website. There row has a counter that has been in- For a sketch of size s, the error is pro- are billions of Web pages, each of which cremented by every occurrence of the portional to 1/s. This is an improve- ment over the case for sampling where, Figure 2. Count-min sketch data structure with four rows, nine columns. as noted earlier, the corresponding be- havior is proportional to 1/√s. Just as Bloom filters are best suited +c for the cases where false positives can h1 be tolerated and mitigated, Count-Min +c sketches are best suited for handling i a slight inflation of frequency. This +c means, in particular, they do not ap- hd ply to cases where a Bloom filter might +c be used: if it matters a lot whether an item has been seen or not, then the uncertainty that the Count-Min sketch

52 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 practice introduces will obscure this level of have been seen out of a large set of precision. The sketches are very good possibilities. For example, a Web pub- for tracking which items exceed a giv- lisher might want to track how many en popularity threshold, however. In different people have been exposed particular, while the size of a Bloom to a particular advertisement. In this filter must remain proportional to the Successful case, you would not want to count the size of the input it is representing, a applications of same viewer more than once. When Count-Min sketch can be much more the number of possible items is not too compressive: its size can be considered sketching involve large, keeping a list, or a binary array, to be independent of the input size, de- a mixture of is a natural solution. As the number of pending instead on the desired accu- possible items becomes very large, the racy guarantee only (that is, to achieve algorthmic tricks, space needed by these methods grows a target accuracy of ε, fix a sketch size of proportional to the number of items s proportional to 1/ε that does not vary systems know-how, tracked. Switching to an approximate over the course of processing data). and mathematical method such as a Bloom filter means The Twitter scenario mentioned pre- the space remains proportional to the viously is a good example. Tracking the insight, and have number of distinct items, although the number of views that a tweet receives led to new research constants are improved. across each occurrence in different Could you hope to do better? If websites creates a large enough volume contributions in you just counted the total number of of data to be difficult to manage. More- each of these areas. items, without removing duplicates, over, the existence of some uncertainty then a simple counter would suffice, in this application seems acceptable: using a number of bits that is propor- the consequences of inflating the pop- tional to the logarithm of the number ularity of one website for one tweet are of items encountered. If only there minimal. Using a sketch for each tweet were a way to know which items were consumes only moderately more space new, and count only those, then you than the tweet and associated meta- could achieve this cost. data, and allows tracking which venues The HyperLogLog (HLL) algorithm attract the most attention for the tweet. promises something even stronger: the Hence, a kilobyte or so of space is suf- cost needs to depend only on the loga- ficient to track the percentage of views rithm of the logarithm of the quantity from different locations, with an error computed. Of course, there are some of less than one percentage point, say. scaling constants that mean the space Since their introduction over a de- needed is not quite so tiny as this might cade ago,7 Count-Min sketches have suggest, but the net result is that quan- found applications in systems that track tities can be estimated with high preci- frequency statistics, such as popularity sion (say, up to a 1%–2% error) with a of content within different groups—say, couple of kilobytes of space. online videos among different sets of us- The method. The essence of this ers, or which destinations are popular method is to use hash functions ap- for nodes within a communications net- plied to item identifiers to determine work. Sketches are used in telecommu- how to update counters so that dupli- nications networks where the volume of cate items are treated identically. A data passing along links is immense and Bloom filter has a similar property: at- is never stored. Summarizing network tempting to insert an item already rep- traffic distribution allows hotspots to be resented within a Bloom filter means detected, informing network-planning setting a number of bits to 1 that are decisions and allowing configuration already recording 1 values. One ap- errors and floods to be detected and proach is to keep a Bloom filter and debugged.6 Since the sketch compactly look at the final density of 1s and 0s to encodes a frequency distribution, it can estimate the number of distinct items also be used to detect when a shift in represented (taking into account col- popularities occurs, as a simple example lisions under hash functions). This of anomaly detection. still requires space proportional to the number of items and is the basis of ear- Counting Distinct Items ly approaches to this problem.15 with HyperLogLog To break this linearity, a different Another basic problem is keeping approach to building a binary coun- track of how many different items ter is needed. Instead of adding 1 to

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 53 practice

the counter for each item, you could A last interesting application of dis- 3 2 1 add 1 with a probability of one-half, tinct counting is in the context of social 2 with a probability of one-fourth, 4 network analysis. In 2016, Facebook set with a probability of 1/8th, and so on. The estimate is obtained by taking out to test the “six degrees of separa- This use of randomness decreases the 2 to the power of each of the array en- tion” claim within its social network. reliability of the counter, but you can tries and computing the sum of the The Facebook friendship graph is suffi- check that the expected count corre- reciprocals of these values, obtaining ciently large (more than a billion nodes sponds to the true number of items 1/8 + 1/4 + 1/2 = 7/8 in this case. The and hundreds of billions of edges) encountered. This makes more sense final estimate is made by multiplying that maintaining detailed information 2 when using hash functions. Apply a αss by the reciprocal of this sum. Here, about the distribution of long-range hash function g to each item i, with αs is a scaling constant that depends on connections for each user would be in- the same distribution: g maps items to s. α3 = 0.5305, so 5.46 is obtained as the feasible. Essentially, the problem is to j with probability 2−j (say, by taking the estimate—close to the true value of 5. count, for each user, how many friends number of leading zero bits in the bi- The analysis of the algorithm is they have at distance 1, 2, 3, and so on. nary expansion of a uniform hash val- rather technical, but the proof is in the This would be a simple graph explora- ue). You can then keep a set of bits in- deployment: the algorithm has been tion problem, except that some friends dicating which j values have been seen widely adopted and applied in practice. at distance 2 are reachable by multiple so far. This is the essence of the early Discussion and applications. One paths (via different mutual friends). Flajolet-Martin approach to tracking example of HLL’s use is in tracking Hence, distinct counting is used to gen- the number of distinct items.8 Here a the viewership of online advertising. erate accurate statistics on reachability logarithmic number of bits is needed, Across many websites and differ- without double counting and to provide as there are only this many distinct j ent advertisements, trillions of view accurate distance distributions (the es- values expected. events may occur every day. Advertis- timated number of degrees of separa- The HLL method reduces the num- ers are interested in the number of tion in the Facebook graph is 3.57).2 ber of bits further by retaining only the “uniques:” how many different people highest j value that has been seen when (or rather, browsing devices) have been Advanced Sketching applying the hash function. This might exposed to the content. Collecting and Roughly speaking, the four examples be expected to be correlated to the car- marshaling this data is not infeasible, of sketching described in this article dinality, although with high variation but rather unwieldy, especially if it cover most of the current practical ap- for example, there might be only a sin- is desired to do more advanced que- plications of this model of data sum- gle item seen, which happens to hash to ries (say, to count how many uniques marization. Yet, unsurprisingly, there a large value. To reduce this variation, saw both of two particular advertise- is a large body of research into new the items are partitioned into groups ments). Use of HLL sketches allows applications and variations of these using a second hash function (so the this kind of query to be answered di- ideas. Just around the corner are a host same item is always placed in the same rectly by combining the two sketches of new techniques for data summariza- group), and information about the larg- rather than trawling through the full tion that are on the cusp of practicality. est hash in each group is retained. Each data. Sketches have been put to use This section mentions a few of the di- group yields an estimate of the local car- for this purpose, where the small rections that seem most promising. dinality; these are all combined to ob- amount of uncertainty from the use Sketching for dimensionality reduc- tain an estimate of the total cardinality. of randomness is comparable to other tion. When dealing with large high- A first effort would be to take the sources of error, such as dropped data dimensional numerical data, it is mean of the estimates, but this still or measurement failure. common to seek to reduce the dimen- allows one large estimate to skew the Approximate distinct counting is sionality while preserving fidelity of result; instead, the harmonic mean also widely used behind the scenes the data. Assume the hard work of data is used to reduce this effect. By hash- in Web-scale systems. For example, wrangling and modeling is done and ing to s separate groups, the standard Google’s Sawzall system provides a the data can be modeled as a massive error is proportional to 1/√s. A small variety of sketches, including count matrix, where each row is one example example is shown in Figure 3. The fig- distinct, as primitives for log data point, and each column encodes an ure shows a small example HLL sketch analysis.13 Google engineers have de- attribute of the data. A common tech- with s = 3 groups. Consider five distinct scribed some of the implementation nique is to apply PCA (principal com- items a, b, c, d, e with their related modifications made to ensure high ponents analysis) to extract a small hash values. From this, the following accuracy of the HLL across the whole number of “directions” from the data. array is obtained: range of possible cardinalities.10 Projecting each row of data along each of these directions yields a different Figure 3. Example of HyperLogLog in action. representation of the data that captures most of the variation of the dataset. x a b c d e One limitation of PCA is that find- h(x) 1 2 3 1 3 ing the direction entails a substantial g(x) 0001 0011 1010 1101 0101 amount of work. It requires finding eigenvectors of the covariance matrix,

54 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 practice which rapidly becomes unsustainable the number of rows. Instead, applying that solving a problem a certain way for large matrices. The competing ap- sketching to matrix A solves the prob- is the only option. Often, fast approxi- proach of random projections argues lem in the lower-dimensional sketch mate sketch-based techniques can pro- that rather than finding “the best” di- space.5 David Woodruff provides a vide a different trade-off. rections, it suffices to use (a slightly comprehensive mathematical survey larger number of) random vectors. of the state of the art in this area.16 Related articles Picking a moderate number of ran- Rich data: Graphs and geometry. The on queue.acm.org dom directions captures a comparable applications of sketching so far can be It Probably Works amount of variation, while requiring seen as summarizing data that might Tyler McMullen much less computation. be thought of as a high-dimensional http://queue.acm.org/detail.cfm?id=2855183 The random projection of each row vector, or matrix. These mathematical Statistics for Engineers of the data matrix can be seen as an ex- abstractions capture a large number of Heinrich Hartmann ample of a sketch of the data. More di- situations, but, increasingly, a richer http://queue.acm.org/detail.cfm?id=2903468 rectly, close connections exist between model of data is desired—say, to model random projections and the sketches links in a social network (best thought of References 1. Ahn, K.J., Guha, S., McGregor, A. Analyzing graph described earlier. The Count-Min sketch as a graph) or to measure movement pat- structure via linear measurements. In Proceedings of the can be viewed as a random projection of terns of mobile users (best thought of as ACM-SIAM Symposium on Discrete Algorithms, (2012). 2. Bhagat, S., Burke, M., Diuk, C., Filiz, I.O., Edunov, S. sorts; moreover, the best constructions points in the plane or in 3D). Sketching Three-and-a-half degrees of separation. Facebook of random projections for dimension- ideas have been applied here also. Research, 2016; https://research.fb.com/three-and-a- half-degrees-of-separation/. ality reduction look a lot like Count- For graphs, there are techniques 3. Bloom, B. Space/time trade-offs in hash coding with Min sketches with some twists (such as to summarize the adjacency informa- allowable errors. Commun. ACM 13, 7 (July 1970), 422–426. randomly multiplying each column of tion of each node, so that connectivity 4. Broder, M., Mitzenmacher, A. Network applications the matrix by either -1 or 1). This is the and spanning tree information can be of Bloom filters: a survey.Internet Mathematics 1, 4 (2005), 485–509. 1 basis of methods for speeding up high- extracted. These methods provide a 5. Clarkson, K.L., Woodruff, D.P. Low rank approximation surprising mathematical insight that and regression in input sparsity time. In Proceedings dimensional machine learning, such as of the ACM Symposium on Theory of Computing, the Hash Kernels approach.14 much edge data can be compressed (2013), 81–90. 6. Cormode, G., Korn, F., Muthukrishnan, S., Johnson, T., Randomized numerical linear al- while preserving fundamental informa- Spatscheck, O., Srivastava, D. 2004. Holistic UDAFs gebra. A grand objective for sketching tion about the graph structure. These at streaming speeds. In Proceedings of the ACM SIGMOD International Conference on Management of is to allow arbitrary complex mathe- techniques have not found significant Data, (2004), 35–46. matical operations over large volumes use in practice yet, perhaps because of 7. Cormode, G., Muthukrishnan, S. An improved data stream summary: the Count-Min sketch and its of data to be answered approximately high overheads in the encoding size. applications. J. Algorithms 55, 1 (2005), 58–75. and quickly via sketches. While this For geometric data, there has been 8. Flajolet, P., Martin, G.N. 1985. Probabilistic counting. In Proceedings of the IEEE Conference on objective appears quite a long way off, much interest in solving problems such Foundations of Computer Science, 1985, 76–82. Also and perhaps infeasible because of some as clustering.9 The key idea here is that in J. Computer and System Sciences 31, 182–209. 9. Guha, S., Mishra, N., Motwani, R., O’Callaghan, L. impossibility results, a number of core clustering part of the input can capture Clustering data streams. In Proceedings of the IEEE mathematical operations can be solved a lot of the overall structural informa- Conference on Foundations of Computer Science, 2000. 10. Heule, S., Nunkesser, M., Hall, A. HyperLogLog in using sketching ideas, which leads tion, and by merging clusters together practice: Algorithmic engineering of a state of the art to the notion of randomized numeri- (clustering clusters) you can retain a cardinality estimation algorithm. In Proceedings of the International Conference on Extending Database cal linear algebra. A simple example is good picture of the overall point density Technology, 2013. distribution. 11. Jermaine, C. Sampling techniques for massive data. matrix multiplication: given two large Synopses for massive data: samples, histograms, matrices A and B, you want to find their wavelets and sketches. Foundations and Trends in Why Should You Care? Databases 4, 1–3 (2012). G. Cormode, M. Garofalakis, product AB. An approach using sketch- P. Haas, and C. Jermaine, Eds. NOW Publishers. ing is to build a dimensionality-reduc- The aim of this article has been to 12. Morris, R. Counting large numbers of events in small registers. Commun. ACM 21, 10 (Oct. 1977), 840–842. ing sketch of each row of A and each col- introduce a selection of recent tech- 13. Pike, R., Dorward, S., Griesemer, R., Quinlan, S. umn of B. Combining each pair of these niques that provide approximate an- Interpreting the data: Parallel analysis with Sawzall. Dynamic Grids and Worldwide Computing 13, 4 (2005), provides an estimate for each entry of swers to some general questions that 277–298. the product. Similar to other examples, often occur in data analysis and manip- 14. Weinberger, K.Q., Dasgupta, A., Langford, J., Smola, A.J., Attenberg, J. Feature hashing for large-scale small answers are not well preserved, ulation. In all cases, simple alternative multitask learning. In Proceedings of the International but large entries are accurately found. approaches can provide exact answers, Conference on Machine Learning, 2009. 15. Whang, K.Y., Vander-Zanden, B.T., Taylor, H.M. A linear- Other problems that have been tack- at the expense of keeping complete time probabilistic counting algorithm for database led in this space include regression. information. The examples shown applications. ACM Trans. Database Systems 15, 2 (1990, 208. Here the input is a high-dimensional here have illustrated, however, that in 16. Woodruff, D. Sketching as a tool for numerical linear dataset modeled as matrix A and col- many cases the approximate approach algebra. Foundations and Trends in Theoretical Computer Science 10, 1–2 (2014), 1–157. umn vector b: each row of A is a data can be faster and more space efficient. point, with the corresponding entry of The use of these methods is growing. Graham Cormode is a professor of computer science b the value associated with the row. The Bloom filters are sometimes said to at the University of Warwick, U.K. Previously, he was a researcher at and AT&T on algorithms for data goal is to find regression coefficients x be one of the core technologies that management. He received the 2017 Adams Prize for his work on data analysis. that minimize ||A x- b|| 2. An exact so- “big data experts” must know. At the lution to this problem is possible but very least, it is important to be aware Copyright held by owner/author. costly in terms of time as a function of of sketching techniques to test claims Publication rights licensed to ACM. $15.00.

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 55 practice

DOI:10.1145/3106631 1. Review the Candidate’s Résumé Article development led by queue.acm.org Read every line of every résumé (and this goes for the really long ones that go on for four pages). Where have these Plan ahead to make the interview candidates worked? How long did they a successful one. stay in a role and did their positions change? These questions make for in- BY KATE MATSUDAIRA teresting conversation topics. Hope- fully there will be something in a can- didate’s background that piques your interest and can be great fodder for starting the interview with some com- 10 Ways to mon ground. This can put candidates at ease, giving them their greatest chance of success.

2. Review Feedback from Be a Better Previous Interviews Most software companies have a lon- ger interview process that can start with phone-screen or homework Interviewer problems and evolve from there. If the candidate has done homework prob- lems, or your teammates have taken the time to type up feedback, do your due diligence and read it. These can also be a great source of material for questions, but more importantly, it is unprofessional to ask the same ques- tions that have already been posed to the candidate. This is partly because IN MANY WAYS interviewing is an art. You have one you will not learn as much from re- hour (more if you count the cumulative interview time) peated questions, but also because the candidate will be bored or unim- to determine if the candidate has the desired skills, pressed going over the same ground. and, more importantly, if you would enjoy working Great candidates want to be chal- with this person. That is a lot of ground to cover. lenged, and an interview team where people are asking the same questions As if finding out all that information isn’t a daunting makes the candidate think the team is enough task, you also need to make sure that the disorganized or unimaginative.

candidate has a positive experience while visiting your 3. Use Calibrated Questions company (after all, people talk and you want them to be Interviews are not the time to try saying good things—since this candidate may not be something new. Take the time to do new problems on your own or test your next hire, but someone he or she meets may be). them on your peers. Come to the in- As an interviewer, the key to your success is terview with questions that you were preparation. Planning will help ensure the success of given in your interview (since you certainly will know how well you did) the interview (both in terms of getting the information or that you have already given to oth- you need and giving the candidate a good impression). ers. Testing new material can really hurt a candidate’s chances for suc- The following list is advice to consider prior to stepping cess or, worse, give him or her a bad into that room with two chairs and a whiteboard. impression of the company when you

56 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 are not prepared to answer clarifying to you to assess it, ask one of your plan to ask and how long each should questions. You get the most from in- teammates to be your guinea pig take. Each question should have clear terviews when you can compare the re- (as a manager I often offer to be the goals and focus on specific competen- sults of one with another, particularly interviewee for my team to test out cies for the position. Ideally, the ques- with the results of a successful hire or new questions; after all, isn’t it fun tions should be different from one an- peer—so try to come to the interview to turn the tables and interview your other and give you a feel for multiple with questions that will help you make manager?). Seeing where the people areas of the candidate’s experience this comparison. you know and respect get stuck, or and background. I like to ask about how long they take to solve it, will five questions, so a typical agenda 4. Test New Questions on give you a good baseline for compari- might look like this: Yourself and Your Peers son with future candidates. ˲˲ Warm-up question about the can- If you do have a new question you didate’s background (or common in- want to give a dry run, have someone 5. Create a Timeline terest): 5–10 minutes ask you to answer it. Where do you for the Interview ˲˲ Problem-solving question that get hung up? How long does it take You should walk into every interview involves coding of some sort: 10–20

PHOTO BY DRAGON IMAGES DRAGON BY PHOTO you? If the problem is too familiar with a schedule: what questions you minutes

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 57 practice

˲˲ Design question: 10–15 minutes 8. Bring a List of Questions ˲˲ Two to three cultural or situational to the Interview questions: 5–10 minutes No candidate will think less of you for ˲˲ Time to answer the candidate’s coming in with written questions, and questions in fact some may appreciate that you Don’t write notes prepared the same way they did. This 6. Head In With a Positive Attitude on the résumé. will also help you establish your game You want the candidate to have a plan and agenda so you don’t forget. good experience with the company Someone once told Another one of my favorite tips is al- and your process. If you are upbeat, me that in some ways to have spare questions for re- it is much more likely a qualified can- ally good interviews (that get through didate will accept the position. If you cultures, business all the material quickly) or for bad in- are not, people talk and it is a small terviews (where you don’t want to ask world. You want candidates to think cards and résumés your prepared questions because they well of the company and feel they are considered are too hard). were treated fairly. It’s like karma— what goes around comes around. To a reflection 9. Be Collaborative ensure this happens, try to make your of the person, You want the candidate to be success- questions and hints feel collabora- ful, so try to approach a problem to- tive, and whatever you do, do not in- and writing on them gether. I know many other managers sult any candidates or make them feel can be insulting. who have moved to a pair-program- stupid. They are probably nervous ming model where the interviewer and and you already have the job—there is the candidate code a problem together nothing to prove, so make an effort to in an editor or Google doc. give them a fair shot. 10. Try To Make the Problems 7. Take Notes Feel As Real-World As Possible Seems obvious, but so many people Smart people want to be challenged. don’t take notes. Even if you have a They also would love to get a taste of photographic memory, taking the time what it is like to work at your company. to write down a few things here and Do your best to come up with questions there will indicate to the candidate you that at least hint at some of the prob- are paying attention and are genuinely lems you might solve (or problems that interested in what he or she has to say. relate to the underlying theory of the As an avid note-taker, here are some of work you do). my favorite tips: Of course, there is no right way to ˲˲ Try not to use a laptop. Yes, it is do an interview, but you can always probably faster and more efficient, but be better. Make an effort to make your it can be a physical divider between you candidates as comfortable as possible and the candidate, not to mention off- so they have the greatest chance for putting. When an interviewer uses a success. Happy hiring! computer during an interview, it is easy to think that he or she is not paying at- Related articles tention to what the candidate has to say. on queue.acm.org ˲˲ Instead of writing code/drawings on a whiteboard, try paper. This may Interviewing Techniques George Neville-Neil be more comfortable for most people http://queue.acm.org/detail.cfm?id=1998475 than standing up at a whiteboard, Nine Things I Didn’t Know I Would Learn and you can take the paper with you, Being an Engineer Manager which is better than any copied white- Kate Matsudaira board code. http://queue.acm.org/detail.cfm?id=2935693 ˲˲ Don’t write notes on the résumé. 10 Optimizations on Linear Search Someone once told me that in some Thomas A. Limoncelli cultures, business cards and résu- http://queue.acm.org/detail.cfm?id=2984631 més are considered a reflection of the person, and writing on them can be Kate Matsudaira (katemats.com) is the founder of her own company, Popforms. Previously she worked at insulting. While I personally haven’t Microsoft and Amazon as well as startups like Decide, encountered anyone who felt this way, Moz, and Delve Networks.

I am sure never to do this (and bring Copyright held by owner/author. my own paper) just in case. Publication rights licensed to ACM. $15.00

58 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 ACM Europe Conference Barcelona, Spain | 7 – 8 September 2017

The ACM Europe Conference, hosted in Barcelona by the Barcelona Supercomputing Center, aims to bring together computer scientists and practitioners interested in exascale high performance computing and cybersecurity.

The High Performance Computing track includes a panel discussion of top world experts in HPC to review progress and current plans for the worldwide roadmap toward exascale computing. The Cybersecurity track will review the latest trends in this very hot field. High-level European Commission officials and representatives of funding agencies are participating.

Keynote Talk by ACM 2012 Turing Award Laureate Silvio Micali, “ALGORAND: A New Distributed Ledger”

Co-located events: • ACM Europe Celebration of Women in Computing: WomENcourage 2017 (Requires registration, https://womencourage.acm.org/) • EXCDI, the European Extreme Data & Computing Initiative (https://exdci.eu/) • Eurolab-4-HPC (https://www.eurolab4hpc.eu/) • HiPEAC, the European Network on High Performance and Embedded Architecture and Compilation (https://www.hipeac.net/)

Conference Chair: Mateo Valero, Director of the Barcelona Supercomputing Center

Registration to the ACM Europe Conference is free of charge for ACM members and attendees of the co-located events.

Europe Council

http://acmeurope-conference.acm.org

acm-europe-conference-cacm-ad-2017.indd 1 5/30/17 12:18 PM contributed articles

DOI:10.1145/3122814 Answering questions correctly from standardized eighth-grade science tests is itself a test of machine intelligence.

BY CARISSA SCHOENICK, PETER CLARK, OYVIND TAFJORD, PETER TURNEY, AND OREN ETZIONI Moving Beyond the Turing Test with the Allen AI Science Challenge

THE FIELD OF artificial intelligence has made great strides recently, as in AlphaGo’s victories in the game key insights of Go over world champion South Korean Lee Sedol in ˽˽ Determining whether a system truly displays artificial intelligence is March 2016 and top-ranked Chinese Go player Ke Jie difficult and complex, and well-known assessments like the Turing Test are not in May 2017, leading to great optimism for the field. suited to the task.

But are we really moving toward smarter machines, ˽˽ The Allen Institute for Artificial Intelligence suggests that answering or are these successes restricted to certain classes of science exam questions successfully is problems, leaving others untouched? In 2015, the a better measure of machine intelligence and designed a global competition to Allen Institute for Artificial Intelligence (AI2) ran its engage the research community in first Allen AI Science Challenge, a competition to test this approach. ˽˽ The outcome of the Allen AI Science machines on an ostensibly difficult task—answering Challenge highlights the current limitations of AI research in language eighth-grade science questions. Our motivations were understanding, reasoning, and to encourage the field to set its sights more broadly by commonsense knowledge; the highest scores are still limited to the capabilities exploring a problem that appears to require modeling, of information-retrieval methods.

60 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 reasoning, language understanding, and ligent.11 As the field of AI has grown, Turing Test is more a test of human commonsense knowledge in order to the test has become less meaningful gullibility than machine intelligence. probe the state of the art while sowing the as a challenge task for several reasons. Finally, the test as originally conceived seeds for possible future breakthroughs. First, in its details, it is not well defined is pass/fail rather than scored, thus Challenge problems have histori- (such as Who is the person giving the providing no measure of progress to- cally played an important role in moti- test?). A computer scientist would ward a goal, something essential for vating and driving progress in research. likely know good distinguishing ques- any challenge problem.a,b For a field striving to endow machines tions to ask, while a random member Machine intelligence today is viewed with intelligent behavior (such as lan- of the general public may not. What less as a binary pass/fail attribute and guage understanding and reasoning), constraints are there on the interac- challenge problems that test such skills tion? What guidelines are provided are essential. to the judges? Second, recent Turing a Turing himself did not conceive of the Turing Test as a challenge problem to drive the field In 1950, Alan Turing proposed the Test competitions have shown that, forward but rather as a thought experiment now well-known Turing Test as a pos- in certain formulations, the test it- to explore a useful alternative to the question sible test of machine intelligence: If a self is gameable; that is, people can Can machines think? system can exhibit conversational be- be fooled by systems that simply re- b Although one can imagine metrics that quan- tify performance on the Turing Test, the im- havior that is indistinguishable from trieve sentences and make no claim precision of the task definition and human 2,3 that of a human during a conversation, of being intelligent. John Markoff variability make it difficult to define metrics

PHOTO BY PANITAN PHOTO, WITH ROBOT ILLUSTRATION BY PETER CROWTHER ASSOCIATES BY ILLUSTRATION WITH ROBOT PHOTO, PANITAN BY PHOTO that system could be considered intel- of The New York Times wrote that the that are reliably reproducible.

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 61 contributed articles

more as a diverse collection of capabili- turn result in more or less energy being tion answering or unfair advantage of ties associated with intelligent behav- consumed. Understanding the question additional training examples. A week ior. Rather than a single test, cognitive also requires the system being able to before the end of the competition, we scientist Gary Marcus of New York Uni- recognize that “energy” in this context provided the final test set of 21,298 versity and others have proposed the no- refers to resource consumption for the questions (including the validation tion of series of tests—a Turing Olym- purposes of transportation, as opposed set) to participants to use to produce a pics of sorts—that could assess the full to other forms of energy one might find final score for their models, of which gamut of AI, from robotics to natural in a science exam (such as electrical and 2,583 were legitimate. We licensed the language processing.9,12 kinetic/potential). data for the competition from private Our goal with the Allen AI Science assessment-content providers that did Challenge was to operationalize one AI vs. Eighth Grade not wish to allow the use of their data such test—answering science-exam To put this approach to the test, AI2 beyond the constraints of the competi- questions. Clearly, the Science Chal- designed and hosted The Allen AI Sci- tion, though AI2 made some subsets of lenge is not a full test of machine in- ence Challenge, a four-month-long the questions available on its website telligence but does explore several competition in partnership with Kaggle http://allenai.org/data. capabilities strongly associated with in- (https://www.kaggle.com/) that began in Baselines and scores. As these ques- telligence—capabilities our machines October 2015 and concluded in Febru- tions were all four-way multiple choice, need if they are to reliably perform the ary 2016.7 Researchers worldwide were a standard baseline score using random smart activities we desire of them in the invited to build AI software that could guessing was 25%. AI2 also generated future, including language understand- answer standard eighth-grade multiple- a baseline score using a Lucene search ing, reasoning, and use of common- choice science questions. The competi- over the Wikipedia corpus, producing sense knowledge. Doing well on the tion aimed to assess the state of the art scores of 40.2% on the training set and challenge appears to require significant in AI systems utilizing natural language 40.7% on the final test set. The final re- advances in AI technology, making it a understanding and knowledge-based sults of the competition was quite close, potentially powerful way to advance the reasoning; how accurately the partici- with the top three teams achieving field. Moreover, from a practical point pants’ models could answer the exam scores with a spread of only 1.05%. The of view, exams are accessible, measur- questions would serve as an indicator of highest score was 59.31%. able, understandable, and compelling. how far the field has come in these areas. One of the most interesting and Participants. A total of 780 teams First Place appealing aspects of science exams is participated during the model-build- Top prize went to Chaim Linhart of their graduated and multifaceted na- ing phase, with 170 of them eventually Hod HaSharon, Israel (Kaggle data ture; different questions explore dif- submitting a final model. Participants science website https://www.kaggle. ferent types of knowledge, varying sub- were required to make the code for their com username Cardal). His model stantially in difficulty, especially for models available to AI2 at the close of achieved a final score of 59.31% cor- a computer. There are questions that the competition to validate model per- rect on the test question set of 2,583 are easily addressed with a simple fact formance and confirm they followed questions using a combination of 15 lookup, like this contest rules. At the conclusion of the gradient-boosting models, each with competition, the winners were also ex- a different subset of features. Unlike How many chromosomes does the pected to make their code open source. the other winners’ models, Linhart’s human body cell contain? The three teams achieving the highest model predicted the correctness of (A) 23 scores on the challenge’s test set re- each answer option individually. Lin- (B) 32 ceived prizes of $50,000, $20,000, and hart used two general categories of (C) 46 $10,000, respectively. features to make these predictions; (D) 64 Data. AI2 licensed a total of 5,083 the first consisted of information- eighth-grade multiple-choice science retrieval-based features, applied by Then there are questions requiring questions from providing partners searching over corpora he compiled extensive understanding of the world, for the purposes of the competition. from various sources (such as study- like this All questions were standard multiple- guide or quiz-building websites, open choice format, with four answer op- source textbooks, and Wikipedia). City administrators can encourage tions, as in the earlier examples. From His searches used various weightings energy conservation by this collection, we provided partici- and stemmed words to optimize per- (A) lowering parking fees pants with a set of 2,500 training ques- formance. The other flavor of features (B) building larger parking lots tions to train their models. We used a used in his ensemble of 15 models (C) decreasing the cost of gasoline validation set of 8,132 questions during was based on properties of the ques- (D) lowering the cost of bus and sub- the course of the competition for con- tions themselves (such as length of way fares firming model performance. Only 800 question and answer, form of answer of the validation questions were legiti- like numeric answer options, answers This question requires the knowl- mate; we artificially generated the rest containing referential clauses like edge that certain activities and incen- to disguise the real questions in order “none of the above” as an option, and tives result in human behaviors that in to prevent cheating via manual ques- relationships among answer options).

62 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 contributed articles

Linhart explained that he used sev- tions obtained from an educational- eral smaller gradient-boosting models flashcard-building site, then created instead of one big model to maximize negative examples by mixing terms with diversity. One big model tends to ignore random definitions. A supervised classi- some important features because it re- fier was trained on these incorrect pairs, quires a very large training set to ensure In the end, each and the output was used to generate fea- it pays attention to all potentially useful of the winning tures for input to XGBoost. features present. Linhart’s use of sever- al small models required that the learn- models gained Third Place ing algorithm use features it would oth- from information- The third-place winner was Alejandro erwise ignore, an advantage, given the Mosquera from Reading, U.K. (Kaggle relatively limited training data available retrieval-based username Alejandro Mosquera), with a in the competition. score of 58.26%. Mosquera approached The information-retrieval-based methods, indicative the challenge as a three-way classifica- features alone could achieve scores as of the state of AI tion problem for each pair of answer op- high as 55% by Linhart’s estimation. His tions. He transformed answer choices A, question-form features filled in some technology in this B, C, and D to all 12 possible pairs (A,B), remaining gaps to bring the system up area of research. (A,C), ..., (D,C) he labeled with three to approximately 60% correct. He com- classes: left-pair element is correct; right bined his 15 models using a simple is correct; or neither is correct. He then weighted average to yield the final score classified the pairs using logistic re- for each choice. He credited careful cor- gression. This three-way classification pus selection as one of the primary ele- is easier for supervised learning algo- ments driving the success of his model. rithms than the more natural two-way (correct vs. incorrect) classification with Second Place four choices, because the two-way clas- The second-place team, with a score of sification requires an absolute decision 58.34%, was from a social-media-analyt- about a choice, whereas the three-way ics company based in Luxembourg called classification requires only a relative Talkwalker (https://www.talkwalker. ranking of the choices. Mosquera made com), led by Benedikt Wilbertz (Kaggle use of three types of features: informa- username poweredByTalkwalker). tion-retrieval-based features based on The Talkwalker team built a relatively scores from Elastic Search using Lucene large corpus compared to other winning over a corpus; vector-based features that models, using 180GB of disk space af- measured question-answer similarity by ter indexing with Lucene. Feature types comparing vectors from word2vec; and included information-retrieval-based question-form features that considered features, vector-based features (scoring such aspects of the data as the structure question-answer similarity by compar- of a question, length of a question, and ing vectors from word2vec, a two-layer answer choices. Mosquera also noted neural net that processes text, and that careful corpus selection was crucial GloVe, an unsupervised learning algo- to his model’s success. rithm (for obtaining vector representa- tions for words), pointwise mutual infor- Lessons mation features (measured between the In the end, each of the winning mod- question and target answer, calculated els gained from information-retrieval- on the team’s large corpus), and string based methods, indicative of the state hashing features in which term-defini- of AI technology in this area of research. tion pairs were hashed and a supervised AI researchers intent on creating a ma- learner was then trained to classify pairs chine with human-like intelligence are as correct or incorrect. A final model unable to ace an eighth-grade science used them to learn pairwise ranking exam because they do not currently have between the answer options using the AI systems able to go beyond surface text XGBoost library, an implementation of to a deeper understanding of the mean- gradient-boosted decision trees. ing underlying each question, then use Wilbertz’s use of string hashing fea- reasoning to find the appropriate an- tures was unique, not tried by either swer. All three winners said it was clear of the other two winners nor currently that applying a deeper, semantic level of used in AI2’s Project Aristo. His team reasoning with scientific knowledge to used a corpus of terms and defini- the questions and answers would be the

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 63 contributed articles

key to achieving scores of 80% and high- reasoning required to successfully an- 4. Berant, J., Chou, A., Frostig, R., and Liang, P. Semantic parsing on Freebase from question-answer pairs. In er and demonstrating what might be swer these example questions. Ques- Proceedings of the 2013 Conference on Empirical considered true artificial intelligence. tion-answering systems developed for Methods in Natural Language Processing (Seattle, WA, Oct. 18–21). Association for Computational Linguistics, A few other example questions each the message-understanding conferenc- Stroudsburg, PA, 2013, 6. of the top three models got wrong high- es6 and text-retrieval conferences13 have 5. Fader, A., Zettlemoyer, L., and Etzioni, O. Open question answering over curated and extracted knowledge light the more interesting, complex nu- historically focused on retrieving an- bases. In Proceedings of the 20th ACM SIGKDD ances of language and chains of reason- swers from text, the former from news- International Conference on Knowledge Discovery and Data Mining (New York, Aug. 24–27). ACM Press, New ing an AI system must be able to handle wire articles, the latter from various York, 2014. in order to answer the following ques- large corpora (such as the Web, micro- 6. Grishman, R. and Sundheim, B. Message understanding Conference-6: A brief history. In Proceedings of the 16th tions correctly and for which informa- blogs, and clinical data). More recent Conference on Computational Linguistics (Copenhagen, Denmark, Aug. 5–9). Association for Computational tion-retrieval methods are not sufficient: work has focused on answer retrieval Linguistics, Stroudsburg, PA, 1996, 466–471. from structured data (such as “In which 7. Kaggle. The Allen AI Science Challenge; https://www. kaggle.com/c/the-allen-ai-science-challenge What do earthquakes tell scientists city was Bill Clinton born?” from Free- 8. Katz, B., Borchardt, G., and Felshin, S. Natural language about the history of the planet? Base, a large publicly available collab- annotations for question answering. In Proceedings th 4,5,15 of the 19 International Florida Artificial Intelligence (A) Earth’s climate is constantly orative knowledgebase). However, Research Society Conference (Melbourne Beach, FL, changing. these systems rely on the information May 11–13). AAAI Press, Menlo Park, CA, 2006. 9. Marcus, G., Rossi, F., and Veloso, M., Eds. Beyond the (B) The continents of Earth are con- being stated explicitly in the underly- Turing Test. AI Magazine (Special Edition) 37, 1 (Spring tinually moving. ing data and are unable to perform the 2016). 10. Simmons, J. True Knowledge: The natural language (C) Dinosaurs became extinct about reasoning steps that would be required question answering Wikipedia for facts. Semantic Focus 65 million years ago. to conclude this information from indi- (Feb. 26, 2008); http://www.semanticfocus.com/blog/ entry/title/true-knowledge-the-natural-language- (D) The oceans are much deeper to- rect supporting evidence. question-answering-wikipedia-for-facts/ day than millions of years ago. A few systems attempt some form 11. Turing, A.M. Computing machinery and intelligence. Mind 59, 236 (Oct. 1950), 433–460. 14 of reasoning: Wolfram Alpha answers 12. Turk, V. The plan to replace the Turing Test with a This involves the causes behind mathematical questions, providing they ‘Turing Olympics.’ Motherboard (Jan. 28, 2015); https:// motherboard.vice.com/en_us/article/the-plan-to- earthquakes and the larger geographic are stated either as equations or with replace-the-turing-test-with-a-turing-olympics 10 13. Voorhees, E. and Ellis, A., Eds. In Proceedings of the phenomena of plate tectonics and is not relatively simple English; Evi is able to 24th Text REtrieval Conference (Gaithersburg, MD, Nov. easily solved by looking up a single fact. combine facts to answer simple ques- 17–20). Publication SP 500-319, National Institute of Standards and Technology, Gaithersburg, MD, 2015. Additionally, other true facts appear in tions (such as “Who is older: Barack or 14. Wolfram, S. Making the world’s data computable. the answer options (“Dinosaurs became Michelle Obama?”); and START,8 which Stephen Wolfram Blog (Sept. 24, 2010); http://blog. stephenwolfram.com/2010/09/making-the-worlds- extinct about 65 million years ago.”) but likewise is able to answer simple infer- data-computable/ must be intentionally identified and ence questions (such as “What South 15. Yao, X. and Van Durme, B. Information extraction over structured data: Question answering with Freebase. discounted as incorrect in the context American country has the largest popu- In Proceedings of the 52nd Annual Meeting of the of the question. lation?”) using Web-based databases. Association for Computational Linguistics (Baltimore, MD, June 22–27). Association for Computational However, none of them attempts the Linguistics, Stroudsburg, PA, 2014, 956–966. Which statement correctly describes level of complex question processing a relationship between the distance and reasoning that is indeed required to Carissa Schoenick ([email protected]) is the senior from Earth and a characteristic of a star? successfully answer many of the science program manager for Project Aristo at the Allen Institute (A) As the distance from Earth to the questions in the Allen AI Challenge. for Artificial Intelligence in Seattle, WA. star decreases, its size increases. Peter Clark ([email protected]) is the senior research (B) As the distance from Earth to the Looking Forward manager for Project Aristo at the Allen Institute for Artificial Intelligence in Seattle, WA. star increases, its size decreases. As the 2015 Allen AI Science Challenge Oyvind Tafjord ([email protected]) is a senior research (C) As the distance from Earth to the demonstrated, achieving a high score scientist and engineer at the Allen Institute for Artificial star decreases, its apparent brightness on a science exam requires a system Intelligence in Seattle, WA.

increases. that can do more than sophisticated Peter Turney ([email protected]) was a senior (D) As the distance from Earth to the information retrieval. Project Aristo at research scientist for Project Aristo at the Allen Institute star increases, its apparent brightness AI2 is focused on the problem of suc- for Artificial Intelligence in Seattle, WA, and is now retired. increases. cessfully demonstrating artificial in- Oren Etzioni ([email protected]) is the Chief Executive Officer of the Allen Institute for Artificial Intelligence telligence using standardized science in Seattle, WA, and a professor in the Allen School for This requires general common- exams, developing an assortment of ap- Computer Science at the University of Washington in Seattle, WA. sense-type knowledge of the physics of proaches to address the challenge. AI2 distance and perception, as well as the plans to release additional datasets and Copyright held by the authors. Publication rights licensed to ACM. $15.00 semantic ability to relate one statement software for the wider AI research com- to another within each answer option to munity in this effort.1 find the right directional relationship.

References Other Attempts 1. Allen Institute for Artificial Intelligence. Datasets; http://allenai.org/data While numerous question-answering 2. Aron, J. Software tricks people into thinking it is human. Watch the authors discuss systems have emerged from the AI com- New Scientist 2829 (Sept. 6, 2011). their work in this exclusive 3. BBC News. Computer AI passes Turing Test in ‘world Communications video. munity, none has addressed the chal- first.’BBC News (June 9, 2014); http://www.bbc.com/ https://cacm.acm.org/videos/ lenges of scientific and commonsense news/technology-27762088 moving-beyond-the-turing-test

64 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 DOI:10.1145/3122803 Even when checked by fact checkers, facts are often still open to preexisting bias and doubt.

BY PETTER BAE BRANDTZAEG AND ASBJØRN FØLSTAD Trust and Distrust in Online Fact-Checking Services

WHILE THE INTERNET has the potential to give people ready access to relevant and factual information, social media sites like Facebook and Twitter have made filtering and assessing online content increasingly difficult due to its rapid flow and enormous volume. In fact, 49% of social media users in the U.S. in 2012 received false breaking news through disseminated further and faster than social media.8 Likewise, a survey by ever before due to social media. Polit- Silverman11 suggested in 2015 that ical analysts continue to discuss mis- false rumors and misinformation information and fake news in social media and its effect on the 2016 U.S. key insights presidential election. Such misinformation challenges ˽˽ Though fact-checking services play the credibility of the Internet as a an important role countering online disinformation, little is known about whether venue for authentic public informa- users actually trust or distrust them. tion and debate. In response, over the

˽˽ The data we collected from social media past five years, a proliferation of out- discussions—on Facebook, Twitter, blogs, lets has provided fact checking and forums, and discussion threads in online debunking of online content. Fact- newspapers—reflects users’ opinions checking services, say Kriplean et al.,6 about fact-checking services. provide “… evaluation of verifiable ˽˽ To strengthen trust, fact-checking services claims made in public statements should strive to increase transparency in their processes, as well as in their through investigation of primary and organizations, and funding sources. secondary sources.” An international

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 65 contributed articles

Figure 1. Categorization of fact-checking services based on areas of concern. Figure 2. Example of Snopes debunking a social media rumor on Twitter Fact-checking services’ areas of concern (March 6, 2016); https://twitter.com/snopes/ Online rumors Political and Specific topics status/706545708233396225 and hoaxes public claims or controversies

Snopes.com FactCheck.org StopeFake Hoax-Slayer PolitiFact TruthBeTold ThruthOrFiction.com The Washington Post #RefugeeCheck Fact Checker HoaxBusters Climate Feedback CNN Reality Check Viralgranskaren - Metro Brown Moses Blog Full Fact (continued as Bellingcat)

Figure 3. Outline of our research approach; posts collected October 2014 to March 2015.

Search Filter Content Meltwater irrelevant posts analysis

Blogs Data Corpus Dataset Findings 1,741 posts 595 posts Trustworthiness Discussion and usefulness forums

Online more political or controversial issues newspaper a fact-checking service covers, the comments more it needs to build a reputation for usefulness and trustworthiness. Research suggests the trustwor- Table 1. Coding scheme we used to analyze the data. thiness of fact-checking services depends on their origin and owner- ship, which may in turn affect integ- Theme Sentiment Service described as rity perceptions10 and the transpar- Positive Useful, serving the purpose of fact checking 4 Usefulness ency of their fact-checking process. Negative Not as useful, often derogatory Despite these observations, we are Positive Reputable, expert, or acclaimed Ability unaware of any other research that Negative Lacking expertise or credibility has examined users’ perceptions of Positive Aiming for greater (social) good these services. Addressing the gap in Benevolence Negative Suspected of (social) ill will (such as through conspiracy, current knowledge, we investigated propaganda, or fraud) the research question: How do so- Positive Independent or impartial Integrity cial media users perceive the trust- Negative Dependent or partially or politically biased worthiness and usefulness of fact- checking services? Fact-checking services differ in census from 2017 counted 114 active ing has scarcely paid attention to the terms of their organizational aim fact-checking services, a 19% increase general public’s view of fact check- and funding,10 as well as their areas over the previous year.12 To benefit ing, focusing instead on how peo- of concern,11 that in turn may affect from this trend, Google News in 2016 ple’s beliefs and attitudes change in their trustworthiness. As outlined let news providers tag news articles response to facts that contradict their in Figure 1, the universe of fact- or their content with fact-checking own preexisting opinions. This re- checking services can be divided into information “… to help readers find search suggests fact checking in gen- three general categories based on fact checking in large news stories.”3 eral may be unsuccessful at reducing their area(s) of concern: political and Any organization can use the fact- misperceptions, especially among public statements in general, corre- checking tag, if it is non-partisan, the people most prone to believe sponding to the fact checking of poli- transparent, and targets a range of them.9 People often ignore facts that ticians, as discussed by Nyhan and claims within an area of interest and contradict their current beliefs,2,13 Reifler;9 online rumors and hoaxes, not just one single person or entity. particularly in politics and controver- reflecting the need for debunking However, research into fact check- sial social issues.9 Consequently, the services, as discussed by Silverman;11

66 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 contributed articles and specific topics or controversies as through Facebook and Twitter. Fig- or particular conflicts or narrowly ure 2 is an example of a Twitter post scoped issues or events (such as the with content checked by Snopes. ongoing Ukraine conflict). We have focused on three ser- Analyzing Social Media vices—Snopes, FactCheck.org, and Consequently, the Conversations StopFake—all included in the Duke more political or To explore how social media users Reporters’ Lab’s online overview of perceive the trustworthiness and use- fact checkers (http://reporterslab.org/ controversial issues fulness of these services, we applied fact-checking/). They represent three a fact-checking a research approach designed to take categories of fact checkers, from on- advantage of unstructured social me- line rumors to politics to a particular service covers, the dia conversations (see Figure 3). topic, as in Figure 1, and differences in more it needs to While investigations of trust and organization and funding. As a mea- usefulness often rely on structured data sure of their popularity, as of June build a reputation from questionnaire-based surveys, 20, 2017, Snopes had 561,650 likes social media conversations repre- on Facebook, FactCheck.org 806,814, for usefulness and sent a highly relevant data source and StopFake 52,537. trustworthiness. for our purpose, as they arguably We study Snopes because of its reflect the raw, authentic percep- aim to debunk online rumors, fitting tions of social media users. Xu et the first category in Figure 1. This al.16 claim it is beneficial to listen aim is shared by other such services, to, analyze, and understand citizens’ including HoaxBusters and the Swed- opinions through social media to im- ish service Viralgranskaren. Snopes prove societal decision-making is managed by a small volunteer or- processes and solutions. They wrote, ganization that has emerged from a for example, “Social media analytics single-person initiative and funded has been applied to explain, detect, through advertising revenue. and predict disease outbreaks, We study FactCheck.org because election results, macroeconomic it monitors the factual accuracy of processes (such as crime detec- what is said by major political fig- tion), (… ) and financial markets ures. Other such services include (such as stock price).”16 Social me- PolitiFact (U.S.) and Full Fact (U.K.) dia conversations take place in the in the second category in Figure 1. everyday context of users likely to be FactCheck.org is a project of the An- engaged in fact-checking services. nenberg Public Policy Center of the This approach may provide a more Annenberg School for Communica- unbiased view of people’s percep- tion at the University of Pennsylva- tions than, say, a questionnaire- nia, Philadelphia, PA. FactCheck.org based approach. The benefit of is supported by university funding gathering data from users in their and individual donors and has been specific social media context does a source of inspiration for other fact- not imply that our data is repre- checking projects. sentative. Our data lacks impor- We study StopFake because it ad- tant information about user de- dresses one highly specific topic— mographics, limiting our ability to the ongoing Ukraine conflict. It claim generality for the entire user thus resembles other highly focused population. Despite this potential fact-checking initiatives (such as drawback, however, our data does #Refugeecheck, which fact checks offer new insight into how social reports on the refugee crises in Eu- media users view the usefulness rope). StopFake is an initiative by and trustworthiness of various cat- the Kyiv Mohyla Journalism School egories of fact-checking services. in Kiev, Ukraine, and is thus a Eu- For data collection, we used ropean-based service. Snopes and Meltwater Buzz, an established ser- FactCheck.org are U.S. based, as vice for social media monitoring. are more than a third of the fact- crawling data from social media checking services identified by conversations in blogs, discussion Duke Reporters’ Lab.12 forums, online newspaper discus- All three provide fact checking sion threads, Twitter, and Facebook. through their own websites, as well Meltwater Buzz crawls all blogs (such

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 67 contributed articles

Figure 4. Positive and negative posts related to trustworthiness and usefulness per to reflect how people start a sentence fact-checking service (in %); “other” refers to posts not relevant for the research when formulating their opinions. categories (N = 595 posts). StopFake is a relatively less-known service. We thus selected a broad- Snopes (n = 385) FactCheck.org (n = 80) StopFake (n = 130) er search string—“StopFake”—to be able to collect enough relevant Positive (total) opinions. The searches returned a data corpus of 1,741 posts over six Negative (total) months—October 2014 to March Usefulness (positive) 2015—as in Figure 3. By “posts,” we mean written contributions by indi- Ability (positive) vidual users. To create a sufficient dataset for analysis, we removed all Benevolence (positive) duplicates, including a small number Integrity (positive) of non-relevant posts lacking person- al opinions about fact checkers. This Usefulness (negative) filtering process resulted in a dataset of 595 posts. Ability (negative) We then performed content analy-

Benevolence (negative) sis, coding all posts to identify and investigate patterns within the data1 Integrity (negative) and reveal the perceptions users ex- press in social media about the three Other fact-checking services we investigat- 0% 20% 40% 60% 80% 100% ed. We analyzed their perceptions of the usefulness of fact-checking ser- vices through a usefulness construct Table 2. Snopes and themes we analyzed (n = 385). similar to the one used by Tsakonas et al.14 “Usefulness” concerns the ex- tent the service is perceived as benefi- Theme Sentiment Example cial when doing a specific fact-check- Positive (21%) Snopes is a wonderful Website for verifying things seen online; it is at ing task, often illustrated by positive least a starting point for research. Usefulness recommendations and characteriza- Negative (10%) Snopes is a joke. Look at its Boston bombing debunking failing to debunk the worst hoax ever ... tions (such as the service is “good” Positive (6%) […] Snopes is a respectable source for debunking wives’ tales, urban or “great”). Following Mayer et al.’s legends, even medical myths ... theoretical framework,7 we catego- Ability Negative (24%) Heh ... Snopes is a man and a woman with no investigative rized trustworthiness according to background or credentials who form their opinions solely on Internet the perceived ability, benevolence, research; they don’t interview anyone. […] and integrity of the services. “Ability” Positive (0%) No posts concerns the extent a service is per- Negative (21%) You show your Ignorance by using Snopes … Snopes is a NWO ceived as having available the needed Benevolence Disinformation System designed to fool the Masses ... SORRY. I Believe NOTHING from Snopes. Snopes is a Disinformation vehicle skills and expertise, as well as being of the Elitist NWO Globalists. Believe NOTHING from them ... […] reputable and well regarded. “Benev- Positive (2%) Snopes is a standard, rather dull fact-checking site, nailing right and olence” refers to the extent a service left equally. […] Integrity is perceived as intending to do good, Negative (44%) Snopes is a leftist outlet supported with money from George Soros. beyond what would be expected from Whatever Snopes says I take with a grain of salt ... an egocentric motive. “Integrity” tar- gets the extent a service is generally viewed as adhering to an acceptable as https://wordpress.com/), discus- of more than 500 members. This set of principles, in particular being sion forums (such as https://offtopic. limitation in Facebook data partly independent, unbiased, and fair. com/), and online newspapers (such explains why the overall number of Since we found posts typically re- as https://www.washingtonpost. posts we collected—1,741—was not flect rather polarized perceptions of com/) requested by Meltwater cus- more than it was. the studied services, we also grouped tomers, thus representing a large, To collect opinions about social the codes manually according to sen- though convenient, sample. It col- media user perceptions of Snopes timent, positive or negative. Some lects various amounts of data from and FactCheck.org, we applied the posts described the services in a plain each platform; for example, it crawls search term “[service name] is,” as and objective manner. We thus coded all posts on Twitter but only the Face- in “Snopes is,” “FactCheck.org is,” them using a positive sentiment (see book pages with 3,500 likes or groups and “FactCheck is.” We intended it Table 1) because they refer to the

68 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 contributed articles service as a source for fact checking, 2 reflect how negative sentiment in crediting a service. Posts expressing and users are likely to reference fact- the posts we analyzed on Snopes was positive sentiment mainly argue for checking sites because they see them rooted in issues pertaining to trust- the usefulness of the service, claim- as useful. worthiness. Integrity issues typically ing that Snopes is, say, a useful re- For reliability, both researchers in involved a perceived “left-leaning” source for checking up on the veracity the study did the coding. One coded political bias in the people behind of Internet rumors. all the posts, and the second then the service. Pertaining to benevo- FactCheck.org. The patterns in the went through all the assigned codes, lence, users in the study said Snopes posts we analyzed for FactCheck.org a process repeated twice. Finally, is part of a larger left-leaning or “lib- resemble those for Snopes. As in Ta- both researchers went through all eral” conspiracy often claimed to be ble 3, the most frequently mentioned comments for which an alternative funded by George Soros, whereas trustworthiness concerns related to code had been suggested to decide comments on ability typically tar- service integrity; as for Snopes, us- on the final coding, a process that geted lack of expertise in the people ers said the service is politically bi- recommended an alternative coding running the service. Some negative ased toward the left. Posts concern- for 153 posts (or 26%). comments on trustworthiness may ing benevolence and ability were also A post could include more than be seen as a rhetorical means of dis- relatively frequent, reflecting user one of the analytical themes, so 30% of the posts were thus coded as ad- Table 3. FactCheck.org and themes we analyzed (n = 80). dressing two or more themes. Theme Sentiment Example Results Positive (25%) […] You obviously haven’t listened to what they say. Despite the potential benefits of fact- Usefulness Also, I hate liars. FactCheck is a great tool. checking services, Figure 4 reports Negative (3%) Anyway, “FactCheck” is a joke […] the majority of the posts on the two Positive (6%) The media sources I use must pass a high credibility bar. FactCheck. org is just one of the resources I use to validate what I read ... U.S.-based services expressed nega- Ability tive sentiment, with Snopes at 68% Negative (16%) […] FactCheck is NOT a confidence builder; see its rider and sources, Huffpo articles … REALLY? and FactCheck.org at 58%. Most posts Positive (0%) No posts on the Ukraine-based StopFake (78%) Negative (25%) FactCheck studies the factual correctness of what major players in reflected positive sentiment. Benevolence U.S. politics say in TV commercials, debates, talks, interviews, and The stated reasons for negative news presentations, then tries to present the best possible fictional sentiment typically concerned one or and propaganda-like version for its target […] more of the trustworthiness themes Positive (19%) When you don’t like the message, blame the messenger. FactCheck is nonpartisan. It's just that conservatives either lie rather than usefulness. For example, Integrity or are mistaken more ... for Snopes and FactCheck.org, the Negative (39%) FactCheck is left-leaning opinion. It doesn’t check facts ... negative posts often expressed con- cern over lack in integrity due to per- ceived bias toward the political left. Negative sentiment pertaining to the Table 4. StopFake and themes we analyzed (n = 130); note * also coded as integrity/positive. ability and benevolence of the servic- es were also common. The few critical Theme Sentiment Example comments on usefulness were typi- Positive (72%) Don’t forget a strategic weapon of the Kremlin is the “web of lies” cally aimed at discrediting a service, spread by its propaganda machine; see antidote http://www.stopfake. by, say, characterizing it as “satirical” Usefulness org/en/news or as “a joke.” Negative (2%) […] StopFake! HaHaHa. You won, I give up. Next time I will quote “Saturday Night Live”; there is more truth:)) ... Positive posts were more often re- Positive (2%) […] by the way, the website StopFake.org is a very objective and lated to usefulness. For example, the accurate source exposing Russian propaganda and disinformation stated reasons for positive sentiment techniques. […]* toward StopFake typically concerned Ability Negative (2%) […] Ha Ha … a flow of lies is constantly sent out from the Kremlin. the service’s usefulness in countering Really. If so, StopFake needs updates every hour, but the best way it pro-Russian propaganda and trolling can do that is to find low-grade blog content and make it appear as if it was produced by Russian media […] and in the information war associat- Positive (4%) […] StopFake is devoted to exposing Russian propaganda against the ed with the ongoing Ukraine conflict. Ukraine. […] Benevolence In line with a general notion of Negative (14%) So now you acknowledge StopFake is part of Kiev’s propaganda. I an increasing need to interpret and guess that answers my question […] act on information and misinforma- Positive (2%) […] by the way, the website StopFake.org is a very objective and tion in social media,6,11 some users accurate source exposing Russian propaganda and disinformation included in the study discussed fact- Integrity techniques. […] checking sites as important elements Negative (11%) […] Why should I give any credence to StopFake.org? Does it ever criticize the Kiev regime, in favor of the Donbass position? […] of an information war. Snopes. The examples in Table

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 69 contributed articles

concern regarding the service as a tion when comparing the various argument. For some users in our contributor to propaganda or doubts services, topic-specific StopFake is sample, lack of trust extends beyond about its fact-checking practices. perceived as more useful than Snopes a particular service to encompass the StopFake. As in Table 4, the results and FactCheck.org. One reason might entire social and political system. Us- for StopFake show more posts ex- be that a service targeting a specific ers with negative perceptions thus pressing positive sentiment than we topic faces less criticism because it seem trapped in a perpetual state of found for Snopes and FactCheck.org. attracts a particular audience that informational disbelief. In particular, the posts included in seeks facts supporting its own view. While one’s initial response to the study pointed out that StopFake For example, StopFake users target statements reflecting a state of infor- helps debunk rumors seen as Russian anti-Russian, pro-Ukrainian readers. mational disbelief may be to dismiss propaganda in the Ukraine conflict. Another, more general, reason might them as the uninformed paranoia of Nevertheless, the general pat- be that positive perceptions are mo- a minority of the public, the state- tern in the reasons users gave us for tivated by user needs pertaining to a ments should instead be viewed as a positive and negative sentiment for perceived high load of misinforma- source of user insight. The reason the Snopes and FactCheck.org also held tion, as in the case of the Ukraine services are often unsuccessful in re- for StopFake. The positive posts were conflict, where media reports and ducing ill-founded perceptions9 and typically motivated by usefulness, social media are seen as overflowing people tend to disregard fact check- whereas the negative posts reflected with propaganda. Others highlighted ing that goes against their preexisting the sentiment that StopFake is politi- the general ease information may be beliefs2,13 may be a lack of basic trust cally biased (“integrity”), a “fraud,” filtered or separated from misinfor- rather than a lack of fact-based argu- a “hoax,” or part of the machinery mation through sites like Snopes and ments provided by the services. of Ukraine propaganda (“benevo- FactCheck.org, as expressed like this: We found such distrust is often lence”). “As you pointed out, it doesn’t take highly emotional. In line with Sil- that much effort to see if something verman,11 fact-checking sites must Discussion on the Internet is legit, and Snopes is be able to recognize how debunking We found users with positive percep- a great place to start. So why not take and fact checking evoke emotion in tions typically extoled the usefulness that few seconds of extra effort to do their users. Hence, they may benefit of fact-checking services, whereas that, rather than creating and sharing from rethinking the way they design users with negative opinions cited misleading items.” and present themselves to strengthen concerns over trustworthiness. This This finding suggests there is in- trust among users in a general state pattern emerged across all three ser- creasing demand for fact-checking of informational disbelief. More- vices. In the following sections, we services,6 while at the same time a over, users of online fact-checking discuss how these findings provide substantial proportion of social me- sites should compensate for the lack new insight into trustworthiness as dia users who would benefit from of physical evidence online by be- a key challenge when countering on- such services do not use them suf- ing, say, demonstrably independent, line rumors and misinformation2,9 ficiently. The services should thus impartial, and able to clearly distin- and why ill-founded beliefs may have be even more active on social media guish fact from opinion. Rogerson10 such online reach, even though the sites like Facebook and Twitter, as wrote that fact-checking sites exhibit beliefs are corrected by prominent well as in online discussion forums, varying levels of rigor and effective- fact checkers, including Snopes, where greater access to fact checking ness. The fact-checking process and FactCheck.org, and StopFake. is needed. even what are considered “facts” may Usefulness. Users in our sample Trustworthiness. Negative percep- in some cases involve subjective in- with a positive view of the services tions and opinions about fact-check- terpretation, especially when actors mainly pointed to their usefulness. ing services seem to be motivated by with partial ties aim to provide the While everyone should exercise cau- basic distrust rather than rational service. For example, in the 2016 U.S. presidential campaign, the organiza- Table 5. Challenges and our related recommendations for fact-checking services. tion “Donald J. Trump for President” invited Trump’s supporters to join a Challenges Recommendations fact-check initiative, similar to the Unrealized potential in public Increase presence in social media and category “topics or controversies,” Usefulness use of fact-checking services discussion forums urging “fact checking” the presiden- Ability Critique of expertise and Provide nuanced but simple overview tial debates on social media. How- reputation of the fact-checking process where ever, the initiative was criticized as relevant sources are included mainly promoting Trump’s views and Benevolence Suspicion of conspiracy and Establish open policy on fact checking 5 Trustworthiness propaganda and open spaces for collaboration on candidacy. fact checking Users of fact-checking sites ask: Integrity Perception of bias and Ensure transparency on organization Who actually does the fact checking partiality and funding. and demonstrable and how do they do it? What organi- impartiality in fact-checking process zations are behind the process? And how does the nature of the organiza-

70 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 contributed articles

tion influence the results of the fact 610928, http://www.revealproject. checking? Fact-checking sites must eu/) but does not necessarily rep- thus explicate the nuanced, detailed resent the views of the European process leading to the presented re- Commission. We also thank Marika sult while keeping it simple enough Lüders of the University of Oslo and to be understandable and useful.11 Users with negative the anonymous reviewers for their in- Need for transparency. While fact- perceptions thus sightful comments. checker trustworthiness is critical, fact checkers represent but one set of seem trapped in References 1. Ezzy, D. Qualitative Analysis. Routledge, London, voices in the information landscape U.K., 2013. a perpetual state 2. Friesen, J.P., Campbell, T.H., and Kay, A.C. The and cannot be expected to be benevo- psychological advantage of unfalsifiability: The appeal lent and unbiased just because they of informational of untestable religious and political ideologies. Journal check facts. Rather, they must strive of Personality and Social Psychology 108, 3 (Nov. disbelief. 2014), 515–529. for transparency in their working pro- 3. Gingras, R. Labeling fact-check articles in Google News. Journalism & News (Oct. 13, 2016); https://blog. cess, as well as in their origins, orga- google/topics/journalism-news/labeling-fact-check- nization, and funding sources. articles-google-news/ 4. Hermida, A. Tweets and truth: Journalism as a To increase transparency in its discipline of collaborative verification.Journalism processes, a service might try to take Practice 6, 5-6 (Mar. 2012), 659–668. 5. Jamieson, A. ‘Big League Truth Team’ pushes Trump’s a more horizontal, collaborative ap- talking points on social media. The Guardian (Oct. 10, proach than is typically seen in the 2016); https://www.theguardian.com/us-news/2016/ oct/10/donald-trump-big-league-truth-team-social- current generation of services. Fol- media-debate lowing Hermida’s recommenda- 6. Kriplean, T., Bonnar, C., Borning, A., Kinney, B., and Gill, B. Integrating on-demand fact-checking with public 4 tion to social media journalists, fact dialogue. In Proceedings of the 17th ACM Conference checkers could be set up as a plat- on Computer-Supported Cooperative Work & Social Computing (Baltimore, MD, Feb. 15–19). ACM Press, form for collaborative verification New York, 2014, 1188–1199. and genuine fact checking, relying 7. Mayer, R.C., Davis, J.H., and Schoorman, F.D. An integrative model of organizational trust. Academy of less on centralized expertise. Form- Management Review 20, 3 (1995), 709–734. 8. Morejon, R. How social media is replacing traditional ing an interactive relationship with journalism as a news source. Social Media Today users might also help build trust.6,7 Report (June 28, 2012); http://www.socialmediatoday. com/content/how-social-media-replacing-traditional- journalism-news-source-infographic Conclusion 9. Nyhan, B. and Reifler, J. When corrections fail: The persistence of political misperceptions. Political We identified a lack of perceived Behavior 32, 2 (June 2010), 303–330. trustworthiness and a state of infor- 10. Rogerson, K.S. Fact checking the fact checkers: Verification Web sites, partisanship and sourcing. mational disbelief as potential obsta- In Proceedings of the American Political Science cles to fact-checking services reach- Association (Chicago, IL, Aug. 29–Sept. 1). American Political Science Association, Washington, D.C., 2013. ing social media users most critical 11. Silverman, C. Lies, Damn Lies, and Viral Content. to such services. Table 5 summarizes How News Websites Spread (and Debunk) Online Rumors, Unverified Claims, and Misinformation. Tow our overall findings and discussions, Center for Digital Journalism, Columbia Journalism outlining related key challenges and School, New York, 2015; http://towcenter.org/wp- content/uploads/2015/02/LiesDamnLies_Silverman_ our recommendations for how to ad- TowCenter.pdf 12. Stencel, M. International fact checking gains dress them. ground, Duke census finds. Duke Reporters’ Lab, Given the exploratory nature of Duke University, Durham, NC, Feb. 28, 2017; https:// reporterslab.org/international-fact-checking-gains- this study, we cannot conclude our ground/ findings are valid for all services. In 13. Stroud, N.J. Media use and political predispositions: Revisiting the concept of selective exposure. Political addition, more research is needed Behavior 30, 3 (Sept. 2008), 341–366. to be able to make definite claims 14. Tsakonas, G. and Papatheodorou, C. Exploring usefulness and usability in the evaluation of open- on systematic differences among the access digital libraries. Information Processing & various fact checkers based on their Management 44, 3 (May 2008), 1234–1250. 15. Van Mol, C. Improving web survey efficiency: The “areas of concern.” Nevertheless, the impact of an extra reminder and reminder content on consistent pattern in opinions we Web survey response. International Journal of Social Research Methodology 20, 4 (May 2017), 317–327. found across three prominent ser- 16. Xu, C., Yu, Y., and Hoi, C.K. Hidden in-game intelligence vices suggests challenges and recom- in NBA players’ tweets. Commun. ACM 58, 11 (Nov. 2015), 80–89. mendations that can provide useful

guidance for future development in Petter Bae Brandtzaeg ([email protected]) is a senior this important area. research scientist at SINTEF in Oslo, Norway. Asbjørn Følstad ([email protected]) is a senior research Acknowledgments scientist at SINTEF in Oslo, Norway. This work was supported by the Eu- ropean Commission co-funded FP 7 project REVEAL (Project No. FP7- © 2017 ACM 0001-0782/17/09 $15.00

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 71 review articles

DOI:10.1145/3096742 Exploring the many distinctive elements that make securing HPC systems much different than securing traditional systems.

BY SEAN PEISERT Security in High- Performance Computing Environments

HOW IS COMPUTER security different in a high-performance computing (HPC) context from a typical IT context? On key insights

the surface, a tongue-in-cheek answer might be, “just the ˽˽ High-performance computing systems have some similarities and some same, only faster.” After all, HPC facilities are connected differences with traditional IT computing to networks the same way any other computer is, often systems, which present both challenges and opportunities. run the same, typically Linux-based operating systems ˽˽ One challenge is that HPC systems are as are many other common computers, and have long “high-performance” by definition, and so many traditional security techniques are been subject to many of the same styles of attacks, be they not effective because they cannot keep up compromised credentials, system misconfiguration, or with the system or reduce performance. ˽˽ Many opportunities also exist: HPC software flaws. Such attacks have ranged from the “wily systems tend to be used for very distinctive purposes, have much more hacker” who broke into U.S. Department of Energy (DOE) regular and predictable activity, and contain highly custom hardware/ and U.S. Department of Defense (DOD) computing systems software stacks. Each of these elements in the mid-1980s,42 to the “Stakkato” attacks against can provide a toehold for leveraging some aspect of the HPC platform to NCAR, DOE, and NSF-funded supercomputing centers in improve security.

72 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 the mid-2000s,24,39 to the thousands own distinctive attributes that make from a desktop computer. Thus, for of probes, scans, brute-force login at- securing such systems somewhat dis- HPC systems, we must ask what is the tempts, and buffer overflow vulnerabil- tinct from securing other types of com- desired functioning of the system so ities that continue to plague high-per- puting systems. that we can establish what the security formance computing facilities today. The fact that computer security policies are and better understand the On the other hand, some HPC sys- is context- and mission-dependent mechanisms with which those policies tems run highly exotic hardware and should not be surprising to security can be enforced. software stacks. In addition, HPC professionals—“security policy is a On the other hand, historically, se- systems have very different purposes statement of what is, and what is not, curity for HPC systems has not neces- and modes of use than most general- allowed,”7—and each organization, sarily been treated as distinct from purpose computing systems, of either will therefore have a somewhat dis- general-purpose computing, except, the desktop or server variety. This fact tinctive security policy. For example, typically, making sure that security means that aside from all of the nor- a mechanism designed to enforce a does not get in the way of performance mal reasons that any network-connect- particular policy considered essen- or usability. While laudable, this article ed computer might be attacked, HPC tial for security by one site might be argues that this assessment of HPC’s computers have their own distinct considered a denial of service to le- distinctiveness is incomplete. systems, resources, and assets that an gitimate users of another site, or how This article focuses on four key

PHOTO BY GORODENKOFF VISUALS VISUALS GORODENKOFF BY PHOTO attacker might target, as well as their a smartphone is protected is distinct themes surrounding this issue:

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 73 review articles

The first theme is that HPC systems regular and predictable mode of opera- even in open science, data leakage is are optimized for high performance tion, which changes the way security certainly an issue and a threat, this ar- by definition. Further, they tend to be can be enforced. ticle focuses more on integrity related used for very distinctive purposes, no- As a final aside, many, but by no threats,31,32 including alteration of code tably mathematical computations. means all HPC systems are often ex- or data, or misuse of computing cycles, The second theme is that HPC tremely open systems from a security and availability related threats, in- systems tend to have very distinctive standpoint, and may be used by scien- cluding disruption or denial of service modes of operation. For example, com- tists worldwide whose identities have against HPC systems or networks that pute nodes in an HPC system may be never been validated. Increasingly, we connect them. accessed exclusively through some are also starting to see HPC systems in Computations that are incorrect kind of scheduling system on a login which computation and visualization for non-malicious reasons, including node in which it is typical for a single are more tightly coupled and, a human flaws in application code, such as gen- program or common set of programs manipulates the inputs to the computa- eral logic errors, round-off errors, non- to run in sequence. And, even on that tion itself in near-real time. determinism in parallel algorithms, login node, from which the computa- This distinctiveness presents both unit conversion errors,20 as well as in- tion is submitted to the scheduler, it opportunities and challenges. This correct assumptions by users about the may be the case that an extremely nar- article discusses the basis for these hardware they are running on, are vital row range of programs exist compared themes and the conclusions for secu- issues, but beyond the scope of this ar- to those commonly found on general- rity for these systems. ticle, due to length and the fact those is- use computing systems. Scope and threat model. I have spent sues are well-covered elsewhere.4,5,6,8,36 The third theme is that while some most of my career in or near “open sci- HPC systems use standard operating ence:” National Science Foundation High-Performance systems, some use highly exotic stacks. and Department of Energy Office of Sci- Computing Environments And even the ones that use standard op- ence-funded high-performance com- Distinctive purposes. The first theme erating systems, very often have custom puting centers, and so the lens through of the distinctiveness of security for aspects to their software stacks, particu- which this article is discussed tends to HPC systems is that these systems larly at the I/O and network driver levels, focus on such environments. The chal- are high-performance by definition, and also at the application layer. And, lenges in “closed” environments, such and are made that way for a reason. of course, while the systems may use as those used by the National Security They are typically used for automated commodity CPUs, the CPUs and other Agency (NSA), Department of Defense computation of some kind, typically hardware system components are often (DoD), or National Nuclear Security performing some set of mathemati- integrated in HPC systems in a way (for Administration (NNSA) National Labs, cal operations. Historically, this has example, by Cray or IBM) that may well or commercial industry, shares some, often been for the purpose of model- exist nowhere else in the world. but not all of the attributes discussed ing and simulation, and increasingly The fourth theme, which follows in this article. As a result, although I today, for data analysis as well. Given from the first three themes, is that HPC discuss confidentiality, a typical com- the primary purpose of HPC systems systems tend to have a much more ponent of the “C-I-A” triad, because is therefore high-performance, and given that such systems themselves are Figure 1. Three typical high-level workflow diagrams of scientific computing. The diagram both few in number, and therefore also at top shows a typical workflow for data analysis in HPC; the middle diagram shows a typical workflow for modeling and simulation; and the bottom diagram shows a coupled, that computing time on such systems interactive compute-visualization workflow. is quite valuable, there is a reluctance by the major stakeholders—the fund- ing agencies that support HPC systems Data Analysis as well as the users who run computa- Connect Transfer Edit config files, Transfer tions on them—to agree to any solu- to login data in compile, Wait data out tion that might impose overhead on node via DTN submit batch job via DTNs the system. Those stakeholders might well regard such a solution as a waste Simulation of cycles at worst, and an unacceptable delay of scientific results at best. This is Connect Edit config files, (Maybe) to login compile, Wait Transfer data an important detail, because it frames node submit batch job out via DTNs the types of security solutions that at least historically might have been con- sidered acceptable to use. Simulation with Coupled Computation/Visualization Distinctive modes of operation. The second theme of the distinctiveness of Connect Edit config files, Job Visualize to login compile, Wait security for HPC systems is that these Starts Output/Adjust node submit batch job Inputs systems tend to have distinctive modes of operation. The typical mode of oper- ation for using a scientific high-perfor-

74 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 review articles mance machine involves connecting email clients, Microsoft Office, iTunes through a login node of some kind. In Music, Adobe Acrobat, personal task parallel, at least for data analysis tasks, managers, Skype, and instant messag- data that a user wishes to analyze may ing. And, importantly, this is often a be copied to the machine via a data much smaller set of programs with a transfer node or DTN, and software For HPC systems, much more regular sequence of events that a user wishes to install may be cop- we must ask what in which the use of one program direct- ied to the login node as well. ly follows from another, as well, rather The user is then likely to edit some is the desired than the constant attention-span-driv- configuration files, compile their soft- functioning of the en context switching of the use of gen- ware, and write a “batch script” that eral-purpose computers. For example, defines what programs should be run, system so that we on the NERSC HPC systems, in 2014, along with parameters of how those for over 5950 unique users that were ac- programs should be run. This is be- can establish what tive in 2014, just 13 applications com- cause most significant jobs are not the security policies prised 50% of the cycles consumed, run on the login nodes themselves, be- 25 applications comprised 66% of the cause the login nodes have very limited are and better cycles, and 50 applications comprised resources. Rather, many institutions understand the 80% of the cycles.2 The consequences use compute nodes, which cannot be of these distinctive workflows are im- logged into directly, but rather have a mechanisms with portant, as we will discuss. batch scheduler that determines when which those policies Custom operating system stacks. The jobs should run based on analyzing the third theme of the distinctiveness of batch scripts that have been submitted can be enforced. security for HPC systems is that these according to a given optimization pol- systems often have highly exotic stacks. icy for the site in question. Thus, after Current HPC environments represent writing their batch script, the user will a spectrum of hardware and software probably submit their job to a batch components, ranging from exotic and queue using a submission program, highly custom to fairly commodity. and then log out and wait for the job to As an example, “Cori Phase 1,”a run on the compute nodes. the newest supercomputer at NERSC, Following that, the user may run is a Cray XC based on Intel Haswell some kind of additional analysis or vi- processors, leveraging Cray Aries in- sualization on the data that was output. terconnects, a Lustre file system, and This may happen on the HPC system, nonvolatile memory express (NVMe) in or the output of the HPC computation the burst buffer that is user accessible. may be downloaded to a non-HPC sys- Cori runs a full SUSE Linux distribu- tem for analysis in a separate environ- tion on the login nodes and Compute ment such as using Jupyter/IPython.33 Node Linux (CNL),44 a light-weight ver- This additional analysis or visualiza- sion of the Linux kernel and run-time tion might happen serially, following environment based on the SuSE Linux the completed execution on the HPC Enterprise Server distribution. system, or, alternatively, may happen in Mira,b at the Argonne Leadership an interactive, tightly-coupled fashion Computing Facility, is a hybrid system. such that the user visualizing the out- The login nodes are IBM Power 7-based put of the computation can manipulate systems. The compute nodes are an the computation as it is taking place.37,45 IBM Blue Gene/Q system based on It should be noted that the “coupled” PowerPC A2 processors, IBM’s 5D to- computation/analysis model could in- rus interconnect, and a similarly elabo- volve network connections external to rate memory structure. The I/O nodes the HPC facility, or, and particularly as also use PowerPC A2 processors and envisioned by the “superfacility” model are connected using Mellanox Infini- for data-intensive science,50 may in- band QDR switches. The login nodes volve highly specialized and optimized run Red Hat Linux. The compute nodes network connections within a single run Compute Node Kernel (CNK),1 a HPC center. Examples of all three work- Linux-like OS for compute nodes, but flows are shown in Figure 1. These use cases are often in stark a http://www.nersc.gov/users/computational- contrast to the plethora of software that systems/cori/configuration/ is typically run on a general-purpose b https://www.alcf.anl.gov/user-guides/ma- desktop system, such as Web browsers, chine-overview

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 75 review articles

support neither multi-tasking or virtu- mon stacks. On the other hand, some al memory27 (CNK has no relationship custom stacks may be smaller, more with CNL). The I/O system runs the easily verified, and less complex. GPFS file system client. Openness. Our final theme is the Aurora,c the system scheduled to relative “openness” of at least some be installed at ALCF in 2019, will be There is a HPC systems. That is, scientists from constructed by a partnership between reluctance by major all over the world whose identities have Cray and Intel and will run third-gen- never been validated may use them. eration Intel Xeon Phi processors with stakeholders—the For example, many such systems, such second-generation Intel Omni-Path funding agencies as those used by NSF or DOE ASCR, photonic interconnects and a variety of have no traditional firewalls between ash memory and NVRAM components that support HPC the data transfer nodes and the Inter- to accelerate I/O, including 3DXpoint net, let alone the ability to “air gap” the and 3D NAND in multiple locations, systems as well HPC system (that is, ensure no physi- all user accessible. Aurora will run Cray as the users who cal connection to the regular Internet Linux10—a full Linux stack on its login is possible) as some communities are nodes and I/O nodes (though the I/O run computations able to do. nodes do not allow general user ac- on them—to agree cess), and mOS46 on its compute nodes. Security Mechanisms and mOS supports both a lightweight ker- to any solution Solutions that Overcome nel (LWK) and full Linux operating sys- that might impose the Constraints of tem to enable users to choose between HPC Environments avoiding unexpected operating system overhead on the Traditional IT security solutions, in- overhead, and the flexibility of a full system. cluding network and host-based intru- Linux stack. sion detection, access controls, and Summit,d the system scheduled to software verification work about as be installed at OLCF in 2018, will be well in HPC as traditional IT (often not based on both IBM POWER9 CPUs and very), or worse, due to constraints in NVIDIA Volta GPUs, with NVIDIA NV- HPC environments. Link on-node networks and dual-rail For example, traditional host-based Mellanox interconnects. security mechanisms, such as those le- In short, there is certainly some veraging system call data via audited, variation on exactly what operating as well as certain types of network se- systems are run—in all cases, login curity mechanisms, like network fire- nodes run “full” operating systems. walls and firewalls doing deep packet And in some cases, full operating sys- inspection, may be antithetical to the tems are also used for compute nodes, needs of the system being protected. while in other cases, lighter-weight For example, it has been shown that but Linux API-compatible versions of even 0.0046% packet loss (1 out of operating systems are used, while in 22,000 packets) can cause a loss in some cases entirely custom operating throughput of network data transfers systems are used that are single-user of approximately 90%.13 Given that only, and contain no virtual memory stateful and/or deep-packet inspect- capabilities or multitasking. ing firewalls can cause delays that At least for the full operating sys- might lead to such loss, a firewall, as tems, it is reasonable to assume the traditionally defined, is inappropriate operating systems contain similar or for use in environments with high net- identical capabilities and bugs as stan- work data throughput requirements. dard desktop and server versions of Thus, alternative approaches must Linux, are just as vulnerable to attack be applied. Some solutions exist that can via various pieces of software (libraries, help compensate for these constraints. runtime, and application) that are run- The Science DMZ13 security frame- ning on the system. work defines a set of security poli- Custom hardware and software cies, procedures, and mechanisms components may have both positives to address the distinct needs of sci- and negatives. On one hand, they may entific environments with high net- receive less assurance than more com- work throughput needs (HPC security theme #1). While the needs of high c http://aurora.alcf.anl.gov throughput networks do not elimi- d https://www.olcf.ornl.gov/summit/ nate options for security monitoring

76 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 review articles or mitigation, those requirements do of HPC environments, such as those behavior in HPC are likely more regular change what is possible. requiring environments with greater than in typical computing systems, one In particular, in the Science DMZ data confidentiality guarantees, such might expect that one can reduce the framework, the scientific computing as medical, defense, and intelligence error rates when using anomaly-based systems are moved to their own en- environments. Steps have been made intrusion detection, and possibly even clave, away from other types of comput- toward the medical context as well. making specifications possible to con- ing systems that might have their own The Medical Science DMZ29 applies struct for specification-based intrusion distinctive security needs and perhaps the Science DMZ framework to com- detection. Thus, such security mecha- even distinct regulations—for example, puting environments requiring com- nisms might even fare better in HPC financial, human resources, and other pliance with HIPAA Security Rule. Key environments than in traditional IT business computing systems. In addi- architectural aspects include the notion environments (theme #4), though dem- tion, it directs transfers through single that all traffic from outside compute/ onstrating the degree to which the in- network ingress and egress point that storage infrastructure passes through creased regularity of HPC environments can be monitored and restricted. heavily monitored head nodes, that may be helpful for security analysis is an However, the Science DMZ does storage and compute nodes themselves open research question. not use “deep packet inspecting” or are not connected directly to the Inter- Analyzing system behavior with ma- stateful firewalls. It does leverage pack- net, and that traffic containing sensitive chine learning. A second, and related key et filtering firewalls that is, firewalls or controlled access data is encrypted. point about HPC systems being used that examine only attributes of packet However, further work in medical en- primarily for mathematical computa- headers and not packet payloads. And, vironments, as well as other environ- tion is that if we can do better analysis of separately, it also performs deep packet ments is required. system behavior, the insight that most inspection and stateful intrusion detec- HPC machines are used for computa- tion, such as might be done with the Leveraging the Distinctiveness tion focuses our attention on what se- Bro Network Security Monitor.28 How- of HPC as an Opportunity curity risks to care about (for example, ever, the two processes are not directly The Science DMZ helps compensate users running “illicit computations,” as coupled, as, unlike a firewall, the IDS is for HPC’s limitations—we need more defined by the owners of the HPC sys- not used in-line with the network traffic, such solutions. As indicated by the four tem) and might give us better ability to and as a result, delays are not imposed themes enumerated in this article, we understand what type of computation is on transmission of the traffic due to also need solutions that can leverage taking place. inspection, and thus congestion that HPC distinctiveness as a strength. An example of a successful approach might lead to packet loss and retrans- Sommer and Paxson41 point out the to addressing this question involved re- mission is also not created. fact that anomaly-based detection typi- search that I was involved with at Berke- Thus, by moving the traffic to its own cally is not used in traditional IT envi- ley Lab between 2009–2013.14,30,47,48 In enclave that can be centrally monitored ronments is due to the high-level fact this project, we asked the questions: at a single point, the framework seeks that “finding attacks is fundamentally What are people running on HPC sys- to maintain a similar level of security different from … other applications” tems? Are they running what they usu- to traditional organizations that typi- (such as credit card fraud detection, ally run? Are they running what they cally have a single ingress/egress point, for example). Among other key issues, requested cycle allocations to run, or rather than simply removing network they note that network traffic is often mining Bitcoins? monitoring without replacing it with much more diverse than one might Are they running something illegal an alternative. However, the Science expect. They point out that semantic (for example, classified)? In that work, DMZ does so in a very specific way that understanding is a vital component of we developed technique for answering accommodates the type and volume of overcoming this limitation to enable these questions by fingerprinting com- network traffic used in scientific and machine-learning approaches to secu- munication on HPC systems. high-performance computing environ- rity to be more effective. Specifically, we collected Message ments. More specifically, it achieves On the other hand, as mentioned Passing Interface (MPI) function calls throughput by reducing complexity, earlier, HPC systems tend to be used for via the Integrated Performance Moni- which is a theme that we will return to very distinctive purposes, notably math- toring (IPM)43 tool, which showed pat- in this article. ematical computations (theme #1). The terns of communication between ores The Science DMZ framework has specific application of HPC systems var- in an HPC system, as shown in Figure 2. been implemented widely in university ies by the organization that uses them Using 1681 logs for 29 scientific ap- and National Lab environments around (for example, DOE National Lab, DOD plications from NERSC HPC systems, the world as a result of funding from lab), but each individual system typi- we applied Bayesian-based machine NSF, DOE ASCR, and other, internation- cally has a very specific use. This is a key learning techniques for classification al funding organizations, to support point because the result may be that of scientific computations, as well as computing and networking infrastruc- both specification-based and anomaly- a graphtheoretic approach using “ap- ture for open science. It goes with- based intrusion detection may be more proximate” graph techniques out saying that both the Science DMZ useful in HPC environments than in tra- (subgraph isomorphism and edit dis- framework and the Bro IDS must also ditional IT environments. Specifically, tance). A hybrid machine learning and continue to be adapted to more types given the hypothesis that patterns of approach identified test

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 77 review articles

HPC codes with 95%–99% accuracy. tain distinctive security policies in HPC system to accomplish whatever illicit Our work analyzing distributed environments that might help improve use the attacker is attempting. memory parallel computation patterns the usefulness of application-level use Collecting better audit and prove- on HPC compute nodes is by no means monitoring. There are at least two rea- nance data. It is important to note the conclusive that anomaly detection is an sons for this. success of the work mentioned in the unqualified success on HPC systems First, given the organization re- previous section is dependent on avail- for intrusion detection. For one thing, sponsible for security of HPC systems ability of useful security monitoring the experiments were not conducted are likely to care more about misuse of data. It is our observation that the cur- in an adversarial environment, and so cycles if very large numbers of cycles rent trend in many scientific environ- the difficultly of an attacker intention- are used, this suggests focusing on the ments on collecting provenance data ally evading detection by attempting to users that use cycles for many hours for scientific reproducibility purposes, make one program look like another per day for days at a time. This is a very such as the Tigres workflow system,38 was not explored. In addition, in our different practical scenario than net- and the DOE Biology Knowledgebase “fingerprinting HPC computation” work security monitoring where a de- (KBase)21 may help to provide better project, we had what we deemed to be cision about security might require a data that can be used for security moni- a reasonable, though not exhaustive response in a fraction of a second in or- toring, as might DARPA’s “Transparent corpus of data representative of typi- der to prevent compromise. Given the Computing” program 11, which seeks cal computations on NERSC facilities longer time scale, therefore, a human to “make currently opaque comput- to examine. In addition, in examining security analyst can be involved rather ing systems transparent by providing the data, we focused on a specific set than requiring the application moni- high-fidelity visibility into component of activity contained within the NERSC toring, on the level that we have done interactions during system operation Acceptable Use. it, to be conclusive. Rather, that appli- across all layers of software abstrac- Policy as falling outside of “accept- cation monitoring might simply serve tion, while imposing minimal perfor- able use.” Other sites will have a differ- to focus an analyst’s attention, and to mance overhead.” ent baseline of “typical computation,” lead to a manual source code analysis, In line with this, as noted earlier, and are also likely have somewhat dif- or even an actual conversation with the HPC systems have a lot in common ferent policies that define what is or is user whose account was used to run with traditional systems, but also not “illicit use.” the code. contain a lot of highly custom OS and However, regardless, we do believe A second reason why this issue of network-level, and application-level the approach is an example of the type an attacker evading detection on HPC software. A key point here is that such of techniques that could possibly have might be harder is because, users are exotic hardware and low-level software success in HPC environments and pos- often given “cycle allocations” to run stacks may also provide opportunities sibly even greater success than in many code. As a result, the more a program for monitoring data going forward. An non-HPC environments. For example, running on an HPC system is modified example of the performance counters consider the possibility of a skilled at- to mask illicit use, the more likely it is used in many of today’s HPC machines tacker attempting to evade detection that additional cycles must be used to is an example of this. something that any security mecha- do additional tasks to make it look like Post-exascale systems, as well as nism relying on machine learning is the program is doing something differ- more architectures that are still in vulnerable to. Not only do there appear ent than it actually is. Thus, the faster their early phases of practical imple- to be more regular use patterns in HPC that a stolen allocation will be used up mentation, such as neuromorphic environments, but there also exist cer- and/or the longer it will take the HPC computing, quantum computing, and

Figure 2. “Adjacency matrices” for individual runs of a performance benchmark, an atmospheric dynamics simulator, and a linear equation solver SUPERLU. Number of bytes sent between ranks is linearly mapped from dark blue (lowest) to red (highest), with white indicating an absence of communication.47,48 Source Rank Source Rank Source Rank

Destination Rank Destination Rank Destination Rank

78 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 review articles photonic computing may all provide to HPC, rather than full-blown UNIX additional challenges and opportuni- command-line interfaces, may provide ties. For example, though neural net- a reduction of complexity that super- works were previously thought by many facility would otherwise introduce. to be inscrutable,16 new research sug- While science gateways still represent gests this may be actually possible at In the future, it is vulnerability vectors from arbitrary some point.12,49 If successful, this might clear that numerous code, even when it is submitted via give to rise to the ability to interpret net- Web front-ends, since security tends to works learned by neuromorphic chips. aspects of HPC will benefit from more constrained opera- change, both for tion, the general toward science gate- Looking to the Future ways may also enhance security. In the future, it is clear that numerous the good of security Finally, the prospect of new and novel aspects of HPC will change, both for security technologies, such as simulated the good of security and in ways that and in ways that homomorphic encryption,34,35 differen- complicate it. complicate it. tial privacy,15 and cryptographic mecha- One key component of the National nisms for securing chains of data3,18,40 Strategic Computing Initiative is that such as blockchains,26 may also may pro- software engineering is a key goal of vide new means for interacting with data the NSCI, and so perhaps automated sets in a constrained fashion. static/runtime analysis tools might be For example, there may be cases developed and used to check HPC code where the owners of the data want to for insecure behaviors. keep the raw data for themselves for On the other hand, science is also an extended period of time, such as a changing. For example, distributed, scientific embargo. Or there may be streaming sensor data collection is cases where the owners of the data increasingly a source of data used in are unable to share the raw data due HPC. In short, science data is getting to to privacy regulations, such as on us in new ways, and we also have more medical data, system and network data than ever to protect. data that contains personally iden- Another change is that on HPC sys- tifiable information, or sensor data tems running full operating systems, containing sensitive (for example, we are starting to see an increasing shift location) information. In either case, toward the use of new virtualized envi- the data owners may still wish to find ronments for additional flexibility. In a way to enable some limited type of particular, as Docker containers25 and computation on the data, or share CoreOS’s Rocket9 become more popu- data, but only with a certain degree of lar for virtual replication and contain- resolution. With CryptDB34 and My- ment in many IT environments, rather lar,35 Popa et al. have demonstrated than replicating full virtual operating approaches for efficiently searching systems, Docker-like containers that over encrypted data without requir- are more appropriate to HPC environ- ing fully homomorphic encryption,17 ments, such as Shifter19 or Singular- which is currently at least a million ity23 are also gaining attention and times slow to be used practically, let use. This notion of “containerization” alone in HPC environments. Like- may well be a key benefit to security, wise, differential privacy,15 and per- both because of the way that contain- haps particularly distributed dif- erization done properly typically lim- ferential privacy22 may provide new its the damage that an attacker can opportunities for sharing and analyz- do, as well as because it simplifies the ing data to be used in HPC environ- operation of the machine, and the re- ments as well. And in addition, block- duction of complexity is also often a chains and similar technologies may key benefit to system robustness, in- provide means for both monitoring cluding security. the integrity of raw scientific data in The superfacility model in which HPC contexts, as well as for maintain- computation and visualization are ing secure audit trails of accesses to more frequently tightly coupled than or modifications of raw data. they currently are, seems also likely to increase. At the same time, the notion Summary of “science gateways” essentially Web Modern HPC systems do some things portals, providing limited interfaces very similar to ordinary IT computing,

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 79 review articles

but they also have some significant dif- 31. Peisert, S., et al. ASCR Cybersecurity for Scientific References Computing Integrity. TR LBNL-6953E, U.S. ferences. This article presented both 1. Adiga, N.R. et al. An overview of the Blue-Gene/L Department of Energy Office of Science, Feb. 2015. challenges and opportunities. supercomputer. In Proceedings of the ACM/IEEE 32. Peisert, S. et al. ASCR Cybersecurity for Scientific Conference on Supercomputing, 2002. Computing Integrity|Research Pathways and Ideas Two key security challenges are the 2. Austin, B. et al. 2014 NERSC Workload Analysis (Nov. Workshop. TR LBNL-191105, U.S. Department of notions that traditional security solu- 5., 2015); http://portal.nersc.gov/project/mpccc/ Energy Office of Science, Sept. 2015. baustin/NERSC_2014_Workload_Analysis_v1.1.pdf. 33. Pérez, F. and Granger, B.E. IPython: A System for tions often are not effective given the 3. Anderson, R.J. UEPS: A second-generation electronic interactive scientific computing.Computing in Science nd paramount priority of high-perfor- wallet. In Proceedings of the 2 European Symposium and Engineering 9, 3 (May 2007), 21–29. on Research in Computer Security (Nov. 1992), 411–418. 34. Popa, R.A., Redfield, C., Zeldovich, N. and Balakrishnan, mance in HPC. In addition, the need 4. Bailey, D.H. Resolving numerical anomalies in scientific H. Cryptdb: Processing queries on an encrypted to make some HPC environments as computation, 2008. database. Commun. ACM 55, 9 (Sept. 2012), 103–111. 5. Bailey, D.H., Borwein, J.M. and Stodden, V. Facilitating 35. Popa, R.A., Stark, E., Helfer, J., Valdez, S., Zeldovich, open as possible to enable broad scien- reproducibility in scientific computing: Principles N., Kaashoek, M.F. and Balakrishnan, H. Building Web and practice. Reproducibility: Principles, Problems, applications on top of encrypted data using Mylar. tific collaboration and interactive HPC Practices. H. Atmanspacher and S. Maasen, Eds. John In Proceedings of the 11th Symposium on Networked also presents a challenge. Wiley and Sons, New York, NY, 2015. Systems Design and Implementation (2014), 157–172. 6. Bailey, D.H., Demmel, J., Kahan, W., Revy, G. and Sen, 36. Rubio-Gonzàlez, C. Precimonious: Tuning assistant There may also be opportunities, as K. Techniques for the automatic debugging of scientific for floating-point precision. InProceedings of the described by the four themes regard- floating-point programs. InProceedings of the 14th International Conf. on High Performance Computing, GAMM-IMACS International Symposium on Scientific Networking, Storage and Analysis. ACM, 2013, 27. ing HPC security presented here. The Computing, Computer Arithmetic and Validated 37. Reubel, O. WarpIV: In situ visualization and analysis of fact that HPC systems tend to be used Numerics (Lyon, France, Sept. 2010). ion accelerator simulations. IEEE 7. Bishop, M. Computer Security: Art and Science. and Applications 36, 3 (2016), 22–35. for very distinctive purposes, nota- Addison-Wesley Professional, Boston, MA, 2003. 38. Ramakrishnan, L., Poon, S., Hendrix, V., Gunter, D., bly mathematical computations, may 8. Cappello, F. Improving the trust in results of numerical Pastorello, G.Z. and Agarwal, D. Experiences with simulations and scientific data analytics. 2015. user-centered design for the Tigres workflow API. mean the regularity of activity within 9. CoreOS, Inc. rkt - App Container runtime. https:// In Proceedings of 2014 IEEE 10th International HPC systems can benefit the effective- github.com/coreos/rkt. Conference on e-Science, vol 1. IEEE, 290–297. 10. Cray, Inc. Cray Linux Environment Software Release 39. Singer A. Tempting fate. ;login: 30, 1 (Feb. 2005), 27–30. ness of machine learning analyses Overview, s-2425-52xx edition (Apr 2014); http://docs. 40. Schneier, B. and Kelsey, J. Automatic event-stream on security monitoring data to detect cray.com/books/S-2425-52xx. notarization using digital signatures. In Proceedings of 11. DARPA. Transparent Computing; http://www. the 4th International Workshop on Security Protocols. misuse of cycles and threats to com- .mil/Our_Work/I2O/Programs/Transparent_ Springer, 1996, 155–169. putational integrity. In addition, cus- Computing.aspx. 41. Sommer, R. and Paxson, V. Outside the closed world: 12. Das, A., Agrawal, H., Zitnick, C.L., Parikh, D. and Batra, On using machine learning for network intrusion tom stacks provide opportunities for D. Human attention in visual question answering: Do detection. In Proceedings of the 31st IEEE Symposium humans and deep networks look at the same regions? on Security and Privacy, Oakland, CA, May 2010. enhanced security monitoring, and In Proceedings of the Conference on Empirical 42. Stoll, C. Stalking the wily hacker. Commun. ACM 31, 5 the general trend toward container- Methods in Natural Language Processing, 2016. (May 1988), 484–497. 13. Dart, E., Rotman, L., Tierney, B., Hester, M. and Zurawski, 43. Skinner, D., Wright, N., Fürlinger, K., Yelick, K.A. and ized operation, limited interfaces, and J. The science DMZ: A network design pattern for Snavely, A. Integrated Performance Monitoring; reduced complexity in HPC is likely data-intensive science. In Proceedings of the IEEE/ACM http://ipm-hpc.sourceforge.net/. Annual SuperComputing Conference (Denver CO, 2013). 44. Wallace, D. Compute node Linux: New frontiers in compute to help in the future much as reduced 14. DeMasi, O., Samak, T. and Bailey, D.H. Identifying HPC node operating systems. Cray User Group, 2007. complexity has benefitted the Science codes via performance logs and machine learning. In 45. Whitlock, B., Favre, J.M. and Meredith, J.S. Parallel Proceedings of the Workshop on Changing Landscapes in situ coupling of simulation with a fully featured DMZ model. in HPC Security (2013). visualization system. In Proceedings of the 11th 15. Dwork, C. Differential privacy. In Proceedings of the Eurographics Conference on Parallel Graphics and 33rd International Colloquium on Automata, Languages Visualization, 2011, 101–109. Acknowledgments and Programming, Part II. Lecture Notes in Computer 46. Wisniewski, R.W., Inglett, T., Keppel, P., Murty, R. Appreciation to Deb Agarwal, David Science 4052, (July 2006), 1–12. Springer Verlag. and Riesen, R. mOS: An architecture for extreme- 16. Gefter, A. Is artificial intelligence permanently scale operating systems. In Proceedings of the 4th Brown, Jonathan Carter, Phil Colella, inscrutable? Nautilus 40 (Sept. 1, 2016). International Workshop on Runtime and Operating Dan Gunter, Inder Monga, and Kathy 17. Gentry, C. Computing arbitrary functions of encrypted Systems for Supercomputers. ACM, 2014. data. Commun. ACM 53, 3 (Mar. 2010), 97–105. 47. Whalen, S., Peisert, S. and Bishop, M. Network-theoretic Yelick for their valuable feedback and 18. Haber, S. and Stornetta, W.S. How to time-stamp a classification of parallel computation patterns. In to Sean Whalen and Bogdan Copos digital document. J. Cryptology 3, 2 (Jan. 1991), 99–111. Proceedings of the First International Workshop on 19. Jacobsen, D.M. and Canon, R.S. Contain this, Characterizing Applications for Heterogeneous Exascale for their excellent work underlying the unleashing docker for HPC. Proceedings of the Cray Systems (Tucson, AZ, June 4, 2011). ideas for new approaches described User Group, 2015. 48. Whalen, S., Peisert, S. and Bishop, M. Multiclass 20. Jiang, L. and Su, Z. Osprey: A practical type system for Classification of Distributed Memory Parallel here. Thanks to Glenn Lockwood for validating dimensional unit correctness of c programs. Computations. Pattern Recognition Letters 34, 3 (Feb. his insights on the specifications for In Proceedings of the 28th International Conference on 2013), 322–329. Software Engineering, (2006), 262–271 ACM, New York. 49. Yosinski, J., Clune, J., Fuchs, T. and Lipson, H. the DOE ASCR hardware and software 21. KBase: The Department of Energy Systems Biology Understanding neural networks through deep coming in the next few years, and both Knowledgebase; http://kbase.us. visualization. In Proceedings of the Deep Learning 22. Kasiviswanathan, S.P., Lee, H.K., Nissim, K., Workshop, International Conference on Machine Glenn Lockwood and Scott Campbell Raskhodnikova, S. and Smith, A. What can we learn Learning, 2015. for the time spent providing the data privately? SIAM J. Computing 40, 3 (2011), 793–826. 50. Yelick, K. A Superfacility for Data Intensive Science. 23. Kurtzer, G.M. et al. Singularity; http://singularity.lbl.gov. Advanced Scientific Computing Research Advisory that supported that research. 24. Marko, J. and Bergman, L. Internet attack is called Committee, Washington, DC, Nov. 8, 2016; http://science. energy.gov/~/media/ascr/ascac/pdf/meetings/201609/ This work used resources of the Na- broad and long lasting. New York Times (May 10, 2005). 25. Merkel, D. Docker: Lightweight Linux containers for Yelick_Superfacility-ASCAC_2016.pdf. tional Energy Research Scientific Com- consistent development and deployment. Linux J. 239 puting Center and was supported by the (2014). Sean Peisert ([email protected]) is Staff Scientist 26. Nakamoto, S. Bitcoin: A Peer-to-Peer Electronic Cash at Lawrence Berkeley National Laboratory, Chief Director, Office of Science, Office of Ad- System (May 24, 2009); http://www.bitcoin.org/bitcoin.pdf. Cybersecurity Strategist at CENIC, and an associate vanced Scientific Computing Research, 27. Nataraj, A., Malony, A.D., Morris, A. and Shende, S. adjunct professor at the University of California, Davis. Early experiences with KTAU on the IBM BG/L. of the U.S. Department of Energy under In European Conference on Parallel Processing, pp. Copyright held by owner/author. 99-110. Springer, 2006. Contract No. DE-AC02-05CH11231. 28. Paxson, V. Bro: A system for detecting network Any opinions, findings, conclusions, intruders in real time. Computer Networks 31, 23 (1999), 2435–2463. Watch the author discuss or recommendations expressed in this 29. Peisert, S., et al. The Medical Science DMZ. J. American his work in this exclusive material are those of the author and do Medical Informatics Assoc. 23, 6 (Nov. 1, 2016). Communications video. 30. Peisert S. Fingerprinting Communication and https://cacm.acm.org/videos/ not necessarily reflect those of the em- Computation on HPC Machines. TR LBNL-3483E, security-in-high-performance- ployers or sponsors of this work. Lawrence Berkeley National Laboratory, June 2010. computing-environments

80 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 research highlights

P. 82 P. 83 Technical Perspective Exploiting the Analog A Gloomy Look Properties of Digital Circuits at the Integrity of Hardware for Malicious Hardware By Charles (Chuck) Thacker By Kaiyuan Yang, Matthew Hicks, Qing Dong, Todd Austin, and Dennis Sylvester

P. 92 P. 93 Technical Perspective Scribe: Deep Integration Humans and of Human and Machine Computers Working Together Intelligence to Caption on Hard Tasks Speech in Real Time By Ed H. Chi By Walter S. Lasecki, Christopher D. Miller, Iftekhar Naim, Raja Kushalnagar, Adam Sadilek, Daniel Gildea, and Jeffrey P. Bigham

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 81 research highlights

DOI:10.1145/3068774 To view the accompanying paper, Technical Perspective visit doi.acm.org/10.1145/3068776 rh A Gloomy Look at the Integrity of Hardware By Charles (Chuck) Thacker

SINCE THE INVENTION of the integrated get’s software, rather than by adding circuit, the complexity of the devices hardware. The reports seem to indi- and the cost of the facilities used to As technologists, cate the bot devices were easily com- build them have increased dramati- technical solutions promised, using default passwords cally. The first fabrication facility that could not be changed, and the de- with which I was associated was built are what we do best. vices were not designed to be updated at Xerox PARC in the mid-1970s at a In the case of in the field. While the security provid- cost of approximately $15M ($75M to- ed by IoT devices will surely improve, day). Today, the cost of a modern fab the attack the authors argue that the introduc- is approximately $15B. This cost is proposed by tion of small Trojans by untrusted justified by the fact that today’s chips fabrication facilities will remain a are much more complex than in ear- the authors, problem for which technical solutions lier times. The number of layers in- a technical defense appear elusive. volved has grown to over 100, and the As technologists, technical solu- tolerances involved are approaching seems problematic. tions to problems are what we do best. atomic dimensions. In the case of the attack proposed by The high cost of a fab means that the authors, a technical defense seems in order to be cost-effective, it must problematic. We do, however, have ex- be fully loaded. This has led to “sili- amples from other fields that might be con foundries,” which build chips for promising. The A2 Trojan assumes an a variety of “fabless” semiconductor untrusted fabrication facility. While it companies based on a set of physical ment, and it may then be triggered might not be possible to do all future design libraries supplied by the found- by an external software attack. When fabrication in trusted facilities, using ry. and in triggered, the chip’s normal function a third party trusted by both the fab their seminal 1980 “Introduction to is subverted by the attacker. In the A2 and its customers to monitor the be- VLSI Systems” initially proposed this implementation, the trigger is used havior of the fab seems plausible. The concept, but the Taiwan Semiconduc- to elevate the privilege of a user-mode job of the third party is to certify the tor Company (TSMC), founded in 1987, program. The authors argue that the proper behavior of the fab. Trusted changed what had been an academic simplicity of the Trojan and its use of third parties are widely used in areas exercise into an industrial norm. To- analog circuitry make it difficult to de- ranging from financial contracts to day, a few large fabs throughout the tect, even with enhanced levels of test- nuclear treaty compliance. “Trust but world dominate this business. ing. They go to considerable lengths to verify” was used during the Cold War Over the last two decades, integrat- verify their approach, including exten- to describe this relationship. ed circuit design has diverged into two sive simulation and actual fabrication The authors have a lot of experience specialties: (1) Architectural and logi- of a processor in a modern silicon pro- with attacks on digital logic, and do a cal design and device layout, done by cess. On the actual hardware, the Tro- good job of explaining previous work a design house, with (2) mask genera- jan operated as expected. in the area. The paper is definitely tion and device fabrication done by a Is this realistic? Certainly no worth reading carefully, as it covers foundry. To ensure the foundry has foundry wants to compromise its an area that will likely become much done its job correctly, the design house business model by being identified more important in an increasingly relies on extensive testing to verify that as untrustworthy. technology-dependent world. devices meet their specifications. As I was preparing this Techni- The following paper assumes the cal Perspective, the Dyn/Mirai DDoS Charles Thacker, computing pioneer and recipient of the 2009 ACM A.M. Turing Award, passed away in June 2017, foundry (or other parties involved in attack occurred. Apparently, the at- soon after this Technical Perspective was written. the low levels of fabrication) is mali- tack used a large number of IoT de- cious, and can modify the design they vices (DVRs and webcams) as a botnet, receive to produce a device that can which targeted a major DNS server. later be used for malice. Their attack This is approximately what the au- employs a very small Trojan circuit in- thors of the following paper describe, cluded in an otherwise correct design. although the attack was done by ex- The Trojan awaits the chip’s deploy- ploiting the lack of security in the tar- Copyright held by author.

82 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 DOI:10.1145/3068776 Exploiting the Analog Properties of Digital Circuits for Malicious Hardware By Kaiyuan Yang, Matthew Hicks, Qing Dong, Todd Austin, and Dennis Sylvester

Abstract The most pernicious fabrication-time attack is the dopant- While the move to smaller transistors has been a boon for level Trojan.2, 10 Dopant-level Trojans convert trusted cir- performance it has dramatically increased the cost to fab- cuitry into malicious circuitry by changing the dopant ratio ricate chips using those smaller transistors. This forces on the input pins to victim transistors. Converting existing the vast majority of chip design companies to trust a third circuits makes dopant-level Trojans very difficult to detect party—often overseas—to fabricate their design. To guard since there are no added or removed gates or wires. In fact, against shipping chips with errors (intentional or other- detecting dopant-level Trojans requires a complete chip wise) chip design companies rely on post-fabrication test- delayering and comprehensive imaging with a scanning ing. Unfortunately, this type of testing leaves the door open electron microscope.17 However, this elusiveness comes at to malicious modifications since attackers can craft attack the cost of expressiveness. Dopant-level Trojans are lim- triggers requiring a sequence of unlikely events, which will ited by existing circuits, making it difficult to implement never be encountered by even the most diligent tester. In sophisticated attack triggers.10 The lack of a sophisticated this paper, we show how a fabrication-time attacker can trigger means that dopant-level Trojans are more detectable leverage analog circuits to create a hardware attack that by post-fabrication functional testing. Thus, dopant-level is small (i.e., requires as little as one gate) and stealthy Trojans represent an extreme on a trade-off space between (i.e., requires an unlikely trigger sequence before affect- detectability during a physical inspection and detectability ing a chip’s functionality). In the open spaces of an already during testing. placed and routed design, we construct a circuit that uses To defend against malicious hardware inserted during capacitors to siphon charge from nearby wires as they tran- fabrication, researchers have proposed two fundamental sit between digital values. When the capacitors are fully defenses: (1) using side-channel information (e.g., power charged, they deploy an attack that forces a victim flip-flop and temperature) to characterize acceptable behavior to a desired value. We weaponize this attack into a remotely in an effort to detect anomalous (i.e., malicious) behavior,1, 7, 13, 15 controllable privilege escalation by attaching the capaci- and (2) adding sensors to the chip that directly measure and tor to a controllable wire and by selecting a victim flip-flop characterize features of the chip’s behavior (e.g., signal that holds the privilege bit for our processor. We imple- propagation delay) in order to identify dramatic changes ment this attack in an OR1200 processor and fabricate a in those features (presumably caused by activation of a chip. Experimental results show that the purposed attack malicious circuit).3, 8, 11 Using side channels as a defense works. It eludes activation by a diverse set of benchmarks works well against large Trojans added to purely combi- and evades known defenses. national circuits where it is possible to test all inputs and there exists a reference chip to compare against. While this accurately describes most existing fabrication-time 1. INTRODUCTION attacks, we show that it is possible to implement a stealthy The trend toward smaller transistors in integrated circuits, and powerful processor attack using only a single added while beneficial for higher performance and lower power, gate without affecting features measured by existing on- has made fabricating a chip expensive. For example, it costs chip sensors. 15% more to set up the fabrication line for each successive We create a new fabrication-time attack that is control- process node and by 2020 it is expected that setting up a fab- lable, stealthy, and small, which borrows the idea of coun- rication line for the smallest transistor size will require a $20 ter-based triggers commonly used to hide design-time billion upfront investment.18 To amortize the cost of fabri- malicious hardware19, 20 and adapt it to fabrication-time. cation development, most hardware companies outsource Based on analog behaviors, the attack replaces the hun- fabrication. dreds of gates required by conventional counter-based digi- Outsourcing of chip fabrication opens up hardware to tal triggers with analog components—a capacitor and a few attack. These hardware attacks can evade software checks transistors wrapped up in a single gate. because software must trust hardware to faithfully imple- ment the instructions.6, 12 Even worse, if there is an attack The original version of this paper is entitled “A2: Analog in hardware, it can contaminate all layers of a system that Malicious Hardware” and was published in 2016 IEEE depend on the hardware and violates high-level security pol- International Symposium on Security and Privacy. icies correctly implemented by software.

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 83 research highlights

This paper presents three contributions. (1) We design vulnerable to malicious attacks by rogue engineers involved and implement the first fabrication-time processor attack in any of the above steps. that mimics the triggered attacks often added during design The design house implements the specification for the time. As a part of our implementation, we are the first to chip’s behavior in some Hardware Description Language show how a fabrication-time attacker can leverage the empty (HDL). Once the specification is implemented in an HDL and space common in chip layouts to implement malicious cir- that implementation has been verified, the design is passed cuits, (2) We show how an analog attack can be much smaller to a back-end house, which places and routes the circuit. and more stealthy than its digital counterpart. Our attack Conventional digital Trojans can only be inserted in diverts charge from unlikely signal transitions to imple- design phase and are easier to be detected by design phase ment its trigger, so it is invisible to all known side-channel verifications. Fabrication-time attacks inserted in back-end defenses. Additionally, as an analog circuit, our attack is and fabrication phases can evade these defenses. Since it is under the digital layer and missed by functional verification strictly more challenging to implement attacks at the fabri- performed on the hardware description language, and (3) cation phase due to limited information and ability to mod- We fabricate an openly malicious processor and then evalu- ify the design compared to the back-end phase, we focus on ate the behavior of our fabricated attacks across many chips that threat model for our attack. and changes in environmental conditions. We compare The attacker starts with a Graphic Database System II these results to Simulation Program with Integrated Circuit (GDSII) file that is a polygon representation of the completely Emphasis (SPICE) simulation models. laid-out and routed circuit. Our threat model assumes that the delivered GDSII file represents a perfect implementa- 2. BACKGROUND AND THREAT MODEL tion—at the digital level of abstraction—of the chip’s speci- The typical design and fabrication process of integrated cir- fication. This is very restrictive as it means that the attacker cuits is as shown in Figure 1. See Rostami16. This process often can only modify existing circuits or—as we are the first to involves collaboration between different parties all over the show in this paper—add attack circuits to open spaces in world and each step is likely done by different teams even the laid-out design. The attacker can not increase the dimen- if they are in the same company. Therefore, the designs are sions of the chip or move existing components around. This restrictive threat model also means that the attacker must Figure 1. Typical IC design process with commonly-research threat perform some reverse engineering to select viable victim flip- vectors highlighted in red. The blue text and brackets highlights the flops and wires to tap. After the untrusted fabrication house party in control of the stage(s). completes fabrication, it sends the fabricated chips off to a trusted party for post-fabrication testing. Our threat model (Third Third-party IPs Design assumes that the attacker has no knowledge of the test cases party) time used for post-fabrication testing. Such a model dictates the RTL design attack use of a sophisticated trigger to hide the attack.

VHDL/Verilog Digital 3. ATTACK METHODS A hardware attack is composed of a trigger and a payload. design Logic verification phase The trigger monitors wires and state within the design and (design Logic synthesis activates the attack payload under very rare conditions such house) that the attack stays hidden during normal operation and Timing verification testing. Previous research has identified that evading detec- tion is a critical property for hardware Trojans designers.5 structural netlist Evading detection involves more than just avoiding attack Back-end Placement and activation during normal operation and testing, it includes design routing hiding from visual/side-channel inspection. There is a trade- off at play between the two in that the more complex the trig- phase LVS and DRC check (design ger (i.e., the better that it hides at run time), the larger the Fabrication house or Post layout timing impact that trigger has on the surrounding circuit (i.e., the time third verification worse that it hides from visual/side-channel inspection). attack party) Layout We propose A2, a fabrication-time attack that is small, stealthy, and controllable. To achieve these outcomes, we Fabrication Manufacture develop trigger circuits that operate in the analog domain. (foundry) The circuits are based on charge accumulating on a capaci- Chips tor from infrequent events inside the processor. If the (Design charge-coupled infrequent events occur frequently enough, Chip verification house) the capacitor will fully charge and the payload is activated to deploy a privilege escalation attack. Our analog trigger Packaging is similar to the counter-based triggers often used in digi- Customers tal triggers, except that using the capacitor has the advan- tage of a natural reset condition due to leakage. Compared

84 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9

to traditional digital hardware Trojans, the analog trigger Figure 2. Behavior model of proposed analog trigger circuit. maintains a high level of stealth and controllability, while dramatically reducing the impact on area, power, and tim- Trigger Trigger Trigger ing due to the attack. An added benefit of a fabrication-time input circuits output attack compared to a design-time attack (when digital-only triggers tend to get added) is that it has to pass through fewer verification stages. Trigger input 3.1. Single stage trigger circuit Threshold Based on our threat model, the high-level design objectives Cap voltage of our analog trigger circuit are as follows: Trigger 1. Functionality: The trigger circuit must be able to detect output Time toggling events of a target victim wire similar to a digi- tal counter and the trigger circuit should be able to Trigger Retention reset itself if the trigger sequence is not completed in a time time timely manner. 2. Small area: The trigger circuit should be small enough to be inserted into the empty space of an arbitrary fin- the other hand, when the trigger input is inactive, leakage ished chip layout. Small area overhead also implies gradually reduces the capacitor’s voltage, eventually dis- better chance to escape detection. abling an already activated trigger. This mechanism ensures 3. Low power: The trigger circuit is constantly monitor- that the attack is not expressed when no intentional attack ing the victim signals, therefore its power consump- happens. The time it takes to reset trigger output after trig- tion must be minimized to hide within the normal ger input stops is defined as retention time. fluctuations of the entire chip’s power consumption. Because of leakage, a minimum toggling frequency must 4. Negligible timing perturbation: The added trigger cir- be reached to successfully trigger the attack. At the mini- cuit must not affect the timing constraints for normal mum frequency, charge added in each cycle equals charge operation and its timing perturbations should not be leaked away. Trigger time and retention time are the two main easily separable from the noise common to path delays. design metrics in the analog trigger circuits that we can 5. Standard cell compatibility: Since all digital designs make use of to create flexible trigger conditions and more are based on standard cells with fixed cell height, the complicated trigger patterns as discussed in Section 3.2. analog trigger circuit must fit into the height and only A stricter triggering condition (i.e., faster toggling rate and use the lowest metal layer for routing.a These require- more toggling cycles) reduces the probability of a false trig- ments are important for insertion into existing chip ger during normal operation or testing, but non-idealities layout and makes the trojan more difficult to detect in in circuits and process, temperature and voltage variations fabricated chips. can cause the attack to fail—impossible to trigger or trivial to accidentally trigger—for some chips. As a result, a trade- To achieve these design objectives, we propose an attack off should be made between a reliable attack that can be based on charge accumulation inside capacitors. A capaci- expressed in every chip and a more stealthy attack that can tor performs analog integration of charge from a victim wire only be triggered for certain chips under certain conditions. while at the same time being able to reset itself through The conventional current-based charge pump is not suit- leakage current. A behavior model of capacitor based trig- able for the attack due to area and power constraints. A new ger circuits comprises charge accumulation and leakage as charge pump circuit based on charge sharing is specifically shown in Figure 2. designed for the attack purpose as shown in Figure 3. During Every time the victim wire that feeds the trigger circuit’s the negative phase of Clk, Cunit is charged to VDD. Then dur- capacitor toggles, the capacitor increases in voltage by some ing positive phase of Clk, the two capacitors are shortened DV. After a number of toggles, the capacitor’s voltage exceeds together, causing the two capacitors to share charges. After a predefined threshold voltage and enables the trigger’s charge sharing, final voltage of the two capacitors is the ­output—deploying the attack payload. The time it takes to same and DV on Cmain is as, activate the trigger is defined as trigger time (Figure 2). Cunit × (VDD−V ) On the other hand, leakage current exists all the time ∆V = 0 Cunit + Cmain and it dumps charge from the trigger circuit’s capacitor.

The attacker can design the capacitor’s leakage to be weaker where V0 is initial voltage on Cmain before the transition than its accumulation when the trigger input is active. On happens. We can achieve different trigger time by sizing the two capacitors. The capacitor keeps leaking over time and finally DV equals the voltage drop due to leakage, which sets a Several layers of metal wires are used in modern CMOS technologies to connect cells together, lower level metal wires are closer to transistors at the maximum capacitor voltage. bottom for short interconnections, while higher metal layers are used for A transistor-level schematic of the proposed analog trig- global routing. ger is as shown in Figure 4. Cunit and Cmain are implemented

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 85 research highlights

with Metal Oxide Semiconductor (MOS) caps. M0 and M1 are 3.2. Multi-stage trigger circuit the two switches as shown in Figure 3. A detector is used to The one-stage trigger circuit described in the previous sec- compare cap voltage with a threshold voltage and can be tion takes only one victim wire as an input. Using only one implemented by inverters or Schmitt triggers. An inverter trigger input limits the attacker in two ways: (1) Because fast has a switching voltage depending on its sizing and when toggling of one signal for tens of cycles triggers the single the capacitor voltage is higher than the switching voltage, stage attack, there is still a chance that normal operations or the output is 0; otherwise, the output is 1. A Schmitt trigger certain benchmarks can expose the attack, and (2) Certain is an inverter with hysteresis. It has a large threshold when instructions are required to create fast toggling of a single input goes from low to high and a small threshold when trigger input and there is not much room for a flexible and input goes from high to low. The hysteresis is beneficial for stealthy attack program. our attack because it extends both trigger time and retention We note that an attacker can make a logical combination time. To balance the leakage current through M0 and M1, an of two or more single-stage trigger outputs to create a vari- additional leakage path to ground (NMOS M2 as shown in ety of more flexible multi-stage analog triggers. Basic opera- Figure 4) is added to the design. tions to combine two triggers include AND and OR. When A SPICE simulation waveform is as shown in Figure 5 to analyzing the behavior of logic operations on single stage illustrate the operation of our analog trigger circuit after trigger output, it should be noted that the single-stage trig- optimization. The operation is same as the behavioral model ger outputs 0 when triggered. Thus, for AND operation, the that we proposed as shown in Figure 2, allowing us to use the final trigger is activated when either A or B triggers fire. For behavior model for system-level attack design. OR operation, the final trigger is activated when both A and B triggers fire. It is possible for an attacker to combine these simple AND and OR-connected triggers into an arbitrarily Figure 3. Design concepts of analog trigger circuit based on complex multi-level multi-stage trigger. capacitor charge sharing. 3.3. Triggering the attack VDD Clk For A2, the payload design is independent of the trigger mecha- nism, so our proposed analog trigger is suitable for various pay- Clk Cunit Clk VDD loads to achieve different attacks. Since the goal of this work Cap is to achieve a Trojan that is nearly invisible while providing a voltages powerful foothold for a software-level attacker, we couple our 9 Cunit Cmain Cmain analog triggers to a privilege escalation attack, which provides Time maximum capabilities to an attacker. We propose a simple design to overwrite security critical registers directly by adding one AND/OR gate to asynchronous set or reset pins of the reg- isters. These reset/set pins are specified in original designs for Figure 4. Transistor-level schematic of analog trigger circuit. processor reset. These reset signals are asynchronous with no timing constraints so that adding one gate into the reset sig- VDD Trigger nal of one register does not affect functionality or timing con- inputs straints of the design. Because there are no timing constraints M0 on asynchronous inputs, the payload circuit can be inserted M1 Trigger manually after final placement and routing in a manner con- Detector Switch output sistent with our threat model. leakage Cap M2 Drain Cunit Cmain leakage leakage 3.4. Selecting victims It is important that the attacker validate their choice of vic- tim signal. This requires verifying that the victim wire has low baseline activity and its activity level is controllable given the expected level of access of the attacker. To validate Figure 5. SPICE simulation waveform of analog trigger circuit. that the victim wire used in A2 has a low background activity, we use benchmarks from the MiBench embedded systems 1 benchmark suite. For cases where the attacker does not have Trigger input access to such software or the attacked processor will see a Voltage 0 wide range of use, the attacker can follow A2’s example and Trigger Trigger time Retention time use a multi-stage trigger with wires that toggle in a mutually- (240ns) (0.8us) output 1 exclusive fashion and require inputs that are unlikely to be produced using off-the-shelf tools (e.g., GNU Compiler Cap Voltage Collection (GCC)). voltage 0 Validating that the victim wire is controllable requires 0.01.0 2.0 3.0 4.05.0 6.0 time (us) that the attacker reason about their expected level of access to the end user system for the attacked processor. In A2, we

86 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9

assume that the attacker can load and execute any unprivi- Triggering the attack in usermode-only code is only the leged instruction. This allows us to create hand-crafted first part of a successful attack. For the second part, the assembly sequences that activate the attack. This model attacker must be able to verify that the triggering software works for attackers that have an account on the system, works—without risk of alerting the operating system. To attackers in a virtual machine, or even attackers that can check whether the attack is successful, we take advantage convince users to load code. of a special feature of some registers on the OR1200: some privileged registers are able to be read by user mode code, 4. IMPLEMENTATION but the value reported has some bits redacted. We use this To experimentally verify A2, we implement and fabricate an behavior to let the attacker’s code know whether it gets privi- open source processor with the proposed analog Trojans leged access to the processor or not. inserted in 65nm General Purpose Complementary Metal- Oxide-Semiconductor (CMOS) technology. Multiple attacks 4.2. Analog activity trigger are implemented in the chip. One set of attacks are Trojans We implement both the one-stage and two-stage trigger cir- aimed at exposing A2’s end-to-end operation, while the cuits in 65nm GP CMOS technology based on SPICE simula- other set of attacks are implemented outside the processor, tions. Both trigger circuits are inserted into the processor to directly connected to Input/Output (IO) pins so that we can demonstrate the attack. investigate trigger behavior directly. Implementation in 65nm GP technology. For prototype purposes, we optimize the trigger circuit towards a reliable 4.1. Attacking a real processor version and building a reliable circuit under process, temper- We implemented an open source OR1200 processor14 to ver- ature, and voltage (PVT) variations is always more challeng- ify our A2 attack including software triggers, analog triggers ing than only optimizing for a certain PVT range—that is, we and payload. The OR1200 Central Processing Unit (CPU) is construct our attacks so that they work in all fabricated pro- an implementation of the 32-bit OR1K instruction set with cessors at all corner-case environments. 65nm CMOS tech- a five stage pipeline. The implemented system in silicon nology is not a favorable technology for our attack because consists of a OR1200 core with 128B instruction cache and the gate oxide is thinner than older technologies due to an embedded 128KB main program memory connected dimension scaling and also thinner than latest technologies through a Wishbone bus. The OR1K instruction set specifies because high-κ metal gate techniques now being employed the existence of a privileged register called the Supervision to reduce gate leakage. However, through careful sizing, it’s Register (SR). The SR contains bits that control how the pro- still possible to design a circuit robust across PVT variations, cessor operates (e.g., Memory Management Units (MMU) but this requires trading-off trigger time and retention time. and caches enabled) and flags (e.g., carry flag). One partic- To reduce gate leakage, another solution is to use thick ular bit is interesting for security purposes; SR[0] controls oxide transistors commonly used in IO cells as the MOS cap the privilege mode of user, with 0 denoting user mode and for Cmain, which shows negligible gate leakage. This option 1 denoting supervisor mode. By overwriting the value of this provides larger space for the configuration of trigger time register, an attacker can escalate a user mode process to and retention time but requires larger area due to design supervisor mode as a backdoor to deploy various high-level rules. Trigger circuit using IO device is implemented for the attacks.5, 9 Therefore, we make the payload of our attack set- two-stage attack and the one without IO device is used for ting this bit in the SR to 1 to give a user mode process full the one-stage attack in the system. control over the processor. Inserting A2 into existing chip layouts. Since A2’s analog Our analog trigger circuits require trigger inputs that can trigger circuit is designed to follow sizing and routing con- have a high switching activity under certain (attacker) pro- straints of standard cells and has the area of a single standard grams but are almost inactive during testing or common cell, inserting the trigger circuit to the layout at fabrication case operation so that the Trojan is not exposed. To search time is not complicated. In typical placement and routing for suitable victim wires as trigger inputs, we run a series of cases, around 60% to 70% of total area is used for standard programs from MiBench (see Section 5) on the target proces- cells, otherwise routing can not complete due to routing sor in an HDL simulator, capturing the toggling rates of all congestions (our chip is more challenging to attack as it wires. The result shows that approximately 3% of total wires has 80% area utilization). Therefore, in any layout of digital have nearly zero activity rate, which provides a wide range of designs, empty space exists. This empty space presents an options for an attacker. The target signals must also be easy opportunity for attackers as they can occupy the free space to control by attack programs. In our attack, we select divide with their own malicious circuit. In our case, we require as by zero flag signal as the trigger for the one-stage attack, little space as one cell. There are four steps to insert a trigger because it is unlikely for normal programs to continuously into the layout of a design: perform division-by-zero while it is simple for an attacker The first step is to locate the signals chosen as trigger to deliberately perform such operations in a tight loop. For inputs and the target registers to attack. The insertion of A2 the two-stage trigger, we select wires that report whether the attack can be done at both back-end and fabrication stage. division was signed or unsigned as trigger inputs. The attack Our threat model focuses on the fabrication stage because program alternatively switches the two wires by performing it is significantly more challenging and implies a more signed, then unsigned division, until both analog trigger stealthy attack over compared to attack at back-end stage circuits are activated, deploying the attack payload. attacks. The back-end stage attacker has access to the netlist

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 87 research highlights

of the design, so locating the desired signal is trivial. But an Comparisons with several variants of NAND2 and DFlip–Flop attack inserted at back-end stage can still be discovered by standard cells from commercial libraries are summarized in SPICE simulation and layout checks, though the chance is Table 1. The area of the trigger circuit not using IO device extremely low if no knowledge about the attack exists. In is similar to a X4 strength DFlip–Flop. Using an IO device contrast, fabrication time attacks can only be discovered by increases trigger circuit size significantly, but area is still post-silicon testing, which is believed to be very expensive similar to the area of two standard cells, which ensures it can and difficult to find small Trojans. To insert an attack during be inserted into empty space in final design layout. AC power chip fabrication, some insights about the design are needed, is the total energy consumed by the circuits when input which can be extracted from layout through physical verifi- changes, the power numbers are simulated with SPICE on cation tools and digital simulations or from a co-conspirator a netlist including extracted parasitics. Standby power is the involved in the design phase. power consumption of the circuits when inputs are static, The next step is to find empty space around the victim which comes from leakage currents of CMOS devices. wire and insert the analog trigger circuit. Unused space is After inserting A2, post-layout simulation with extracted usually automatically filled with filler cells or capacitor cells parasitics shows that the extra delay of victim wires is 1.2ps by placement and routing tools. Removing these cells will on average, which is only 0.33% of 4ns clock period and not affect the functionality or timing. well below the process variation and noise range. In prac- To insert the attack payload circuit, the reset wire needs tice, such delay difference is nearly impossible to measure, to be cut as discussed in Section 3.3. It has been shown unless a high-resolution time to digital converter is included that timing of reset signal is flexible, so the AND or OR gate on chip, which is impractical due to its large area and power only need to be placed somewhere close to the reset signal. overhead. Because the added gates can be a minimum strength cell, Comparison to digital-only attacks. If we look at a previ- their area is small and finding space for them is trivial. ously proposed, digital only and smallest implementation of The last step is to manually do the routing from trigger a privilege escalation attack,5 it requires 25 gates and 80mm2 input wires to analog trigger circuit and then to the payload while our analog attack requires as little as one gate for the circuits. There is no timing requirement on this path so that same effect. Our attack is also much more stealthy as it the routing can go around existing wires at same metal layer requires dozens of consecutive rare events, where the other (jogging) or jump over existing wires by going to another attack only requires two. We also implement a digital only, metal layer (jumping). If long and high metal wires become counter-based attack that aims to mimic A2. The digital ver- a concern of the attacker due to potentially easier detection, sion of A2 requires 91 cells and 382mm2, almost two orders- repeaters (buffers) can be added to break long wire into of-magnitude more than the analog counterpart. These small sections. Furthermore, it is possible that the attacker results demonstrate how analog attacks can provide attack- can choose different trigger input wires and/or payload ers the same power and control as existing digital attacks, according to the existing layout of the target design. but much more difficult to catch. In our OR1200 implementation, inserting the attack fol- lowing the steps above is trivial, even with the design’s 80% 5. EVALUATION area utilization. Routing techniques including jogging and We perform all experiments with our fabricated 2.1mm2 jumping are used, but such routing approach is very com- malicious OR1200 processor as shown in Figure 6. Figure 6 mon for automatic routing tools so the information leaked also marks the locations of A2 attacks, with two levels of by such wires is limited. zoom to aide in understanding the challenges of identifying Side-channel information. For the attack to be stealthy A2 in a sea of non-malicious logic. In fact, A2 occupies less and defeat existing protections, the area, power and timing than 0.08% of the chip’s area. Our fabricated chip contains overhead of the analog trigger circuit should be minimized. two sets of attacks: the first set of attacks are one and two- High accuracy SPICE simulation is used to characterize stage triggers baked-in to the processor that we use to assess power and timing overhead of implemented trigger circuits. the end-to-end impact of A2. The second set of attacks exist

Table 1. Comparison of area and power between our implemented analog trigger circuits and commercial standard cells in 65nm GP CMOS technology.

Function Drive strength Width† AC power† Standby power†

NAND2 X1 1 1 1

NAND2 X4 3 3.7 4.1

NAND2 X8 5.75 7.6 8.1 DFF with Async reset X1 6 12.7 2.6 DFF with Async reset X4 7.75 21.8 7.2 DFF with Async set and reset X1 7.5 14.5 3.3 DFF with Async set and reset X4 8.75 23.6 8.1 Trigger w/o IO device – 8 7.7 2.2 Trigger w/ IO device – 13.5 0.08 0.08

* DFF stands for D Flip Flop. † Normalized values.

88 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9

outside of the processor and are used to fully characterize (a free register bit that we can use to test the two-stage trig- A2’s operation. ger) to 1. When the respective trigger deploys the attack, We use the testing setup as shown in Figure 7 to evaluate the single-stage attack will cause SR[0] to suddenly have a our attacks’ response to changing environmental conditions 1 value, while the two-stage trigger will cause SR[1] to have and a variety of software benchmarks. The chip is packaged a 0 value—the opposite of their initial values. Because our and mounted on a custom testing board to interface with attack relies on analog circuits, environmental aspects dic- a PC. Through a custom scan chain, we can load programs tate the performance of our attack. Therefore, we test the into the processor’s memory and also check the values of the chip at six temperatures from −25°C to 100°C to evaluate processor’s registers. The system’s clock is provided by an the robustness of our attack. Measurement results con- on-chip 240MHz clock generator at the nominal condition firm that both the one-stage and two-stage attacks in all ten (1V supply voltage and 25°C). tested chips successfully overwrite the target registers at all temperatures. 5.1. Does the attack work? Analog trigger circuit measurement results. Figure 8 To prove the effectiveness of A2, we evaluate it from two per- shows the measured distribution of retention time and spectives. One is a system evaluation that explores the end- trigger cycles at three different trigger toggling frequen- to-end behavior of our attack by loading attack-triggering cies across ten chips. The results show that our trigger cir- programs on the processor, executing them in user mode, cuits have a regular behavior in the presence of real-world and verifying that after executing the trigger sequence, manufacturing variances, confirming SPICE simulation they have escalated privilege on the processor. The other results. retention time at the nominal condition (1V sup- perspective seeks to explore the behavior of our attacks by ply voltage and 25°C) is around 1ms for the trigger with directly measuring the performance of the analog trigger only core devices and 5ms for attacks constructed using IO circuit, the most important component in our attack, but devices. It is verified that the number of cycles to trigger also the most difficult aspect of our attack to verify using attack for both trigger circuits (i.e., with and without IO simulation. devices) are very close in chip measurements and SPICE System attack. Malicious programs described in Section 4.1. simulations. The results indicate that SPICE is capable of are loaded to the processor and then we check the target providing results of sufficient accuracy for these unusual register values. In the program, we initialize the target reg- attack circuits. isters SR[0] (the mode bit) to user mode (i.e., 0) and SR[1] To verify the implemented trigger circuits are robust across voltage and temperature variations (as SPICE simu- lation suggests), we characterize each trigger circuit under Figure 6. Die micrograph of analog malicious hardware test chip with different supply voltage and temperature conditions. We a zoom-in layout of inserted A2 trigger.

Figure 8. Measured distribution of retention time and trigger cycles under different trigger input divider ratios across 10 chips at nominal Via Metal 3 1V supply voltage and 25°C. Main memory Metal 2

128KB SRAM 1.5mm 7 7 7 7 120MHz 9.23MHz 1.875MHz Retention time 6 6 6 6 5 5 5 5 Scan OR1200 chain A2 Trigger I$ CLK Testing 4 4 4 4 core Structure

2 m 3 3 3 3 IO drivers and pads 2 2 2 2 Number of chips 1 1 1 1 1.4mm 6.4mm 0 0 0 0 10 12 14 16 10 12 14 16 12 14 16 18 4681012 Cycles Cycles Cycles Retention time (us) (a) Distribution of analog trigger circuit using IO device Figure 7. Testing setup for test chip measurement. 7 7 7 7 120MHz 34.3MHz 10.9MHz Retention time 6 6 6 * 2 chips cannot 6 trigger at this Packaged Labview 5 5 5 switching activity 5 test chip Temperature 4 4 4 4 chamber 3 3 3 3

Number of chip s 2 2 2 2 Power supply and 1 1 1 1 source meter 0 0 0 0 Testing PCB 46810 6810 12 10 12 14 16 0.6 0.8 1.0 1.2 Cycles Cycles Cycles Retention time (us) Digital IO (b) Distribution of analog trigger circuit using only core device

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 89 research highlights

confirmed that the trigger circuit can be activated when the programs, at the nominal condition (1V supply voltage and victim wire toggles between 0.46MHZ and 120MHz, the sup- 25°C). Direct measurement of trigger circuit power is ply voltage varies between 0.8V and 1.2V, and the ambient infeasible in our setup, so simulation is used as an esti- temperature varies between −25°C and 100°C. mation. Simulated trigger power consumption in Table 1 As expected, different conditions yield different mini- translates to 5.3nW and 0.5mW for trigger circuits con- mum toggling rates to activate the trigger. Temperature structed with and without IO devices. These numbers are has a stronger impact than voltage on the trigger condi- based on the assumption that trigger inputs keep tog- tion because of leakage current’s exponential dependence gling at 1/4 of the clock frequency of 240MHz, which is the on temperature. At higher temperature, more cycles are maximum switching activity that our attack program can required to trigger and higher switching activity is required achieve. In the common case of non-attacking software, because leakage from capacitor is larger. the switching activity is much lower—approaching zero— and only lasts a few cycles so that the extra power due to 5.2. Is the attack triggered by non-malicious our trigger circuit is even smaller. In our experiments, the benchmarks? power of the attack circuit is orders-of-magnitude less Another important property for any hardware Trojan is not than the normal power fluctuations that occur in a pro- exposing itself under normal operations. Because A2’s trig- cessor while it executes different instructions. Further ger circuit is connected only to the trigger input signal, digi- discussions about possible defenses such as split manu- tal simulation of the design is enough to acquire the activity facturing and runtime verifications are presented in our of the signals. However, since we make use of analog charac- original A2 paper.21 teristics to attack, analog effects should also be considered as potential effects to accidentally trigger the attack. We use 6. CONCLUSION MiBench4 as test bench because it targets the class of pro- Experimental results with our fabricated malicious proces- cessor that best fits the OR1200 and it consists of a set of sor show that a new style of fabrication-time attack is pos- well-understood applications that are popular benchmarks sible, which applies to a wide range of hardware, spans the in both academia and in industry. To validate that A2’s trig- digital and analog domains, and affords control to a remote ger avoids spurious activations from a wide variety of soft- attacker. Experimental results also show that A2 is effec- ware, we select five benchmark applications from MiBench, tive at reducing the security of existing software, enabling each from a different class. This ensures that we thoroughly unprivileged software full control over the processor. test all subsystems of the processor—exposing likely activity Finally, the experimental results demonstrate the elusive rates for the wires in the processor. Again, in all programs, nature of A2: (1) A2 is as small as a single gate—two orders of the victim registers are initialized to opposite states that A2 magnitude smaller than a digital-only equivalent; (2) attack- puts them in when its attack is deployed. The processor runs ers can add A2 to an existing circuit layout without perturb- all five programs at six different temperatures from −25°C to ing the rest of the circuit; (3) a diverse set of benchmarks fail 100°C. Results prove that neither the one-stage nor the two- to activate A2 and (4) A2 has little impact on circuit power, stage trigger circuit is exposed when running these bench- frequency, or delay. marks across such wide temperature range. Our results expose two weaknesses in current malicious hardware defenses. First, existing defenses analyze the 5.3. Existing protections digital behavior of a circuit using functional simulation or Existing protections against fabrication-time attacks are the analog behavior of a circuit using circuit simulation. mostly based on side-channel information, for example, Functional simulation is unable to capture the analog prop- power, temperature, and delay. In A2, we only add one gate erties of an attack, while it is impractical to simulate an in the trigger, thus minimizing power and temperature per- entire processor for thousands of clock cycles in a circuit turbations caused by the attack. simulator—this is why we had to fabricate A2 to verify that it Table 2 summarizes the average power consumption worked. Second, the minimal impact on the run-time prop- measured when the processor runs our five benchmark erties of a circuit (e.g., power, temperature, and delay) due to A2 suggests that it is an extremely challenging task for Table 2. Power consumption of our test chip running a variety of side-channel analysis techniques to detect this new class of benchmark programs. attacks. We believe that our results motivate a different type of defense, where trusted circuits monitor the execution of Program Power (mW) untrusted circuits, looking for out-of-specification behavior Standby 6.210 in the digital domain. Basic math 23.703 Dijkstra 16.550 Acknowledgments FFT 18.120 This work was supported in part by C-FAR, one of the six SHA 18.032 Search 21.960 SRC STARnet Centers, sponsored by MARCO and DARPA. Single-stage attack 19.505 This work was also partially funded by the National Science Two-stage attack 22.575 Foundation. Any opinions, findings, conclusions, and rec- Unsigned division 23.206 ommendations expressed in this paper are solely those of the authors.

90 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9

13. Narasimhan, S., Wang, X., Du, D., Reversing stealthy dopant-level References Chakraborty, R.S., Bhunia, S. TeSR: circuits. In International Conference 1. Agrawal, D., Baktir, S., Karakoyunlu, D., Operating Systems (ASPLOS, A robust temporal self-referencing on Cryptographic Hardware and Rohatgi, P., Sunar, B. Trojan Istanbul, Turkey, 2015). ACM, approach for hardware Trojan Embedded Systems (CHES, New York, detection using IC fingerprinting. In 517–529. detection. In Hardware-Oriented NY, 2014). Springer-Verlag, 112–126. Symposium on Security and Privacy 7. Jin, Y., Makris, Y. Hardware Trojan Security and Trust (HOST, San Diego, 18. S.S. Technology. Why node shrinks are (S&P, Washington, DC, 2007). IEEE detection using path delay fingerprint. CA, June 2011). IEEE Computer no longer offsetting equipment costs, Computer Society, 296–310. In Hardware-Oriented Security and Society, 71–74. (online webpage, Oct. 2012). 2. Becker, G.T., Regazzoni, F., Paar, C., Trust (HOST, Washington, DC, 2008). 14. OpenCores.org. OpenRISC OR1200 19. Waksman A., Sethumadhavan, S. Burleson, W.P. Stealthy dopant-level IEEE Computer Society, 51–57. processor. Silencing hardware backdoors. In hardware Trojans. In International 8. Kelly, S.,Zhang, X., Tehranipoor, M., 15. Potkonjak, M., Nahapetian, A., IEEE Security and Privacy (S&P, Conference on Cryptographic Ferraiuolo, A. Detecting hardware Nelson, M., Massey, T. Hardware Oakland, CA, May 2011). IEEE Hardware and Embedded Systems Trojans using on-chip sensors in an Trojan horse detection using gate- Computer Society. (CHES, Berlin, Heidelberg, 2013). ASIC design. Journal of Electronic level characterization. In Design 20. Wang, X., Narasimhan, S., Krishna, A., Springer-Verlag, 197–214. Testing 31, 1 (Feb. 2015), 11–26. Automation Conference, volume 46 of Mal-Sarkar, T., Bhunia, S. Sequential 3. Forte, D., Bao, C., Srivastava, A. 9. King, S.T., Tucek, J., Cozzie, A., Grier, C., DAC (2009), 688–693. hardware trojan: Side-channel aware Temperature tracking: An innovative Jiang, W.n., Zhou, Y. Designing and 16. Rostami, M., Koushanfar, F., design and placement. In Computer run-time approach for hardware implementing malicious hardware. In Rajendran, J., Karri, R. Hardware Design (ICCD), 2011 IEEE 29th Trojan detection. In International Workshop on Large-Scale Exploits and security: Threat models and metrics. International Conference on (IEEE, Conference on Computer-Aided Emergent Threats, volume 1 of LEET In Proceedings of the International Oct 2011), 297–300. Design (ICCAD, 2013). IEEE, (USENIX Association, Apr. 2008). Conference on Computer-Aided 21. Yang, K., Hicks, M., Dong, Q., Austin, T., 532–539. 10. Kumar, R., Jovanovic, P., Burleson, W., Design (ICCAD, San Jose, CA, 2013). Sylvester, D. A2: Analog malicious 4. Guthaus, M.R., Ringenberg, J.S., Ernst, D., Polian, I. Parametric Trojans for fault- IEEE Press, 819–823. hardware. In 2016 IEEE Symposium on Austin, T.M., Mudge, T., Brown, R.B. injection attacks on cryptographic 17. Sugawara, T., Suzuki, D., Fujii, R., Security and Privacy (SP) (May 2016). MiBench: A free, commercially hardware. In Workshop on Tawa, S., Hori, R., Shiozaki, M., Fujino, T. IEEE Computer Society, 18–37. representative embedded benchmark Fault Diagnosis and Tolerance in suite. In Workshop on Workload Cryptography (IEEE, FDT, 2014), 18–28. Characterization (Washington D.C., 11. Li, J., Lach, J. At-speed delay * Kaiyuan Yang ([email protected]), Dept. 2001). IEEE Computer Society, 3–14. characterization for IC authentication of ECE, Rice University, Houston, TX. 5. Hicks, M., Finnicum, M., King, S.T., and Trojan horse detection. In Martin, M.M.K., Smith, J.M. Hardware-Oriented Security and Trust * Matthew Hicks ([email protected]), Overcoming an untrusted computing (HOST, Washington, DC, 2008). IEEE Dept. of CS, Virginia Tech, Blacksburg, VA. base: Detecting and removing Computer Society, 8–14. malicious hardware automatically. 12. Li, M.-L., Ramachandran, P., Qing Dong, Todd Austin, and Dennis USENIX;login 35, 6 (Dec. 2010), Sahoo, S.K., Adve, S.V., Adve, V.S., Sylvester ({kaiyuan, mdhicks, qingdong, 31–41. Zhou, Y. Understanding the propagation austin, dmcs}@umich.edu), 6. Hicks, M., Sturton, C., King, S.T., of hard errors to software and Department of EECS, University Smith, J.M. Specs: A lightweight implications for resilient system of Michigan, Ann Arbor, MI. runtime mechanism for protecting design. In International Conference software from security-critical on Architectural Support for * This work was done at the University of processor bugs. In Proceedings Programming Languages and Michigan, Ann Arbor. of the Twentieth International Operating Systems (ASPLOS, Conference on Architectural Support Seattle, WA, Mar. 2008). ACM, for Programming Languages and 265–276. © 2017 ACM 0001-0782/17/09 $15.00

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 91 research highlights

DOI:10.1145/3068614 To view the accompanying paper, Technical Perspective visit doi.acm.org/10.1145/3068663 rh Humans and Computers Working Together on Hard Tasks By Ed H. Chi

THE FIELD OF crowdsourcing and human transform individual untrained work- leagues at Bletchley Park in a recent is- computation has evolved considerably ers into better captionists. sue of Communications: “Another myth from its early days. At first, crowdsourc- Second, the system uses a Map- is that code-breaking machines elimi- ing was mainly conceived as a way to Reduce programming paradigm to di- nated human labor and code-breaking obtain ground truth labels for datasets, vide and conquer the various pieces of skill ... Technology transcended, rather particularly image datasets, in the mid- the captioning tasks and coordinates than supplemented, human labor and 2000s. Soon after, researchers began to the workers and their tasks through bureaucracy.”e The article points out utilize crowdsourcing for performing this organization paradigm. First in- the real challenge of the whole effort large-scale user studies of systems.a,b As troduced by Kittur et al.,d this is a clever was a combination of the management our understanding of crowdsourcing application of the MapReduce para- of a (mostly female!) human operator continued to evolve, researchers real- digm, but instead of applying to com- force along with the Enigma machines. ized the workers can be reserved ahead puting tasks, the system applies the From my perspective, intelligent aug- of time to perform real-time tasks.c Uti- concept to organizing human tasks. mentation of our abilities is the real re- lizing this idea, the system described in Third, impressively, to combine the search frontier. the following paper demonstrates how partial contributions from individual While we continue to explore the a crowd of workers can caption speech workers, the system utilizes a sequence boundary of what is possible for ma- nearly as well as a professional caption- alignment algorithm to combine the chine intelligence, we should also be ist. Importantly, this paper was one of streams of input from various workers. exploring the boundary of how humans the first in a recent set of crowdsourcing This is novel because most crowd- will interact with machine intelligence. papers that demonstrated how human sourcing systems use a simple major- For example, how can we have an intel- workers can collaborate in concert with ity voting approach to combine the ligent conversation with computing sys- computing systems to accomplish a worker inputs. The use of a sophisti- tems? Can I talk to a restaurant recom- real-time task that is difficult for either cated algorithm here is necessary to fit mendation system while I drive home to one to do by itself. This is notable for the captioning problem, and it points get ready for a dinner date? How should many reasons, but let me first summa- to the possibility of other combiner my television respond if I say I wanted rize the significance of this work. functions in other problems in future an exciting action film tonight that takes First, the system demonstrated that research. A natural extension of the into account the tastes of other fam- significant innovation is needed to get alignment algorithm here would be to ily members? If it doesn’t have enough human workers to productively per- utilize a task-specific language model information on everyone in the room, form the captioning task. For example, trained using deep learning. will it (he/she?) ask intelligent ques- the Scribe system slows down the con- From a historical perspective, aug- tions while naturally conversing with tinuous speech for a brief period of menting humans has been at the very my guests? Can I give feedback both via time with the right volume changes to center of much personal computing hand gestures as well as voice dialog? emphasize what passage to transcribe and HCI research. There has been Since an important application of for the worker. The volume variations much talk about the degree in which machine intelligence is to augment hu- help with audio saliency. This tech- machine learning (ML) will replace mans in their desires, goals, and tasks, nique is interesting to human-comput- human labor (HL) in the future, but I what we should do is to ask important re- er interaction (HCI) researchers, since think that is misguided. Instead, what search questions about human interac- it utilizes our intuition about how we we see in this research is a good ex- tions with ML systems. In other words, can direct human attention, helping to ample in which humans and machines we should have much better research of work in concert on a very hard task that ML+HL, ML+HCI, and ML+Human In- a Kittur, A., Chi, E.H., Suh B.. Crowdsourcing is currently still too difficult to do by teraction, and this research is a shining user studies with Mechanical Turk. In Proceed- either alone. Interestingly, this aligns example that points the way. ings of the ACM Conference on Human-Factors well with a historical recounting of the in Computing Systems, ACM Press (Florence, Italy, 2008), 453–456. code-breaking work by Turing and col- e Haigh, T. Colossal genius: Tutte, flowers, and a b Egelman, S., Chi, E.H., Dow, S. Crowdsourc- bad imitation of Turing. Commun. ACM 60, 1 (Jan. ing in HCI research. Ways of Knowing in HCI. 2017), 29–35; https://doi.org/10.1145/3018994 J.S. Olson and W.A. Kellogg, Eds. Springer, NY, d Kittur. K, Smus. B., Khamkar. S., and Kraut. R.E. 2014, 267–289. CrowdForge: Crowdsourcing complex work. In th Ed H. Chi is Research Lead Manager and Sr. Staff c Bernstein, M., Brandt, J., Miller, R., and Karger, Proceedings of the 24 Annual ACM Symposium on Research Scientist at Google Inc., Mountain View, CA. D. Crowds in two seconds: Enabling real-time User Interface Software and Technology (2011), 43– crowd-powered interfaces. UIST 2011. 52; http://dx.doi.org/10.1145/2047196.2047202 Copyright held by author.

92 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 DOI:10.1145/3068663 Scribe: Deep Integration of Human and Machine Intelligence to Caption Speech in Real Time By Walter S. Lasecki, Christopher D. Miller, Iftekhar Naim, Raja Kushalnagar, Adam Sadilek, Daniel Gildea, and Jeffrey P. Bigham

Abstract This is particularly true of the large (and increasing) number Quickly converting speech to text allows deaf and hard of DHH people who lost their hearing later in life, which of hearing people to interactively follow along with live includes one third of people over 65.12 Captioning may also speech. Doing so reliably requires a combination of percep- be preferred by some to sign language interpreting for tech- tion, understanding, and speed that neither humans nor nical domains because it does not involve translating from machines possess alone. In this article, we discuss how our the spoken language to the sign language, but rather trans- Scribe system combines human labor and machine intel- literating an aural representation to a written one. Finally, ligence in real time to reliably convert speech to text with like captionists, sign language interpreters are also expen- less than 4s latency. To achieve this speed while maintain- sive and difficult to schedule. ing high accuracy, Scribe integrates automated assistance in People learn to listen and speak at a natural rate of 120–180 two ways. First, its user interface directs workers to different words per minute (WPM).17 They acquire this skill effort- portions of the audio stream, slows down the portion they lessly without direct instruction while growing up or being are asked to type, and adaptively determines segment length immersed in daily linguistic interaction, unlike text genera- based on typing speed. Second, it automatically merges the tion, which is a trained skill that averages 60–80 WPM for partial input of multiple workers into a single transcript both handwriting29 and typing.14 Professional captionists using a custom version of multiple-sequence alignment. (stenographers) can keep up with most speakers and pro- Scribe illustrates the broad potential for deeply interleav- vide captions that are accurate (95%+) and real-time (within ing human labor and machine intelligence to provide intel- a few seconds). But they are not on-demand (need to be pre- ligent interactive services that neither can currently achieve booked for at least an hour), and are expensive ($120–$200 alone. per hour).30 As a result, professionals usually cannot provide access for last minute lectures or other events, or for unpre- dictable and ephemeral learning opportunities, such as con- 1. INTRODUCTION AND BACKGROUND versations with peers after class. Real-time captioning converts speech to text in under 5s to pro- Automatic speech recognition (ASR) is inexpensive and vide access to live speech content for deaf and hard of hearing available on-demand, but its low accuracy in many real set- (DHH) people in classrooms, meetings, casual conversation, tings makes it unusable. For example, ASR accuracy drops and other events. Current options are severely limited because below 50% when it is not trained on the speaker, caption- they either require highly-skilled professional captionists ing multiple speakers, and/or when not using a high-quality whose services are expensive and not available on demand, microphone located close to the speaker.3, 6 Both ASR and or use automatic speech recognition (ASR) which produces the software used to assist real-time captionists often make unacceptable error rates in many real-world situations.10 We errors that can change the meaning of the original speech. present an approach that leverages groups of non-expert cap- As DHH people use context to compensate for errors, they tionists (people who can hear and type, but are not specially often have trouble following the speaker.6 trained stenographers) to collectively caption speech in real- Our approach is to combine the efforts of multiple non- time, and explore this new approach via Scribe, our end-to- expert captionists. Because these non-expert captionists can end system allowing on-demand real-time captioning for live be drawn from more diverse labor pools than professional events.19 Scribe integrates human and machine intelligence in captionists, they are more affordable and more easily avail- real time to reliably caption speech at natural speaking rates. able on demand. Recent work has shown, for instance, that The Word Health Organization (WHO) estimates that around 5% of the world population, that is, 360 million peo- 32 Sign languages, such as American Sign Language (ASL) are not simply codes ple, have disabling hearing loss. They struggle to under- for an aural language, but rather entirely different languages with their own stand speech and benefit from visual input. Some combine vocabulary, grammar, and syntax. lip-reading with listening, while others primarily watch visual translations of aural information, such as sign lan- The original version of this paper is entitled “Real-Time guage interpreters or real-time typists. While visual access Captioning by Groups of Non-Experts” and was published to spoken material can be achieved through sign language in UIST, 10/2012, ACM. interpreters, many DHH people do not know sign language.

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 93 research highlights

workers on Mechanical Turk can be recruited within a few sec- metrics used in this paper. Methods for producing real-time onds,1, 2, 11 and engaged in continuous tasks.21, 24, 25, 28 Recruiting captioning services come in three main varieties: from a broader pool allows workers to be selectively chosen Computer-Aided Real-time Transcription (CART): CART for their expertise not in captioning but in the technical is the most reliable real-time captioning service, but is areas covered in a lecture. While professional stenographers also the most expensive. Trained stenographers type in are able to type faster and more accurately than most crowd shorthand on a “steno” keyboard that maps multiple key workers, they are not necessarily experts in the field they are presses to phonemes that are expanded to verbatim text. captioning, which can lead to mistakes that distort the mean- Stenography requires 2–3 years of training to consistently ing of transcripts of technical talks.30 Scribe allows student keep up with natural speaking rates that average 141 WPM workers to serve as non-expert captionists for $8–$12 per hour and can reach 231 WPM.13 (a typical work-study pay rate). Therefore, we could hire sev- Non-Verbatim Captioning: In response to the cost of eral students for much less than the cost of one professional CART, computer-based macro expansion services like captionist. C-Print were introduced.30 C-Print captionists need less train- Scribe makes it possible for non-experts to collabora- ing, and generally charge around $60 an hour. However, they tively caption speech in real time by providing automated normally cannot type as fast as the average speaker’s pace, assistance in two ways. First, it assists captionists by mak- and cannot produce a verbatim transcript. Scribe employs ing the task easier for each individual. It directs each captionists with no training and compensates for slower worker to type only part of the stream audio, it slows down typing speeds and lower accuracy by combining the efforts the portion they are asked to type so they can more easily of multiple parallel captionists. keep up, and it adaptively determines the segment length Automated Speech Recognition: ASR works well in ideal based on each individual’s typing speed. Second, it solves situations with high-quality audio equipment, but degrades the coordination problem for workers by automatically quickly in real-world settings. ASR is has difficulty recogniz- merging the partial input of multiple workers into a single ing domain-specific jargon, and adapts poorly to changes, transcript using a custom version of multiple-sequence such as when the speaker has a cold.6 ASR systems can alignment. require substantial computing power and special audio Because captions are dynamic, readers spend far more equipment to work well, which lowers availability. In our mental effort reading real-time captions compared to experiments, we used Dragon Naturally Speaking 11.5 for static text. Also, regardless of method, captions require Windows. users to absorb information that is otherwise consumed Re-speaking: In settings where trained typists are not via two senses (vision and hearing) via only one (vision). common (such as in the U.K.), alternatives have arisen. In In classroom settings, this can be particularly common, re-speaking, a person listens to the speech and enunci- with content appearing on the board and being refer- ates clearly into a high-quality microphone, often in a spe- enced in speech. The effort required to track both the cial environment, so that ASR can produce captions with captions and the material they pertain to simultaneously high accuracy. This approach is generally accurate, but is one possible reason why deaf students often lag behind cannot produce punctuation, and has considerable delay. their hearing peers, even with the best accomodations.26 Additionally, re-speaking still requires extensive training, To address these issues, we also explore how captions since simultaneous speaking and listening is challenging. can be best presented to users,16 and show that control- ling bookmarks in caption playback can even increase 3. LEGION: SCRIBE comprehension.22 Scribe gives users on-demand access to real-time cap- This paper outlines the following contributions: tioning from groups of non-experts via their laptop or mobile devices (Figure 1). When a user starts Scribe, it • Scribe, an end-to-end system that has advantages over immediately begins recruiting workers to the task from current state-of-the-art solutions in terms of availabil- Mechanical Turk, or a pool of volunteer workers, using ity, cost, and accuracy. LegionTools.11, 20 When users want to begin captioning • Evidence that non-experts can collectively cover speech audio, they press the start button, which forwards audio at rates similar to or above that of a professional. to Flash Media Server (FMS) and signals the Scribe server • Methods for quickly merging multiple partial captions to begin captioning. to create a single, accurate stream of final results. Workers are presented with a text input interface • Evidence that Scribe can produce transcripts that both designed to encourage real-time answers and increase cover more of the input signal and are more accurate global coverage (Figure 2). A display shows workers their than either ASR or any single constituent worker. rewards for contributing in the form of both money and • The idea of automatically combining the real-time points. In our experiments, we paid workers $0.005 for efforts of dynamic groups of workers to outperform every word the system thought was correct. As workers type, individuals on human performance tasks. their input is forwarded to an input combiner on the Scribe server. The input combiner is modular to accommodate dif- 2. CURRENT APPROACHES ferent implementations without needing to modify Scribe. We first overview current approaches for real-time cap- The combiner and interface are discussed in more detail tioning, introduce our data set, and define the evaluation later in this article.

94 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9

Figure 1. Scribe allows users to caption audio on their mobile device. The audio is sent to multiple amateur captionists who use Scribe’s Web-based interface to caption as much of the audio as they can in real time. These partial captions are sent to our server to be merged into a final output stream, which is then forwarded back to the user’s mobile device. Crowd workers are optionally recruited to edit the captions after they have been merged. Merging Scribe server System overview has a two-fold axis

Speech we have a crystal

have a crystal that has Flash media server

Merged Caption stream Crowd corrections captions

Speech source Output we have a crystal that has a two-fold axis... we have a crystal that has a two-fold axis

Figure 2. The original worker interface encourages captionists The user interface for Scribe presents streaming text to type quickly by locking in words soon after they are typed. To within a collaborative editing framework (see Figure 3). encourage coverage of specific segments, visual and audio cues are Scribe’s interface masks the staggered and delayed format presented, the volume is reduced during off periods, and rewards of real-time captions with a more natural flow that mimics are increased during these periods. writing. In doing this, the interface presents the merged inputs from the crowd workers via a dynamically updating Web page, and allows users to focus on reading, instead of tracking changes. We have also developed methods for let- ting users have more control over their own caption play- back, which can improve comprehension.22 When users are done, pressing stop will end the audio stream, but lets work- ers complete their current transcription task. Workers are asked to continue working on other audio for a time to keep them active so that response time is reduced if users need to resume captioning. Though this article focuses on captioning speech from a single person, Scribe can handle dialogues using auto- mated speaker segmentation techniques. We use a stan- dard convolution-based kernel method to first identify distinct segments in a waveform. We then use a one-class support vector machine (SVM) to classify each segment and 15 Figure 3. The Web-based interface that shows users the live caption assign a speaker ID. Prior work has shown such segmenta- stream returned by Scribe. tion techniques to be accurate even in the presence of severe noise, such as when talking on a cellphone while driving.12 The segmentation allows us to decompose a dialogue in real- time, then caption each part individually, without burden- ing workers with the need to determine and annotate which person is currently speaking. Our solution to the transcription problem is two-fold. First, we designed an interface that facilitates real-time cap- tioning by non-experts and encourages covering the entire audio signal. Second, we developed algorithms for merging partial captions to form one final output stream. The inter- face and algorithm have been developed to address these problems jointly. For instance, because determining where each word in a partial caption fits into the final transcript is difficult, we designed the interface to encourage work- ers to type continuous segments during specified periods.

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 95 research highlights

In the following sections, we detail the co-evolution of the 6s. This seems to work well in practice, but it is likely that it worker interface and algorithm for merging partial captions is not ideal for everyone (discussed below). Our experience in order to form a final transcript. suggests that keeping the in period short is preferable even when a particular worker was able to type more than the 4. COORDINATING CAPTIONISTS period because the latency of a worker’s input tended to go Scribe’s non-expert captioning interface allows contributors up as they typed more consecutive words. to hear an audio stream of the speaker(s), and provide cap- tions with a simple user interface (UI) (Figure 2). Captionists 5. IMPROVING HUMAN PERFORMANCE are instructed to type as much as they can, but are under no Even when workers are directed to small, specific portions of pressure to type everything they hear. If they are able, work- the audio, the resulting partial captions are not perfect. This ers are asked to separate contiguous sequences of words by is due to several factors, including bursts of increased speak- pressing enter . Knowing which word sequences are likely to ing rates being common, and workers mis-hearing some con- be contiguous can help later when recombining the partial tent due to a particular accent or audio disruption. To make captions from multiple captionists. the task easier for workers, we created TimeWarp,23 which To encourage real-time entry of captions, the interface allows each worker to type what they hear in clips with a lower “locks in” words a short time after they are typed (500ms). playback rate, while still keeping up with real time and main- New words are identified when the captionist types a space taining context from content they are not responsible for. after the word, and are sent to the server. The delay is added to allow workers to correct their input while adding as little addi- 5.1. Warping time tional latency as possible to it. When the captionist presses TimeWarp manages this by balancing the play speed dur- enter (or following a 2s timeout during which they have not ing in periods, where workers are expected to caption the typed anything), the line is confirmed and animates upward. audio and the playback speed is reduced, and out periods, During the 10–15s trip to the top of the display (depending where workers listen to the audio and the playback speed is on settings), words that Scribe determines were entered cor- increased. A cycle is one in period followed by an out period. rectly (based on either spell-checking or overlap with another At the beginning of each cycle, the worker’s position in the worker) are colored green. When the line reaches the top, a audio is aligned with the real-time stream. To do this, we point score is calculated for each word based on its length first need to select the number of different sets of workers and whether it has been determined to be correct. N that will be used in order to partition the stream. We call

To recover the true speech, non-expert captions must the length of the in period Pi, the length of the out period Po cover all of the words spoken. A primary reason why the par- and the play speed reduction factor r. Therefore, the play- 1 tial transcriptions may not fully cover the true signal relates back rate during in periods is r . The amount of the real-time to saliency, which is defined in a linguistic context as “that stream that gets buffered while playing at the reduced speed quality which determines how semantic material is distrib- is compensated for by an increased playback speed of N −1 N − r uted within a sentence or discourse, in terms of the relative during out periods. The result is that the cycle time of the modi- emphasis which is placed on its various parts”.7 Numerous fied stream equals the cycle time of the unmodified stream.

factors influence what is salient, and so it is likely to be dif- To set the length of Pi for our experiments, we conducted ficult to detect automatically. Instead, we inject artificial preliminary studies with 17 workers drawn from Mechanical saliency adjustments by systematically varying the volume Turk. We found that their mean typing speed was 42.8 WPM of the audio signal that captionists hear. Scribe’s captionist on a similar real-time captioning task. We also found that interface is able to vary the volume over a given a period with a worker could type at most 8 words in a row on average before an assigned offset. It also displays visual reminders of the the per-word latency exceeded 8s (our upper bound on accept- period to further reinforce this notion. able latency). Since the mean speaking rate is around 150 WPM,13 Initially, we tried dividing the audio signal into segments workers will hear 8 words in roughly 3.2s, with an entry time that we gave to individual workers. We found several prob- of roughly 8s from the last word spoken. We used this to set

lems with this approach. First, workers tended to take lon- Pi = 3.25s, Po = 9.75s, and N = 4. We chose r = 2 in our tests so that 1 = ger to provide their transcriptions as it took them some time the playback speed would be 2 0.5 times for in periods, and N −1 = 3 = 1.5 to get into the flow of the audio. A continuous stream avoids the play speed for out periods is N −r 2 times. this problem. Second, the interface seemed to encourage To speed up and slow down the play speed of content workers to favor quality over speed, whereas streaming con- being provided to workers without changing the pitch tent reminds workers of the real-time nature of the task. The (which would make the content more difficult to under- continuous interface was designed in an iterative process stand for the worker), we use the Waveform Similarity involving tests with 57 remote and local users with a range Based Overlap and Add (WSOLA) algorithm.4 WSOLA works of backgrounds and typing abilities. These tests showed that by dividing the signal into small segments, then either workers tended to provide chains of words rather than dis- skipping (to increase play speed) or adding (to decrease joint words, and needed to be informed of the motivations play speed) content, and finally stitching these segments behind aspects of the interface to use them properly. back together. To reduce the number of sound artifacts, A non-obvious question is what the period of the volume WSOLA finds overlap points with similar wave forms, then changes should be. In our experiments, we chose to play the gradually transitions between sequences during these audio at regular volume for 4s and then at a lower volume for overlap periods.

96 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9

5.2. Integrating ASR into crowd captioning 6.3. Weighted A* search algorithm Combining ASR into human captioning workflows can also We next developed a weighted A* search based MSA algo- help improve captioning performance. By using the sug- rithm to efficiently align the partial captions.27 To do this, we gestions from an ASR system to provide an initial “base- formulate MSA as graph-traversal over a specialized lattice. line” answer that crowd workers can correct, we can reduce Our search algorithm then takes each node as a state, allow- latency. However, above an error rate of ≥ ∼ 30% error, the ing us to estimates the cost function g(n) and the heuristic ASR input actually increases latency because of the cost of function h(n) for each state. finding and repairing mistakes.9 The opposite integration At each step of the A* search algorithm, the node with the is also possible: by using sparse human input to provide smallest evaluation function is extracted from the priority corrections to the word lattice of an ASR system, it is pos- queue Q and expanded by one edge. This is repeated until a sible to reduce the error rate.8 full alignment is produced (the goal state). While weighted A* significantly speeds the search for the best alignment, it 6. AGGREGATING PARTIAL CAPTIONS is still too slow for very long sequences. To counteract this, The problem of aligning and aggregating multiple par- we use fixed-size time windows to scope the exploration to tial transcripts can be mapped to the well-studied Multiple the most-likely paths. Sequence Alignment (MSA) problem. The basic formulation of the problem involves some number of ordered sequences that 7. EXPERIMENTAL RESULTS include at least some similar elements (coming from the same We have tested our system with non-expert captionists drawn “dictionary” of possible terms plus a “gap” term). Finding the from both local and remote crowds. As a data set, we used alignment that minimizes total distance between all pairs of lectures freely available from MIT OpenCourseWare. These sequences is a non-trivial problem because, in the worst case, lectures were chosen because one of the main goals of Scribe all possible alignments of the content of each sequence— is to provide captions for classroom activities, and because including all possible spaces containing a gap term—may the recording of the lectures roughly matches our target as need to be explored. This optimization problem has been well—there is a microphone in the room that often captures shown to be NP-complete,31 and exact algorithms have time multiple speakers, for example, students asking questions. complexity that is exponential in the number of sequences. We chose four 5 min segments that contained speech from As a result, it is often necessary to apply heuristic approxima- courses in electrical engineering and chemistry, and had tions to perform MSA with in a reasonable amount of time. them professionally transcribed at a cost of $1.75 per minute. In practice, MSA is a well-studied problem in the bio- Despite the high cost, we found a number of errors and omis- informatics literature that has long been used in aligning sions. We corrected these to obtain a completely accurate genome sequences, but also has applications in approximate baseline. text matching for information retrieval, and in many other domains. Tools like MUSCLE Edgar5 provide extremely pow- 7.1. Core system study results erful solvers for MSA problems. Accordingly, our approach Our study used 20 local participants. Each participant cap- is to formulate our text-matching problem as MSA. tioned 23 min of aural speech over a period of approximately 30 min. Participants first took a standard typing test and 6.1. Progressive alignment algorithms averaged a typing rate of 77.0 WPM (SD=15.8) with 2.05% Most MSA algorithms for biological sequences follow a average error (SD=2.31%). We then introduced participants progressive alignment strategy that first performs pair- to the real-time captioning interface, and had them caption wise alignment among the sequences, and then merges a 3 min clip using it. Participants were then asked to caption sequences progressively according to a decreasing order of the four 5 min clips, two of which were selected to contain pairwise similarity. Due to the sequential merging strategy, saliency adjustments. We measure coverage (recall within a progressive alignment algorithms cannot recover from the 10s per-word time bound), precision, and WER. errors made in the earlier iterations, and typically do not We found that saliency adjustment made a significant work well for the caption alignment task. difference on coverage ranges. For the electrical engineer- ing clip, the difference was 54.7% (SD=9.4%) for words in the 6.2. Graph-based alignment selected periods as compared to only 23.3% (SD=6.8%) for We first explored a graph-based incremental algorithm to com- words outside of those periods. For the chemistry clips, the bine partial captions on the fly.19 The aggregation algorithm difference was 50.4% (SD=9.2%) of words appearing inside incrementally builds a chain graph, where each node repre- the highlighted period as compared to 15.4% (SD=4.3%) of sents a set of equivalent words entered by the workers, and the words outside of the period. links between nodes are adjusted according to the order of the To see if workers on Mechanical Turk could complete this input words. A greedy search is performed to identify the path task effectively—which would open up a large new set of work- with the highest confidence, based on worker input and an ers who are available on-demand—we recruited a crowd to n-gram language model. The algorithm is designed to be used caption the four clips (20 min of speech). Our tasks paid $0.05 online, and hence has high speed and low latency. However, and workers could make an additional $0.002 bonus per word. due to the incremental nature of the algorithm and the lack of We provided workers with a 40s instructional video to beign a principled objective function, it is not guaranteed to find the globally optimal alignment for the captions. http://ocw.mit.edu/courses/.

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 97 research highlights

with. In total, 18 workers participated, collectively achieving 7.3. TimeWarp results 78.0% coverage. The average coverage over just three work- To evaluate TimeWarp, we ran two studies that asked par- ers was 59.7% (SD=10.9%), suggesting we could be conservative ticipants to caption a 2.5 min (12 captioning cycles) lecture in recruiting workers and cover much of the input signal. clip. Again, we ran our experiments with both local partici- In our tests, workers achieved an average of 29.0% cover- pants and workers recruited from Mechanical Turk. Tests age, ASR achieved 32.3% coverage, CART achieved 88.5% cov- were divided into two conditions: time warping on or off, erage and Scribe reached 74% out of a possible 93.2% coverage and were randomized across four possible time offsets: 0s, using 10 workers (Figure 4). Collectively, workers had an aver- 3.25s, 6.5s, 9.75s. age latency of 2.89 significantly improving on CART’s latency Local participants were again generally proficient (but of 4.38s. For this example, we tuned our combiner to balance non-expert) typists and had time to acquaint themselves coverage and precision (Figure 5), getting an average of 66% with the system, which may better approximate student and 80.3% respectively. As expected, CART outperforms the employees captioning a classroom lecture. We recruited other approaches. However, our combiner presents a clear 24 volunteers (mostly students) and had them practice with improvement over both ASR and a single worker. our baseline interface before using the time warp interface. Each worker was asked to complete two trials, one with 7.2. Improved combiner results TimeWarp and one without, in a random order. We further improved alignment accuracy by applying a novel We also recruited 139 Mechanical Turk workers, who weighted-A* MSA algorithm.27 To test this, we used the same were allowed to complete at most two tasks and were four 5 min long audio clips as before. We tested three con- randomly routed to each condition (providing 257 total figurations of our algorithm: (1) no agreement needed with responses). Since Mechanical Turk often contains low qual- a 15s sliding window, (2) two-person agreement needed with ity (or even malicious workers),18 we first removed inputs a 10s window, and (3) two-person agreement needed with a which got less than 10% coverage or precision or were outli- 15s window. We compare the results from these three con- ers more than 2σ from the mean. A total of 206 tasks were figurations to our original graph-based method, and to the approved by this quick check. Task payment amounts were MUSCLE package (Figure 6). the same as for our studies described above. The with agreement and a 15s window (the best perform- Our student captionists were able to caption a major- ing setting), our algorithm achieves 57.4% average (1-WER) ity of the content well even without TimeWarp. The mean accuracy, providing 29.6% improvement with respect to the coverage from all 48 trials was 70.23% and the mean pre- graph-based system (average accuracy 42.6%), and 35.4% cision was 70.71%, compared to the 50.83% coverage and improvement with respect to the MUSCLE-based MSA sys- 62.23% precision for workers drawn from Mechanical tem (average accuracy 41.9%). On the same set of audio clips, Turk. For student captionists, total coverage went up we obtained 36.6% accuracy using ASR (Dragon Naturally 2.02%, from 69.54% to 70.95%, and precision went up by Speaking, version 11.5 for Windows), which is worse than 2.56% from 69.84% to 71.63%, but neither of these differ- all the crowd-powered approaches. We intentionally did not ences were detectably significant. However, there was a optimize the ASR for the speaker or acoustics, since DHH significant improvement in mean latency per word, which students would also not be able to do this in realistic settings. improved 22.46% from 4.34s to 3.36s (t(df) = 2.78, p <

Figure 4. Optimal coverage reaches nearly 80% when combining the input of four workers, and nearly 95% with all 10 workers, showing captioning audio in real time with non-experts is feasible. 100% 90% 80% Optimal 70% 60% CART 50% SCRIBE 40% Coverage ASR 30% Single 20% 10% 0% 12345678910 Number of workers

98 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9

8. CONCLUSION AND FUTURE WORK Figure 5. Precision-coverage curves for the electrical engineering (EE) and chemistry (Chem) lectures using different combiner parameters Scribe is the first system capable of making reliable, afford- with 10 workers. In general, increasing coverage reduces accuracy. able captions available on-demand to deaf and hard of hearing users. Scribe has allowed us to explore further issues related to how real-time captions can be made more useful to end users. For example, when captions are used, we have shown that students’ comprehension of instructional material sig- nificantly improves when they have the ability to control when the captions play, and track their position so that they are not overwhelmed by using one sensory channel to absorb content that is designed to be split between both vision and hearing. To help address this problem, we built a tool that lets students highlight or pause at the last position they read before looking away from the captions to view other visual content.22 While we have discussed how automation can be used to effectively mediate human caption generation, advances in ASR technologies can aid Scribe as well. By including ASR systems as workers, we can take advantage of the affordable, highly-scalable nature of ASR in settings where it works, while using human workers to ensure that DHH users always have access to accurate captions. ASR can eventually Figure 6. Evaluation of different systems on using (1-WER) as an accuracy measure (higher is better). use Scribe as an in situ training tool, resulting in systems that are able to provide reliable captions right out of the box using human intelligence, and scale to fully automated 0.57 0.6 0.55 solutions quicker than would otherwise be possible. A*-10-t More generally, Scribe is an example of an interactive sys- 0.5 0.44 (c=10s, threshold=2) 0.42 tem that deeply integrates human and machine intelligence A*-15-t

WER) 0.4 (c=15s, threshold=2) - 0.34 in order to provide a service that is still beyond what com- 0.3 A*-15 puters can do alone. We believe it may serve as a model for (c=15s, no threshold) interactive systems that solve other problems of this type. g (1.0 Graph-based

Av 0.2 Acknowledgments 0.1 MUSCL This work was supported by the National Science Foundation 0.0 under awards #IIS-1149709 and #IIS-1218209, the University of Michigan, Google, an Alfred P. Sloan Foundation Fellowship, and a Microsoft Research Ph.D. Fellowship. Figure 7. Relative improvement from no warp to warp conditions in terms of mean and median values of coverage, precision, and latency. We expected coverage and precision to improve. Shorter References research 32, 5 (2004), 1792–1797. latency was unexpected, but resulted from workers being able to 1. Bernstein, M.S., Brandt, J.R., Miller, R.C., 6. Elliot, L.B., Stinson, M.S., Easton, D., consistently type along with the audio instead of having to remember Karger, D.R. Crowds in two seconds: Bourgeois, J. College students learning Enabling realtime crowd-powered with C-print’s education software and go back as the speech outpaced their typing. interfaces. In Proceedings of the 24th and automatic speech recognition. Annual ACM Symposium on User In American Educational Research Mean Median Interface Software and Technology, Association Annual Meeting (New 20% UIST ‘11 (New York, NY, USA, 2011). York, NY, 2008), AERA. ACM, 33–42. 7. Flowerdew, J.L. Salience in the 2. Bigham, J.P., Jayant, C., Ji, H., Little, G., performance of one speech act:the 15% Miller, A., Miller, R.C., Miller, R., case of definitions.Discource Processes Tatarowicz, A., White, B., White, S., 15, 2 (Apr–June 1992), 165–181. +19.1% Yeh, T. Vizwiz: Nearly real-time 8. Metze, F., Gaur, Y., Bigham, J. P. 10% answers to visual questions. In Manipulating word lattices to +14.4% +16.8% Proceedings of the 23nd Annual incorporate human corrections. In +12.6% ACM Symposium on User Interface Proceedings of INTERSPEECH, (2016). 5% +11.4% +11.2% Software and Technology, UIST ‘10, 9. Gaur, Y., Lasecki, W.S., Metze, F., (New York, NY, USA, 2010). ACM, Bigham, J.P. The effects of automatic 333–342. speech recognition quality on human Improvement (%) 0% 3. Cooke, M., Green, P., Josifovski, L., transcription latency. In Proceedings Vizinho, A. Robust automatic speech of the 13th Web for All Conference Coverage Precision Latency recognition with missing (2016) ACM. and unreliable acoustic data. 10. Glass, J.R., Hazen, T.J., Cyphers, D.S., Speech commun. 34, 3 (2001), Malioutov, I., Huynh, D., Barzilay, R. 267–285. Recent progress in the MIT spoken 4. Driedger, J. Time-scale modification lecture processing project. In .01). Mechanical Turk workers’ mean coverage (Figure 7) algorithms for music audio signals. Interspeech (2007), 2553–2556. Master’s thesis, Saarland University, 11. Gordon, M., Bigham, J.P., Lasecki, increased 11.39% (t(df) = 2.19, p < .05), precision increased 2011. W.S. Legiontools: A toolkit+ UI for 5. Edgar, R. Muscle: multiple sequence recruiting and routing crowds to 12.61% (t(df) = 3.90, p < .001), and latency was reduced by alignment with high accuracy and synchronous real-time tasks. In 16.77% (t(df) = 5.41, p < .001). high throughput. Nucleic acids Adjunct Proceedings of the 28th

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 99 research highlights

Annual ACM Symposium on User 2012, 2012. Proceedings of the 2013 conference International Foundation for Interface Software & Technology 19. Lasecki, W., Miller, C., Sadilek, A., on Computer supported cooperative Autonomous Agents and Multiagent (2015) ACM, 81–82. Abumoussa, A., Borrello, D., work (2013) ACM, 1203–1212. Systems (2015), 841–849. 12. Gordon-Salant, S. Aging, hearing Kushalnagar, R., Bigham, J. Real- 26. Marschark, M., Sapere, P., Convertino, C., 29. Turner, O.G. The comparative legibility loss, and speech recognition: stop time captioning by groups of non- Seewagen, R. Access to postsecondary and speed of manuscript and cursive shouting, i can’t understand you. In experts. In Proceedings of the 25th education through sign language handwriting. The Elementary School Perspectives on Auditory Research, Annual ACM Symposium on User interpreting. J Deaf Stud Deaf Educ. Journal (1930), 780–786. volume 50 of Springer Handbook of Interface Software and Technology, 10, 1 (Jan. 2005), 38–50. 30. Wald, M. Creating accessible Auditory Research. A.N. Popper and UIST ‘12, (2012), 23–34. 27. Naim, I., Gildea, D., Lasecki, W.S., educational multimedia through R.R. Fay, eds. Springer New York, 20. Lasecki, W.S., Gordon, M., Koutra, D., Bigham, J.P. Text alignment for editing automatic speech recognition 2014, 211–228. Jung, M.F., Dow, S.P., Bigham, J.P. real-time crowd captioning. In captioning in real time. Interactive 13. Jensema, C., McCann, R., Ramsey, S. Glance: rapidly coding behavioral Proceedings North American Chapter Technology and Smart Education 3, 2 Closed-captioned television presentation video with the crowd. In Proceedings of the Association for Computational (2006), 131–141. speed and vocabulary. In Am Ann of the 27th Annual ACM Symposium Linguistics (NAACL) (2013), 201–210. 31. Wang, L., Jiang, T. On the complexity Deaf 140, 4 (October 1996), 284–292. on User Interface Software and 28. Salisbury, E., Stein, S., Ramchurn, S. of multiple sequence alignment. 14. John, B.E. Newell, A. Cumulating Technology, UIST ‘14, (New York, NY, Real-time opinion aggregation J Comput Biol. 1, 4 (1994), 337–348. the science of HCI: from s-R 2014). ACM, 1. methods for crowd robotics. In 32. World Health Organization. Deafness compatibility to transcription typing. 21. Lasecki, W.S., Homan, C., Bigham, J.P. Proceedings of the 2015 International and hearing loss, fact sheet N300. ACM SIGCHI Bulletin 20, SI (Mar. Architecting real-time crowd-powered Conference on Autonomous http://www.who.int/mediacentre/ 1989), 109–114. systems. Human Computation 1, 1 Agents and Multiagent Systems. factsheets/fs300/en/, February 2014. 15. Kadri, H., Davy, M., Rabaoui, A., (2014). Lachiri, Z., Ellouze, N., et al. Robust 22. Lasecki, W.S., Kushalnagar, R., audio speaker segmentation using Bigham, J.P. Helping students keep Walter S. Lasecki (wlasecki@umich. Raja Kushalnagar (raja.kushalnagar@ one class SVMs. In IEEE European up with real-time captions by pausing edu), Computer Science & Engineering, gallaudet.edu), Information Technology Signal Processing Conference and highlighting. In Proceedings of the University of Michigan. Program, Gallaudet University. (Lausanne, Switzerland, 2008) ISSN: 11th Web for All Conference, W4A ‘14 2219-5491. (New York, NY, 2014). ACM, 39:1–39:8. Christopher D. Miller, Iftekhar Naim, Jeffrey P. Bigham ([email protected]), 16. Kushalnagar, R.S., Lasecki, W.S., 23. Lasecki, W.S., Miller, C.D., Bigham, J.P. Adam Sadilek, and Daniel Gildea (c.miller HCI and LT Institutes, Carnegie Mellon Bigham, J.P. Captions versus Warping time for more effective @rochester.edu) ({inaim,sadilek,gildea}@ University. transcripts for online video content. In real-time crowdsourcing. In cs.rochester.edu), Computer Science Proceedings of the 10th International Proceedings of the SIGCHI Department, University of Rochester. Cross-Disciplinary Conference on Web Conference on Human Factors in Accessibility, W4A ’13, (New York, NY, Computing Systems, CHI ‘13 (New 2013), ACM, 32:1–32:4. York, NY, 2013). ACM, 2033–2036. 17. Kushalnagar, R.S., Lasecki, W.S., 24. Lasecki, W.S., Murray, K., White, S., Bigham, J.P. Accessibility evaluation Miller, R.C., Bigham, J.P. Real-time of classroom captions. ACM Trans crowd control of existing interfaces. Access Comput. 5, 3 (Jan. 2014), In Proceedings of the 24th Annual 1–24. ACM Symposium on User Interface 18. Lasecki, W. Bigham, J. Online Software and Technology, UIST ‘11, quality control for real-time crowd (New York, NY, 2011). ACM, 23–32. captioning. In International 25. Lasecki, W.S., Song, Y.C., Kautz, H., ACM SIGACCESS Conference on Bigham, J.P. Real-time crowd labeling Computers & Accessibility, ASSETS for deployable activity recognition. In © 2017 ACM 0001-0782/17/09 $15.00

Without a clear understanding of the human side of virtual reality, the experience will always fail. “Dr. Jerald has recognized a great need in our community and filled it. The VR Book is a scholarly and comprehensive treatment of the user interface dynamics surrounding the development and application of virtual reality. I have made it a required reading for my students and research colleagues. Well done!” - Professor Tom Furness, University of Washington VR Pioneer and Founder of HIT Lab International and the Society

100 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 CAREERS

Brigham Young University Church of Jesus Christ of Latter-day Saints. Suc- Applying Faculty Position cessful candidates are expected to support and To apply via Academic Jobs Online submit (1) cur- contribute to the academic and religious mis- riculum vitae, (2) graduate transcripts, (3) three The Department of Electrical and Computer sions of the university within the context of the letters of recommendation (at least one of which Engineering at Brigham Young University an- principles and doctrine of the affiliated Church. discusses your potential as a teacher), (4) a cover nounces an opening for a professorial continu- Equal Opportunity Employer: m/f/Vets/ letter that addresses why you are interested in ing-faculty-status (tenure) track position. While Disability Macalester, (5) a statement of teaching philoso- our preference is in the area of Computer Engi- phy, and (6) a research statement. Please contact neering, applicants in all areas of Electrical and Shilad Sen at [email protected] with any Computer Engineering will be considered. Macalester College questions about the position. Evaluation of appli- Areas of interest include but are not limited Two Tenure-Track Assistant Professors of cations will begin October 15, 2017 and continue to: Computer Systems (including architecture, Computer Science until the position is filled. IoT and embedded/real-time systems, network- Apply now: https://www.macalester.edu/ ing, security, software, compilers, O/S, parallel Macalester invites applications for two tenure- academics/mscs/compscitenure-trackjob.html systems, etc.), Robotics and Autonomous Sys- track positions at the assistant professor level to tems, Computer Vision, Machine Learning, Data begin Fall 2018. Candidates must have or be com- Science, Distributed Systems, and Digital Sys- pleting a PhD in Computer Science and have a National University of Singapore tems Design (FPGA and/or VLSI). strong commitment to both teaching and research Senior and Junior Tenure-Track Faculty The department has state-of-the-art facilities in an undergraduate liberal arts environment. We Positions in Artificial Intelligence in computing and supercomputing, autonomous are especially interested in candidates who are en- vehicles and computer vision, control systems, thusiastic to teach a broad range of undergraduate The Department of Computer Science at the Na- optics, and microelectronic fabrication. Excel- courses. This person will contribute to the teach- tional University of Singapore (NUS) invites appli- lent research programs exist in the department in ing of our introductory, core and advanced cours- cations for one Distinguished Professorship and the areas of FPGA-based computing, high-perfor- es, and mentor undergraduate research. several tenure-track faculty positions in artificial mance embedded systems, autonomous vehicles Macalester offers majors in Computer Sci- intelligence, machine learning, computational and control, robotics and computer vision, high- ence, Mathematics, and Applied Mathematics neuroscience and related areas of robotics. The speed low-power electronics, digital communi- and Statistics, and minors in Computer Science, Department enjoys ample research funding, cations systems, signal processing, biomedical Mathematics, and Statistics, as well as a new mi- moderate teaching loads, excellent facilities, and imaging, optics, and microfluidics. Successful nor in Data Science. Typical class sizes range from extensive international collaborations. We have candidates will be expected to strengthen under- 15 to 32 students. We encourage innovative peda- a full range of faculty covering all major research graduate and graduate education and to develop gogy and curriculum and emphasize computer areas in computer science and a thriving PhD pro- an outstanding research program to complement science’s interdisciplinary connections. We have gram that attracts the brightest students from the existing research or develop new research areas. close relationships with several disciplines both region and beyond. More information is available The ACT score for the average BYU entering within and beyond the sciences, and we are inter- at www.comp.nus.edu.sg/careers. freshman is above the 90th percentile nationally. ested in candidates whose work spans disciplin- NUS offers highly competitive salaries and is BYU is also fifth on the NSF’s list of U.S. baccalau- ary boundaries. Areas of highest priority include situated in Singapore, an English-speaking cosmo- reate-origin institutions for engineering doctorate computer and data security and privacy, mobile politan city that is a melting pot of many cultures, recipients. We expect our faculty to challenge these and ubiquitous computing, computer networks both the east and the west. Singapore offers a safe outstanding students to reach their potential. and systems. For more information about our and family-friend environment with high qual- Successful candidates will be hired at the programs, see: http://macalester.edu/mscs ity education and healthcare at all levels, as well assistant, associate, or full professor level de- as very low tax rates. Singapore has also recently pending on experience. Requirements include About Macalester launched a S$150 million national initiative, AI.SG, a doctorate in computer engineering, computer Macalester College is a highly selective, private to expand research, development, and adoption of science, electrical engineering, or closely related liberal arts college in the vibrant Minneapolis- AI technologies. AI.SG will be hosted at NUS. field and a willingness to fully support and par- Saint Paul metropolitan area. The Twin Cities Candidates for the Distinguished Profes- ticipate in the ideals and mission of BYU. have a population of approximately three million, sor position should have an established record An on-line application for this position can a rich arts community, strong local industries, of outstanding research achievements, thought be found at: https://yjobs.byu.edu, job posting an award-winning parks system, and are home leadership, and international stature in artificial #64783. to many colleges and universities, including the intelligence. Questions regarding the position can be di- University of Minnesota. Macalester’s diverse Candidates for Assistant Professor positions rected to: student body comprises over 2000 undergradu- should demonstrate excellent research poten- Dr. Aaron Hawkins, Faculty Committee Chair ates from 40 states and the District of Columbia tial in AI, and a strong commitment to teaching. Dept of ECE, Brigham Young University and over 90 nations. The College maintains a Truly outstanding Assistant Professor applicants 459 CB longstanding commitment to academic excel- will be considered for the endowed Sung Kah Kay Provo UT 84602 lence with a special emphasis on international- Assistant Professorship. [email protected] ism, multiculturalism, and service to society. We are especially interested in applicants dedicated Application Details: *The position will remain open until filled. to excellence in teaching and research/creative Submit the following documents (in a single PDF) ** Brigham Young University is an equal op- activity within a liberal arts college community. online via: https://faces.comp.nus.edu.sg portunity employer. All faculty are required to As an Equal Opportunity employer supportive of 1. A cover letter that indicates the position abide by the university’s Honor Code and Dress affirmative efforts to achieve diversity among its applied for and the main research interests & Grooming Standards. Strong preference will be faculty, Macalester College strongly encourages 2. Curriculum Vitae given to qualified candidates who are members applications from women and members of un- 3. A teaching statement in good standing of the affiliated Church, The derrepresented minority groups. 4. A research statement

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 101 CAREERS

˲˲Provide the contact information of 3 referees team player who can help bring together current cants to apply, including women, veterans, indi- when submitting your online application, or, ar- campus efforts in cyber security or privacy. In par- viduals with disabilities, and members of tradi- range for at least 3 references to be sent directly ticular, we are looking for someone who will work tionally underrepresented populations. to [email protected]. at the intersection of several areas, such as: (a) For questions, please contact the Cluster’s ˲˲Application reviews will commence immedi- hardware and IoT security, (b) explaining and pre- Search Committee Chair, Gary T. Leavens, at ately and continue until positions are filled. dicting human behavior, creating policies, study- [email protected]. ˲˲Please submit your application by 1 December ing ethics, and ensuring privacy, (c) cryptography 2017. and theory of security or privacy, or (d) tools, meth- If you have further enquiries, please contact ods, training, and evaluation of human behavior. University of Central Florida the Search Committee Chair, Weng-Fai Wong, at Minimum qualifications include a Ph.D., ter- Cluster Lead, Cyber Security and Privacy [email protected] minal degree, or foreign degree equivalent from Cluster an accredited institution in an area appropriate to the cluster, and a record of high impact re- The University of Central Florida (UCF) is recruit- University of Central Florida search related to cyber security and privacy, dem- ing a lead for its cluster on cyber security and Assistant or Associate Professor in Faculty onstrated by a strong scholarly and/or funding re- privacy. This position has a start date of August 8, Cluster for Cyber Security and Privacy cord. A history of working with teams, especially 2018. The position will carry a rank of associate teams that span multiple disciplines, is a strongly or full professor, commensurate with the candi- The University of Central Florida (UCF) is recruit- preferred qualification. The position will carry a date’s prior experience and record. The lead is ex- ing a tenure-track assistant or associate professor rank commensurate with the candidate’s prior pected to have credentials and qualifications like for its cyber security and privacy cluster. This po- experience and record. those expected of a tenured associate or full pro- sition has a start date of August 8, 2018. Candidates must apply online at https://www. fessor. To obtain tenure, the selected candidate This will be an interdisciplinary position that jobswithucf.com/postings/50404 and attach the must have a demonstrated record of teaching, will be expected to strengthen both the cluster and following materials: a cover letter, curriculum vi- research and service commensurate with rank. a chosen tenure home department, as well as a pos- tae, teaching statement, research statement, and This will be an interdisciplinary position that sible combination of joint appointments. The can- contact information for three professional refer- will be expected to strengthen both the cluster didate can choose a combination of units from the ences. In the cover letter candidates must address and a chosen tenure home department, as well cluster for their appointment (see http://www.ucf. their background in cyber security and privacy, as a possible combination of joint appointments. edu/faculty/cluster/cyber-security-and-privacy/). and identify the department or departments for The candidate can choose a combination of units The ideal junior candidates will have a strong their potential tenure home and the joint ap- from the cluster for their appointment. (See http:// background in cyber security and privacy, and be pointments they would desire. When applying, www.ucf.edu/faculty/cluster/cyber-security-and- on an upward leadership trajectory in these areas. have all documents ready so they can be attached privacy/.) Both individual and interdisciplinary in- They will have research impact, as reflected in at that time, as the system does not allow resub- frastructure and startup support will be provided. high-quality publications and the ability to build a mittal to update applications. The ideal candidate will have a strong back- well-funded research program. All relevant techni- As an equal opportunity/affirmative action ground in cyber security and privacy and outstand- cal areas will be considered. We are looking for a employer, UCF encourages all qualified appli- ing research credentials and research impact, as reflected in a sustained record of high quality pub- lications and external funding. All relevant techni- cal areas will be considered including: network security, cryptography, blockchains, hardware security, trusted computing bases, cloud comput- ing, human factors, anomaly detection, forensics, privacy, and software security, as well as appli- cations of security and privacy to areas such as TENURE-TRACK AND TENURED POSITIONS IoT, cyber-physical systems, finance, and insider ShanghaiTech University invites highly qualified threats. A history of working with teams, especially candidates to fill multiple tenure-track/tenured faculty positions as its core founding team in the School of Information Science and teams that span multiple disciplines, is a strongly Technology (SIST). We seek candidates with exceptional academic records or demonstrated preferred qualification. A record of demonstrated strong potentials in all cutting-edge research areas of information science and technology. leadership is highly desired, as we are looking for They must be fluent in English. English-based overseas academic training or background is highly desired. a leader to bring together all the current campus ShanghaiTech is founded as a world-class research university for training future generations efforts in cyber security and privacy. This includes of scientists, entrepreneurs, and technical leaders. Boasting a new modern campus in three cluster members already hired, as well as a Zhangjiang Hightech Park of cosmopolitan Shanghai, ShanghaiTech shall trail-blaze a new pending hire for the 2017-18 academic year. education system in China. Besides establishing and maintaining a world-class research profile, faculty candidates are also expected to contribute substantially to both graduate Minimum qualifications include a Ph.D. from and undergraduate . an accredited institution in an appropriate area, Academic Disciplines: Candidates in all areas of information science and technology shall and a record of high impact research related to cy- be considered. Our recruitment focus includes, but is not limited to: computer architecture, ber security and privacy demonstrated by a strong software engineering, database, computer security, VLSI, solid state and nano electronics, RF electronics, information and signal processing, networking, security, computational foundations, scholarly publication record and a significant big data analytics, data mining, visualization, computer vision, bio-inspired computing systems, amount of sustained funding. power electronics, power systems, machine and motor drive, power management IC as well as Candidates must apply online at http://www. inter-disciplinary areas involving information science and technology. jobswithucf.com/postings/50044 and upload the Compensation and Benefits: Salary and startup funds are highly competitive, commensurate with experience and academic accomplishment. We also offer a following materials: cover letter, CV, teaching and comprehensive benefit package to employees and eligible dependents, including on- research statements, and contact information for campus housing. All regular ShanghaiTech faculty members will join its new tenure-track 3 professional references. In the cover letter, can- system in accordance with international practice for progress evaluation and promotion. didates should address their background, and Qualifications: • Strong research productivity and demonstrated potentials; identify the department for their potential tenure • Ph.D. (Electrical Engineering, Computer Engineering, Computer Science, Statistics, home and any desired joint appointments. Applied Math, or related field); An equal opportunity/affirmative action em- • A minimum relevant (including PhD) research experience of 4 years. ployer, UCF encourages all qualified applicants Applications: Submit (in English, PDF version) a cover letter, a 2-page research plan, a CV plus copies of 3 most significant publications, and names to apply, including women, veterans, individuals of three referees to: [email protected]. For more information, visit with disabilities, and members of traditionally http://sist.shanghaitech.edu.cn/NewsDetail.asp?id=373 underrepresented populations. Deadline: The positions will be open until they are filled by appropriate candidates. Questions can be directed to the search com- mittee chair, Gary T. Leavens, at [email protected].

102 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9 last byte

[CONTINUED FROM P. 104] ing. Another example is the Shannon trick of synthesizing text. Imagine “We don’t see things if you start typing an SMS on your as they are; we see phone but you keep using the predic- tive function. The algorithm is very them tinted by basic—it’s just “look for the last time language and culture something like this occurred and steal the next most probable letter.” and all the baggage.” But you get really interesting results, because you have a lot of data.

Thanks to the Internet, you’ve got ac- cess to a massive corpus of data. Didn’t one of your early papers examine two make new discoveries in ways people Barriers to Refactoring million images from Flickr? have not been able to do before. I Exactly. Initially, we said, “We’ll would love to discover something that just download 20,000 images.” The re- people haven’t noticed yet. Internet Advertising sults weren’t great. But my then-grad student, James Hays, was like, “Why What about your recent discovery, in an Millennials’ don’t we just keep downloading?” If analysis of 150,000 American yearbook Attitude Toward you look at the big neural networks photos, that people’s smiles broad- IT Consumerization right now, it is really impressive what ened during each decade since 1900? they can do. But I think people are for- For the portraits, we were very hap- in the Workplace getting that one of the reasons they’re py to see the increase in smiling over so powerful is that they are able to time. We thought, wow, this is a re- What Can Agile Methods COMMUNICATIONS gobble up orders of magnitude more ally cool discovery. Of course, then we Bring to High-Integrity data than we could do with earlier found some psychological literature Software Development? methods. This is not very glamorous, that indicates people have already no- because it suggests that humans are ticed this. not so smart. It’s really the data. Programming Your work has found applications in Languages and That reminds me of the old philo- areas from entertainment to security. Code Quality sophical debate about experiential vs. What other pie-in-the-sky applications in Github a priori knowledge. or discoveries do you hope to see? People like to rationalize. They like Frankly, my goal has always been to get a nice beautiful theory of the to understand and model biologi- Multi-Objective world. But reality is often really noisy cal vision. Human vision is too hard, Parametric Query and complicated, and in a way, data al- because it connects with everything Optimization lows you to use this complexity, to not else. We don’t see things as they are; have to throw it away. It’s not the mini- we see them tinted by language and malist beauty, the clean lines. It’s the culture and all the baggage. But if I’m Metaphors beauty of a jumbled mess. able to build a model of a rabbit’s vi- We Compute By sion or a rat’s vision by the time I re- Your analyses of photographic data tire, I think that would be absolutely Research for Practice

sets like faces and building facades fantastic. Imagine having a model of in Next Month Coming have also revealed lots of visual trends this remarkable apparatus that al-  Why the Bell Curve that might not otherwise have been most all living creatures possess. easy to notice. Now, because this is such a hard Hasn’t Transformed That is a big beautiful promise problem, you don’t get wins very often. Into a Hockey Stick and we’re only scratching the sur- A lot of the time, it’s a depressing slog. face. People are good at finding cer- But once in a while, as a kind of by- tain kinds of patterns. We can hold a product, some really neat things come small number of things in our minds up that you can use to create pretty and compare them. We are not able pictures. And I think the world needs to find a tiny, tiny little pattern over more pretty pictures. Plus the latest news on printing thousands or millions of data points, 3D body parts, computerized or very subtle changes over a long Leah Hoffmann is a technology writer based in Piermont, NY. sound processing, and whether range of time. Using computer vision smartphones harm children. and techniques, I’m hoping we can © 2017 ACM 0001-0782/17/09 $15.00

SEPTEMBER 2017 | VOL. 60 | NO. 9 | COMMUNICATIONS OF THE ACM 103 last byte

DOI:10.1145/3121444 Leah Hoffmann Q&A All The Pretty Pictures Alexei Efros, recipient of the 2016 ACM Prize in Computing, works to harness the power of visual complexity.

DESPITE the fact that he does not see smart kids go into CS, and many look very well, Alexei Efros, recipient of the down at all of these humanities peo- 2016 ACM Prize in Computing and a ple with disdain. In my classes, I try to professor at the University of California remind them that computer scientists at Berkeley, has spent most of his ca- are hot now, but physicists were hot reer trying to understand, model, and in the Sixties, and chemists were hot recreate the visual world. Drawing on in the Thirties, and they’re not super- the massive collection of images on the hot now. Shakespeare is going to be Internet, he has used machine learn- around much longer than Python. ing algorithms to manipulate objects in photographs, translate black-and- How did you get involved with com- white images into color, and identify puter vision, graphics, and machine architecturally revealing details about learning? cities. Here, he talks about harnessing Even in high school, my goal was to the power of visual complexity. solve AI. But then I reasoned it out: AI is too hard, and you don’t know when You were born in St. Petersburg (Russia), you’re succeeding. With language, you and were 14 when you came to the U.S. kind of know when you’re succeeding, What drew you to computer science? but that’s also very high-level. Mean- I was interested in computers from while, almost all animals have vision. an early age. I remember reading a Interestingly enough, I was actu- Vision seems like the most basic thing, book about PDP-11 assembly lan- ally considering whether I should go so it’s got to be easy, right? guage programming when I was 12 into computer science (CS) or theater. and dreaming about how one day, I In fact, I applied to Carnegie Mellon Of course. might actually have a computer of my University because it’s one of the top Basically, I think I’ve just had one own to try this out in practice. Then, departments in CS, but also one of idea throughout my whole career, and in high school, I did some research the top universities for theater. Then I’ve been milking it since undergrad, with a professor at the University of I showed my father the tuition, and, and the idea is not even that profound. Utah. It sounds kind of brazen, but I well, we were immigrants. So I went It’s that we fetishize intellectual con- went to the CS department and was to the , where CS tributions—algorithms, data struc- like, “Bring me to your chairman.” was much stronger than theater, and I tures, and so on. And we often forget Tom Henderson was the chair at that think I got a very good education. But that a lot of the complexity in the time and, you know, he actually saw I’m still practicing my stagecraft twice world is actually due to the data. My me. I told him that I wanted to do a week in my classes. favorite example is in computer graph- computer science and asked him for ics. We know how light behaves, and a problem. And he basically said, “Ok, I’ve seen your talks. You’re a very en- we can simulate everything we want. weird Russian kid. I have a robot run- gaging speaker. But the reason current animated mov- ning around; do you want to help with There is this whole dichotomy be- ies don’t look like the real thing is the that?” It was wonderful. tween the geeks and the artsy people— data. There is a lot of entropy in the either you are good with numbers, or world and it’s just too hard to capture. You did your undergraduate work at with arts and humanities. I think it’s The algorithms are fine. It’s the data

the University of Utah, as well. misplaced. CS is hot right now. A lot of that is miss- [CONTINUED ON P. 103] OF UC BERKELEY BERGER, COURTESY NOAH BY PHOTO

104 COMMUNICATIONS OF THE ACM | SEPTEMBER 2017 | VOL. 60 | NO. 9

CONFERENCE 27 – 30 November 2017 EXHIBITION 28 – 30 November 2017 BITEC, Bangkok, Thailand

THE CELEBRATION OF LIFE & TECHNOLOGY

The 10th ACM SIGGRAPH Conference and Exhibition on Computer Graphics and Interactive Techniques in Asia

Register online by 15 October 2017, & enjoy early bird discounts of up to 20% SA2017.SIGGRAPH.ORG/REGISTRATION

Sponsored by Organized by