COMMUNICATIONS CACM.ACM.ORG OF THEACM 12/2015 VOL.58 NO.12 What Makes Paris Look Like Paris?

Personalizing Maps How to De-Identify Data Internet Use and Psychological Well-Being

Association for Computing Machinery An ACM-W Celebration!

An ACM-W Celebration!

Delta City Centre Ottawa, ON DeltaCanada City Centre Ottawa, ON Canada

JANJAN 22 22 - 23- 23 20162016

CanadianCanadian Celebration Celebration ofof WomenWomen in in Computing Computing 2016 2016 Come celebrate with us at the largest gathering of Women in Computing in Canada! Come celebrateThe conference with us will at feature the largest prominent gathering keynote of speakers, Women panels,in Computing workshops, in Canada! The conferencepresentations will feature and prominent posters, as keynote well as aspeakers, large career panels, fair. workshops, presentations and posters, as well as a large career fair. Register today! www.can-cwic.ca Register today! www.can-cwic.ca INSPIRING MINDS FOR 200 YEARS Ada’s Legacy illustrates the depth and diversity of writers, things, and makers who have been inspired by Ada Lovelace, the English mathematician and writer. The volume commemorates the bicentennial of Ada’s birth in December 1815, celebrating her many achievements as well as the impact of her work which reverberated widely since the late 19th century. This is a unique contribution to a resurgence in Lovelace scholarship, thanks to the expanding influence of women in science, technology, engineering and mathematics.

ACM Books is a new series of high quality books for the computer science community, published by the Association for Computing Machinery with Morgan & Claypool Publishers. COMMUNICATIONS OF THE ACM

Departments News Viewpoints

5 Editor’s Letter 24 The Profession of IT On Lethal Autonomous Weapons Why Our Theories of Innovation Fail Us By Moshe Y. Vardi Until we moderate our fascination with creating ideas, we will not achieve 7 Cerf’s Up the rate of innovations we seek. Advancing the ACM Agenda By Peter J. Denning and Nicholas Dew By Vinton G. Cerf 27 Computing Ethics 8 Letters to the Editor Coupled Ethical-Epistemic What About Statistical Analysis in Teaching Ethics Relational Learning? Critical reflection on value choices. By Nancy Tuana 10 BLOG@CACM What Do We Do When the Jobs 30 Kode Vicious Are Gone, and Why We Must 15 Pickled Patches Embrace Active Learning On repositories of patches and Moshe Y. Vardi ponders the outlook 12 When Data Is Not Enough tension between security professionals for people when all work is Reproducibility of code and in-house developers. automated, while Mark Guzdial is increasingly crucial to By George V. Neville-Neil emphasizes the importance verifying scientific claims. of active learning in teaching By Don Monroe 33 Broadening Participation computer science. Increasing the Participation 15 The Hyper-Intelligent Bandage of Individuals with Disabilities 31 Calendar Scientists are developing in Computing smart, sensor-packed dressings Lessons learned from 122 Careers to help heal chronic wounds. a decade of practice. By Gregory Mone By Richard E. Ladner and Sheryl Burgstahler Last Byte 17 Technology Brings Online Education in Line with Campus Programs 37 Viewpoint 136 Q&A Whether sitting in front of a screen Creating a New Generation Redefining Architectures or in a classroom, online and of Computational Thinkers Mary Jane Irwin on building campus-based institutions want Experiences with a successful advanced circuits, special processors, to verify students actually attend school program in Scotland. and a hardware description classes, take exams. By Jeremy Scott and Alan Bundy language, while advocating for By Keith Kirkpatrick women in computer science. 41 Viewpoint By Leah Hoffmann I Can’t Let You Do That, Dave Viewpoints Computers should not treat their owners as adversaries. About the Cover: 20 Historical Reflections By Cory Doctorow Is it possible for a The Digital Dark Age computer to distinguish a major city by its visual …and why it will have to wait. 43 Point/Counterpoint essence? Through the By David Anderson The Case for Banning Killer Robots use of Google Street View imagery, this Ban the bots? Considering both sides month’s cover story of the argument for and against. illustrates how the identifying look of a city By Stephen Goose/Ronald Arkin does not rely on famous landmarks but rather stylistic elements of daily life. Cover collage by Iwona Usakiewicz/

Andrij Borys Associates/Shutterstock. SCIENCE FOUNDATION OF THE NATIONAL COURTESY IMAGE

2 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 12/2015 VOL. 58 NO. 12

Practice Contributed Articles Review Articles

94 Internet Use and Psychological Well-Being: Effects of Activity and Audience The connection between online communication and psychological well-being depends on whom you are communicating with. By Robert Kraut and Moira Burke

Research Highlights

102 Technical Perspective Paris Beyond Frommer’s 48 86 By Noah Snavely

48 How to De-Identify Your Data 68 Personalizing Maps 103 What Makes Paris Look Like Paris? Balancing statistical accuracy Digital maps can be engineered to By Carl Doersch, Saurabh Singh, and subject privacy in large adapt to a person’s unique interests Abhinav Gupta, Josef Sivic, social-science datasets. and experience in geographic space. and Alexei A. Efros By Olivia Angiuli, Joe Blitzstein, By Andrea Ballatore and Jim Waldo and Michela Bertolotto Watch the authors discuss their work in this exclusive 56 Lean Software Development— Communications video. Building and Shipping Two Versions http://cacm.acm.org/ Watch the authors discuss videos/what-makes-paris- Catering to developers’ strengths their work in this exclusive look-like-paris while still meeting team objectives. Communications video. http://cacm.acm.org/ By Kate Matsudaira videos/personalizing-maps 111 Technical Perspective In-Situ Database Management 59 Challenges of Memory Management 75 Propositions as Types By David Maier on Modern NUMA Systems Connecting mathematical logic Optimizing NUMA systems and computation, it ensures 112 NoDB: Efficient Query Execution applications with Carrefour. that some aspects of programming on Raw Data Files By Fabien Gaud, Baptiste Lepers, are absolute. By Ioannis Alagiannis, Justin Funston, Mohammad Dashti, By Philip Wadler Renata Borovica-Gajic, Alexandra Fedorova, Vivien Quéma, Miguel Branco, Stratos Idreos, Renaud Lachaize, and Mark Roth 86 Smart Data Pricing: Using Economics and Anastasia Ailamaki to Manage Network Congestion Articles’ development led by Economic incentives that alleviate queue.acm.org congestion for Internet customers can also improve business performance for network operators. By Soumya Sen, Carlee Joe-Wong, Sangtae Ha, and Mung Chiang

Association for Computing Machinery Advancing Computing as a Science & Profession IMAGES BY FREEBIRD PHOTOS; JOHN LUND FREEBIRD PHOTOS; BY IMAGES

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 3 COMMUNICATIONS OF THE ACM Trusted insights for computing’s leading professionals.

Communications of the ACM is the leading monthly print and online magazine for the computing and information technology fields. Communications is recognized as the most trusted and knowledgeable source of industry information for today’s computing professional. Communications brings its readership in-depth coverage of emerging areas of computer science, new trends in information technology, and practical applications. Industry leaders use Communications as a platform to present and debate various technology implications, public policies, engineering challenges, and market trends. The prestige and unmatched reputation that Communications of the ACM enjoys today is built upon a 50-year commitment to high-quality editorial content and a steadfast dedication to advancing the arts, sciences, and applications of information technology.

ACM, the world’s largest educational STAFF EDITORIAL BOARD ACM Copyright Notice and scientific computing society, delivers Copyright © 2015 by Association for resources that advance computing as a DIRECTOR OF GROUP PUBLISHING EDITOR-IN-CHIEF Computing Machinery, Inc. (ACM). science and profession. ACM provides the Scott E. Delman Moshe Y. Vardi Permission to make digital or hard copies computing field’s premier Digital Library [email protected] [email protected] of part or all of this work for personal and serves its members and the computing or classroom use is granted without NEWS profession with leading-edge publications, Executive Editor fee provided that copies are not made Co-Chairs conferences, and career resources. Diane Crawford or distributed for profit or commercial William Pulleyblank and Marc Snir Managing Editor advantage and that copies bear this Board Members Executive Director and CEO Thomas E. Lambert notice and full citation on the first Mei Kobayashi; Kurt Mehlhorn; Bobby Schnabel Senior Editor page. Copyright for components of this Michael Mitzenmacher; Rajeev Rastogi Deputy Executive Director and COO Andrew Rosenbloom work owned by others than ACM must Patricia Ryan Senior Editor/News VIEWPOINTS be honored. Abstracting with credit is Director, Office of Information Systems Larry Fisher Co-Chairs permitted. To copy otherwise, to republish, Wayne Graves Web Editor Tim Finin; Susanne E. Hambrusch; to post on servers, or to redistribute to Director, Office of Financial Services David Roman John Leslie King lists, requires prior specific permission Darren Ramdin Rights and Permissions Board Members and/or fee. Request permission to publish Director, Office of SIG Services Deborah Cotton William Aspray; Stefan Bechtold; from [email protected] or fax Donna Cappo Michael L. Best; Judith Bishop; (212) 869-0481. Director, Office of Publications Art Director Stuart I. Feldman; Peter Freeman; Bernard Rous Andrij Borys Mark Guzdial; Rachelle Hollander; For other copying of articles that carry a Director, Office of Group Publishing Associate Art Director Richard Ladner; Carl Landwehr; code at the bottom of the first or last page Scott E. Delman Margaret Gray Carlos Jose Pereira de Lucena; or screen display, copying is permitted Assistant Art Director Beng Chin Ooi; Loren Terveen; provided that the per-copy fee indicated ACM COUNCIL Mia Angelica Balaquiot Marshall Van Alstyne; Jeannette Wing in the code is paid through the Copyright President Designer Clearance Center; www.copyright.com. Alexander L. Wolf Iwona Usakiewicz Vice-President Production Manager PRACTICE Subscriptions Vicki L. Hanson Lynn D’Addesio Co-Chairs An annual subscription cost is included Secretary/Treasurer Director of Media Sales Stephen Bourne in ACM member dues of $99 ($40 of Erik Altman Jennifer Ruzicka Board Members which is allocated to a subscription to Past President Publications Assistant Eric Allman; Terry Coatta; Stuart Feldman; Communications); for students, cost Vinton G. Cerf Juliet Chance Benjamin Fried; Pat Hanrahan; is included in $42 dues ($20 of which Chair, SGB Board Tom Limoncelli; Kate Matsudaira; is allocated to a Communications Patrick Madden Columnists Marshall Kirk McKusick; George Neville-Neil; subscription). A nonmember annual Co-Chairs, Publications Board David Anderson; Phillip G. Armour; Theo Schlossnagle; Jim Waldo subscription is $100. Jack Davidson and Joseph Konstan Michael Cusumano; Peter J. Denning; The Practice section of the CACM Members-at-Large Mark Guzdial; Thomas Haigh; ACM Media Advertising Policy Editorial Board also serves as Erik Allman; Ricardo Baeza-Yates; Leah Hoffmann; Mari Sako; Communications of the ACM and other the Editorial Board of . Cherri Pancake; Radia Perlman; Pamela Samuelson; Marshall Van Alstyne ACM Media publications accept advertising Mary Lou Soffa; Eugene Spafford; in both print and electronic formats. All advertising in ACM Media publications is Per Stenström CONTRIBUTED ARTICLES CONTACT POINTS at the discretion of ACM and is intended SGB Council Representatives Co-Chairs Copyright permission to provide financial support for the various Paul Beame; Barbara Boucher Owens Andrew Chien and James Larus [email protected] activities and services for ACM members. Board Members Calendar items Current Advertising Rates can be found William Aiello; Robert Austin; Elisa Bertino; BOARD CHAIRS [email protected] by visiting http://www.acm-media.org or Gilles Brassard; Kim Bruce; Alan Bundy; Education Board Change of address by contacting ACM Media Sales at Peter Buneman; Peter Druschel; Mehran Sahami and Jane Chu Prey [email protected] (212) 626-0686. Practitioners Board Letters to the Editor Carlo Ghezzi; Carl Gutwin; Gal A. Kaminka; James Larus; Igor Markov; Gail C. Murphy; George Neville-Neil [email protected] Single Copies Bernhard Nebel; Lionel M. Ni; Kenton O’Hara; Single copies of Communications of the Sriram Rajamani; Marie-Christine Rousset; WEBSITE ACM are available for purchase. Please REGIONAL COUNCIL CHAIRS Avi Rubin; Krishan Sabnani; http://cacm.acm.org contact [email protected]. ACM Europe Council Ron Shamir; Yoav Shoham; Larry Snyder; Fabrizio Gagliardi Michael Vitale; Wolfgang Wahlster; COMMUNICATIONS OF THE ACM ACM India Council AUTHOR GUIDELINES Hannes Werthner; Reinhard Wilhelm (ISSN 0001-0782) is published monthly Srinivas Padmanabhuni http://cacm.acm.org/ by ACM Media, 2 Penn Plaza, Suite 701, ACM China Council RESEARCH HIGHLIGHTS New York, NY 10121-0701. Periodicals Jiaguang Sun Co-Chairs postage paid at New York, NY 10001, ACM ADVERTISING DEPARTMENT Azer Bestovros and Gregory Morrisett and other mailing offices. PUBLICATIONS BOARD 2 Penn Plaza, Suite 701, New York, NY Board Members Co-Chairs 10121-0701 Martin Abadi; Amr El Abbadi; Sanjeev Arora; POSTMASTER Jack Davidson; Joseph Konstan T (212) 626-0686 Nina Balcan; Dan Boneh; Andrei Broder; Please send address changes to Board Members F (212) 869-0481 Doug Burger; Stuart K. Card; Jeff Chase; Communications of the ACM Ronald F. Boisvert; Anne Condon; Director of Media Sales Jon Crowcroft; Sandhya Dwaekadas; 2 Penn Plaza, Suite 701 Nikil Dutt; Roch Guerrin; Carol Hutchins; Matt Dwyer; Alon Halevy; Norm Jouppi; New York, NY 10121-0701 USA Yannis Ioannidis; Catherine McGeoch; Jennifer Ruzicka [email protected] Andrew B. Kahng; Sven Koenig; Xavier Leroy; M. Tamer Ozsu; Mary Lou Soffa; Alex Wade; Steve Marschner; Kobbi Nissim; Keith Webster Media Kit [email protected] Steve Seitz; Guy Steele, Jr.; David Wagner; Printed in the U.S.A. Margaret H. Wright ACM U.S. Public Policy Office Renee Dopplick, Director Association for Computing Machinery 1828 L Street, N.W., Suite 800 (ACM) WEB Washington, DC 20036 USA 2 Penn Plaza, Suite 701 Chair T (202) 659-9711; F (202) 667-1066 New York, NY 10121-0701 USA James Landay

E R E C T (212) 869-7440; F (212) 869-0481 Board Members S Y A C E L L E Computer Science Teachers Association Marti Hearst; Jason I. Hong; P

T E H Lissa Clayborn, Acting Executive Director Jeff Johnson; Wendy E. MacKay N I I S Z M A G A

4 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 editor’s letter

DOI:10.1145/2839512 Moshe Y. Vardi On Lethal Autonomous Weapons

HE IMPRESSIVE PROGRESS 1991 Gulf War, information and com- humans perform every life-and-death in artificial intelligence puting technology has been a major decision in today’s battlefield. But (AI) over the past decade driver in what has become known as today’s battles are conducted by sys- and the prospect of an im- the “Revolution in Military Affairs.” tems of enormous complexity. A lethal pending global race in AI- The “third revolution in warfare,” action is the result of many actions Tbased weaponry have led to the pub- referred to in the Open Letter, has and decisions, some by humans and lication, on July 28, of “Autonomous already begun! Today, every informa- some by machines. Defining causal- Weapons: An Open Letter from AI tion and computing technology has ity when discussing composite actions & Robotics Researchers,” with over some military application. Let us not by highly complex systems is nearly 20,000 signatories by now, calling forget, for example, the Internet came impossible. The “fundamental moral for “a ban on offensive autonomous out of ARPAnet, which was funded by and ethical line” discussed by Goose is weapons beyond meaningful hu- the Advanced Research Projects Agen- fundamentally vague. man control.” Communications is cy (ARPA) of the U.S. Department of Arkin’s position is that AI technolo- following up on this letter with a Defense. Do we really believe AI can, gy could and should be used to protect Point-Counterpoint debate between somehow, get an exemption from noncombatants in the battlespace. Stephen Goose and Ronald Arkin on military applicability? AI is already I am afraid I am as skeptical of the the subject of lethal autonomous seeing wide military deployment. potential of technology to humanize weapons systems (LAWS) beginning Rather than call for a general ban war as I am skeptical of the prospect on page 43. on military application of AI, the of banning technology in war. Arkin “War is hell,” said General Wil- Open Letter calls for a more specific argues that judicious design and use liam T. Sherman, a Union Army gen- ban on “offensive autonomous weap- of LAWS can lead to the potential eral during the American Civil War. ons,” which “select and engage tar- saving of noncombatant life. Techni- Since 1864, the world’s nations have gets without human intervention.” cally, this may be right. But the main developed a set of treaties (known as But the concept of “autonomous” is effort of military designers has been the “Geneva Conventions”) aiming at intrinsically vague. In the 1984 sci- and will be to increase the lethality of somewhat diminishing the horror of ence-fiction film The Terminator, the their weapons. I fear that protecting war and ban weapons that are con- protagonist is a cyborg assassin sent noncombatant life has been and will sidered particularly inhumane. Some back in time from the year 2029. The be a minor goal at best. notable successes have been the ban- Terminator seems to be precisely the The bottom line is that the highly ning of chemical and biological weap- nightmarish future the Open Letter important issue raised by the Open ons, the banning of anti-personnel signatories are attempting to block, Letter and by the Point-Counterpoint mines, and the banning of blinding la- but the Terminator did not select its articles is highly complex. Knowledge- ser weapons. Banning LAWS seems to fictional target, Sarah Connor; that able, well-meaning experts are arguing be the next frontier in effort to “some- selection was done by Skynet, an AI the two sides of the LAWS issue. To the what humanize” war. defense network that has become best of my knowledge, this is the first While I am sympathetic to the de- “self-aware.” So the Terminator itself time the computing-research commu- sire to curtail a new generation of even was not autonomous! In fact, the Ter- nity is publicly grappling with an issue more lethal weapons, I must confess, minator can be viewed as a “fire-and- of such weight. That, I believe, is a very however, to having a deep sense of forget” weapon, which does not re- positive development. pessimism as I read the Open Letter, quire further guidance after launch. Follow me on Facebook, Google+, as well as the two powerful Point and My point here is not to debate a sci- and Twitter. Counterpoint articles. I suspect many ence-fiction scenario but to point out computer scientists, like me, like to the intrinsic philosophical vagueness Moshe Y. Vardi, EDITOR-IN-CHIEF believe that, on the whole, comput- of the concept of autonomy. ing benefits humanity. Thus, it is dis- Goose argues that ceding life-and- turbing for us to realize computing is death decisions to machines on the also making a major contribution to battlefields crosses a fundamental military technology. In fact, since the moral and ethical line. This assumes Copyright held by author.

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 5 SHAPE THE FUTURE OF COMPUTING. JOIN ACM TODAY.

ACM is the world’s largest computing society, offering benefits and resources that can advance your career and enrich your knowledge. We dare to be the best we can be, believing what we do is a force for good, and in joining together to shape the future of computing. SELECT ONE MEMBERSHIP OPTION ACM PROFESSIONAL MEMBERSHIP: ACM STUDENT MEMBERSHIP:

q Professional Membership: $99 USD q Student Membership: $19 USD q Professional Membership plus q Student Membership plus ACM Digital Library: $42 USD ACM Digital Library: $198 USD ($99 dues + $99 DL) q Student Membership plus Print CACM Magazine: $42 USD q ACM Digital Library: $99 USD q Student Membership with ACM Digital Library plus (must be an ACM member) Print CACM Magazine: $62 USD q Join ACM-W: ACM-W supports, celebrates, and advocates internationally for the full engagement of women in all aspects of the computing field. Available at no additional cost. Priority Code: CAPP Payment Information Payment must accompany application. If paying by check or money order, make payable to ACM, Inc., in U.S. dollars Name or equivalent in foreign currency.

ACM Member # q AMEX q VISA/MasterCard q Check/money order

Mailing Address Total Amount Due

Credit Card # City/State/Province Exp. Date ZIP/Postal Code/Country Signature Email

Return completed application to: Purposes of ACM ACM General Post Office ACM is dedicated to: P.O. Box 30777 1) Advancing the art, science, engineering, and New York, NY 10087-0777 application of information technology Prices include surface delivery charge. Expedited Air 2) Fostering the open interchange of information Service, which is a partial air freight delivery service, is to serve both professionals and the public available outside North America. Contact ACM for more 3) Promoting the highest professional and information. ethics standards Satisfaction Guaranteed!

BE CREATIVE. STAY CONNECTED. KEEP INVENTING.

1-800-342-6626 (US & Canada) Hours: 8:30AM - 4:30PM (US EST) [email protected] 1-212-626-0500 (Global) Fax: 212-944-1318 acm.org/join/CAPP cerf’s up

DOI:10.1145/2842510 Vinton G. Cerf Advancing the ACM Agenda ACM is a fairly compact organization, looking make ACM itself and its functions and activities as accessible as possible. only at its modest but spirited staff, now led Diversity does not stop with accessi- by the energetic team of CEO Bobby Schnabel bility. If we are to be a successful global organization, diversity on all fronts and COO Pat Ryan. For the sake of saving must be a global ambition. That means we must work with and empower our space, I have not tried to list all the to its agenda and to opportunities for regional councils to help draw atten- other wonderful staffers who tend members and beneficiaries of ACM’s tion to the volunteer opportunities to the machinery of ACM’s opera- work to discover and participate in available within the ACM framework tion and support the much larger the many volunteer opportunities and to the valuable products and ser- volunteer structure that makes ACM that abound in our online and offline, vices that are the output of ACM’s ef- what it is today. On the record, we computer-driven world. Just scanning forts. It means we need to encourage have a first-rate team of full-time the special interest group activities and spur efforts to bring women and talent keeping ACM humming. But, highlights the remarkable range of minority groups more fully into the alone, the staffers do not and could interests we serve. With the addition ACM family. And that brings me to an- not make ACM the diverse and mul- of the regional councils and chapters other point about diversity. tifaceted organization it is. The vol- in India, China, and Europe, and else- We are entering into our bian- unteers, many of whom have served where, ACM’s footprint and scope of nual election process for leadership in multiple roles over decades, pro- opportunity has increased notably. in ACM’s volunteer component. The vide ACM with the intellectual and This brings me to the first point of able, top-leadership team of Alex Wolf, operational muscle power it needs to this column: diversity is key to ACM’s Vicki Hanson, and Erik Altman will carry out myriad activities in support ability to prosecute its agenda and to have completed their statutory two- of computer science professionals deliver benefits to members and prac- year terms by the end of June 2016 and and practitioners around the world. titioners around the world. The sec- it is time to elect new officers. As the It is interesting to note that many of ond point is that volunteers are abso- electorate, you have both the opportu- our dedicated ACM volunteers started lutely essential to achieving the many nity and the obligation to give careful with a single, focused commitment goals ACM’s volunteer leadership has thought to your choices for leadership. to, say, a program committee or a adopted over the years. We need to ex- I chair the nominations committee as particular publication and over time pand outreach to underrepresented past president and I have also sought and further engagement have become populations of computer profession- to impress on the committee the im- some of the most effective leaders of als and educators and to the general portance of diversity in our recom- our association. Indeed, ACM volun- public. We would benefit from many mendations for candidates for volun- teers come to value the impact they more volunteers and members drawn teer leadership positions. It is my hope can make serving our community. from non-U.S. populations, women, we can collectively put ACM on a track Despite the fact I have been a and other underrepresented minority toward greater diversity in all dimen- member of ACM since 1967, I am still groups, all of whom bring important sions of its activities, including our discovering new things that ACM is perspectives and energy to ACM’s mis- SIGs and other special interest groups, involved in, and this after serving as sion. Just thinking about accessibility such as ACM-W. president and now past-president for of computer-based systems reminds We are already actively pursuing the past 3.5 years! The diversity of ac- me how important it is to engage with these goals and your election choices tivity is well illustrated in ACM’s Web users who would benefit from various represent one way to spur the organi- pages but I confess I do not visit them assistive technologies and methods. zation toward greater success in volun- enough to stay on top of everything. We have a SIG focused on this com- teer diversity in the years ahead. My New Year’s resolution is going to munity (SIGACCESS), but we also need be to visit the pages at least once a members of these communities to feel Vinton G. Cerf is vice president and Chief Internet Evangelist at Google. He served as ACM president from 2012–2014. week to see what is going on. ACM is and be welcome and supported in all looking for new ways to draw attention our SIGs and our activities. We need to Copyright held by author.

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 7 letters to the editor

DOI:10.1145/2841423 What About Statistical Relational Learning?

HILE STUART RUSSELL’S Another major strand of research the Halting Problem, as well as general review article “Unify- in this area the article did not por- program synthesis. ing Logic and Prob- tray accurately is probabilistic logic I am also accused of ignoring “the ability” (July 2015) programming. The article said the entire field of statistical relational provided an excellent “ … first significant probabilistic learning.” The excellent book Introduction Wsummary of a number of attempts programming language was Pfef- to Statistical Relational Learning, to unify these two representations, fer’s IBAL.” While IBAL is definitely compiled by my former student Lise it also gave an incomplete picture of significant, Poole’s ICL and Sato’s Getoor and the late Ben Taskar (MIT the state of the art. The entire field of PRISM were developed much earlier Press, 2007), has 13 chapters on SRL statistical relational learning (SRL), and have had a significant impact on languages and systems. My article which was never mentioned in the the field. ICL and PRISM essentially referred to 10 of them. article, is devoted to learning logical extend the Prolog programming lan- My comment that IBAL was the first probabilistic models. Although the guage by labeling facts with probabil- probabilistic programming language article said little is known about com- ities. They then use these probabilis- (PPL) was in no way intended as a putationally feasible algorithms for tic facts in the same way probabilistic slight to Sato’s PRISM and Poole’s learning the structure of these mod- databases use labeled tuples—to de- ICL, contributions for which I have the els, SRL researchers have developed fine a probability distribution over highest respect. My article placed these a wide variety of them. Likewise, con- possible worlds. They can represent approaches, along with BLOG, within trary to the article’s statement that Bayesian networks, as well as cope the tradition of languages for defining generic inference for logical probabi- with infinite possible worlds and an probability distributions over logical listic models remains too slow, many unknown number of objects. Sato worlds, as did Sato and Poole. efficient algorithms for this purpose received the test-of-time award from For example, the PRISM website have been developed. the International Conference on Log- (http://rjida.meijo-u.ac.jp/prism/) says, The article mentioned Markov logic ic Programming in 2015 for his semi- “The program defines a probability networks (MLNs), arguably the lead- nal 1995 paper on PRISM. distribution over the set of possible ing approach to unifying logic and The article concluded with, “ … Herbrand interpretations.” My article probability, but did not accurately de- these are early days in the process of clearly distinguished this approach from scribe them. While the article conflat- unifying logic and probability.” On the PPL tradition based on distributions ed MLNs with Nilsson’s probabilistic the contrary; with developments like over execution traces of an arbitrary logic, the two are quite different in a MLNs, probabilistic logic program- programming language, due to Koller, number of crucial respects. For Nils- ming, lifted inference, statistical re- McAllester, and Pfeffer. Perhaps this is son, logical formulas are indivisible lational learning, and more generally just a matter of terminology, although constraints; in contrast, MLNs are log- statistical relational AI, we are well it seems worthwhile to point out that linear models that use first-order for- on our way to solving this longstand- execution traces need have no relational mulas as feature templates, with one ing problem. structure at all. At the time of writing, feature per grounding of the formula. Pedro Domingos, Seattle, WA, the extensive bibliography at This novel use of first-order formulas Kristian Kersting, Dortmund, Germany, http://probabilistic-programming.org/research/ allows MLNs to compactly represent Raymond Mooney, Austin, TX, and does not include the early papers by Sato most graphical models, something Jude Shavlik, Madison, WI and Poole, but a broader notion of PPL previous probabilistic logics could might well include them. not do. This capability contributes sig- Stuart Russell, Berkeley, CA nificantly to the popularity of MLNs. Author’s Response: And since MLNs subsume first-order Domingos et al. take me to task for Bayesian networks, the article’s claim my cautious claims about the state of Give Me ‘Naked’ Braces that MLNs have problems with vari- the art. Given the history of occasional I was appalled by A. Frank Ackerman’s able numbers of objects and irrelevant overstatement in artificial intelligence, letter to the editor “Ban ‘Naked’ objects that Bayes-net approaches caution seems appropriate. Domingos Braces!” (Oct. 2015), which recom- avoid is incorrect. MLNs and their et al. claim “many efficient algorithms” mended programmers adopt a policy variants cannot only handle object exist for generic inference in these of always following the closing brace uncertainty but relation uncertainty languages, as well as “a wide variety … of each code block (presumably, in as well. Further, the article said MLNs of computationally feasible algorithms for Algol-like languages like C and Java) perform inference by applying MCMC learning the structure of these models.” with a comment intended to make to a ground network, but several lifted If true, this is excellent news for computer it clear exactly which code block the inference algorithms for them exist. science, since these problems subsume closing brace belongs to. However,

8 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 letters to the editor assuming one’s code is properly in- ticularly in the case of closing braces, dented and the size and complexity as advocated by Ackerman. of every function is below some rea- Weston Markham, Pittsburgh, PA sonable limit, the best way (in my firmly held opinion) to determine the code block closed by any given brace Author’s Response: is simply to follow the indentation up In matters of style there are always a variety to find its matching brace. Although of opinions. I personally find the practice I I readily admit a “reasonable limit” recommended useful both in development is a subjective matter (and may be and in improving readability for maintenance. strongly influenced by the particular I have used it for many years and taught it to development tools being used) the my students in several different procedural mere presence of a code block large languages. Code has a tendency to move and complex enough to cause poten- around quite bit during development, and tial confusion should serve as a red my proposed tags help keep the logic blocks flag, indicating the code in question straight when it happens. ought to be decomposed into smaller A. Frank Ackerman, Butte, MT functions. In such cases, it may in- deed be expedient to use comments I found an interesting juxtaposition in the manner Ackerman described, between A. Frank Ackerman’s letter if one wishes to defer the decisions to the editor (Oct. 2015) and George Spatial Computing of exactly which parts to split out, V. Neville-Neil’s Viewpoint “Storming how to pass parameters to and/or the Cubicle” (Oct. 2015), with Neville- Open Data and Civic Apps from the resulting functions, or what Neil implying Ackerman’s suggestion names to give these functions. How- would not work in practice. Ackerman ever, this is a rather poor (and, one reasonably recommended developers The Building Blocks hopes, temporary) reason to add such tag their construct terminators (with, COMMUNICATIONS of a Cloud Strategy comments. Every developer should say, closing braces) to help avoid pro- perceive them as being unusual in gramming errors and seemed to as- Algebraic Fingerprints production-quality code, rather than sume developers would code this way the normal way to end a code block. if asked. But Neville-Neil illustrated for Faster Algorithms The main problem with a com- the well-known fact that uneducated ment that is supposed to document and/or unmotivated developers rou- Time Is an Illusion which code block is ended by a giv- tinely ignore quality in favor of do- en brace is it merely documents the ing their work as quickly and easily Immutability Changes comment’s belief about the purpose as possible. Fortunately, Ackerman of each closing brace, and there- ended by noting educating beginners fore opens up the possibility for the about code quality is the one true Unbalanced comments to be inaccurate. In the solution to such problems. I would Data Leads to Obsolete absence of any support by the devel- like to add that education can take Economic Advice opment tools, every bug that was pos- many forms, not only for beginners sible before adopting such coding in an educational environment but Why Knowledge practice is still possible afterward. So, even for experienced programmers any degree to which subsequent de- through on-the-job training, continu- Representation Matters velopers trust the comments simply ing education, and, most important,

increases their surprise when a bug peer mentoring. in Next Month Coming Crowdsourced  happens. As a result, the comments Geoffrey A. Lowney, Issaquah, WA Enumeration Queries actually communicate their author’s skepticism, rather than confidence, as to whether or not the braces are in Communications welcomes your opinion. To submit a Bare-Metal Performance Letter to the Editor, please limit yourself to 500 words or agreement with the intended struc- less, and send to [email protected]. for I/O Virtualization ture of the code. Broadly speaking, I would not sup- port an attempt to convince coders they can provide a significant benefit to the quality of code by simply add- Plus the latest news about preserving the Internet, ing comments to indicate what they non-volatile memory, and believe the code is doing. Absent a how computers recognize way to validate those beliefs, such at- and understand images. tempts are generally misguided, par- © 2015 ACM 0001-0782/15/12 $15.00

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 9 The Communications Web site, http://cacm.acm.org, features more than a dozen bloggers in the BLOG@CACM community. In each issue of Communications, we’ll publish selected posts or excerpts.

Follow us on Twitter at http://twitter.com/blogCACM

DOI:10.1145/2833120 http://cacm.acm.org/blogs/blog-cacm What Do We Do When the Jobs Are Gone, and Why We Must Embrace Active Learning Moshe Y. Vardi ponders the outlook for people when all work is automated, while Mark Guzdial emphasizes the importance of Active Learning in teaching computer science.

Moshe Y. Vardi on fully automating their whole supply large number of have-nots, supported, “The Future of Work: chain. The list of jobs likely to be auto- say, by government subsidies. This is But What Will mated grows daily, as AI increases its reminiscent of “panem et circenses,” Humans Do?” cognitive ability (it won at chess in 1999 the Roman practice of free bread and http://bit.ly/1geYbLj and “Jeopardy!” in 2011), and its situa- entertainment to the masses. Yet I do September 11, 2015 tional awareness and physical dexterity. not find this a promising future, as I The unstoppable march of AI sug- do not find the prospect of leisure- While artificial intelligence has proved gests that Herbert Simon was probably only life appealing. I believe work is much more difficult than some early right when he wrote in 1956 that “ma- essential to human well-being. Is this pioneers believed, its progress has been chines will be capable … of doing any our future? nothing short of inexorable. In 2004, work a man can do.” I do not expect It is instructive to recall the bib- economists argued that driving was this to happen soon, but I do believe lical story of the Garden of Eden in unlikely to be automated in the near that by 2045 machines will be able to the book of Genesis (Chapters 2 and future. A year later, a Stanford auto- do much of the work that humans can 3). God places Adam and Eve in the nomous vehicle won a DARPA Grand do. So the question is: If machines can Garden and tells them: “But of the Challenge by driving over 100 miles do almost any work humans can, what tree of the knowledge of good and along an unrehearsed desert trail. A will humans do? evil, thou shalt not eat of it.” The Ser- decade later, one hears regularly about A typical answer is that if machines pent then tempts Eve, who, in turn, the exploits of the Google driverless will do all our work, we will be free to tempts Adam, to eat from the Tree car. I believe that in 30 years it will be pursue leisure activities. Of course, of Knowledge. This leads to the ex- quaint, perhaps even illegal, for hu- our economic system would have to pulsion of Adam and Eve from Eden. mans to drive on public roads. undergo a radical restructuring to en- Furthermore, God metes punishment Once driving is automated, delivery able billions of people to live lives of on the Serpent, Eve, and Adam: “And will be quick to follow; companies such leisure. One can imagine perhaps a so- unto Adam he said, ‘cursed is the as Amazon are already working hard ciety of a small number of haves and a ground for thy sake; in sorrow shalt

10 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 blog@cacm thou eat of it all the days of thy life; Mark Guzdial It is now a matter of science, not in the sweat of thy face shalt thou eat “Be It Resolved: opinion. Active learning methods bread’.” So, according to this biblical Teaching are more effective than lecturing. We story, our need to work for a living is Statements should encourage use of active learn- an outcome of the failure of humanity Must Embrace ing methods in our classrooms. The to follow the word of God. Active Learning blog post at http://bit.ly/1MOMMxO But let us contemplate humanity and Eschew Lecture” connects to resources for improved before and after the expulsion. Before http://bit.ly/1RkVbuC teaching methods in computer sci- the expulsion, Adam and Eve spent August 14, 2015 ence. There are active learning meth- their time frolicking naked in the gar- ods that we can use even in large den, where food is amply available In 1964, the U.S. Surgeon General pro- classes, like Peer Instruction (see without work; one could say they were duced a report unequivocally stating PeerInstruction4CS.org). no better than apes. One could even that smoking was a health hazard. Here is something concrete that we see the story as a metaphor for the The report had a dramatic impact on in academia can do. We can change the roots of humanity in pre-human pri- public policy and how people viewed way we select teachers for computer mates. After the expulsion, humans smoking. Fifty years later, we know science and how we reward faculty. had to work for a living, but they have that the impact of that report was to All teaching statements for faculty eaten from the fruit of the Tree of save thousands and maybe millions of hiring, promotion, and tenure should Knowledge. They were inventive. They lives (see 50-year retrospective report include a description of how the candi- have learned to hunt, mastered fire, at http://1.usa.gov/1RkVyoM). date uses active learning methods and invented agriculture, and eventually We are at a similar point in un- explicitly reduces lecture. launched the Industrial Revolution. derstanding that lecture is an inef- We create the incentive to teach We are about to launch another In- fective way of teaching. Active learn- better. We might simply add a phrase dustrial Revolution, where work will ing methods lead to better learning to our job ads and promotion and be almost fully automated. and greater retention. More, there is tenure policies like, “Teaching state- In a sense, humans used the knowl- increasing evidence that poor teach- ments will be more valued that de- edge they gained from the Forbidden ing disproportionately impacts stu- scribe how the candidate uses active Fruit to overcome God’s punishment; dents from disadvantaged and under- learning methods and seeks to re- they will no longer need to work for a represented groups. duce lecture.” We should read these living; no more “by the sweat of thy Last year, the Proceedings of the critically. We should be convinced the face.” But can humanity go back to the National Academy of Science pub- candidates are not just mouthing ac- Garden of Eden? Will we be happy just lished a meta-analysis of 225 studies tive learning rhetoric, but are actually frolicking? Furthermore, human prog- (http://bit.ly/1Lovtqb). The conclu- investigating and using active learn- ress has been driven to a large extent sion appeared as the title of the pa- ing methods. by our desire to eliminate work or, at per, Active learning increases student It is a small step, but it is an impor- least, to lighten the toil. What will drive performance in science, engineering, tant one. Incentives change behavior. humanity once that goal has by and and mathematics. There is increasing Stating clearly what we value in teach- large been accomplished? evidence that improved teaching re- ing statements will send messages Thus, even if we manage to solve duces the achievement gap between that change how CS faculty teach over the economic implications of the disadvantaged and more advantaged time. This step will likely have a critical complete or almost-complete automa- students, for example, in biology impact on how we teach and who suc- tion of work, the question of the con- (http://bit.ly/1j5Kp06) and in comput- ceeds in computing. sequences to quality of life remains er science (see new paper from ICER I do not know if any other STEM wide open. The classical Greek phi- 2015 at http://dl.acm.org/citation. disciplines are changing how they losophers, starting with Socrates, dis- cfm?doid=2787622.2787728). evaluate teaching statements in re- cussed “Eudaimonia,” often translated Now, Nature has just published a sponse to the Nature and PNAS pa- as “the good life”—in other words, hu- paper (http://bit.ly/1Od6G8U), “Why pers. Let us lead. Let us be first. In man flourishing. Aristotle viewed this we are teaching science wrong, and 50 years, we might be looking back question as one of the most central in how to make it right,” which includes to find that our response to these re- philosophy. So the question facing us the quote, “At this point it is unethi- ports brought more and more diverse today is whether we can achieve the cal to teach any other way.” Wired students to computing. good life without work. magazine’s article on the active learn- I believe the question of how hu- ing papers (http://wrd.cm/1VvlAXa) Moshe Y. Vardi is the Karen Ostrum George Distinguished Professor of Computational Engineering at manity will occupy itself in the pres- makes the connection more explicit: Rice University, and editor-in-chief of Communications. Mark Guzdial is a professor in the College of Computing ence of intelligent machinery is one “The impact of these data should be at the Georgia Institute of Technology. of the most central challenges facing like the Surgeon General’s report on society today. To repeat my earlier ‘Smoking and Health’ in 1964—they question: If machines are capable of should put to rest any debate about almost any work humans can do, what whether active learning is more effec- will humans do? tive than lecturing.” © 2015 ACM 0001-0782/15/12 $15.00

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 11 news

Science | DOI:10.1145/2833138 Don Monroe When Data Is N Not Enough Reproducibility of code is increasingly crucial to verifying scientific claims.

ASSIVE DATASETS and digital processing are transforming and ac- celerating science, but there is growing concern Mthat many scientific results may not be trustworthy. Scientific procedures developed over centuries to assure re- liable knowledge are sometimes over- whelmed by new ways of generating and processing scientific information. As a result, the scientific community is implementing requirements that help independent researchers reproduce published results, a cornerstone of the scientific method. For data, the revolution is well un- der way. Inspired by projects like the Human Genome Project, the National Institutes of Health has provided in- frastructure (and funding) for massive repositories of genetic and other data. In this field and others, researchers are expected to make their data available to other researchers. Yet there is a growing recognition that provisions must also be made for the data-analysis software that supports the conclusions. Policy dis- cussions of reproducibility “almost always talk primarily about data,” cau-

tioned Victoria Stodden of the Gradu- BARTLETT/NHGRI MAGGIE BY IMAGE

12 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 news

ate School of Library and Information said Peng, or at least “how to manage Science at the University of Illinois at people who are doing these complex Urbana-Champaign. “This is a huge Systematic efforts data analyses” as part of increasingly gap for computational science.” are under way multidisciplinary teams. “There has been an enormous flip in the last 20 years from data collection to to validate Putting Up Resistance data analysis,” said Roger Peng at the important studies There are many reasons researchers Johns Hopkins Bloomberg School of may not want to post their code, noted Public Health in Baltimore. In many in biomedicine Yolanda Gil of the University of South- fields, the critical challenges now re- and psychology. ern California’s Information Sciences side in analysis of widely available big Institute, who also chairs the ACM datasets, he said. This shift “has hap- Special Interest Group on Artificial In- pened more quickly than I think most telligence. She likens the task to com- fields have been prepared to deal with.” menting code, a thankless task that everyone endorses but which many A Reproducibility Crisis programmers neglect. On top of the Reproducibility is always central to atic efforts are under way to validate work involved in making their software science, but especially when data may important studies in biomedicine and usable by strangers, researchers “often be fudged. In a recent case, Duke Uni- also in psychology, which has been believe it is not very high-quality,” she versity researchers described a gene plagued by irreproducible results. said, or that “no one will use it.” “signature” to predict cancer-drug If the code and data are used by oth- response, which generated enormous Documenting Science in Progress ers, that can create even more work, as interest and even clinical trials at Traditionally, the laboratory notebook climate scientists learned when out- N Duke. Excited clinicians at the Uni- was a standard part of experimental side skeptics pored over their work. versity of Texas MD Anderson Cancer research, documenting procedures, “It’s easy to take data that someone has Center enlisted statisticians Keith measurements, and calculations that provided and do something to it and Baggerly and Kevin Coombes to con- led to the conclusions. say that something fraudulent has oc- firm the conclusions, but the pair As analysis has moved to comput- curred,” said Peng. In principle, such found numerous problems in the data ers, researchers may explore their data scrutiny will improve the science, but analysis. Eventually the trials were using simple but powerful tools like the extra work can make researchers stopped, but only after misrepresen- Excel spreadsheets, which leave no re- reluctant to share. tations were found on researcher Anil cord of the calculations. “It’s a little bit Potti’s résumé. crazy,” Stodden said, since those pro- Publications Because of the limited published grams are “not designed with scientific In practice, researchers rarely recreate information, this reanalysis required tracking and sharing needs in mind.” experiments simply to confirm them, “about 30% of our time for several Reproducibility of code should force but they often retrace the key steps in months,” Baggerly said. “This is not a scientists to migrate to platforms that order to extend the results. Scientific- scalable solution.” Based on this expe- document their manipulations explicit- article style evolved specifically to en- rience, he said, “It became obvious that ly. Although there is not yet any standard sure readers have enough information improving the reproducibility of re- to rival the lab notebook, “it’s gotten to repeat the procedures. ports was an important problem in and markedly easier,” said Baggerly. “You’ve An official publication is an ideal of itself,” including the posting of code seen the advent of a whole bunch of time to assess the reliability of the re- as well as data. tools which make tracking of computa- sults and to deposit data and code in “This was certainly an atypical tion and reconstruction easier.” a public repository, and journals in- case,” Baggerly stressed, but encour- For code, GitHub is widely used to creasingly allow online posting of sup- aging reproducibility is critical even track rapidly evolving programs. This porting material. They also frequently without the possibility of misconduct. support is important for code, Stod- require authors to affirm that data has For example, a widely noted 2012 den says, because unlike data, “well- been placed in public repositories, al- comment in Nature said retesting by used popular software is all forked and though those requirements are not nec- the pharmaceutical company Amgen branched; it’s changing all the time.” essarily enforced. “The legacy publish- confirmed only six of 53 “landmark” Other tools are also available, but so ing system hasn’t quite caught up to the biomedical studies cases. A study far none is appropriate for everyone. To fact that we need to have code and data published in June on “The Econom- help computational scientists navigate and digital scholarly objects associated ics of Reproducibility in Preclinical this complex landscape, Peng and two with our publications,” said Stodden. Research” (http://bit.ly/1K7NeWL) es- colleagues offer an online course on re- One exception to this neglect is the timated the costs of irreproducibility producible research, and he, Stodden, journal Mathematical Programming in life science at $28 billion per year in and a third author also collaborated on Computation, founded in 2009. Fol- the U.S., attributing a quarter of that a book, Implementing Reproducible Re- lowing the mathematics tradition of sum to “data analysis and reporting.” search. “Everyone involved in science checking proofs before publication, Recognizing this challenge, system- will need to know some data analysis,” this journal not only requires software

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 13 news

but, whenever possible, technical edi- shot of the code used in the publica- tors confirm the software works as ad- tion will miss “all the trails and av- vertised. “It can sometimes be quite “Different fields enues that scientists went down that difficult,” said founding editor William have different didn’t pan out,” Stodden warned. Cook, a professor of Combinatorics and Although such exploration is an im- Optimization at the University of Water- documentation portant part of science, it naturally loo in Canada, who says this process has needs … there selects pleasant surprises. The statis- uncovered serious unintentional errors. tical significance of a clinical trial, for “I am now skeptical about any computa- can’t be a one-size- example, can only be assessed if there tional results (in my own area of math- fits-all solution.” was a clear protocol on record before- ematical optimization) that have not hand. Tracking the dead ends would gone through a review like we do.” require a much more sophisticated Still, “the extra review is a great bur- software environment, one that does den on authors,” he admits. Some of not yet exist. In principle, such a plat- them complain that they get no extra form could document the entire his- credit for publishing in this journal, “so tory and workflow of a project. Unlike they choose to go elsewhere.” needs,” and some journals have lim- the minutiae of physical and chemi- As a demonstration for less-math- ited resources. “There can’t be a one- cal experiments, data and code can ematical science, Gil has been shep- size-fits-all solution.” The graduated be completely and cheaply captured, herding The Geosciences Paper of the structure should make it easy for jour- so software tools could lead the way to Future Initiative, sponsored by the nals to get started and to increase their improving reproducibility across the National Science Foundation. For this commitment over time. entire scientific enterprise. project, a dozen or so teams of geo- Still, “just having a standard isn’t For now, however, advocates are just scientists have volunteered to submit a solution in itself,” Nosek acknowl- hoping to improve the transparency of articles that illustrate transparency edged. Ultimately, requirements for code. “It’s a serious collective-action and reproducibility, to be published deposition of data, code, and other problem.” Stodden said. Only the in the next few months. Among other research materials also will have to highest-impact journals can impose things, “We’re trying to incorporate be enforced broadly throughout the new requirements without the risk of these best practices as you do the work, scientific ecosystem, including fund- alienating authors, for example. “The not at the end,” Gil stressed. ing agencies and committees making progress is very slow,” Stodden said, tenure decisions. “A focus on journals but “we’re getting there.” Providing Incentives was our start,” Nosek said. “It was low- “Everyone agrees” about the need to hanging fruit.” address code reproducibility, Stod- Further Reading den said. “We’d have much better Into the Future Ince, D.C., Hatton, L., and Graham-Cummig, J. science if we resolved these issues.” Code used to derive scientific results The case for open computer programs. Nature 482:485–488 (2012). doi:10.1038/ However, expecting early-career re- is finally being recognized as a critical nature10836 searchers to unilaterally devote ef- ingredient for reproducibility, but sup- Stodden, V., Borwein, J., and Bailey, D., fort to this end is not reasonable, she porting tools are still primitive. “The “Setting the default to reproducible” in stressed. “They’re just not going to, only solution that we have now is to Computational Science Research. until the incentives change.” dump all the code onto you,” said Peng, SIAM News, June 03, 2013 As a step toward providing those which he likens to trying to teach a mu- http://sinews.siam.org/DetailsPage/ incentives, last winter the non-profit sician a song with a bit-level sound file. tabid/607/ArticleID/351/ Center for Open Science hosted a com- Researchers are still trying to devise Stodden, V., Leisch, F., and Peng, R.D., mittee including scientific experts as the code equivalent of sheet music that Implementing Reproducible Research. CRC Press, 2014, https://books.google. well as representatives of journals and makes the significance of sounds hu- com/books?id=WVTSBQAAQBAJ funding agencies. The group published manly useful. a general framework for journals to Simply making code available also Stodden, V., Opportunities and Challenges for Open Data codify their transparency standards for fails to answer the key question: wheth- and Code: Facilitating Reproducibility. each of eight standards–notably includ- er it actually does the calculations it is 2013, http://bit.ly/1MzGSBB ing “analytic methods (code) transpar- supposed to do. Such verification is Baggerly, K., ency.” For each subject, the framework burdensome and may be almost im- The Importance of Reproducible Research specifies four tiers of rigor, ranging possible for large programs like those in High-Throughput Biology: Case Studies from vague support or silence to de- used for climate, unless researchers in Forensic Bioinformatics. 2010, tailed validation of posted resources. can implement intrinsically robust http://bit.ly/1EEUWmM The standards had to be flexible, techniques like those being developed Don Monroe is a science and technology writer based in said executive director Brian Nosek, a for mathematical proofs. (See “A New Boston, MA. psychologist at the University of Vir- Type of Mathematics?” Communica- ginia in Charlottesville. “Different tions, February 2014.) fields have different documentation In addition, even a complete snap- © 2015 ACM 0001-0782/15/12 $15.00

14 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 news

Technology | DOI:10.1145/2834057 Gregory Mone The Hyper-Intelligent Bandage Scientists are developing smart, sensor-packed dressings to help heal chronic wounds.

N THE RECENT movie Avengers: is detecting biological warning signs Age of Ultron, the supremely of these problems. At Northeastern skilled archer Hawkeye suffers University, chemical engineer Edgar a major injury to his midsec- Goluch and colleagues are developing tion during the opening battle. sensors designed to identify the pres- IHawkeye is quickly wrapped up and ence of specific bacterial species. whisked back to superhero headquar- Oxygen is another key metric: heal- ters, where a bioengineer repairs the ing tissues consume large amounts of terrible wound, using a tool that scans oxygen; without it, they die. This is one and then prints a new layer of synthetic of the main reasons for diabetic ulcers skin across the injured area. After a and the chronic wounds that can result quick recovery, he is soon up and rac- from bedsores—the pressure of one’s ing back into action. own body actually denies oxygen to the If only wound treatment were that surrounding tissue. advanced in the real world. Unfortu- Evans and his group developed a pro- nately, effectively dealing with major This prototype electronic skin patch can totype bandage that measures tissue store and transmit data about a person’s wounds and burns, whether they stem movements, receive diagnostic information, oxygenation through the use of oxygen- from disease or the battlefield, remains and release drugs into skin. sensitive phosphores, or light-emitting a technical and scientific challenge. molecules. When oxygen bumps into A wound is a complex micro- Chemist Conor Evans of the Wellman these molecules, it effectively dims the environment packed with competing Center for Photomedicine at Massachu- phosphores; in low-oxygen environ- influences, from the agents dispatched setts General Hospital compares the na- ments, “as oxygen decreases, it glows by the body’s immune response to the scent technology to the life-supporting brighter and brighter,” he explains. bacteria that cause infections. Standard protective garb of the space-walking as- The result is a bandage that maps oxy- bandages and dressings cannot always tronaut. “What we’re trying to do is make gen levels across the wound. Initially, maintain the infection-free, moist, ox- a space suit for wounds—something the bandage had to be photographed to ygen-rich environment required for a that can go over a wound, keep it safe, reveal the results, but now oxygen-defi- wound to heal. One often-cited example and start to allow it to heal,” he says. cient areas are visible to the naked eye. is the population of patients with severe Evans says the ability to pinpoint these diabetes who incur external ulcers on Sensing Problems at the Site trouble spots could allow for a more tar- their feet; approximately 25% of diabet- The treatment of wounds has changed geted response. ics suffer from a foot ulcer at some point dramatically over the millennia, begin- Other groups, including a team led in their lives, and these wounds are a ning with ancient Sumerians applying by Harvard University tissue engineer Ali leading cause of amputations. poultices of honey and animal fats to Khademhosseini, are developing tech- Recently, scientists have begun an injury, but the body’s healing pro- nologies to measure not just oxygen, working on new types of smart ban- cess has remained the same. First, but temperature, pH, and more. For dia- dages capable of monitoring and even the body attempts to stop bleeding betic patients who often lose feeling in treating such chronic wounds. The re- through coagulation. Then inflamma- their extremities and thus fail to notice a search is still in the prototype stage, and tion takes over, followed by the prolif- worsening wound, this kind of feedback the approaches vary, but these intelli- eration of new cells, repair of the in- could be extremely valuable; a smart gent devices will be closer to miniature jured skin, and eventually the growth bandage with such sensors could alert medical labs than advanced bandages: of new skin to cover the wound. them to the presence of an infection. they will protect the wound, provide a In the case of chronic wounds, the scaffold on which new cells can grow, first step is not as much of a concern Responding in Real Time monitor the area for infections, wire- as the latter ones, during which bacte- The next step in the process is reacting lessly alert caregivers to changes in the ria and other factors conspire to slow to alarming information. Khademhos- status of the injury, and potentially de- or halt healing. One of the major areas seini and his multidisciplinary team

IMAGE COURTESY OF THE NATIONAL SCIENCE FOUNDATION OF THE NATIONAL COURTESY IMAGE liver medications directly. of focus in smart bandage research are building smart bandages that con-

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 15 news

sist of a flexible substrate with embed- the hybrid material in animal models. ded microelectronics and communica- The results have not yet been published, tions components, along with sensors “There’s an ensemble but the fibronectin seemed to acceler- and drug delivery mechanisms. Cur- of technologies ate the healing and prevent scarring in rent prototypes are about a half-inch mice. The work is still in its early stages, thick and six to eight inches long. that will be applied but Parker is encouraged. “We want skin They communicate wirelessly with to wound healing, that’s soft, almost like baby skin, so if smartphones and computers through you have craniofacial burns you can still low-energy Bluetooth, and a coin cell because every move your mouth,” Parker says, “and we battery powers the bandages, afford- patient is different, think we’ve nailed it.” ing them a lifetime of about one week. Eventually, Parker says, this new The group is also working to incorpo- and every wound material could be incorporated into rate wireless charging through wireless is different, too.” smart bandage technology, but he cau- power transfer technology. tions there will not be a single cure-all In one prototype, low oxygen levels in device for chronic wounds and burns. the wound trigger a chemical reaction in- Khademhosseini’s group is developing side the bandage; the by-product of this a range of different prototypes for that reaction is oxygen, which then migrates reason, and none of the scientists are out of the bandage and into the deprived moisture, and pH levels, is allowing underestimating the difficulty of the area. Khademhosseini and his col- the skin to properly reform. One of problem they are working to address. leagues plan to conduct animal testing of the ways our bodies deal with a severe “Wound site biology is very com- this oxygen-releasing system, along with injury is through scarring, but scar tis- plex,” Parker explains. “There’s a whole another that administers antibiotics. sue can be painfully taut and unnatu- ecosystem in there before you even The goal is to graduate to human clinical ral, especially in areas like the face. start talking about infections. There’s trials within a few years. In the case of an Harvard University biophysicist Kevin an ensemble of technologies that will actual patient, the bandage could wire- “Kit” Parker, an Army veteran, has seen be applied to wound healing, because lessly relay sensor data to a smartphone this firsthand in U.S. soldiers returning every patient is different, and every on a regular basis. An app could translate home from Iraq and Afghanistan with wound is different, too. We’re coming that raw data, then generate updates on severe craniofacial burns. “The scar- up with one technology, but there’s no whether the wound is healing or becom- ring can be really terrible,” Parker says. silver bullet.” ing infected, and relay this information “They struggle to use their mouths and to a physician for review. open and close their eyes.” Further Reading “The system could transfer all the The soldiers’ struggles inspired sensing parameters to medical staff, in- Parker to search for a better alterna- Dargaville, T.R., Farrugia, B.L., Broadbent, J.A., cluding the oxygen concentration of the tive—a dressing that would allow the Pace, S., Upton, Z., and Voelcker, N.H. Sensors and Imaging for Wound Healing: wound (and) temperature,” suggests Ali skin to heal with minimal scarring. A Review. Biosensors and Bioelectronics, 41. Tamayol, a bioengineer at Harvard who The key may be the material that actu- Li, Z., Roussakis, E., Evans, C.L., et. al. is also involved in the project. “They ally comes in contact with the wound as Non-Invasive Transdermal Two- could then decide if things are going new cells move in to replace the dam- Dimensional Mapping of Cutaneous well or if they have to do something.” aged ones. The skin cells benefit when Oxygenation with a Rapid-Drying Liquid Yet Khademhosseini says the there is a ‘scaffold’ in place, and Parker Bandage. Biomedical Optics Express, Vol. 5, smart bandage could also be a fully and his team found that a substrate Issue 11, 2014. closed-loop system, given the power made of precisely aligned nanofibers Abrigo, M., McArthur, S.L., and Kingshott, P. of micro-processors and the relatively is an ideal material. Other scientists— Electrospun Nanofibers as Dressings for Chronic Wound Care: Advances, Challenges, small amount of data to be analyzed. including Tamayol and Khademhos- and Future Prospects. Macromolecular The system could be programmed seini—have tested this idea of using Bioscience. 2014, 14. so certain conditions trigger it to re- nanofiber dressings for wound healing, Najafabadi, A.H., Tamayol, A., lease oxygen molecules, while oth- with positive results. Parker’s group, Khademhosseini, A., et. al. ers spur the targeted delivery of an- however, has developed a new manu- Biodegradable Nanofibrous Polymeric tibiotics. Tamayol explains that you facturing technique, inspired by a cot- Substrates for Generating Elastic and could define a critical range for pH, ton candy machine, that allows them to Flexible Electronics. Advanced Materials. Sept. 3, 2014. for example, and if those parameters incorporate a wider range of materials were breached, then a particular drug into the nanofibers. Badrossamay, M.R., Balachandran, K., In their research of the literature Parker, K.K., et. al. could be released. “The system could Engineering Hybrid Polymer-Protein Super- understand and react,” he says. “Ide- on wounds, Parker and his colleagues Aligned Nanofibers via Rotary Jet Spinning. ally, it would all be automatic.” learned about a protein called fibronec- Biomaterials, March 2014. tin that can accelerate the healing pro- Healing the Wound cess and prevent scarring. The group in- Gregory Mone is a Boston, MA-based science writer and children’s novelist. The final step, after warding off infec- corporated some of this protein into its tion and maintaining the right oxygen, polymer-based nanofibers, then tested © 2015 ACM 0001-0782/15/12 $15.00

16 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 news

Society | DOI:10.1145/2834055 Keith Kirkpatrick Technology Brings Online Education in Line with Campus Programs Whether sitting in front of a screen or in a classroom, online and campus-based institutions want to verify students actually attend classes, take exams.

HEN YOU CHEAT, you only cheat yourself,” is an oft-repeated pearl of wisdom from teach- ers, parents, and en- Wlightened students alike. Nevertheless, 65–75% percent of college-age students have admitted to cheating at one time or another, according to surveys con- ducted in the early 1960s, and recent studies indicate cheating still remains rampant. Indeed, Harvard University forced out 70 students for cheating during a May 2012 final exam, and in 2015, more than 60 Dartmouth College students were accused of cheating in a sports-ethics class. In response to such incidents, as well as growing pressure from accredi- tation bodies, employers, and even alumni, traditional campus-based in- stitutions and online universities are A “flipped” classroom: students watch video lectures at home and work on problems in class. implementing technological solutions for stopping cheaters. The challenge faced by these insti- Furthermore, websites have sprung up Institutions also are being closely tutions is twofold: schools must ensure to offer custom-written papers on hun- monitored by the U.S. Department the work a student turns in throughout dreds of subjects, presumably properly of Education, which oversees the fi- the duration of the course is his or her sourced and annotated to pass muster nancial aid/student loan eligibility own work, as well as preventing or dis- with most professors. process through which most schools couraging students from cheating on The ease and pervasiveness of receive the majority of their funding. examinations. Ultimately, if cheating plagiarism is driving the use of tech- “The Department of Education took is seen as pervasive, the reputation of nology to assess the originality of a stronger stance when the Office of the school, as well as its graduates and student-submitted work. Scanning Inspector General did its audits for faculty, will suffer. software such as Turnitin (http://tur- financial aid disbursements,” accord- nitin.com/), which The New York Times ing to Tim Dutta, CEO of Verificient One Step Ahead reported being used by more than Technologies, a provider of student While students from earlier eras pla- 3,500 universities, compares student- verification and anti-cheating technol- giarized authors’ works and liberally submitted work with content on the ogy. “From that perspective, schools quoted from other students’ previous Web, as well as student-submitted have taken a more proactive and rigor- papers, it required nearly as much time papers. According to the company ous approach” to ensuring academic and effort to retype the work as it would Turnitin, the software uses a match- integrity with their programs. “Now to simply complete the assignment ing algorithm to identify passages of these [verification and integrity solu- properly. The advent of computer- and words within submitted work identi- tions] have kind of gotten on the ra- Internet-based learning made it easier cal to those found within its repository dar of institutions as a ‘need to have,’ to copy-and-paste from ever-growing of content. The result is an “originality

IMAGE BY IDA LIESZKOVSZKY/STATEIMPACT OHIO LIESZKOVSZKY/STATEIMPACT IDA BY IMAGE rather than a ‘nice to have.’” annals of legitimate source material. report,” indicating the degree of simi-

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 17 news

larity between submitted work and personal property, with the exception er logs in to take the test. The system sources of content contained in the of identification documents. The room also supports application blacklist- database. As such, human judgment is under surveillance by video cameras ing to prevent the test-taker from us- is also required to determine if an in- and live proctors, there are no sched- ing unauthorized programs on their stance of plagiarism has occurred. uled breaks, and two forms of ID are re- computer before or during a test, and Software, however, cannot catch quired to verify the test taker’s identity. records the entire exam session using all instances of students passing off their webcam and microphone. This the work of others as their own, par- Online Verification video is stored, along with everything ticularly as online courses continue In an online test-taking environment, that is displayed on the user’s screen, to grow in popularity among both it is much harder to verify students are to allow quick review of the test and the traditional educational institutions not cheating, simply because it is not environment by certified proctors, who and for-profit providers. For provid- possible to truly know if something or can review any anomalies that may be ers that offer fully online courses, it someone else is assisting them during indicative of someone trying to cheat, has become increasingly important the exam. such as frequently looking away from to verify a student’s work is that of the There are a number of software-based the screen to read a nearby book, cheat- person who originally registered for solutions designed to make it more diffi- sheet, or second computer screen out the course and will be receiving a cre- cult for online students to cheat. These of view of the webcam. dential for completing it successfully. systems are not foolproof and will not Verificient Technology’s Proctor- stop a truly determined person cheat- Track operates in a similar fashion, Testing the Limits ing, but may deter casual cheaters, and tracking eye movement to determine Most course providers that offer a cer- demonstrate a school’s commitment to if the subject is looking away from the tification or fully accredited degree reducing academic fraud. screen (to prevent stealing glances at a use examinations to demonstrate pro- Software Secure’s RPNow or Verifi- cheat sheet), while the microphone re- ficiency with a subject area. Not sur- cient Technologies’ ProctorTrack each cords audio to detect extraneous noise prisingly, many students try to game use a combination of software, analyt- that could be indicative of someone the system and cheat on these exams, ics engines, and vision- and sensor- else in the room assisting the test-taker. generally motivated by achieving an based systems to verify students are Verificient’s Dutta contends moni- external reward such as a good grade who they say they are, are engaged in toring test-takers is not about forc- in a course, the satisfaction of a de- coursework, and are closely monitored ing students to be “robots” during an gree program and, eventually, the se- to prevent cheating. exam, but simply ensuring they can be curing of a good (or better) job. At massive open online course “seen” by the verification software, and Students have migrated from old- (MOOC) provider edX, vice president of creating a record that can be reviewed school cheating techniques (such as product Beth Porter says, “Our vendor to see if behavior that could be con- writing answers on their clothing) to is Software Secure, and what it does is strued as cheating is present. more technologically advanced meth- look for anomalies.” Some observers have raised con- ods that are specifically designed to Software Secure’s RPNow captures cerns about student privacy with anti- avoid detection by proctors. For exam- images of student test takers and their cheating software that records facial ple, students have scanned a beverage respective photo IDs. The test takers and other personal identification de- label into an image-editing program, re- must confirm the images can be later tails, which are sent back over the In- placed the small text listing the ingredi- used to authenticate their identity by ternet. For its part, Verificient claims ents with exam answers, and reprinted matching the photo when the test-tak- Proctortrack complies with the U.S. and pasted the label back onto the bot- Family Educational Rights and Privacy tle to serve as a cheating aid that is hid- Act (FERPA), and has pledged that the ing in plain sight, usually undetected Software-based software does not access any files on by proctors. Other students have used a student’s hard drive, does not share electronic devices to facilitate cheat- systems will not stop any data with third parties, and purges ing, such as by loading cheat sheets a truly determined personal student data within 30-60 into cellphones, calculators, watches, days, depending upon the university. or other devices, or using them to share character, but Unlike K–12 students, college at- information surreptitiously. demonstrate a tendees ultimately have the choice of To combat these types of schemes, whether or not to attend a school that traditional universities have embraced school’s commitment uses this type of technology. Moreover, an approach that mirrors a ‘clean to reducing students seeking an online degree that room’—a highly monitored, controlled is well respected and valued in the mar- environment that makes cheating very academic fraud. ketplace likely will seek out schools difficult. The campus-based Univer- that are taking steps to improve their sity of Central Florida is one of many reputations, even if it means subject- schools that use an ultra-secure test- ing themselves to remote monitoring. ing facility to thwart cheaters; students The key, according to security and entering the center must lock up all technology expert Joseph Kirkpatrick

18 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 news of Kirkpatrick Price, a Tampa, FL-based licensed CPA and PCI quality security ACM assessor firm, is for both the vendors It is key that vendors supplying the monitoring software and supplying monitoring Member the schools utilizing it to have clear privacy and security policies, as well as software and schools having a compliance officer or team to utilizing it have clear News ensure these policies are enforced. HEISER FOCUSES Kirkpatrick expects the debate over privacy and security ON MICROKERNELS the fair, ethical use of monitoring and policies, along with When it comes verification of student activities for to operating a compliance officer systems, Gernot online and traditional universities will Heiser is a continue, likely as part of the overarch- or team to ensure minimalist. ing debate over data collection, privacy, Heiser, Scientia policies are enforced. Professor and and protection by private companies John Lions Chair of operating and the government. “There are all systems in the University of New these ethical questions,” Kirkpatrick South Wales’ School of Computer says. However, he notes that to enjoy Science and Engineering in Sydney, Australia, develops certain conveniences, such as attend- microkernels, the rudimentary ing classes and taking exams online, core of operating systems. “you’re going to give up some privacy.” pushback from students. Indeed, stu- Heiser received an Still, stopping or reducing cheating is dents must agree to the terms and undergraduate degree in physics from Germany’s University of not the only area in which online educa- conditions of using the school’s soft- Freiburg, and a master of science tional providers must consider how they ware-based online learning platform degree in semiconductor physics interact with students. A recent settle- in order to take courses, and as such, and computer science from Brock University in Ontario, ment between edX and the U.S. Depart- likely will agree to use this type of exam Canada. He earned his Ph.D. in ment of Justice under which the MOOC monitoring as a condition of receiving computer engineering and 3D provider will make its website and course credentials. semiconductor device simulation course delivery platform accessible to “The economics of this [online] at Switzerland’s ETH Zurich. To Heiser, “Microkernels are blind and hard-of-hearing students and model are going to necessitate that,” sexy; they make software and to those with other physical disabilities, Butin says. “The front-end investment systems truly reliable and much has thrown a light on the need for online that [Arizona State University] or any- more secure.” Microkernels are slimmed- education providers to comply with the one else is going to have to put in for down versions of traditional Americans With Disabilities Act (ADA). student verification or anti-cheating operating systems that consist of This is especially important for any in- technology is minimal, compared with tens of millions of lines of code. stitution that receives federal funds (via the potential for revenue generation Experts estimate developers make three errors for every 100 lines grants or student loans), as Section 504 that will happen once they figure out of code, making microkernels of the ADA specifies no qualified indi- these questions,” Butin says. inherently “more complex, faulty, vidual will be excluded from a program For its part, edX says there will be no and error-prone” than operating or activity that receives federal funding. changes in course pricing due to the systems, Heiser says. Heiser and his team Similarly, educational institutions company’s accessibility efforts. “Any developed the Security Enhanced are likely to want to comply with Section investment by edX or course content L4 (SEL4) microkernel with 508 of the ADA, which requires elec- providers with regard to accessibility just 10,000 lines of code. They employed mathematical proof tronic and information technology to is not passed on to the students,” says techniques, providing basic be accessible to people with disabilities, edX general counsel Tena Herlihy. isolation mechanisms (the to attract the maximum number of po- ability to partition resources and tential students and avoid any potential switch between the partitions). Further Reading These techniques, coupled legislation related to discrimination. Harvard Cheating Scandal: http://www. with the modest amount of Overall, the benefits of online educa- thecrimson.com/article/2012/8/30/ code, “reduce the attack vector/ tion providers deploying anti-cheating, academic-dishonesty-ad-board/ surface and the propensity for flaws, making SEL4 essentially verification, and accessibility software Cheating Survey: https://www. bug-free and highly secure,” far outweigh the costs, particularly insidehighered.com/news/2012/03/16/ Heiser says. SEL4, he says, is the with respect to the growing population arizona-survey-examines-student-cheating- only protected-mode OS with of non-traditional students flocking to faculty-responses sound and complete worst-case execution time analysis. online degree programs. University of Central Florida Testing Center Heiser also is director of Dan Butin, founding dean of the Video-How it Works: http://utc.sdes.ucf.edu/ Forests Alive, an Australian School of Education & Social Policy at company whose mission is to Merrimack College, notes that imple- Keith Kirkpatrick is principal of 4K Research & demonstrate it makes economic Consulting LLC, based in Lynbrook, NY. sense “to save the world’s forests menting these systems is fairly easy by using them as carbon sinks.” for schools to do, even if there is some © 2015 ACM 0001-0782/15/12 $15.00 —Laura DiDio

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 19 viewpoints

DOI:10.1145/2835856 David Anderson VHistorical Reflections The Digital Dark Age …and why it will have to wait.

HE GROWTH IN accessible digi- Among professionals working in the drawn into the field from a back- tized primary source mate- digital preservation field, the reaction ground in what has come to be called rials, together with the im- has been much less accepting. Many of the “digital humanities,” where I con- provement in quantity and the people with whom I come into con- tinue to take a particular interest in quality of online research tact exhibit a simmering resentment the history of computing. For most of Ttools has changed fundamentally the that a great deal of pioneering and fun- that time, it has been something of an way in which historians go about their damental work carried out in Europe uphill struggle to persuade academic business. While the fundamental im- (and more widely), is being overlooked colleagues that digital preservation portance of primary sources remains in the publicity storm that Cerf’s pro- was not only something that needed the same, it is ever more the case that nouncements have generated. to be taken seriously, but was a field the sources themselves were born digi- This is entirely understandable, in which there were, and are, interest- tal, consumed digitally, and (hopefully) but I think it is almost certainly the ing and intellectually satisfying chal- preserved digitally. But what assurance wrong response. For most of the last lenges to be addressed. What was true do we have that in the next century, or decade, I have been working actively of fellow academics was doubly so of the next millennium, historians will in digital preservation, having been business leaders, politicians, and be able to access materials that were other opinion formers. Fellow digital produced on machines, and software preservationists were often inclined packages that have long ceased to ex- The situation to scratch their heads, and with their ist? This is the question Google Vice thousand-yard stare firmly in place, President and Chief Internet Evan- with the protection ask what they have to do to get preser- gelist (and ACM past president), Vint of our digital heritage vation onto the political and business Cerf raised in a number of engaging agenda. I have attended a number of recent public talks, interviews, and is far from bleak, conferences where participants de- in his Communications’ “Cerf’s Up” and there are, bated whether the best approach to column. His conclusion is that we are getting preservation taken seriously standing on the edge of a precipice, indeed, many reasons was to deploy somewhat apocalyptic and that unless we take appropriate to be cheerful. tales of the dangers of losing large steps a digital dark age awaits us. I slices of our cultural heritage as a cannot easily gauge the effect of Cerf’s result of inattention to preservation, remarks in the U.S., but in Europe, his or to encourage better digital custo- comments have been taken up widely, dianship by more positive means. It and accepted more or less uncritical- probably needs a little of both, but ly, by the broadcast and print media. the uncomfortable truth is the mes-

20 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 V viewpoints

senger is often more important than ings, and sites of archaeological and machines running application state the message, and whether the ap- cultural importance in the Syrian civil capture files, Cerf has drawn attention proach is to scare the world at large war, makes talk of a digital dark age to the importance of the human, soci- into taking preservation seriously, seem particularly resonant. The re- etal, and organizational dimensions or to provide less dramatic encour- cent execution by ISIS of the longtime of preservation. His call, for example, agement, the messenger needs to be keeper of Palmyra’s extraordinary cul- to revisit the rules on copyright to al- able to cut through the noise and be tural artifacts, octogenarian Khaled low for a “fair use” provision in the heard. Vint Cerf, as one of the archi- al-Asaad, for ‘crimes’ including repre- case of preservation activity would be tects of the Internet, and a key figure senting Syria at “infidel conferences,” a real step forward. on the U.S. politico-scientific stage, is and serving as “the director of idola- Despite the impression left by extremely well placed to bring digital try” in Palmyra, should remind us that Cerf’s remarks, Olive is not the only preservation to greater prominence in some parts of the world, protecting fruit of the digital preservation field. than has hitherto been the case, and cultural heritage comes at a very high In Europe, in response to Cerf, the should therefore be welcomed whole- price. However, whatever grounds Executive Director of the Digital Pres- heartedly, even if he does not always there are for despondency in the Mid- ervation Coalition, William Kilbride, capture fully the digital preservation dle East, and somewhat at odds with has issued a call for the preservation zeitgeist. The media interest in Cerf’s the picture being drawn by Cerf, the community to highlight preservation talk of a looming digital dark age can situation with the protection of our activities or projects in which they only serve to help raise awareness digital heritage is far from bleak, and have been involved, so that a fuller among computer scientists and engi- there are, indeed, many reasons to be picture of the preservation landscape neers and encourage them to engage cheerful. Over recent years the global over recent years might be given. with the outstanding technological digital preservation community has Many of these contributions have challenges. His intervention may been very active and there has been been gathered together in a Twitter even be instrumental in persuading real and substantial progress made. feed with the pleasingly optimistic funders to put their dollars and euros In addition to calling for new technol- hashtag (https://twitter.com/hashtag/ behind much-needed digital preser- ogies and techniques, and promoting nodigitaldarkage). vation research. the “Olive” project at Carnegie Mellon For the remainder of this column, The almost daily news of the dam- (https://olivearchive.org), which ap- and in the spirit of the nodigitaldark-

IMAGE BY DAN SAMOILA DAN BY IMAGE age being caused to historic build- proaches preservation by using virtual age campaign, I would like to draw

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 21 viewpoints

some attention to a number of proj- emulation. The KEEP (Keeping Emu- INTERACTIONS ects and other activities members of lation Environments Portable) project my own team have led, or in which they (http://www.keep-project.eu), was the have played prominent roles. These first publically funded project to de- cover the development of innovative velop emulation services to enable ac- tools and techniques, dedicated out- curate rendering of both static and dy- reach and dissemination, and the orga- namic digital objects: text, sound, and nizational aspects of preservation and image files; multimedia documents, reuse of culturally significant material websites, databases, videogames, and in digital form. so forth. The overall aim of the proj- Vint Cerf’s comments give very ect was to facilitate universal access little background of the intellectual to our cultural heritage by developing roots of projects like Olive. Most of flexible tools for accessing and storing the early work in digital preservation a wide range of digital objects. KEEP ACM’s Interactions magazine concentrated on migration as a pres- also considered legal issues concern- explores critical relationships ervation approach. This, in essence, ing the implementation of emulation- between people and depends on copying or converting based systems and proposed solu- technology, showcasing digital objects originally intended to tions that comply with European and emerging innovations and run on one technology platform to national copyright laws. industry leaders from around run on another. Inevitably, migration Digital objects are, of course, con- the world across important involves changing some of the charac- nected intimately with the techni- applications of design thinking teristics of the original digital object, cal environments in which they were and the broadening eld of so a lot of attention must be paid to created and used. In order to ensure interaction design. ensuring the properties considered long-term preservation of, and ac- Our readers represent a growing to be of the greatest importance for a cess to, digital material it is there- community of practice that is designated stakeholder community fore essential to carefully record the of increasing and vital global are preserved intact. Major practical hardware and software dependencies importance. limitations in the migration approach of each digital object in a preserved are exposed when the digital objects corpus. Typical information required in question are inherently complex. includes details of the computer Migrating a modern computer game hardware, operating system, plug- from one platform to another, for ex- ins, software libraries, and so forth, ample, involves a level of technical which a preserved object originally expertise that simply does not exist in required, together with information cultural heritage organizations, and on the hardware and software envi- involves intellectual property rights ronment that was used during any issues, which are, for all practical pur- subsequent preservation actions such poses insurmountable. As the tenden- as migration or emulation. There is cy is for digital objects to become ever substantial complexity involved in as- more complex, interest has turned to sembling and maintaining even the developing approaches to preserva- basic technical environment meta- tion that depend, as Olive does, on data required for subsequent emula- tion. This is a seriously time-consum- ing, detailed and complex task in its Most of the early own right. To address this challenge, work in digital my colleague Janet Delve led the de- To learn more about us, velopment of the TOTEM (Trustwor- visit our award-winning website preservation thy Online Technical Environment http://interactions.acm.org Metadata) technical registry (http:// concentrated amzn.to/1JuKR3c). The TOTEM ge- Follow us on neric data models, a database imple- Facebook and Twitter on migration as mentation, and a metadata schema To subscribe: a preservation have been combined with a compat- http://www.acm.org/subscribe approach. ible OWL ontology created within the PLANETS project. Although engaged in public out- Association for reach, Cerf does not refer to the con- Computing Machinery siderable efforts that have been made in this area by organizations like the Open Preservation Foundation, or the

22 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12

IX_XRDS_ThirdVertical_V01.indd 1 3/18/15 3:35 PM viewpoints

Digital Preservation Coalition. Nor The project will be public facing, pro- does he refer to any of the projects viding a fully operational archival ser- that have played a role in this space. Many of the problems vice, and access to information for its The POCOS (Preservation of Complex that appeared, just users. The project results will be gener- Objects Symposia) project (http://bit. ic and scalable in order to build an ar- ly/1LmMaPa), concentrated not on a few years ago, to chival infrastructure across the EU and the development of new tools or tech- be almost intractable in environments where different legal niques, but was established to give systems and records management tra- global thought-leaders in research are being brought ditions apply. E-ARK will provide new into the Preservation of Complex under control. types of access for business users. Objects an opportunity to share and E-ARK will pilot an end-to-end thereby extend the body of knowledge OAIS-compliant e-archival service on this topic through a series of sym- covering ingest, vendor-neutral ar- posia at locations across the U.K. The chiving, and reuse of structured and fundamental task facing these sym- unstructured data, thus covering both posia was to present material of great databases and records, addressing the technological and organizational needs of data subjects, owners, and complexity in a lucid, cogent, relevant To address this, the E-ARK project users. The pilot and methodology will and approachable manner so as to (http://www.eark-project.com/) work- also focus on the essential pre-ingest engage U.K. High-End Instrumenta- ing cooperatively with commercial sys- phase of data export and normaliza- tion researchers and practitioners in tems providers, is creating and piloting tion in source systems. The pilot will a wide variety of disciplines, as well as a pan-European methodology for elec- integrate tools currently in use in reaching those further afield in, for ex- tronic document archiving. The em- partner organizations, and provide a ample, commerce, industry, cinema, phasis is on not on “blue-sky” research framework for providers of these and government, games, and films classi- but on synthesizing existing tools and similar tools ensuring compatibil- fication boards. techniques that have been developed ity and interoperability. A core compo- The seminars were arranged around over the last decade or so both com- nent of the project is the integration three general themes: visualizations mercially and within the context of platform that uses the existing ES- and simulations; software art; and vid- publically funded research projects. SArch Preservation Platform (EPP) ap- eogames and virtual worlds. Each of National and international best prac- plication as an Archival Information these domains involves the develop- tices, that will keep records and data- System, which is already in productive ment, use, and manipulation of com- bases authentic and usable over time, deployment at the National Archives plex digital objects, and each presents are also being drawn together and inte- of Norway and Sweden. In order to a different, although clearly related, grated, with the intention of providing achieve scalability, E-ARK will adopt set of preservation challenges. A sub- a single, scalable, robust approach ca- a data management and storage layer stantial and innovative dissemination pable of meeting the needs of diverse for this tool on top of the proven open program was established to ensure the organizations, public and private, source Cloudera CDH4 distribution of various stakeholder communities ob- large and small, and able to support Apache Hadoop, enabling storage and tained the maximum long-term value. complex data types. E-ARK will there- computational power to be seamlessly This included the production of a peer- fore demonstrate the potential ben- added to the system. reviewed book (http://bit.ly/1NxL3i8) efits for public administrations, public All in all, there is considerable rea- presenting the key outputs. agencies, public services, citizens, and son to feel hopeful significant prog- Considerable effortrs have been business by providing simple, efficient ress will continue to be made in digital made to address the need for inte- access to the workflows for the three preservation in the years ahead. Many grated approaches to digital preser- main activities of an archive—acquir- of the problems that appeared, just a vation that include organizations as ing, preserving, and enabling reuse of few years ago, to be almost intractable well as tools. On the organizational information. are being brought under control. The side, archives provide an indispens- The methodology will be imple- pace of technological change contin- able component of the digital eco- mented in various national contexts, ues unabated, and this brings with it system by safeguarding information using existing, near-to-market tools, fresh preservation challenges, but de- and enabling access to it. Harmoni- and services developed by the partners. spite rumors to the contrary, it appears zation of currently fragmented archi- This will allow memory institutions the digital dark age will have to wait a val approaches is required to provide and their clients (public- and private- little while longer. the economies of scale necessary for sector) to assess, in an operational I dedicate this column to Khaled general adoption of end-to-end solu- context, the suitability of those state- al-Asaad. tions. There is a critical need for an of-the-art technologies. overarching methodology addressing The practices developed within the David Anderson ([email protected]) is the CiTECH business and operational issues, and project will reduce the risk of informa- Research Centre Director at the School of Creative Technologies, University of Portsmouth, U.K. technical solutions for ingest, preser- tion loss due to unsuitable approaches vation, and reuse. to keeping and archiving of records. Copyright held by author.

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 23 viewpoints

VDOI:10.1145/2835854 Peter J. Denning and Nicholas Dew The Profession of IT Why Our Theories of Innovation Fail Us Until we moderate our fascination with creating ideas, we will not achieve the rate of innovations we seek.

NLY 1 IN 500 patents makes its inventor money, and businesses are awash in great ideas of dubious mar- ket value (only about 4% Omake money).1 So why do people think innovation begins with a creative idea, is sold through an imaginative story, and diffuses through society because of novelty and merit? Innovators mo- bilize people to adopt ideas. Although they might start with idea creation, innovators focus mostly on other as- pects: market offers, market testing, beta prototyping, production, sales, and customer-support infrastructures that companies use to get products adopted. In fact, 90% of innovation is in fostering adoption.1,4 Ideas are often stories invented after the fact to explain innovations that already emerged, as with the iPhone example discussed later in this column. of his having to convince executives (and interpretations) about innova- Yet the media telling of the story of companies they needed a network tion process and hence our actions. makes it sound as if ideation—the cre- product they never heard of before, An example is the innovation pipeline: ation of ideas—is 90% of the work of in- and then living up to the expectations an innovation begins as an idea and novation. Ideation has produced many he left them with. He spent one year flows through stages of prototyping, inventions that never became innova- developing his Ethernet idea and the production, and marketing before ar- tions because no one adopted them. next 10 years selling Ethernets. Sales riving in the marketplace. Another is Many people are misled by stories that do not matter in invention, but they the innovation funnel: a set of ideas inaccurately equate innovation with matter in a big way in innovation. He is progressively winnowed by reviews, invention. People who believe these summarized his effort with his fa- prototype tests, and market tests, un- stories put too little effort into adop- mous saying, “Invention is a flower, til the few with greatest merit make it tion and are disappointed by their low innovation is a weed.” to the marketplace. A third example is success rates.1,5 network diffusion: an innovator injects Bob Metcalfe, the inventor of Eth- Three Flawed Memes: Hindsight, an idea into a social network, where it ernet, tells the story of 3Com, a com- Oversimplification, and Ideation spreads out across the communication pany he founded to make and sell Eth- Innovation stories are tremendously channels of the network until everyone 4 ernets. His story is full of accounts influential in guiding our perceptions has a chance to adopt it. A fourth is the LIGHTSPRING BY IMAGE

24 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 viewpoints viewpoints

innovation cell, a protected pocket of forward afterward when there was an innovators spinning off ideas into the opportunity to claim credit and be surrounding environs. Innovations recognized in Wikipedia. These exam- These stories are all “sticky.” It are new practices ples illustrate the larger pattern: most is easy to form a mental picture of a innovations “emerge” in the practices pipeline with ideas flowing through adapted in a of communities and are not caused by it, or a series of progressively narrower community, someone’s good idea.3 In fact, most of funnels flowing one into the next, or which displace what we call “ideas” behind innova- V waves of adoption washing through a tions are actually stories made up in network, or ideas spinning off a round- other practices. hindsight to explain the practices al- table. They are memes that hold our at- ready emerging. tention. However, these sticky stories With these flaws, it is difficult to contain flaws that lead the unwary into see how careful strategic planning, in- actions that do not work. novation process management, and The first flaw is that our stories charismatic leadership can work con- about innovation are retrospective. In find a way to show them Ethernet took sistently well. In a review of Barbara hindsight, we can see all the actions care of an important concern, and Tuchman’s March of Folly: From Troy to involved in an innovation and de- build trust in him as the salesman Vietnam, written many years ago, Gor- scribe a pattern they seem to follow. and supplier. How did he learn their don Wood wrote there was but one big But as innovators “in the trenches” doubts? Discover their unmet con- lesson of history: “Nothing ever works we experience things quite different- cerns? Construct a proposal on the out quite the way its managers intend- ly. Every action seems to have an un- spot for how they could try Ethernets ed or expected.”6 This larger lesson un- predictable outcome and we cannot at acceptable risk? Lead them to the fortunately has not yet made it into our tell if it leads us closer to our desired conclusion he was sincere, compe- dominant narratives about innovation. innovation. So many things depend tent, and had their best interests at Sticky innovation stories are easy on actions of other people. Doubt heart? Bob will tell you he often had to recall and fun to retell. The only and uncertainty are irreducible. You no idea what it would take to close a way to displace these stories is to in- cannot “see” where you are in the deal, and in many cases he failed to terpret innovation with new and bet- pipeline, funnel, network, or cell; close a deal. He did not feel in control. ter stories. You need a new story to only future historians can pass those The best he could do is approach each dislodge a story.3 judgments. Bob Metcalfe did not find encounter with a sense of confidence executives ready and waiting for Eth- he could lead the conversation to a If Not Ideation, Then What? ernets; he constantly had to confront successful conclusion. How did Bob If ideation is a relatively easy 10% of their doubts about a product they cultivate a mood in himself that dis- your effort, how should you spend the never heard of before, persuade them posed him toward success? other 90%? What should you do? We of its benefits to their companies, and The third flaw is all the innova- like the story from Fernando Flores convince that he would be a trustwor- tion models assume an idea starts the about innovation emergence.2 This thy supplier of Ethernets. If you try process. Someone’s idea triggers the story begins with the notion that in- to form an innovation plan around pipeline, or feeds the funnel, starts a novations are new practices adopted the pipeline, funnel, network, or cell wave in the network, or seeds the cell. in a community, which displace other model, your plan will almost always What if most innovations do not be- practices. Emergence of a new practice fail because the people involved can- gin with an idea? For example, social begins when someone makes a pro- not tell where they are in your imag- innovations such as Mothers Against posal to combine existing practices in ined structure. Drunk Driving or more recently legal- a new way to meet an unmet concern. The second flaw is that our stories ized marijuana and same-sex mar- The proposal is contingent on many about innovation are tremendously riage welled up in popular opinion factors: technologies and practices al- oversimplified. The stories present the and swept many people along. The ready in existence, unmet concerns in successful actions of innovators as de- leaders of these movements report the community, the proposer’s timing liberate, considered, and sometimes they were reacting to injustices and and choice of concern to address, the inspired choices by persons able to not creating ideas. Many technology social power of the proposer’s network, make sense of the situation and con- innovations seemed to well up out of and the strength of the opposition.2 trol it. Their individual actions fit to- circumstances of the time without Bringing an innovation into reality, gether into neat causal chains whose anyone claiming to have put an idea therefore, is unpredictable and relies outcomes align with the innovator’s into action. For example, the iPhone on explicitly working for adoption. intentions. Bob Metcalfe could say af- and smartphones that imitated it For example, Steve Jobs did not terward that he visited the “ABC” com- seemed to catch on because “the time simply create iPhone in a flash of pany, overcame their doubts, and got was right” even though many previous genius and sit back and wait for the their order for Ethernets. But when he similar attempts had failed. Blogging profits to roll in. His contribution was was there nothing was certain. He had seemed to well up without anyone to believe in a vision of a lightweight to learn their doubts and concerns, inventing blogging or even stepping portable phone that could be custom-

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 25 viewpoints

ized to its owner’s detailed personal Offering and mobilizing are the preferences, and to mobilize a busi- core skills. Your offers are propos- ness network to make it happen. The Innovations emerge als to take designs into social move- Apple company invested a lot of work in spaces of practices, ments. Can you make offers that in- to transform the iPhone vision into trigue people with new possibilities an adopted technology. The transfor- which are constantly to address their (often unspoken) mation was contingent on the exis- drifting and changing concerns and do not seem too risky? tence of other components already, Can you turn your networks (or build or soon to be, in place. Apple worked as powerful forces a network) into a following of people with suppliers to build smaller and converge and conflict. who commit to the new practices the more energy-efficient components offer brings? Do you understand who such as hard disks, touch displays, will resist or support and what actions scratch-resistant gorilla glass, sen- will harness the power of the network sors for GPS and motion, and bat- to shape the emerging new future? teries. Apple adapted the operating Detecting, appropriating, navigat- system MacOS into iOS that would ing, and surfing all support the core manage an interface presenting a nius of the founders and ignore the skills. Detecting means to sense an large collection of user-chosen apps. hard work they put into adoption. unmet concern and form an inkling Apple adapted the iTunes store into that you can do something about it. an apps store and cultivated a net- Six Fundamental Skills Appropriating means to immerse in work of a million programmers to The six skills in the accompanying ta- related domains to discover marginal populate it with downloadable apps. ble nicely summarize what innovators practices and interpretations that Apple worked with the telecommuni- do.2 Innovations emerge in spaces of can help you with your inkling. Navi- cation companies, initially AT&T, to practices, which are constantly drift- gating means to move toward a goal create data plans within the cellular ing and changing as powerful forces in a complex and uncertain world; the phone network. Apple worked with converge and conflict. Innovators pro- metaphor recalls seafaring explor- professional product designers and pose changes of practice and shape ers in open oceans who must respect marketers to position the iPhone as a their adoption. In the swirl of the forc- the power of the waves and the limi- lifestyle enhancer rather than a mo- es nothing is certain. Multiple people tations of their crews, avoid storms, bile phone. The iPhone was contin- are likely to come up with competing and deal with emergencies. Surfing gent on all these components and the proposals at about the same time, means to ride waves that move in the business deals that made them work. each responding to the sense of an un- direction you seek and keep your bal- Its adoption took a great deal of busi- met concern that anyone who cares to ance when turbulent network forces ness and political skill. Yet the popu- listen can detect. These six skills are buffet you. lar stories focus on Jobs alone and ig- based on your ability to listen for con- To be an innovator, learn these nore the huge amount of work Apple cerns, histories, movements of social six skills. invested to get the iPhone widely ad- power, barriers, moods, reactions to opted. You will find similar stories in offers, and followers in networks. They References 1. Denning, P. and Dunham, R. The Innovator’s Way. MIT all the other technology companies. depend only loosely on communicat- Press, 2010. The standard stories focus on the ge- ing your ideas or telling your stories. 2. Denning, P. and Flores, F. Emergent innovation. Commun. ACM 58, 6 (June 2015). 3. Freedman, L. Strategy: A History. Oxford University Six skills for achieving adoption. Press, 2013. 4. Metcalfe, R. Invention is a flower, innovation is a weed. MIT Technology Review (Nov. 1999); http://bit. ly/1Pt5ygT. Offering Making proposals of combinations of existing practices and 5. Read, S., Sarasvathy, S., Dew, N. Wiltbank, R., and technologies to meet an unmet concern, then observing Ohlsson, A.-V. Effectual Entrepreneurship. Routledge, reactions and modifying the offer to be more attractive. 2010. 6. Wood, G. History lessons. Review of Barbara Tuchman, Mobilizing Getting a social network to back your offer and help make it March of Folly: From Troy to Vietnam. New York happen; depends on the social power of the network and Review of Books (Mar. 29, 1984). on your personal power.

Detecting Sensing an opportunity in an unmet concern or a disharmony; Peter J. Denning ([email protected]) is Distinguished being unsettled by an anomaly. Professor of Computer Science and Director of the Cebrowski Institute for information innovation at the Appropriating Investigating related domains to understand their history Naval Postgraduate School in Monterey, CA, is Editor behind their concerns, and to discover existing practices that of ACM Ubiquity, and is a past president of ACM. The might help with the concern you are dealing with. author’s views expressed here are not necessarily those of his employer or the U.S. federal government. Navigating Finding your way amidst conflicting waves of possibilities, coming to your goal without having a detailed plan to get there. Nicholas Dew ([email protected]) is Associate Professor of Strategic Management at the Naval Postgraduate School Surfing Finding waves of possibilities moving toward your goal in Monterey, CA. He researches entrepreneurship and is a and riding with them, retaining your balance and center co-author of the textbook Effectual Entrepreneurship. The author’s views expressed here are not necessarily those of when hitting turbulence. his employer or the U.S. federal government.

Copyright held by authors.

26 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 viewpoints

VDOI:10.1145/2835957 Nancy Tuana Computing Ethics Coupled Ethical-Epistemic Analysis in Teaching Ethics Critical reflection on value choices.

THICS IS AN important com- ponent of STEM education as illustrated by the fact that ABET accreditation requires proof of training in ethics Efor engineering fields. But what range of knowledge and skills are required for integrity? Covering codes of ethics does not teach ethical sensibility or the ethical reasoning skills required for re- search and professional integrity.10 So what should be included in ethics edu- cation for engineers? Engineers like to “get it right,” and engineering education should focus on knowledge and skills to ensure that engineering work meets the highest standards of empirical and theoretical adequacy. This is what Mason5 refers to as a “covenant with reality.” But this does not ensure the best interests of the broader community and the envi- ronment are served. “Getting it right” requires understanding what is epis- temically and ethically salient to the issues being addressed, communicat- ing information and sharing technolo- gies in ways users can understand, and supporting responsible application. Integrity in engineering research and practice depends on values. This is especially important in the respon- sible conduct of research. “Computer experts,” Deborah Johnson reminds us, “aren’t just building and ma- tions, and values.” Johnson claims after the user logs off—to tailor ad- nipulating hardware, software, and computer experts can “facilitate and vertisements or software or enable code, they are building systems that constrain behavior, and materialize companies to engage in workplace help to achieve important social func- social values.”4 Social media pages, surveillance. Is this OK and why? An tions, systems that constitute social for instance, can be designed to track important but often overlooked com-

IMAGE BY GARSYA BY IMAGE arrangements, relationships, institu- Internet browsing histories—even ponent of ethics education is the abil-

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 27 viewpoints

A visualization of coupled ethical-epistemic analysis. will help answer these questions (see the accompanying figure). Values transparency and analy- sis is central to the National Science Ethical Analysis Epistemic Analysis Foundation’s Sustainability Research How should we act? What can we know? Network on Sustainable Climate Risk

Responsible Uncertainty Management (SCRiM, http://scrim- selection of quantification hub.org). Coupled ethical-epistemic research topics Coupled Ethical-Epistemic analysis has helped identify new and Science to Model selection Analysis society impacts refined research topics, and informed How do we support responsible modeling for multi-objective, robust action with what we know? decision making.8 An example is the debate over ice sheet data in modeling Ethical values that inform sea level rise. Whether ice sheet melt epistemic decisions data is sufficiently robust for sea level rise projection models was debated by Epistemic decisions Intergovernmental Panel on Climate that have ethical import Change (IPCC) scientists. They had to balance evidential robustness and Ethical values Epistemic values predictive power (epistemic values), Fairness Robustness of evidence a decision that remains controversial Sustainability Predictive power despite improvements in scientific Convergence of evidence Caring understanding and modeling of ice Human well-being Scope !" sheet dynamics (for example, Chang et al.2). Balancing these epistemic values has ethical implications: a de- cision to not include ice sheet data ity to identify values and appreciate fields be based on analysis of values might result in underreporting future the ethical dimensions of the broad- in four dimensions of research and sea level rise.7 Does the value assigned er impacts from actions computing practice. Various types of values can to evidential robustness outweigh the professionals might take. Coupled be involved in each domain including impact on predictive power? Also at ethical-epistemic analysis is an im- ethical values (the good of society, eq- stake is whether ice sheet data helps portant lens for critical reflection on uity, sustainability), aesthetic values the wider community understand the value choices. This column explains (simplicity, elegance, complexity), or impact of climate change mitigation why and how. epistemic values (predictive power, re- and adaptation decisions. Epistemic Values serve as a guide to action liability, coherence, scope). and ethical values are coupled. and knowledge. They are relevant to ˲˲ What is a good basis for the selec- Mason5 urges that modelers adopt all aspects of scientific and engineer- tion of research topics? a covenant with reality and a covenant ing practice, including discovery, ˲˲ What counts as evidence and what with values to make models faithful analysis, and application. Cognitive constitutes robust evidentiary support? not only with relevant facts, but also scientists have found values to be ˲˲ What is the likelihood that a model, with the values of intended model inextricable components of STEM re- hypothesis, or theoretical explanation users. Fleischmann and Wallace3 search. Paul Thagard explains, “the will provide convincing explanation? augment Mason’s work with a third decisions that scientists and others ˲˲ Are epistemic and ethical values covenant, values transparency in mod- need to make about what projects relevant to applying results to other re- eling, which “not only allows the client to pursue, what theories to accept, search problems or to social problems to assess whether the model conforms and what applications to enact will (for example, via decision-support)? to the first two covenants, it allows the unavoidably have an emotional, Coupled ethical-epistemic analysis client to assess when the model is mis- value-laden aspect,” and concludes, behaving or malfunctioning.” They “the best course is not to eliminate suggest that transparency can allow values and emotions, but to try to Users (and modelers) clients to avoid, “circumstances where ensure that the best values are used in the model errors might lead to nega- the most effective ways.”9 Decisions are not always aware tive consequences for those affected about whether there is robust evi- of relevant values. by the model.3 dence for a claim (an epistemic val- I applaud and support these posi- ue) can for example be influenced by tions, and add a further step. Users possible effects on human well-being (and modelers) are not always aware (an ethical value). of relevant values. Ethics education To use the best values effectively, designed to promote values identifi- I advocate ethics education in STEM cation and analysis is a key element

28 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 viewpoints

impacted by use become essential ele- ments in that work. Epistemic and Parallel training of philosophers ethical value and social scientists is needed so they can assist with coupled ethical-epis- decisions have temic analysis. Techniques such as important social ViMM assist in identifying the range of relevant values, and serve as a ba- implications. sis for careful analysis of the implica- tions of value choices. This approach works best when trained philosophers and social scientists are embedded

in research teams, collaborating with APPS engineers and scientists. The SCRiM of this ability, but values analysis re- network illustrates the benefits of this quires techniques to identify values. transdisciplinary approach to coupled An example is values-informed men- ethical-epistemic analysis. tal models (ViMM).1 ViMM is an em- Engineering ethics training involving pirically grounded method for gath- coupled ethical-epistemic analysis helps ering information about individuals’ with “getting it right” in all senses. beliefs and inferences about an issue, Access the and elicits values in addition to beliefs References 1. Bessette, D. et al. Values-informed mental models: and inferences. It is designed to pro- A new tool for decision support and analysis. Nature, latest issue, vide the transparency regarding users’ Climate Change. Forthcoming. 2. Chang, W. et al. Probabilistic calibration of a Greenland past issues, and modelers’ values in a decision Ice Sheet model using spatially-resolved synthetic situation. observations: Toward projections of ice mass loss with uncertainties. Geoscientific Model Development, 7 BLOG@CACM, Epistemic and ethical value deci- (2014), 1933–1943; DOI: 10.5194/gmd-7-1933-2014. 3. Fleischmann, K.R. and Wallace, W.A. A covenant sions have important social implica- with transparency: Opening the black box of models. News, and tions. This fact is reflected in the Na- Commun. ACM 48, 5 (May 2005), 93–97. 4. Johnson, D.A. Computer experts: Guns-for-hire or tional Science Foundation broader professionals? Commun. ACM 51, 10, (Oct. 2008), more. impacts criterion and its view of the 24–26. DOI: 10.1145/14 00181.1400190. 5. Mason, R.O. Morality and models. Ethics in Modeling, role and purpose of ethics education: W.A. Wallace, Ed. Elsevier, Tarrytown, NY, 1994. “Ethics education is particularly criti- 6. National Science Foundation. Status Update on NSF Implementation of Section 7009 of the America cal to the science and engineering COMPETES Act (ACA): Responsible Conduct of community as it faces an increasingly Research Advisory Committee for Business and Operations Spring Meeting, May 2008; http://1.usa. competitive funding environment; ris- gov/1G6c0IR. ing collaboration with international 7. Rahmstorf, S. A new view on sea level rise. Nature Reports Climate Change, 4 (2010), 44–45; DOI: colleagues who may follow different 10.1038/climate.2010.29. 8. Singh, R., Reed, M., and K. Keller, K. Many-objective Available for iPad, guidelines; and growing recognition robust decision making for managing an ecosystem of the relevance of science and engi- with a deeply uncertain threshold response. Ecology iPhone, and Android and Society. Forthcoming. neering to social, economic, and ethi- 9. Thagard, P. The Cognitive Science of Science: cal issues of wide public and political Explanation, Discovery, and Conceptual Change. MIT 6 Press, Cambridge, MA, 2012. interest.” 10. Tuana, N. An ethical leadership developmental A twofold approach should be used framework. A Handbook of Ethical Educational Leadership. C. Branson and S.J. Gross, Eds. in training for coupled ethical-epis- Routledge, 2014. temic analysis. This requires some changes in ethics education for engi- Nancy Tuana ([email protected]) DuPont/Class of 1949 neers, philosophers, and social scien- Professor of Philosophy and Women’s Studies at the Nancy Tuana Directorship of the Rock Ethics Institute, tists. Engineers must become more Penn State University, State College, PA. Available for iOS, aware of the ethical and epistemic val- Android, and Windows ues embedded in all components of This work was partially supported by the National their work as well as their salience and Science Foundation through the Network for Sustainable http://cacm.acm.org/ Climate Risk Management (SCRiM) under NSF about-communications/ role in research and practice through cooperative agreement GEO-1240507. Any opinions, training for values transparency and findings, and conclusions or recommendations mobile-apps expressed in this material are those of the author and COMMUNICATIONS coupled ethical-epistemic analysis do not necessarily reflect the views of the National skills. This better prepares engineers Science Foundation. The theory and practice of coupled ethical-epistemic analysis was informed by dialogue to understand and take responsibil- with members of the SCRiM team. Thanks to Rachelle ity for the epistemic and/or ethical im- Hollander and several anonymous reviewers for insightful editorial feedback on the column. port of the values embedded in their work, so values of the users and those Copyright held by author.

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 29 viewpoints

DOI:10.1145/2835959 George V. Neville-Neil V Article development led by queue.acm.org Kode Vicious Pickled Patches On repositories of patches and tension between security professionals and in-house developers.

Dear KV, I recently came upon a software re- pository that was not a repo of code, but a repo of patches. The project seemed to build itself out of sever- al other components and then had complicated scripts that applied the patches in a particular order. I had to look at this repo because I wanted to fix a bug in the system, but trying to figure out what the code actually looked like at any particular point in time was baffling. Are there tools that would help in working like this? I have never come across this type of system before, where there were more than 100 patches, some of which contained thousands of lines of code. Pick a Peck of Pickled Patches

Dear Pickled, The appropriate tools for such a system do exist, but they require a background discipline to spend some upfront time extreme measures. check in many states in the U.S. and are thinking about how to integrate exist- You mention that the project you banned outright in the more developed ing code with new development, and are looking at is a repo of patches for countries of the world. if that work does not get done early, it use with another project. That being What you are faced with is a project often does not get done at all. the case, you need to lay down what I that probably ought to have forked the If you want to fix a single bug in the will refer to as a base track. The ulti- projects it was working with, but, in- system, I suggest you contact the de- mate, upstream software the system stead, started with one patch, then two velopers, because they should under- is based on has to be the base layer. It patches, then four patches, until you stand what they have done—and the also needs to be placed into a source have what you see before you. When a mess they are in—sufficiently well to code control system that allows you project is developing quickly and has be able to address your problem more to update that base layer from the ul- not started out with the understanding quickly than you can sort out what timate source. With the base layer in that it is a significant derivative work, they have done with their system. On place, you should create a branch per the proper use of source code control the other hand, if you need to do sig- patch from the derivative system. You tools may not occur soon enough in nificant work on the system you are could do this blindly, but it is probably

the development process. It requires looking at, you may have to take more best—although possibly quite frustrat- ASSOCIATES/SHUTTERSTOCK ANDRIJ BORYS BY IMAGE

30 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 viewpoints viewpoints

ing—to read through the project build scripts carefully beforehand. I will wa- Calendar ger that some of the patches you see in Let me say that the derived project are not standalone, again, never directly of Events but instead depend on each other to fix some underlying bug or to implement modify the base December 1–4 a complex feature of the system. Once layer in your CoNEXT ‘15: Conference on you have collected the patches into Emerging Networking project repository. Experiments and Technologies, V groups, you can then create the patch Heidelberg, Germany branches and import the patches from Sponsored: ACM/SIG, the derived repo. With code now prop- Contact: Felipe Huici Email: [email protected] erly contained in a source code con- trol system, you should branch the December 5–9 base layer into its own development MICRO-48: The 48th branch. Never directly modify the base velopers are in different organizational Annual IEEE/ACM International Symposium layer in your project repository, as this units and the security professionals of Microarchitecture, will make integrating changes from have the ultimate responsibility for se- Waikiki, Hawaii, the ultimate upstream repository near- curity, it is natural for the security per- Contact: Milos Prvulovic, ly impossible. Let me say that again, spective to dominate dialogue between Email: [email protected] never directly modify the base layer these two camps. December 7–10 in your project repository, as this will Security professionals have a clear UCC ‘15: 8th International make integrating changes from the mandate to protect the organization, Conference on Utility ultimate upstream repository nearly and their toolset necessarily includes and Cloud Computing, Contact: Ashiq Anjum impossible. rigidly standardized computer set- Email: [email protected] For the base layer you will always tings and policies and enforcement have at least two branches, the pristine mechanisms. December 07–11 th branch that includes only changes In-house developers frequently re- Middleware ‘15: 16 International Middleware coming from the upstream, and the quire exceptions to security policies Conference, development branch that takes code because their work may require ac- Sponsored: ACM/SIG, from the pristine branch and merges cess to software tools excluded from Contact: Rodger Lea, it with the patches. You can now in- the standard office suite (for example, Email: [email protected] tegrate patches into the development integrated development environ- branch and test them one by one to ments), security testing tools (such 2016 make sure they work individually be- as OWASP Zap), and/or elevated privi- fore trying to make them all work to- leges. Also, developers who work with January gether. KV often goes on about test- multiple projects (as most of us do) January 10–13 SMA ‘16: The 4th International ing, but in your case it bears a good may need multiple virtual machines Conference on SmartMedia deal of emphasis. Unless you are in in order to manage multiple develop- and Applications, close communication with your up- ment-project contexts. Danang, Viet Nam, stream providers, you have no idea As a software developer who at- Contact: Youngchul Kim, Email: [email protected] how they are testing these patches, tempts due diligence with respect and accepting them wholesale without to security, I am often disappointed incremental tests is a great way to wind when security professionals seem to February up paying a lot of money to someone pay so little attention to the concerns February 14–17 TEI ‘16: Tenth International who puts you on a couch and asks you of in-house software developers. When Conference on Tangible, questions about your childhood. security policies are inflexible, useful Embedded, and Embodied Of course your best bet is to find tools or approaches are disapproved, Interaction, the people creating patches of patches and in-house developers are unable to Eindhoven, Netherlands, Sponsored: ACM/SIG, and then give them this response. You apply software development best prac- Contact: Saskia Bakker, might want to inscribe it in golden fire tices, the organization is not necessar- Email: [email protected] on tablets first, but that is up to you. ily more secure. KV I am hoping you will consider using February 21–23 FPGA’16: The 2016 ACM/SIGDA your voice to stimulate debate on this International Symposium on topic. Please consider a blog article Field-Programmable Gate Arrays, Dear KV, discussing how security profession- Monterey, CA, Many organizations may be institu- als might collaborate with in-house Sponsored: ACM/SIG, Contact: Deming Chen, tionalizing tension between security developers to the benefit of all. You Email: [email protected] professionals and in-house develop- might consider discussing alternative ers. In organizations where the secu- approaches for reconciling corporate rity professionals and the in-house de- policies (or USGCB) with develop-

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 31 viewpoints

ers executing security probes against either as an individual user, who usu- their own code, for example with ally has sufficient privilege to run the OWASP Zap. The case for service but not enough to test or debug Zapped elevated privileges it, or as root. When developers are de- bugging or testing, they want to run is another one the software as “root” or with a similar Dear Zapped, that comes up superuser-type power because then “it Once upon a time I was one of those just works.” While these two levels of in-house security professionals, I still frequently between power are easy to understand—user vs. have cards with my title, “Paranoid,” security and root—and while they do underpin a lot printed right on them. I never keep old of modern computing thanks to their business cards, but I kept those be- development teams. use in the original Unix operating sys- cause they are the only ones I have ever tem, they are insufficiently expressive. had that were that honest. I could print But that is a topic for another, much more honest cards, but then I could longer discussion. not hand them out in polite company. Short of rewriting a ton of existing I have never been a fan of blanket software, we come down to needing bans of, well, just about anything, and test environments, walled off from definitely not software that would help be amusing, in much the way that some most of the rest of the system, or re- developers produce better and more se- people consider car accidents fun to quiring special commands to get ex- cure code. Blanket bans usually come watch, but it is not going improve site ternal access, to give developers a safe from a misguided belief that the rules reliability or security. Large companies sandbox in which to work. of engagement can be defined by a have dedicated teams of pen testers— If elevated privileges are absolutely small group and that if everyone sticks often within the security team. Having required to get a job done, that privi- to those rules nothing can possibly go a small group that can work with devel- lege must have a timeout. To me it wrong. That belief is not only mistaken, opers to create appropriate security test seems reasonable to give a developer but incredibly dangerous. Any security scenarios and schedule them to run at root powers on a production box so team with a clue knows you set out gen- times that are convenient for testing is a long as they are working with one other eral guidelines and then work with the good solution if sufficient resources are person, and all their commands are development group in an advisory role available. Startups and smaller compa- written to a file. The sudo program can to ensure the guidelines make sense. nies will need to have their developers actually do all of this. I would either set Only an idiot would create a set of rules do this type of work, just as they have the timeout for a day or to when the that apply to both the development and them do everything else, from coding to problem was fixed, whichever came the accounting teams. Alas, the world testing and documentation. The rise of first. That statement is not meant to is filled with idiots as well as those who cheap cloud computing should actually be a policy to be blindly adopted by simply fear the unknown. help in this area because a service can all of my willing thralls; it is simply an A cursory glance at the software you be cloned, walled off, and “attacked” example of how security and develop- mention does not show it to be more with tools without harming the active ment teams can work together to get a dangerous than any other piece of soft- service. One has to be careful with this job done. ware a developer might use, right down type of testing, as some cloud providers KV to a compiler or a debugger. What is may flag your attack testing as a real at- important in any of these discussions tack and shut you down. You mention is an ability to come up with reason- virtual machines in your letter, and this Related articles on queue.acm.org able boundaries and safeguards so is another way to achieve the same ends the software in question can be used as the cloud solution, by spinning up a Painting the Bike Shed Kode Vicious without causing accidental damage to cluster of virtual machines on a virtual http://queue.acm.org/detail.cfm?id=1557897 the systems. Reasonable guidelines are network on a large server dedicated to developed in concert with the teams security testing. Security in the Browser Thomas Wadlow doing the work. Members of a good The case for elevated privileges is http://queue.acm.org/detail.cfm?id=1516164 security team know they are playing a another one that comes up frequently Patching the Enterprise supporting role and they must gain the between security and development George Brandman trust of the people they are working teams. The basic problem is that soft- http://queue.acm.org/detail.cfm?id=1053344 with in order to do their job. ware systems, including operating sys-

When developers need to do some- tems, but also extending to databases George V. Neville-Neil ([email protected]) is the proprietor of thing considered particularly danger- and other pieces of critical software, Neville-Neil Consulting and co-chair of the ACM Queue editorial board. He works on networking and operating ous, for instance attack their own sys- are not engineered with the idea of se- systems code for fun and profit, teaches courses on tems, it often makes sense to do this curity in mind. The now-famous XKCD various programming-related subjects, and encourages your comments, quips, and code snips pertaining to his in a lab environment, at least at first. comic (https://xkcd.com/149/) about Communications column. Unleashing the latest penetration test- the sudo command pretty much says ing toy on the company website might it all. Most software is written to run Copyright held by author.

32 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 viewpoints

VDOI:10.1145/2835961 Richard E. Ladner and Sheryl Burgstahler Broadening Participation Increasing the Participation of Individuals with Disabilities in Computing Lessons learned from a decade of practice.

“ ACKING DIVERSITY ON an en- gineering team, we limit the set of solutions that will be considered and we may not find the best, the elegant so- Llution.”8 This insight, as expressed by computer scientist William A. Wulf, has motivated efforts to encourage women and racial/ethnic minorities to pursue computing careers. How- ever, the underrepresentation of in- dividuals with disabilities in these disciplines has been largely ignored. Not so at the University of Washing- ton. Our AccessComputing initiative has worked tirelessly in this domain at a national level and, over a de- cade, has achieved promising out- comes in increasing the number of people with disabilities completing degrees and pursuing careers in A student engages with a computer through a touchscreen. computing fields. AccessComputing efforts have increased the capacity to be accessible to a broad audience of key players—postsecondary insti- and preparation of professionals with tutions, professional organizations, accessibility skills to help employers and industry—to fully include indi- meet legal mandates under the Ameri- viduals with disabilities and helped cans with Disabilities Act (ADA) and individuals with disabilities move The Need address the needs of an aging popula- toward computing degrees and ca- Increasing the participation of indi- tion. Individuals with disabilities are reers.1 In this column, we share ex- viduals with disabilities in computing uniquely suited to contribute to this ef- amples of project activities and les- fields is not just a matter of quantity. fort. However, people with disabilities sons learned to encourage other Engaging this population can help us face challenges in pursuing science, educators and employers to join our meet the mandate the National Science technology, engineering, and math- efforts in increasing diversity in Foundation (NSF) to support “the best ematics (STEM), completing degrees computing disciplines by engaging ideas from the most capable research- and succeeding in careers. Student more individuals with disabilities in ers and educators.”6 This applies to the veterans with post-traumatic stress

PHOTOS COURTESY OF ACCESSCOMPUTING COURTESY PHOTOS these professions. design of information technology (IT) disorder and traumatic brain injury,

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 33 viewpoints

“signature injuries” of recent conflicts, ˲˲ Individual participation: Implement can face additional challenges related evidence-based practices (for example, to social adjustment, finances, and ad- Increasing workshops, mentoring, internships) to justment to their disabilities.3 the participation increase the number of individuals with One ongoing challenge to ad- disabilities moving through critical junc- dress is ensuring the availability of of individuals tures to computing careers. information technology that is ac- with disabilities Bringing together multiple stake- cessible to everyone. There are well- holder groups is a key to AccessCom- documented solutions for making IT in computing puting’s success. This starts with lead- accessible; they help with access to fields is not just ership. As its leaders, we represent the keyboards and mice for those with Department of Computer Science and limited mobility, to content within a matter of quantity. Engineering and the DO-IT (Disabili- images for students who are blind, ties, Opportunities, Internetworking, and to audio content for students who and Technology) Center, both at the are deaf or hard of hearing.7 Students University of Washington. Together, with learning disabilities or who have we have a depth of experience in com- difficulty with focus, reading compre- puter science as well as access issues hension, or other cognitive issues can and solutions for individuals with dis- benefit from inclusive instructional AccessComputing: It Is abilities. In addition, AccessComput- approaches that do not make them About Engaging Professionals ing engages 58 project partners who stand out as different.5 The success AccessComputing was created in 2006 represent a diverse set of postsecond- stories of the relatively few individu- to address the need to increase diver- ary institutions and computing orga- als with disabilities in computing sity in computing by including more nizations. Three of our partners, Na- fields demonstrate the opportunities individuals with disabilities. Funded tional Technical Institute for the Deaf for those who develop academic and by the NSF, the initiative engages in- at Rochester Institute of Technology, self-determination skills and who dividuals with disabilities as well as Gallaudet University, and Landmark overcome inaccessibility in facilities, those who support, educate, and em- College, serve students with disabili- curricula, and IT and inadequate aca- ploy them, in sustainable practices for ties as a specific mission. Institutional demic support. However, the biggest making computing disciplines more representatives engage online and barrier is often lack of encourage- welcoming and accessible to people onsite to explore promising practices ment by individuals with whom they with disabilities. Project objectives and develop resources that enhance interact, including educators and fall into two areas: understanding and promote replica- employers. Integrating content about ˲˲ Organizational capacity develop- tion at educational institutions, in how IT can improve the lives of indi- ment: Increase the capacity of post- professional organizations, and in the viduals with disabilities in computing secondary institutions, organizations computing industry. courses can encourage students with that broaden participation in com- AccessComputing leaders have also disabilities and others interested in puting, and industry to fully include worked with 175 collaborators, some social impact who may not otherwise individuals with disabilities in com- who have hosted student interns consider the computing field. puting fields. and others who have hosted events for students with disabilities or fac- ulty and staff at their sites. We have awarded 75 mini-grants to collabo- rators to help fund their activities that support the project objectives. We have worked with 78 computing departments and professional orga- nizations to help make them more accessible and welcoming to people with disabilities. Examples include working with 21 computing depart- ments to evaluate and help improve the accessibility of their websites; with ACM SIGCHI in its efforts to make CHI conferences more accessi- ble; with our partner, Center for Mi- norities and People with Disabilities in IT (CMD-IT), in its efforts to make the Tapia Conference, Academic Ca- reers Workshop, and its other activi- A student uses a speech interface to gain the functionality of a keyboard and mouse. ties more accessible and welcoming

34 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 viewpoints

to people with disabilities; and EDU- Respondent ratings of gains from internships: Six questions on a 5-point scale (1 = strongly CAUSE in making its events more disagree, 5 = strongly agree). accessible and to address disability- related topics in its conference ex- Average hibits and presentations. We have Survey Statement: Response also worked with partners and col- I am more motivated to study and work toward a career. 4.33 laborators to expand the AccessCom- puting online searchable knowledge My knowledge of my career interests has increased. 4.27 base of 487 questions and answers, case studies, and evidence-based I have learned the skills I need to succeed in specific job tasks. 4.08 practices that can help leaders and I have learned skills I need to effectively work with co-workers. 4.00 practitioners make computing disci- plines more inclusive.1 I have learned the skills I need to effectively work with supervisors. 3.99

I learned about disability-related accommodations I may need at work. 3.67 AccessComputing: It Is About Engaging People with Disabilities Over the past 10 years almost 450 stu- dents around the U.S. have partici- sis of results supports the efficacy of have learned. For example, we have pated in the AccessComputing Team, evidence-based interventions by doc- learned that positive outcomes result an online community of computing umenting higher levels of success—in from cooperative efforts between or- students. They have engaged with peer terms of high school graduation, col- ganizations focused on computing and career mentors, in research and lege attendance/persistence, comput- and those focused on disability and industrial internships, and profes- ing majors/degrees and careers—of that new activities benefit from facili- sional conferences and other career participants with disabilities than of tation by experienced staff and provi- development activities. Of the 410 Ac- people with disabilities in compari- sion of external funding for initial ac- cessComputing internships, 110 were son groups. tivities on which institutionalization research internships at various univer- can build. sities. The outcomes of participation Lessons Learned Many lessons have been learned are impressive. For example, in a fol- A decade of AccessComputing efforts with respect to the participation of stu- low-up survey, 151 AccessComputing in- has resulted in students with disabili- dents with disabilities in computing terns rated six statements on a 5-point ties moving through critical junctures fields. scale, where 1 is “strongly disagree” to computing degrees and careers; ˲˲ Individuals face common issues and 5 is “strongly agree.” The items increased capacity of computing fac- as well as unique challenges related to and average responses are represented ulty and professional organizations specific disabilities, in both academic in the accompanying table. to include participants with disabili- and non-academic (for example, self- These results are consistent with an ties; useful resources for educators advocacy) arenas. earlier study where STEM interns with and employers; and sustained engage- ˲˲ Motivational activities should be disabilities reported increased moti- ments of stakeholder groups. Others undertaken to recruit students without vation to work toward a career, knowl- who want to host a project like Access- initial interest in computing. edge about careers and the workplace, Computing can benefit from what we ˲˲ For students with disabilities in- job-related skills, ability to work with terested in computing, comprehensive supervisors and coworkers, and self- preparation and retention interven- advocacy skills.2 Profiles of 17 Access- One ongoing tions should be undertaken in order to Computing Team members are high- produce more positive outcomes than lighted on the Choose Computing web challenge to isolated efforts. page of our website.1 address is ensuring ˲˲ We need to develop more leaders We measure project impact by and role models with disabilities in gathering participant feedback and the availability computing fields by undertaking more by tracking the progress of student of information efforts like AccessComputing. participants. AccessComputing par- ˲˲ Aggressive retention efforts are es- ticipants are tracked through criti- technology that sential because the percentage of stu- cal junctures4 to computing degrees is accessible dents in computing majors who have and careers. Some AccessComput- disabilities decreases as education lev- ing participants participate in the to everyone. els increase. AccessSTEM/AccessComputing/DO-IT Much has been learned about orga- Longitudinal Transition Study (ALTS), nizational capacity development as well. an ongoing research study that tracks ˲˲ Computing departments, pro- the progress of students with disabili- grams, and professional organiza- ties through critical junctures to post- tions should be welcoming and ac- secondary degrees and careers. Analy- cessible to people with disabilities,

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 35 viewpoints

˲˲ Supporting the integration rele- vant disability/accessibility/universal Bringing together design content into computing cours- multiple stakeholder es in order to increase knowledge and skills among future computing pro- groups is a key to fessionals. AccessComputing’s ˲˲ Engaging the computing industry AACCMM Confeerreennccee to increase focus on placement of in- success. dividuals with disabilities in careers. PPrrooceediinnggss Those interested in moving this diversity agenda forward should con- NNooww AAvailaabbllee v viiaa sult the AccessComputing website at PPrriinntt--oon-Deemmaanndd!! http://www.uw.edu/accesscomputing or contact us at accesscomputing@ uw.edu. Ultimately, outcomes of proj- Did you know that you can consulting individuals with disabili- ects like AccessComputing will benefit now order many popular ties and/or disability services to make society by making computing oppor- accessibility a centerpiece of the or- tunities available to more people and ACM conference proceedings ganization. Being responsive to legal by enhancing computing fields with via print-on-demand? obligations means designing for ac- the talents and perspectives of peo- cessibility and providing instructions ple with disabilities. for requesting accommodations. Institutions, libraries and ˲˲ Employers need to become more References 1. AccessComputing. AccessComputing. University individuals can choose aware of how they can benefit from of Washington, Seattle; http://www.uw.edu/ hiring employees with disabilities be- accesscomputing from more than 100 titles 2. Bellman, S., Burgstahler, S., and Ladner, R. Work-based cause addressing accessibility chal- learning experiences help students with disabilities on a continually updated transition to careers: A case study of University of lenges in their daily lives has made this Washington projects. WORK: A Journal of Prevention, group good problem solvers. Assessment & Rehabilitation 48, (2014), 399-405. list through Amazon, Barnes 3. Bernton, H. Senate passes legislation to improve ˲˲ Both students and employees with & Noble, Baker & Taylor, care for soldiers. The Seattle Times. (2007, Dec. disabilities provide unique perspec- 15, 2007); http://seattletimes.nwsource.com/html/ politics/2004074552_warriors15m.html Ingram and NACSCORP: tives on the usability of products and 4. Burgstahler, S. AccessSTEM: Progress of teens with services that benefit academic pro- disabilities toward STEM careers: Positive inputs leading CHI, KDD, Multimedia, to critical junctures. University of Washington, Seattle, grams and industry. 2006; http://www.uw.edu/doit/Stem/flowchart.html SIGIR, SIGCOMM, SIGCSE, ˲˲ Instructors and employers should 5. Burgstahler, S. Universal design: Implications for computing education. ACM Transactions on Computing SIGMOD/PODS, consult with individuals with disabili- Education 11, 3 (Oct. 2011). ties about what will make them most 6. Congressional Commission on the Advancement of and many more. Women and Minorities in Science, Engineering, and productive in computing activities; af- Technology Development. Land of Plenty: Diversity as ter all, they are the experts about their America’s Competitive Edge in Science, Engineering and Technology. (Sept. 2000) Washington, D.C. For available titles and own access challenges and solutions. 7. World Wide Web Consortium. Web Content Accessibility Guidelines, 2.0. 2008; http://www.w3.org/ ordering info, visit: TR/wcag20/ Conclusion 8. Wulf, W.A. (2000). How shall we satisfy the long-term librarians.acm.org/pod educational needs of engineers? In Proceedings of the Evaluation results of AccessComput- IEEE 88, 4 (2000), 593–596. ing activities suggest that computing

departments, professional organiza- Richard E. Ladner ([email protected]) is a tions, and employment opportunities professor in computer science and engineering at the University of Washington. He is principal investigator for have become more accessible and wel- AccessComputing, an ACM Fellow, and was awarded the coming because of the engagement 2014 SIGCHI Social Impact Award. between AccessComputing partners and Sheryl Burgstahler ([email protected]) is Director of collaborators. Many individual stu- the Disabilities, Opportunities, Internetworking, and Technology (DO-IT) Center and Affiliate Professor dents have been positively impacted in Education at the University of Washington and by their participation in the project as is the co-principal investigator and Director of AccessComputing.The recipient of many awards, DO-IT well. Clearly, much has been accom- won a 1997 Presidential Award for Excellence in Science, plished. But critical work remains to be Mathematics, and Engineering Mentoring. done by AccessComputing and comple- This column was created with funding from the National mentary projects. NSF has demonstrat- Science Foundation (grant numbers CNS-0540615, CNS- ed its agreement with this conclusion 0837508, and CNS-1042260). Any questions, findings, and conclusions or recommendations expressed in this column by authorizing five more years of fund- are those of the authors and do not necessarily reflect the ing for the AccessComputing initiative. views of the federal government. The renewal of the grant will particu- larly focus on the following new areas: Copyright held by authors.

36 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 viewpoints

VDOI:10.1145/2791290 Jeremy Scott and Alan Bundy Viewpoint Creating a New Generation of Computational Thinkers Experiences with a successful school program in Scotland.

REVIOUS COMMUNICATIONS VIEWPOINTS have discussed the crisis in school comput- ing teaching.1,4,5 In many countries, school comput- Ping lessons have degenerated into the teaching of office skills, often by unqualified teachers. Students typi- cally found this boring and uninspir- ing. There is a worldwide movement to tackle this problem, manifest in the U.K. by the Computing at Schools orga- nization (CAS), the Royal Society report Shutdown or Restart?6 and Google Ex- ecutive Chairman Eric Schmidt’s Mc- Taggart Lecture at the 2011 Edinburgh Television Festival. It is important to create a commu- nity of toolmakers rather than just tool users. To achieve this we must teach schoolchildren to program, but this is not sufficient. Students also need to think computationally: to use ab- straction, modularity, hierarchy, and so forth in understanding and solving problems. It is also necessary to em- ploy a pedagogy that is informed by the years. CS teachers are eager to obtain element of programming. Scotland latest research into the most effective teaching materials that address the also benefits from a unitary national ways to teach computing. CfE’s aims and objectives. curriculum and a unitary assessment In Scotland, we have recently taken Scotland starts from a stronger authority for public examinations. advantage of a unique opportunity to position than many other countries. Lastly, it is also small enough that one reform the teaching of computing. The While not immune to a decline in can get representatives from all the new Curriculum for Excellence (CfE) specialist CS teachers and student key stakeholders around a table, get has granted teachers significantly demand,3 this decline has been more buy-in from them all, and thereby be more autonomy in deciding what to gentle and from a higher starting nimble in solving problems. teach and how to teach it. CS is now a point than in the rest of the U.K. Scot- Scotland thus provided an ideal core entitlement for all students dur- tish teachers are all required to have a laboratory in which to develop a pro- ing the first three years of secondary teaching qualification in their special- gram for teaching computing: com- school, and an option in a new series of ity and some public CS qualifications bining programming, computational

IMAGE FROM SHUTTERSTOCK.COM IMAGE public qualifications during the senior have included a significant and rising thinking (CT), and evidence-based

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 37 viewpoints

pedagogy, in order to address the Context is also vital. Today’s learn- computing teaching crisis. We en- ers have a different experience of com- capsulated this program in teaching puting from previous generations: it is materials that we have made freely online, social, and increasingly mobile. available and which have been en- Computing devices have become tactile Distinguished thusiastically adopted, not just in and personal, the result of the conver- Speakers Program Scotland, but in many other coun- gence of numerous technologies from tries. Some Scottish schools previ- multitouch to motion sensing and GPS. ously taught only information sys- We therefore had to connect to learn- http://dsp.acm.org tems in the senior years, which does ers in a way that resonated with their not include programming (beyond own experiences of the digital world, basic scripting), rather than comput- eschewing the tedium of calculating ing, which does. Moreover, business payroll and other such mundane tasks. education teachers sometimes teach New and rich programming envi- computing in junior years. Therefore, ronments allowed us to focus on prob- it was necessary for our materials to lem solving without requiring students be easily used by a non-specialist. to learn a demanding syntax—some- In our view, the success of our pro- thing that often defeats learners be- gram shows that, with the aid of pro- fore they get started. To that end, we fessionally curated materials and evi- selected Scratch and App Inventor, Students and faculty dence-based pedagogy, schoolchildren both maintained by MIT’s Media Lab, can be taught to think computationally as the environments for Phase 1. Their can take advantage of and to become toolmakers. blocks-based approach to coding fi- ACM’s Distinguished nesses problems of syntax, enabling The RSE/BCS students to concentrate on the logic Speakers Program Exemplification Project of programming, while supporting the In 2010, the Royal Society of Edin- construction of rich multimedia appli- to invite renowned burgh (RSE) and the BCS Academy of cations (Scratch) and smartphone apps Computing joined forces in a project (App Inventor). Students learn all the thought leaders in to support schoolteachers to develop usual programming concepts: sequen- academia, industry materials to exemplify the CS CfE tial, conditional, and iterative connec- aims and objectives. A wide variety of tives, but this happens implicitly. Both and government bodies generously and enthusiasti- Scratch and App Inventor enabled the cally provided funding: Scottish uni- construction of quite sophisticated ap- to deliver compelling versity CS departments, national and plications with short and easily devel- and insightful talks international industry and the gov- oped programs, so learners were quick- ernment agency Education Scotland. ly rewarded for their efforts. Note that on the most important One of the co-authors of this article, none of this would have been possible Jeremy Scott, was appointed: initial- 5–10 years earlier. topics in computing ly to a one-year, part-time post, but In Phase 2, we built on Phase 1 by and IT today. this was eventually extended to three developing materials for the new Scot- years from 2011–2014, at a total cost tish qualifications of National 4 and 5 ACM covers the cost of £90,000. The project was overseen that follow the Broad General Educa- by an advisory group that drew on all tion. This required progressing from of transportation the stakeholders: RSE, BCS, universi- the previous blocks-based languages for the speaker ties, industry, CAS Scotland, CS teach- to more conventional text-based ones. ers, and various other agencies. This We chose LiveCode, a modern equiva- to travel to your event. multi-agency engagement helped get lent of Apple’s HyperCard, which per- the project taken seriously by govern- mits cross-platform development and ment, as did the presence of respected deployment of rich apps to a variety of members of the advisory group. desktop and mobile platforms. In addition to traditional program- Pedagogy and Approach ming, the new qualifications cover In Phase I, we focused on the first information systems design and de- three years of secondary school (ages velopment. While computational 11–14 in Scotland)—the “Broad Gen- thinking certainly applies to infor- eral Education” that seeks to give all of mation systems, there are further Scotland’s students a solid grounding overarching ideas that together we across the curriculum—including the referred to as “informational think- teaching of CS. ing.” Informational thinking stresses

38 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 viewpoints the notion that information systems ers and students. Since release we rely on abstracting knowledge into have had further positive feedback, data structures. These structures are New and rich not just from within Scotland but linked to develop user-centered sys- programming across the U.K. and internationally. tems for the storage, processing, and A typical comment was: “The RSE retrieval of information. environments materials set the pace on a global Our materials were designed to be allowed us to focus scale—they’re comprehensive, ac- used by even non-specialist teachers cessible and were clearly written by and to make available to them the lat- on problem solving someone with an understanding of est, evidence-based pedagogical meth- without requiring how kids learn. They also employ a lot odology, for example, “did you under- of research-based pedagogy, which is stand” questions. Each off-the-shelf students to learn a evident in both the teacher notes and pack included: demanding syntax. student materials.” [Cameron Fadjo, ˲˲ Screencasts: Videos illustrate key while Director of Software Engineer- program construction operations step ing Education for the New York City by step to the YouTube generation, Department of Education (now Proj- which makes the materials easier for ect Lead in Computer Science Educa- non-specialist teachers to use. tion at Google)] ˲˲ Buggy code: Students are lured into and teacher packs.a The latter are anno- INRIA has also translated Starting making errors and then challenged to tated with pedagogical suggestions and From Scratch into French for use in debug their faulty code. suggestions for further work, including French schools. ˲˲ Did you understand?: Embed- interdisciplinary learning—a corner- ded questions reinforce and assess stone of Scotland’s new curriculum— Lessons learning and encourage discussion while mapping to the aims and objec- The project’s success exceeded our ex- and collaboration between students tives of CfE. This enables non-specialist pectations. The materials have been (which pedagogic research suggests is teachers to step out of their comfort widely and successfully adopted, not an important aid to deep understand- zone and provide the kind of comput- just in Scotland, but across the U.K. ing2). Some of the questions involve ing teaching our students require. and elsewhere. We achieved a great debugging code samples, while oth- The sequence of materials was en- deal at relatively low cost, as much of ers address key CT themes, such as se- visaged to support CS from approxi- the project worked on good will: busy quencing, scope and hierarchy (see the mately 11–15 years over, say, 15 hours academics and industry figures self- appendices to this Viewpoint). per pack: lessly gave up their time, ably coordi- ˲˲ Background: Opportunities are ˲˲ Starting from Scratch: An intro- nated by expert help from the RSE. Jer- taken to deepen students’ understand- duction to computing science using emy Scott’s school was also flexible in ing of broader underlying principles, Scratch. accommodating his secondment and such as data representation, and place ˲˲ Itching for More: Intermedi- speaking engagements. developments in computing in a his- ate material using BYOB (Build Your But why else did the project succeed torical and social context. Own Blocks) Snap!, a modification of and what advice would we give to oth- ˲˲ Design not imitation: The em- Scratch developed at UC Berkeley. ers trying to tackle the same issues? phasis is on the student’s design of ˲˲ I ♥ My Smartphone: Consolidating ˲˲ A coordinated approach is key: the algorithm, not on imitation with- previous work through the medium universities, learned societies, and in- out understanding. of mobile app development using App dustry are taken seriously by govern- ˲˲ Excitement: The environments Inventor. ments and we successfully leveraged and applications are chosen to moti- ˲˲ Information Everywhere: An in- their support to get our voice heard. vate the target age group, in contrast troduction to web-based information ˲˲ We were also ‘squeaky wheels’, to some of the standard applications in systems. making noise with government and the past, mentioned earlier. ˲˲ Live to Code: A programming through the press. We seized upon the ˲˲ Computational thinking: Under- course for Scotland’s new high school project’s initial success to extend its lying principles such as abstraction, CS qualifications using the LiveCode original remit and raise broader issues modularity, and hierarchy, for exam- app development language. facing the subject. ple, are abstracted from the examples ˲˲ It also happened at a time when and highlighted via box-outs. Response government wanted to be seen to be do- We believe this overall package, The materials were successfully tri- ing something about it. A launch event combining the elements listed here, al- alled in a number of Scottish schools for the first phase of materials main- lowed the project’s materials to aid the before general release, receiving tained the interest of politicians who learning of CT informed by evidence- widespread praise from both teach- were happy to be associated with it. based pedagogy. ˲˲ This allowed us to engage with our a Computing Science. The materials discussed in public education bodies in order to The Materials this Viewpoint are available from the Royal So- help shape the curriculum, rather than All of the materials comprise student ciety of Edinburgh at http://bit.ly/1LaWlXr. simply react to it.

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 39 viewpoints

Did You Understand? question from Starting from Scratch. ˲˲ Everyone is “Scratching,” but there is a risk that teachers could just Nested repeat question let students loose on a programming platform, such as Scratch, and tick How many times will sprite move 10 steps? the CS strand on their curriculum. For us, these wonderful tools are the me- Why? dium but CT is the message. It is about laying firm foundations, which our materials provide.

The Future Answer: the sprite (character) will move 10 steps 15 times. This example demonstrates to We are now at a critical point in Scot- learners concepts of fixed loops and nested commands. tish CS school teaching. On the one hand we have our, and other (for ex- ample, CAS inspired), well-received Did You Understand? question from I ♥ My Smartphone. In this example, a button can materials; a band of able and energetic be clicked to increase the line width in a finger-painting application. teachers; enthusiastic students; and buy-in from industry, government, FingerPaint app question universities, and learned societies. On A user starts up a FingerPaint app and immediately clicks ButtonBigBrush (code shown below). the other hand, in some schools, we still have declining numbers taking CS qualifications and non-replacement of CS teachers. Some head teachers, ca- reer advisers, parents, and local educa- tion authorities are still unaware of the rewards of programming, the intellec- However, when the user tries to paint, nothing appears on the canvas until they click tual content of CT, and the exciting CS ButtonBigBrush a second time. job opportunities. It could go either way. It is a mat- Discuss with your partner why this happens and what change(s) should be made to the code to fix this bug. ter of urgency to get the message out to schools, parents, local education Answer: Variable brushSize is initially set to zero, then assigned to the LineWidth property authorities and other stakeholders to before it is incremented. Consequently, when the button is pressed for the first time, the user paints a line of width 0 pixels. The answer is to increment the brushSize variable before reverse the current decline. In Scot- assigning it to the LineWidth property. land, we hope to play a significant role This example demonstrates to learners concepts of initialization, assignment, the difference in spreading this message both within between variables and properties to which they can be assigned and, above all, sequencing. our country and worldwide. With the support of our government and the me- dia, we have made a successful start. Did You Understand? question from I ♥ My Smartphone. In this example, a bat from a classic “Pong”-style game is constantly moving, while its heading left and right can References be changed by tilting the phone. 1. Bell, T. Establishing a nationwide CS curriculum in New Zealand high schools. Commun. ACM 57, 2 (Feb. 2014), 28–30. Bat & ball question 2. Chi, M.T.H. Active-constructive-interactive: A conceptual framework for differentiating learning activities. Topics Algorithm in Cognitive Science 1 (2009), 73–105. Moving bat (using Orientation Sensor) 3. Computing Science Teachers in Scotland 2014. A report if orientation sensor roll > 0 then (phone is tilted to the right) by Computing At School Scotland, December 2014. 4. Cooper, S., Grover, S. and Simon, B. Building a set the bat heading to the right (0) virtual community of practice for K–12 CS teachers. else (phone is tilted to the left) Commun. ACM 57, 5 (May 2014), 39–41. set the bat heading to the left (180) 5. Shein, E. Should everybody learn to code? Commun. ACM 57, 2 (Feb. 2014), 16–18. 6. Shut Down or Restart? The Way Forward for Computing in UK Schools. The Royal Society, January 2012. In what direction will the bat move if the phone’s tilt is zero (completely level)?

Jeremy Scott ([email protected]) is Principal Teacher of Computing Science at George Heriot’s School, Why? Edinburgh, Scotland. Alan Bundy ([email protected]) is Professor of Automated Reasoning at the School of Informatics, University of Edinburgh, Scotland.

Answer: The bat will move to the left because the orientation sensor roll measurement is We wish to thank the numerous funders of our project, not greater than zero. This example demonstrates the binary logic in an IF…THEN…ELSE the project’s advisory group, and the anonymous Communications referees, whose constructive comments statement. The text in brackets could also be interpreted as internal commentary for the IF and helped shape our Viewpoint. ELSE statements, demonstrating further the concept of programmer logic errors. Copyright held by authors.

40 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 viewpoints

VDOI:10.1145/2801133 Cory Doctorow Viewpoint I Can’t Let You Do That, Dave Computers should not treat their owners as adversaries.

T HAS BEEN 25 years since the Electronic Frontier Foundation was founded to ensure the civil liberties that mattered in the real world followed us into the Ionline world, and it has been a heady quarter-century, with many significant victories, and we have learned some alarming, important lessons on the way. First among these lessons: there is no distinction between the “virtual” and “real” worlds. The Internet is the nervous system of the 21st century. This was obvious even before the Internet of Things (IoT) turned our cars and homes into computers that engulf our bodies, but the IoT has made the issue especially urgent. The problems of regulation and the IoT is an old one, but with a new ur- gency. Since 1998, EFF has been part of the movement to reform the Digital Millennium Copyright Act (DMCA). The DMCA has many flaws, the worst can freely exercise, without any permis- player had in 1996: playing the movie. of which is in Section 1201, the “anti- sion from the copyright holder. But un- If you want to watch your movie on your circumvention” rule, whose ills have der 1201, you can only make these uses phone legally, you cannot do so, be- been magnified by recent technologi- if you do not have to break a lock. cause despite the legality of transcod- cal developments, turning them into For example, it is legal for you to rip ing and moving files for personal use, something of an existential threat to a your CDs. Put a CD in your computer’s the legal inviolability of the digital lock free and fair future. optical drive and the manufacturer- you must break to do that computation DMCA 1201 prohibits breaking supplied OS will launch a tool that in- means you must buy all your movies all “digital locks” that restrict access to vites you to rip and library the music over again to watch them on the go. copyrighted works. Though it was on the disc, automating the process of Watching a movie you paid for on originally conceived as a means of pre- taking your music with you on a mobile a device you own is not piracy by any venting piracy, it has proved most use- device. Ripping DVDs is legal under the definition, and it is bad enough that ful at preventing competition and the same theory, with one important dif- the DMCA prevents this lawful feature. creation of legitimate, otherwise legal ference: DVDs were born with DMCA- It has been 19 years since the DVD was technologies. Copyright law has many covered digital locks. Insert a DVD introduced and not one single feature flexibilities and exclusions that prod- into your computer and the only fea- has been introduced to the platform in

IMAGE BY GUALTIERO BOFFI GUALTIERO BY IMAGE uct designers, developers, and users ture you get is the same one your DVD all that time.

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 41 viewpoints

But the main event is not user take pains to hide their processes rights or innovation: it is security and and files from their owners are much free speech. As ACM members doubt- A security model more difficult to detect and root out. lessly appreciate, preventing the own- that treats Those devices are supposed to run er of a computer from executing the programs that user-space apps can- code of their choice is an impossible the computer’s not see or terminate, so malware that task. No matter how cleverly the oper- user as an attacker avails itself of this privilege becomes ating system and its services monitor nearly bulletproof. the user and hide the keys necessary is doomed. The Electronic Frontier Founda- to unlock files without permission, tion’s new Apollo 1201 project aims to users will eventually find a flaw in the reform DMCA 1201, and all of the laws defenders’ code and use it to jailbreak like it around the world, within a de- the system, allowing arbitrary code cade. We want to litigate the constitu- execution. Even if you stipulate that tionality of 1201, representing schol- locking computer users out of their agreement through which they prom- ars, researchers and academics, these own computers is a legitimate objec- ise only to buy original parts. being the kind of unimpeachable cli- tive, it is still a technological non- We are familiar with this model: it is ents judges are loathe to find against. sense. A security model that treats the printer-ink business-model, where We know from our own off-the- the computer’s user as an attacker is digital locks are used to ensure people record conversations with academ- doomed. We cannot hide keys in de- who buy a product are locked into buy- ics and researchers that they quietly vices we give to attackers for the same ing consumables for it from one com- violate 1201 in their work all the time, reason we cannot keep safes—no pany, at the highest possible price. It is and that there are plenty of legitimate matter how well designed—in bank- just one of the ways the DMCA rewards projects that never launch for fear of robbers’ living rooms. IoT businesses that treat their custom- violating the law. If you do this sort of The DMCA tries to address this by ers as the enemy. work, the Electronic Frontier Founda- threatening people who publish code Another is the ability to make prom- tion would like to discuss it with you. or information that would help remove ises to other people about what your If you know someone who does this a lock with severe penalties: five years customers will and will not do with kind of work, encourage that person in prison and $500,000 in fines for a your product. The World Wide Web to get in touch with the Electronic first offense. Consortium yielded to pressure from Frontier Foundation. But information about flaws in a Netflix and the BBC to add digital The model of fixing social prob- computer is not just useful to people locks to the standards for HTML5 so lems by locking users out of their own who want to add functionality to their that these companies could promise devices is an invitation to even worse computers: it also provides oppor- copyright holders viewers would not security policies. When FBI Director tunities for malware to seize control be able to save streamed video. Smart- James Comey and U.K. Prime Minis- over the system. By criminalizing dis- thermostat offerings like Nest want to ter David Cameron call for backdoors closure of flaws, the DMCA ensures be able to promise power authorities in our crypto, they are necessarily im- systems covered by its measures be- they can lower their customers’ ther- plying a means of ensuring you can- come reservoirs of long-lived digital mostats without customers turning not install code of your choosing on pathogens. This is bad enough in them back up again. your devices, lest you choose to install the context of mobile devices—your Finally, the DMCA lets vendors ex- working crypto. phone is not just a distraction rect- tract rent from, and exact control over, The Internet of Things is here al- angle that lets you throw birds at pigs; independent software vendors. Putting ready. The most salient fact about a per- it is a sensor-studded supercomputer locks on the devices you sell means you son’s pacemaker is its network stack that is privy to your every movement, can set up app stores and no one else and security model; the most salient conversation, and authentication cre- can set up competing stores. This lets fact about a person’s car is its infor- dential—but it gets much worse in you charge high commissions on sales matics. As computers move inside our the age of the IoT. and refuse to carry apps that add func- bodies and as we move our bodies into Internet of Things startups are tionality your users want, but that you computers, it is clearer than ever that under intense investor pressure to would rather not see. “offline” and “online” are not mean- restrict their devices with DMCA-cov- DMCA 1201 is turning all of IoT into ingful distinctions. If the information ered locks to create managed “ecosys- a playground for malware, where re- age is to be habitable, then crucial free tems.” GM and John Deere both filed porting vulnerabilities and releasing speech provisions that let experts blow comments with the U.S. Copyright Of- third-party improvements to systems the whistle on unsafe practices in our fice this spring asserting the software are chilled by a law that was stupid in infrastructure cannot be denied. locks on their products are covered by 1998 and is deadly in 2015. the DMCA. They want to ensure inde- Malware is always frightening, but Cory Doctorow ([email protected]) is the co-editor of the blog Boing Boing and the former European Affairs Coordinator pendent mechanics cannot jailbreak it is much worse on systems already for the Electronic Frontier Foundation. those vehicles and provide service designed to treat their owners as ad- without first entering into a license- versaries. Infections on devices that Copyright held by author.

42 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 viewpoints

V Stephen Goose/Ronald Arkin Point/Counterpoint The Case for Banning Killer Robots Ban the bots? Considering both sides of the argument for and against.

There is the real danger that if even one DOI:10.1145/2835963 nation acquires these weapons, others Point: Stephen Goose Weaponizing may feel they have to follow suit in or- IPLOMATS AND MILITARY ex- fully autonomous der to defend themselves and to avoid perts from more than 90 falling behind in a robotic arms race. countries gathered in Ge- robotics systems The open letter signed by some of neva in April 2015 for their is seen by many the most renowned AI experts stated, second meeting on “lethal as a step too far. “If any major military power pushes Dautonomous weapons systems,” also ahead with AI weapon development, a known as fully autonomous weapons, global arms race is virtually inevitable or more colloquially, killer robots. … We therefore believe that a military Noted artificial intelligence expert Stu- AI arms race would not be beneficial art Russell informed delegates the AI for humanity. There are many ways in community is beginning to recognize which AI can make battlefields safer the specter of autonomous weapons manitarian benefits. Most of those call- for humans, especially civilians, with- is damaging to its reputation and indi- ing for a ban support AI and robotics out creating new tools for killing peo- cated several professional associations research. But, weaponizing fully auton- ple … [M]ost AI researchers have no in- were moving toward votes to take a po- omous robotic systems is seen by many terest in building AI weapons—and do sition on the topic. as a step too far. not want others to tarnish their field by On July 28, 2015, more than 1,000 AI doing so, potentially creating a major professionals, roboticists, and others re- The Many Reasons to Ban public backlash against AI that curtails leased an open letter promoting a “ban Fully Autonomous Weapons its future societal benefits.”b on offensive autonomous weapons be- The central concern is with weapons There is also the prospect that fully yond meaningful human control.”a that once activated, would be able autonomous weapons could be ac- Thus, it is timely and appropriate to select and engage targets without quired by repressive regimes or non- that Communications is focusing on further human involvement. There state armed groups with little regard fully autonomous weapons, particular- would no longer be a human operator for the law. These weapons could be ly the call for a preemptive prohibition deciding whom to fire at or when to perfect tools of repression and terror on their development, production, and shoot. Instead, the weapon system it- for autocrats. use as has been made by the Campaign self would undertake those tasks. This Another type of proliferation con- to Stop Killer Robots, many in the AI would constitute not just a new type of cern is that such weapons would in- community, more than 20 Nobel Peace armament, but a new method of war- crease the likelihood of armed attacks, Laureates, and many others. fare that would radically change how making resort to war more likely, as de- There is no doubt that AI and greater wars are fought, and not to the better- cision makers would not have the same autonomy can have military and hu- ment of humankind. concerns about loss of soldiers’ lives. Many in the AI community have This could have an overall destabiliz- a The Future of Life Institute organized the focused on the serious international ing effect on international security. letter. The text and list of signatories can be security and proliferation concerns found at: http://bit.ly/1UQI7QE related to fully autonomous weapons. b http://bit.ly/1UQI7QE

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 43 viewpoints

For many people, these weapons Rapid and Growing would cross a fundamental moral and Support for a Ban ethical line by ceding life and death deci- Many advanced Over the past two years, the question sions on the battlefield to machines. Giv- militaries are rapidly of what to do about fully autonomous ing such responsibilities to machines weapons has rocketed to the top in such circumstances has been called marching toward ranks of concern in the field of disar- the ultimate attack on human dignity.c ever greater mament and arms control, or what is The notion of allowing compassionless now often called humanitarian disar- robots to make decisions about the ap- autonomy in their mament. Within this short period of plication of violent force is repugnant to weapons systems. time, world leaders including the Sec- many. Compassion is a key check on the retary-General of the United Nations, killing of other human beings. the U.N. disarmament chief, and the In my extensive engagement with head of the International Committee a variety of audiences on this issue, it of the Red Cross, have expressed deep has been striking how most people concerns about the development of have a visceral negative reaction to the be especially difficult as this relies fully autonomous weapons and urged notion of fully autonomous weapons. heavily on situational and contextual immediate action. The Martens Clause, which is articu- factors, which could change consider- The Campaign to Stop Killer Ro- lated in the Geneva Conventions and ably with a slight alteration of the facts. bots—an international coalition of non- elsewhere, is a key provision in interna- There are also serious concerns governmental organizations (NGOs)— tional law that takes into account this about the lack of accountability when was launched in April 2013 calling for notion of general repugnance on the fully autonomous weapons fail to com- a preemptive ban on fully autonomous part of the public. Under the Martens ply with IHL in any particular engage- weapon systems.f Coordinated by Hu- Clause, fully autonomous weapons ment. Holding a human responsible man Rights Watch, it is modeled on the should comply with the “principles of for the actions of a robot that is acting successful civil society campaigns that humanity” and the “dictates of pub- autonomously could prove difficult, be led to international bans on antiper- lic conscience.” They likely would not it the operator, superior officer, pro- sonnel landmines, cluster munitions, comply with either. grammer, or manufacturer. and blinding lasers. A month after the There are serious questions about Scientists and military leaders have campaign launched, the U.N. special whether fully autonomous weapons also raised a host of technical and op- rapporteur on extrajudicial killings pre- would ever be capable of complying erational issues with these weapons sented the Human Rights Council with with core principles of international that could pose grave dangers to civil- a report that echoed many of the cam- humanitarian law (IHL) during com- ians—and to soldiers—in the future.d paign’s concerns and called on govern- bat, or international human rights A particular concern for many is how ments to adopt national moratoria on law (IHRL) during law enforcement “robot vs. robot” warfare would unfold, the weapons. operations, border control, or other and how devices controlled by complex In October 2013, more than 270 circumstances. There is of course no algorithms would interact.e prominent engineers, computing and way of predicting what technology Taken together, this multitude of artificial intelligence experts, roboti- might produce many years from now, concerns has led to the call for a pre- cists, and professionals from related but there are strong reasons to be emptive prohibition on fully autono- disciplines issued a statement call- skeptical about compliance with in- mous weapon systems—a new inter- ing for a ban. This was organized by ternational law in the future, includ- national treaty that would ban the the International Committee for Ro- ing the basic principles of distinction development, production, and use of bot Arms Control (ICRAC), which was and proportionality. fully autonomous weapons, and require founded in 2009 by roboticists, ethi- Could robots replicate the innately there is always meaningful human con- cists, and others.g As noted earlier, in human qualities of judgment and in- trol over targeting and kill decisions. July 2015 more than 1,000 AI experts tuition necessary to comply with IHL, signed an open letter supporting a including judgment of an individual’s prohibition; the letter was organized d See, for example, “Autonomy in Weapons Sys- intentions, as well as subjective deter- tems,” U.S. Department of Defense, Directive by the Future of Life Institute. minations? Compliance with the rule Number 3000.09, Nov. 21, 2012. This direc- During 2014, the European Parlia- of proportionality prohibiting attacks tive, which guides U.S. policy on autonomous ment passed a resolution that calls in which expected civilian harm out- weapons, cites a multitude of technical issues for a ban, more than 20 Nobel Peace weighs anticipated military gain would that would have to be overcome before fielding Laureates issued a joint statement fully autonomous weapons. e An October 2013 joint statement calling for in favor of a ban, and more than 70 c This point has been made repeatedly by the a ban on fully autonomous weapons signed prominent faith leaders from around U.N. Special Rapporteur Christof Heyns. See, by more than 270 computing experts said, the world released a statement call- for example, U.N. Human Rights Council, “Such interactions could create unstable and ing for a ban. A Canadian robotics Report of the Special Rapporteur on extra- unpredictable behavior, behavior that could judicial, summary or arbitrary executions, initiate or escalate conflicts, or cause unjus- Christof Heyns, Lethal Autonomous Robotics, tifiable harm to civilian populations.” See f http://www.stopkillerrobots.org/ A/HRC/23/47, Apr. 9, 2013. http://icrac.net/call/ g http://icrac.net/call/

44 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 viewpoints company, Clearpath, became the first The point of a preemptive treaty is and pose serious risks to civilians. As commercial entity to support a ban to prevent future harm and with all such, they demand new, specific law and declare it will not work toward the dangers and concerns associated that clarifies and strengthens existing the development of fully autonomous with fully autonomous weapons, it international humanitarian law. weapons systems. would be irresponsible to take a “wait A specific treaty banning a weapon Governments have become seized and see” approach and only try to deal is also the best way to stigmatize the with the issue of fully autonomous with the issue after the harm has al- weapon. Experience has shown that weapons since 2013, though few have ready occurred. Once developed, they stigmatization has a powerful ef- articulated formal policy positions. will be irreversible; it will not be possi- fect even on those who have not yet Most importantly, the 120 States Par- ble to put the genie back in the bottle formally joined the treaty, inducing ties to the Convention on Convention- as the weapons spread rapidly around them to comply with the key provi- al Weapons (CCW) agreed in Novem- the world. sions, lest they risk international con- ber 2013 to take up the issue, holding The notion of a preemptive treaty demnation. A regulatory approach meetings in May 2014 and then again has been done before. The best ex- restricting use to certain locations or in April 2015. In the diplomatic world, ample is the 1995 CCW protocol that to specific purposes would be prone the decision to take on killer robots bans blinding laser weapons. After to longer-term failure as countries was made at lightning speed. initial opposition from the U.S. and would likely be tempted to use them It appears certain that nations will others, states came to agree the weap- in other, possibly inappropriate, ways agree to continue their CCW delib- ons would pose unacceptable dangers during the heat of battle or in dire erations next year, though questions to soldiers and civilians. The weapons circumstances. Once legitimized, the remain about the nature, content, were seen as counter to the dictates of weapons would no doubt be mass and duration of the work. A small but public conscience and nations came produced and proliferate worldwide; growing number of states have already to recognize their militaries would only a preemptive international treaty called for a preemptive ban, while most be better off if no one had the weap- will prevent that. participating states have expressed ons than if everyone had them. These The call for a ban on development interest in discussing the concept of same rationales apply to fully autono- of fully autonomous weapons is not meaningful human control of weap- mous weapons. aimed at impeding broader research ons systems, indicating they see a need While some rightly point out that into military robotics or weapons au- to draw a line before weapons become there is no “proof” there cannot be a tonomy or full autonomy in the civil- fully autonomous. technological fix to the problems of ian sphere. It is not intended to curtail The April 2015 CCW experts meet- fully autonomous weapons, it is equal- basic AI research in any way. Research ing was by far the richest, most in- ly true there is no proof there can be. and development activities should be depth discussion held to date on auton- Given the scientific uncertainty that banned if they are directed at technol- omous weapons. Not a single state said exists, and given the potential benefits ogy that can only be used for fully au- it is actively pursuing them yet the week of a new legally binding instrument, tonomous weapons or that is explicitly featured extensive discussion about the the precautionary principle in inter- intended for use in such weapons. potential benefits of such weapons. national law is directly applicable. The The U.S. and Israel were the only principle suggests the international Conclusion states to explicitly say they were keep- community need not wait for scientific While there are at this stage still many ing the door open to the acquisition of certainty, but could and should take doubters, my experience leads me to fully autonomous weapons but there action now. conclude a preemptive ban is not only are plenty of indicators there are many Fully autonomous weapons repre- warranted, but is achievable and is the states that are contemplating them. sent a new category of weapons that only possible approach that would suc- Without question, many advanced mil- could change the way wars are fought cessfully address the potential dangers itaries are rapidly marching toward ev- of fully autonomous weapons. Howev- er-greater autonomy in their weapons er, the involvement, advice, and exper- systems, and there are no stop signs in It will not be possible tise of the AI community are needed their path. both to get to a ban and to ensure it is to put the genie the most effective ban possible. Why a Ban Is the Best Solution back in the bottle The AI community has an important Some oppose a preemptive and com- role to play in bringing about the ban prehensive prohibition, saying it is as the weapons on fully autonomous weapons. This is too early and we should “wait and see” spread rapidly not a political issue to be avoided in where the technology takes us. Oth- the name of pure science, but rather an ers believe restrictions would be more around the world. issue of humanity for which we are all appropriate than a ban, limiting their responsible. use to specific situations and missions. Some say existing international hu- Stephen Goose ([email protected]) is the director of the Human Rights Watch Arms Division. manitarian law will be sufficient to ad- dress the challenges posed. Copyright held by author.

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 45 viewpoints

DOI:10.1145/2835965 bated by the tempo at which modern stances rather than an outright ban Counterpoint: Ronald Arkin warfare is conducted. Expecting wide- and stigmatization of the weapon sys- spread compliance with IHL given this tems. Do not make decisions based on ET ME UNEQUIVOCALLY state: pace and resultant stress seems unrea- unfounded fears—remove pathos and The status quo with respect to sonable and perhaps unattainable by hype and focus on the real technical, innocent civilian casualties is flesh and blood warfighters. legal, ethical, and moral implications. utterly and wholly unaccept- I believe judicious design and use of In the future autonomous robots able. I am not Pro Lethal Au- LAWS can lead to the potential saving may be able to outperform humans Ltonomous Weapon Systems (LAWS), of noncombatant life. If properly devel- from an ethical perspective under bat- nor for lethal weapons of any sort. I oped and deployed it can and should tlefield conditions for numerous rea- would hope that LAWS would never be used toward achieving that end. It sons: need to be used, as I am against kill- should not be simply about winning ˲˲ Their ability to act conservatively, ing in all its manifold forms. But if wars. We must locate this humanitar- as they do not need to protect them- humanity persists in entering into ian technology at the point where war selves in cases of low certainty of target warfare, which is an unfortunate un- crimes, carelessness, and fatal human identification. derlying assumption, we must protect error lead to noncombatant deaths. ˲˲ The eventual development and the innocent noncombatants in the It is not my belief that an unmanned use of a broad range of robotic sensors battlespace far better than we cur- system will ever be able to be perfectly better equipped for battlefield observa- rently do. Technology can, must, and ethical in the battlefield, but I am con- tions than humans currently possess. should be used toward that end. Is it vinced they can ultimately perform ˲˲ They can be designed without not our responsibility as scientists to more ethically than human soldiers. emotions that cloud their judgment or look for effective ways to reduce man’s I have stated that I am not averse to result in anger and frustration with on- inhumanity to man through technol- a ban should we be unable to achieve going battlefield events. ogy? Research in ethical military robot- the goal of reducing noncombatant ˲˲ Avoidance of the human psycho- ics could and should be applied toward casualties, but for now we are better logical problem of “scenario fulfill- achieving this goal. served by a moratorium at least until ment” is possible, a factor contributing I have studied ethology (animal be- we can agree upon definitions regard- to the downing of an Iranian Airliner by havior in their natural environment) ing what we are regulating, and it is the USS Vincennes in 1988.7 as a basis for robotics for my entire indeed determined whether we can ˲˲ They can integrate more informa- career, spanning frogs, insects, dogs, realize humanitarian benefits through tion from more sources far faster than birds, wolves, and human compan- the use of this technology. A preemp- a human possibly could in real time be- ions. Nowhere has it been more de- tive ban ignores the moral imperative fore responding with lethal force. pressing than to study human behavior to use technology to reduce the persis- ˲˲ When working in a team of com- in the battlefield (for example, the Sur- tent atrocities and mistakes that hu- bined human soldiers and autono- geon General’s Office 2006 report10 and man warfighters make. It is at the very mous systems, they have the potential Killing Civilians: Method, Madness, and least premature. History indicates that of independently and objectively mon- Morality in War.9). The commonplace technology can be used toward these itoring ethical behavior in the battle- occurrence of slaughtering civilians goals.4 Regulate LAWS usage instead field by all parties and reporting infrac- in conflict over millennia gives rise to of prohibiting them entirely.6 Consider tions that might be observed. my pessimism in reforming human restrictions in well-defined circum- LAWS should not be considered an behavior yet provides optimism for ro- end-all military solution—far from it. bots being able to exceed human moral Limited circumstances for their use performance in similar circumstances. Is it not our must be utilized. Current thinking rec- The regular commission of atrocities ommends: is well documented both historically responsibility as ˲˲ Specialized missions only where and in the present day, reported al- scientists to look bounded morality,a,1 applies, for ex- most on a daily basis. Due to this un- ample, room clearing, countersniper fortunate low bar, my claim that robots for effective ways operations, or perimeter protection in may be able to eventually outperform to reduce man’s the DMZ.b humans with respect to adherence to ˲˲ High-intensity interstate warfare, international humanitarian law (IHL) inhumanity to man in warfare (that is, be more humane) is through technology? a Bounded morality refers to adhering to moral credible. I have the utmost respect for standards within the situations that a system has our young men and women in the bat- been designed for, in this case specific battlefield missions and not in a more general sense. tlespace, but they are placed into situ- b For more specifics on these missions see Ar- ations where no human has ever been kin, R.C., Governing Lethal Behavior in Autono- designed to function. This is exacer- mous Systems, Chapman-Hall, 2009.

46 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 viewpoints not counterinsurgencies, to minimize do fail. Nowhere is this more evident likelihood of civilian encounter. than on the battlefield. Until that goal ˲˲ Alongside soldiers, not as a re- I say to my fellow can be achieved, I support a mora- placement. A human presence in the researchers, if your torium on the development and de- battlefield should be maintained. ployment of this technology. If our Smart autonomous weapon sys- research is of any research community, however, firmly tems may enhance the survival of value, someone believes the goal of achieving better noncombatants. Consider Human performance than a human warfight- Rights Watch’s position on the use of somewhere someday er with respect to adherence to IHL is precision-guided munitions in urban will put it to work unattainable, and states collectively settings—a moral imperative. LAWS in that we cannot ever reach this level of effect may be mobile precision-guided in a military system. exceeding human morality in narrow munitions resulting in a similar moral battlefield situations where bounded imperative for their use. Consider not morality applies and where humans just the possibility of LAWs making are often at their worst, then I would a decision when to fire, but rather de- of ethical advisory systems for human be moved to believe our community ciding when not to fire (for example, warfighters to assist in their decision- asserts artificial intelligence in gen- smarter context-sensitive cruise mis- making when in conflict. eral is unattainable. This appears to siles). Design them with runtime hu- Restating my main point: The status contradict those who espouse their man overrides to ensure meaningful quo is unacceptable with respect to non- goal of doing just that. human control,11 something everyone combatant deaths. It may be possible We must reduce civilian casualties wants. Additionally, LAWS can use fun- to save noncombatant lives through if we are foolish enough to continue to damentally different tactics, assuming the use of this technology—if done engage in war. I believe AI researchers far more risk on behalf of noncomba- correctly—and these efforts should have a responsibility to achieve such tants than human warfighters are ca- not be prematurely terminated by a reductions in death and damage dur- pable of, to assess hostility and hostile preemptive ban. ing the conduct of warfare. We cannot intent, while assuming a “First do no Quoting from a recent NewsWeek simply accept the current status quo harm” rather than “Shoot first and ask article3: “But autonomous weapon with respect to noncombatant deaths. questions later” stance. systems would not necessarily be like Do not turn your back on those inno- To build such systems is not a short- those crude weapons [poison gas, land- cents trapped in war. It is a truly hard term goal but will require a mid- to mines, cluster bombs]; they could be problem and challenge but the poten- long-term research agenda address- far more discriminating and precise in tial saving of human life demands such ing the many very challenging research their target selection and engagement an effort by our community. questions. By exploiting bounded mo- than even human soldiers. A preemp- rality within a narrow mission context, tive ban risks being a tragic moral fail- References 1. Allen, C., Wallach, W., and Smit, I. Why machine ethics? however, I would contend that the goal ure rather than an ethical triumph.” IEEE Intelligent Systems (Jul./Aug. 2006), 12–17. of achieving better performance with Similarly from the Wall Street Jour- 2. Anderson, K. and Waxman, K. Law and ethics for autonomous weapon systems: Why a ban won’t work 8 respect to preserving noncombatant nal : “Ultimately, a ban on lethal au- and how the laws of war can. Stanford University, life is achievable and warrants a ro- tonomous systems, in addition to be- The Hoover Institution (Jean Perkins Task Force on National Security and Law Essay Series), 2013. bust research agenda on humanitar- ing premature, may be feckless. Better 3. Bailey, R. Bring on the killer robots. Newsweek; (Feb. 1, to test the limits of this technology first 2015); http://bit.ly/1K3VaYK ian grounds. Other researchers have 4. Horowitz, M. and Scharre, P. Do killer robots save begun related work on at least four to see what it can and cannot deliver. lives? Politico Magazine (Nov. 19, 2014). 5. Joy, B. Why the future doesn’t need us. Wired 8, 4 continents. Nonetheless, there remain Who knows? Battlefield robots might (Apr. 2000). many daunting research questions yet be a great advance for international 6. Muller, V. and Simpson, T. Killer robots: Regulate, don’t ban. Blavatnik School of Government Policy Memo, regarding lethality and autonomy yet humanitarian law.” Oxford University, Nov. 2014. to be resolved. Discussions regarding I say to my fellow researchers, if 7. Sagan, S. Rules of engagement. In Avoiding War: Problems of Crisis Management. A. George, Ed., regulation of LAWs must be based on your research is of any value, some- Westview Press, 1991. reason and not fear. Some contend one somewhere someday will put it to 8. Schechter, E. In defense of killer robots. Wall Street Journal (July 10, 2014). that existing IHL may be adequate to work in a military system. You cannot 9. Slim, H., Killing Civilians: Method, Madness, and Morality afford adequate protection to noncom- be absolved from your responsibility in in War. Columbia University Press, New York, 2008. 10. Surgeon General’s Office, Mental Health Advisory batants from the potential misuse of the creation of this new class of tech- Team (MHAT) IV Operation Iraqi Freedom 05-07, LAWs.2 A moratorium is more appro- nology simply by refusing a particular Final Report, Nov. 17, 2006. 11. U.N. The Weaponization of Increasingly Autonomous priate at this time than a ban, until funding source. Bill Joy argued for the Technologies: Considering How Meaningful Human these questions are resolved and only relinquishment of robotics research Control Might Move the Discussion Forward. UNIDIR Resources, Report No. 2, 2014. then can careful, graded introduction in his Wired article “Why the Future 5 of the technology into the battlespace Doesn’t Need Us.” Perhaps it is time Ronald Arkin ([email protected]) is a Regents’ be ensured. Proactive management of for some to walk away from AI if their Professor and is the director of the Mobile Robot Laboratory in the College of Computing at the Georgia these issues is necessary. Other tech- conscience so dictates. Institute of Technology. nological approaches are of course But I believe AI can be used to save welcome, perhaps such as the creation innocent life, where humans may and Copyright held by author.

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 47 practice

DOI:10.1145/2814340

Article development led by queue.acm.org Balancing statistical accuracy and subject privacy in large social-science datasets.

BY OLIVIA ANGIULI, JOE BLITZSTEIN, AND JIM WALDO How to De-Identify Your Data

BIG DATA IS all the rage; using large datasets promises to give us new insights into questions that have been difficult or impossible to answer in the past. This is especially true in fields such as medicine and the social sciences, where large amounts of data can be gathered and mined to find insightful relationships among variables. Data in such fields involves humans, however, and thus raises issues of privacy that are not human subjects triggers a number faced by fields such as physics or astronomy. of regulatory regimes designed to Such privacy issues become more pronounced when protect the privacy of those subjects. Sharing medical data, for example, researchers try to share their data with others. Data requires adherence to HIPAA (Health sharing is a core feature of big-data science, allowing Insurance Portability and Account- others to verify research that has been done and to ability Act); sharing educational data triggers the requirements of FERPA pursue other lines of inquiry the original researchers (Family Educational Rights to Privacy may not have attempted. But sharing data about Act). These laws require that, to share

48 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 data generally, the data be de-identi- Previous research has looked at a different, and perhaps more trou- fied or anonymized (note that, for the how well these requirements protect bling, aspect of de-identification. purposes of this article, these terms the identities of those whose data is in These studies have shown the conclu- are interchangeable). While FERPA a dataset.2 Violations of privacy, like sions one can draw from a de-identi- and HIPAA define the notion of de- re-identification, generally work by fied dataset are significantly different identification slightly differently, the linking data from a de-identified data- from those that would be drawn when core idea is if a dataset has certain val- set with outside data sources. It is of- the original dataset is used.1 Indeed, it ues removed, the individuals whose ten surprising how little information appears the process of de-identifica- data is in the set cannot be identified, is needed to re-identify a subject. tion makes it difficult or impossible to

IMAGE BY FREEBIRD PHOTOS BY IMAGE and their privacy will be preserved. More recent research has shown use a de-identified (and therefore eas-

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 49 practice

ily sharable) version of a dataset either stitute of Technology during the first dataset to have the same combina- to verify conclusions drawn from the year of those offerings. It examines tion of identity-revealing traits as at original dataset or to do new science which aspects of the de-identification least k-1 other individuals in the da- that will be meaningful. This would process for that dataset caused it to taset. Identity-revealing traits, termed seem to put big-data social science change significantly, and it presents quasi-identifiers, are those that allow in the uncomfortable position of hav- a different approach to de-identifica- linking to other datasets; information ing either to reject notions of privacy tion that shows promise to allow both that is meaningful within only a single or to accept that data cannot be easily sharing and privacy. dataset is not of concern. shared, neither of which are tenable Anonymizing a dataset with regard positions. Defining Anonymization to quasi-identifiers is important in This article looks at a particular The first step in de-identifying a data- order to prevent the re-identification dataset, generated by the massive set is determining the anonymization of individuals that would be made open online courses (MOOCs) offered requirements for that set. The notion possible if these traits were linked through the edX platform by Harvard of privacy that was used throughout with external data that shares the University and the Massachusetts In- the de-identification of this particular same traits. The example in Figure 1 dataset was guided by FERPA, which illustrates how two datasets could be Figure 1. Combination of two datasets that requires personally identifiable in- combined in such a way that allows allow re-identification. formation be removed, such as name, re-identification.2 address, Social Security number, and In the edX dataset, the quasi-iden- mother’s maiden name. FERPA also tifiers were course ID, level of educa- requires other information, alone tion, year of birth, gender, country, ethnicity name and number of forum posts. The visit date or in combination, must not enable zip address diagnosis identification of any student with number of forum posts is considered birth date date registered procedure “reasonable certainty.” a quasi-identifier because the forum sex party affiliation medication To meet these privacy specifica- was a publicly accessible website that date last voted total charge tions, the HarvardX and MITx re- could be scraped in order to link user search team (guided by the general IDs with their number of forum posts. medical data voter list counsel, for the two institutions) opt- Course ID is considered a quasi-iden- ed for a k-anonymization framework, tifier because unique combinations which requires every individual in the of courses could conceivably enable linking personally identifiable infor- Figure 2. Distortion of mean grade increasing with k. mation that a student posts in a forum with the edX dataset. The required value of k within k- 0.045 anonymization was set to 5 in this context, based on the U.S. Department 0.040 of Education’s Privacy Technical As- sistance Center’s claim that “statisti- 0.035 cians consider a cell size of 3 to be the

Mean Grade 0.030 absolute minimum” and that values of 5 to 10 are even safer. A higher value 0.025 of k corresponds to a stricter privacy 1 2 3 4 5 6 7 8 standard, because more individuals k are required to have a given combina- tion of identity-revealing traits.3 Note this is not a claim that de- Figure 3. Distortion of mean grade decreasing with bin size. identifying the dataset to a privacy standard of k = 5 assures no one in the dataset can be re-identified. Rather, this privacy standard was chosen to al- 0.033 low legal sharing of the data.

0.030 What Methods Allow Anonymization? Mean Grade 0.027 There are two techniques to achieve a k-anonymous dataset: generaliza- 0.024 tion and suppression. Generalization 1 2 3 4 5 6 Forum Post Bin Size occurs when granular values are com- bined to create a broader category that will contain more records. This can be

50 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 practice achieved both for numerical variables Understanding the (for example, combining ages 20, 21, Mechanisms of Distortion and 22 into a broader category of 20– Daries et al. showed de-identification 22) and for categorical variables (for distorted measures of class participa- example, generalizing location data tion by suppressing records of rare from “Boston” to “Massachusetts”). Anonymizing a (generally higher) levels of participa- Suppression occurs when a record dataset with regard tion. We pursued investigating where that violates anonymity standards is distortion of summary statistics was deleted from the dataset entirely. to quasi-identifiers being introduced into the dataset. Generalization and suppression is important in Intuitively, distortion is introduced techniques introduce differing kinds whenever a row becomes generalized and degrees of distortion during the order to prevent or suppressed. Under k-anonymity, anonymization process. Relying on this occurs only when a row’s combi- suppression can mean a large num- the re-identification nation of quasi-identifier values oc- ber of records in the dataset will be of individuals that curs fewer than k times. If rare quasi- removed. Suppression-only de-iden- identifier values tend to be associated tification also skews the integrity of would be made with high grades or participation lev- a dataset when values are eliminated possible if these els, then the de-identified dataset disproportionately to the original dis- would be expected to have a lower tribution of the data, causing distor- traits were linked mean grade or participation level than tion in resulting analyses. with external data the original dataset. On the other hand, generalized val- We did, in fact, find a quasi-identi- ues are often less powerful than gran- that shares the fier characteristic whosefrequency of ular values—it may be difficult, for same traits. occurrence is correlated with a numer- example, to fit a linear regression line ic attribute is most likely to create dis- on generalized numeric attributes. tortion in that numeric attribute. Spe- Further, while generalization-only de- cifically, we confirmed this hypothesis identification leaves non-quasi-identi- in three ways, using the edX data: fier fields intact, quasi-identifiers may ˲˲ As privacy requirements increase become generalized to a point where (that is, k is increased), distortion in- few conclusions can be drawn about creases in such numeric attributes their relationship with other fields. Fi- as mean grade, shown in Figure 2. nally, since generalization is applied The fact that more distortion is intro- to whole columns, it decreases the duced as more rows are suppressed is quality of the entire dataset, whereas consistent with the hypothesis the as- suppression decreases the quality of sociation of rare quasi-identifier val- the dataset on a record-by-record ba- ues with high grades will cause more sis. distortion of the dataset as the privacy The anonymization process used to standard is increased. de-identify edX data for public release ˲˲ The deletion of quasi-identifier in 2014 employed a “suppression- columns whose values’ frequency of emphasis” approach toward k-anony- occurrence is highly correlated with mization. In this approach, the names numeric attributes results in a de- of the countries were first generalized creased amount of distortion in nu- to region or continent names, then meric attributes. This supports the hy- date-time stamps were transformed pothesis the presence of a correlation into date stamps, and finally any ex- between the frequency of quasi-iden- isting records that were not k-anony- tifier values and numeric attributes mous after these generalizations were introduces distortion of the dataset by suppressed. In the process, records de-identification. that claimed a birth date before 1931 ˲˲ As the correlation between the fre- (which seemed unlikely to be correct) quency of occurrence of quasi-iden- were automatically suppressed. tifier values and other numeric attri- Daries et al.’s 2014 study of edX data butes is manually increased, more confirmed a suppression-emphasis distortion is introduced into those approach tended to distort mean val- attributes. This, too, supports the ues of de-identified columns, whereas hypothesis the magnitude of a corre- a generalization-emphasis approach lation between the frequency of qua- tended to distort correlations between si-identifier values and numeric at- de-identified columns.1 tributes increases distortion of those

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 51 practice

Figure 4. Number of rows suppressed vs. bin capacity. we first explore the effect of general- izing this attribute. As the bin size in- creases (for example, from 0,1,2,3 to 240 values of 0-1,2-3, and so on), the num- 1k ber of rows requiring suppression de- creases, as shown in Figure 3. Further, 180 the mean grade approaches the true value (of 0.045) as bin size increases, 120 5k suggesting generalization may allevi- ate distortion by preventing records 10k associated with rarer quasi-identifier Number of Rows 15k 60 20k values from becoming suppressed. Suppressed (Thousands) 25k Generalization, however, can make it difficult to draw statistical conclu- 0 3-anon 4-anon 5-anon 6-anon 7-anon sions from a dataset. Certain statis- Forum Post Bin Capacity tical properties of a column, like its mean, can be maintained after gen- eralization by computing a weighted mean of the pregeneralized values Figure 5. Original and de-identified data, 5-anonymous, 3k bins. within each bin. The average of these bin averages will be equal to the true registered viewed explored certified mean of the pregeneralized values. registered, de-id viewed, de-id explored, de-id certified, de-id Such a solution, however, cannot easily preserve two-dimensional rela- MITx/8.MReV/2013 summer tionships among generalized values. MITx/8.02x/2013 Table 1 illustrates the correlation of spring the number of forum posts with vari- MITx/7.00x/2013 spring ous numeric attributes becomes in- MITx/6.00x/2013 creasingly distorted with increasing spring forum post bin size. MITx/6.00x/2012 fall Thus is encountered the fundamen- MITx/6.002x/2013 tal trade-off between generalization spring and suppression as discussed earlier: MITx/6.002x/2012 although an approach emphasizing fall suppression may introduce bias into MITx/3.091x/2013 spring an attribute where a correlation exists MITx/3.091x/2012 between quasi-identifier frequency fall and numeric attributes, generaliza- MITx/2.01x/2013 spring tion may also distort correlational and MITx/14.73x/2013 other multidimensional relationships spring inherent within datasets. HarvardX/PH278x/2013 Decreasing distortion introduced spring HarvardX/PH207x/2012 by generalization. One potential im- fall provement to generalization may be HarvardX/ER22x/2013 to distribute the number of records spring more evenly within each bin, using HarvardX/CS50x/2012 small bucket sizes for values that are HarvardX/CB22x/2013 well represented and larger bucket spring sizes for less-well-represented values. 0 25 50 75 100 When the number of forum posts Percent of Students is generalized into groups of five for values greater than 10 (for example, 1,2,3,…,11–15, 16–20, and so on), the attributes by de-identification. during de-identification. We therefore correlations between the number of What methods may alleviate distor- consider a prospective role for gener- forum posts and other characteristics tion introduced by de-identification? alization in alleviating distortion dur- become less distorted than with gen- The analyses here indicate associa- ing de-identification. eralization schemes that use constant tions between quasi-identifier traits Since the number of forum posts is bin widths. This suggests optimizing and numeric attributes may introduce the quasi-identifier whose frequency for equal numbers of records within distortion of means by suppression of values is most correlated to grade, each bin may enable a compromise

52 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 practice

between the loss of utility and the dis- Figure 6. Original and de-identified data, 5-anonymous, 5k bins. tortions caused in numeric analysis, such as correlations between different registered viewed explored certified variables. Using this framework for registered, de-id viewed, de-id explored, de-id certified, de-id generalization, let’s now explore its relationship to suppression in more MITx/8.MReV/2013 detail. summer MITx/8.02x/2013 spring A Trade-Off Between MITx/7.00x/2013 Generalization and Suppression spring MITx/6.00x/2013 To reach a compromise between the spring distortions introduced by suppres- MITx/6.00x/2012 sion and by generalization, we first fall want to quantify the relationship be- MITx/6.002x/2013 spring tween suppression and generaliza- MITx/6.002x/2012 tion. As generalization is increased, fall how much suppression is prevented, MITx/3.091x/2013 and does this change at a constant spring MITx/3.091x/2012 rate as generalization is increased? fall Each of the quasi-identifiers was MITx/2.01x/2013 individually binned to ensure a mini- spring MITx/14.73x/2013 mum number of records in each bin, spring termed bin capacity. An increase in bin HarvardX/PH278x/2013 capacity from 1,000 to 5,000 drasti- spring cally decreases the number of records HarvardX/PH207x/2012 fall that have to be suppressed, but this HarvardX/ER22x/2013 improvement drops off as bin capaci- spring ty continues to increase. Furthermore, HarvardX/CS50x/2012 in Figure 4, the decreasing slope of HarvardX/CB22x/2013 the lines as the bin size increases sug- spring gests the larger the chosen bin capaci- 0 25 50 75 100 ties, the smaller the marginal cost of a Percent of Students greater degree of anonymity. We then quantify the distortion that was introduced under each choice of bin capacity. Concentrating Table 1. Increasing distortion of correlation with increasing bin size. on sets that were 5-anonymous with bin capacities of 3k, 5k, and 10k, we Correlations Of Forum Posts With Numeric Attributes compare the resulting de-identified Bin size datasets with the original set on the percentage of students who simply Original 1 2 3 4 5 6 registered for the course; those who Grade ­0.159 0.105 0.0980 0.0919 0.0833 0.0732 0.0533 registered and viewed (defined as Viewed 0.0444 0.0683 0.0582 0.0462 0.0372 0.0294 0.0228 looking at less than half of the mate- Explored ­0.127 0.0744 0.0710 0.0661 0.0620 0.0554 0.0418 rial); those who explored (defined as Certified 0.152 0.0868 0.0810 0.0758 0.0699 0.0598 0.0482 looking at more than half of the ma- terial but not completing the course); # Active Days 0.236 0.117 0.111 0.106 0.0940 0.0855 0.0649 and those who were certified (com- # Chapters 0.154 0.143 0.127 0.115 0.100 0.0858 0.0715 pleted the material). This comparison # Events 0.283 0.103 0.103 0.0964 0.0986 0.0913 0.0597 shows the greatest disparity in the # Video Plays 0.0929 0.0943 0.105 0.103 0.125 0.110 0.0683 de-identification scheme that favors suppression; the results are skewed by as much as 20% with the suppression- emphasis de-identification approach. in some categories the distortion is of 5,000 entries, as shown in Figure A generalization scheme using bin large (such as the certification rates 6. The distribution of participation is capacities of 3,000 entries, as shown for MITx/7.00x during the Spring se- nearly the same in the de-identified in Figure 5, produces a distribution of mester), others are much closer to the set as in the original dataset. The max- participation that is somewhat closer original values. imum difference between the mea- to the original distribution than the The situation gets considerably sures is less than three percentage suppression-only approach. While better by using bins with a minimum points; most are within one percent.

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 53 practice

Figure 7. Original and de-identified data, 5-anonymous, 10k bins. Moving to a bin capacity of 10,000 gives even better results, as shown in Figure 7. While there are one or two registered viewed explored certified registered, de-id viewed, de-id explored, de-id certified, de-id cases of results differing by almost three percentage points, in most cas- MITx/8.MReV/2013 es the difference is a fractional per- summer centage. MITx/8.02x/2013 spring As expected, the decrease in the MITx/7.00x/2013 distortion of the mean of certain attri- spring butes is accompanied by an increase MITx/6.00x/2013 spring in the distortion of the correlation MITx/6.00x/2012 between quasi-identifier fields with fall numeric attributes as bin capacity in- MITx/6.002x/2013 creases. The table in Figure 8 shows spring MITx/6.002x/2012 the correlations between the number fall of forum posts and numeric attributes MITx/3.091x/2013 under various bin capacities. The col- spring umn corresponding to a bin capacity MITx/3.091x/2012 fall of 1 represents a suppression-only ap- MITx/2.01x/2013 proach. spring Encouraged, we observe a bin ca- MITx/14.73x/2013 spring pacity of 3,000 produces a dataset HarvardX/PH278x/2013 whose correlations are close to those spring of the original, non-de-identified data- HarvardX/PH207x/2012 set, as shown in Figure 8. Even though fall HarvardX/ER22x/2013 a bin capacity of 3,000 did not produce spring optimal results in terms of minimiza- HarvardX/CS50x/2012 tion of class participation distortion, these results may signal the existence HarvardX/CB22x/2013 spring of a bin capacity that produces an

0 25 50 75 100 acceptable balance of distortion be- Percent of Students tween single- and multidimensional relationships.

Further Opportunities Table 2. Number of rows suppressed: number of forum posts bin size vs. year of birth bin size. for Optimization Given these results, the question nat- Year Of Birth: Bin Capacity urally arises whether bin capacities may be chosen differently for each 100 200 300 400 500 quasi-identifier in order to minimize 100 1119 1074 1057 1017 1009 distortion further. 200 577 530 508 468 456 The edX dataset contains two nu- 300 ­400 v 356 337 291 280 meric, generalizable quasi-identifier 400 ­355 303 281 238 221 values: year of birth and number of 500 283 236 210 169 167 forum posts. Experimentation with 600 261 208 187 146 131 different bin capacity combinations 700 220 161 140 107 100 yielded the results shown in Table 2. This table illustrates the number of 800 224 164 148 112 101 records that must be suppressed with 900 208 138 129 88 76 the respective amounts of generaliza- 1000 186 122 111 72 61 tion. It is particularly noteworthy that 1100 183 120 109 70 61 generalization of each quasi-identifier 1200 164 94 78 53 46 has uneven effects: the required num- Number of Forum Posts: Bin Capacity 1300 145 81 70 48 41 ber of suppressed values drops off 1400 145 81 70 48 41 much more quickly as the bin capacity 1500 136 78 63 42 35 for number of forum posts increases, as compared with the bin capacity for 1600 134 76 61 45 35 year of birth. Such an analysis of the trade-offs between generalization versus sup-

54 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 practice

Figure 8. Correlation between number of forum posts with various attributes. promising path to solving the second of these problems, but there seems to be no magic bullet here; our best results were obtained by trying a grade explored # active days # events number of different combinations viewed certified # chapters # video plays 0.5 of generalization, sizing, and record suppression. There is further work to be done, such as investigating the 0.4 possibility of choosing different bin capacities for different quasi-identi- fiers, which may mitigate some of the 0.3 distortions introduced by anonymity. We are more confident than we were a year ago that some form of de-identi-

Correlation 0.2 fication may allow sharing of datasets without distorting the analyses done on those shared sets beyond the point 0.1 of usefulness, but there is much left to investigate.

0 original 1 25 50 100 1000 2500 3000 5000 10000 15000 Related articles on queue.acm.org Bin Capacity Privacy, Anonymity, and Big Data in the Social Sciences Jon P. Daries et al. Bin Capacity http://queue.acm.org/detail.cfm?id=2661641 Original 1 100 1000 2500 3000 5000 10000 Broadcast Messaging: Grade 0.145 0.155 0.037 0.071 0.113 0.161 0.369 0.386 Messaging to the Masses Viewed 0.039 0.063 0.013 0.019 0.028 0.038 0.122 0.129 Frank Jania http://queue.acm.org/detail.cfm?id=966719 Explored 0.127 0.128 0.037 0.060 0.095 0.133 0.314 0.332 Modeling People and Places Certified 0.138 0.141 0.036 0.067 0.108 0.153 0.333 0.349 with Internet Photo Collections # Active Days 0.207 0.207 0.068 0.123 0.177 0.227 0.434 0.449 David Crandall, Noah Snavely http://queue.acm.org/detail.cfm?id=2212756 # Chapters 0.145 0.155 0.039 0.067 0.128 0.174 0.367 0.383

# Events 0.294 0.294 0.073 0.135 0.167 0.222 0.407 0.420 References 1. Daries, J.P. et al. Privacy, anonymity, and # Video Plays 0.091 0.091 0.016 0.043 0.058 0.084 0.240 0.256 big data in the social sciences. Commun. ACM 57, 9 (Sept. 2014), 56–63. 2. Sweeney, L. k-anonymity: A model for protecting privacy. Intern. J. Uncertainty, Fuzziness and Knowledge-Based Systems 10, 5 (2002), 557–570. 3. Young, E. Educational privacy in the online classroom: FERPA, MOOCs, and the big data conundrum. Harvard pression becomes exponentially regulations around big datasets in- Journal of Law & Technology 28, 2 (2015), 549–592. harder as the number of quasi-iden- volving human subjects require a level tifier values increases. A brute-force of anonymity before those sets can Olivia Angiuli ([email protected]) is a data method of calculating the number of be shared. While there is some indi- scientist at Quora. She is ultimately interested in suppressed records would demand ex- cation regulators may be rethinking harnessing big data for social good. cessive computation time with data- the tie between de-identification and Joe Blitzstein ([email protected]) is a professor of the practice of statistics at Harvard University, whose sets like edX’s that contain six quasi- ensuring privacy, there is no indica- research is a mixture of statistics, probability, and identifier fields. The development of tion the regulations will be changed combinatorics. He is especially interested in graphical models, complex networks, and Monte Carlo algorithms. approximation algorithms for these anytime soon. For now, sharing will calculations would enable researchers require de-identification. Jim Waldo ([email protected]) is a Gordon McKay Professor of the Practice in Computer Science, to quickly determine a near-optimal But de-identification is hard. We a member of the faculty of the Kennedy School, and the Chief Technology Officer at Harvard University. His generalization scheme that strikes have known for some time it is dif- research centers around distributed systems and topics an ideal balance between distortions ficult to ensure the dataset does not in technology and policy, especially around privacy and introduced by generalization versus allow subsequent re-identification of cyber security. suppression. This is an area where fur- individuals, but we now find it is also ther research is needed. difficult to de-identify datasets with- out introducing bias into those sets Conclusion that can lead to spurious results.

De-identification techniques will con- A combination of record suppres- Copyright held by authors. tinue to be important as long as the sion and data generalization offers a Publication rights licensed to ACM. $15.00

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 55 practice

DOI:10.1145/2814346 Mary wrote sloppy code and did not Article development led by queue.acm.org write enough tests (she even showed me comments in the code like “hack:….” and “to do: someone should add more Catering to developers’ strengths error handling here”). Each of them had while still meeting team objectives. valid points and feedback for the other. I spent the next few weeks coaching BY KATE MATSUDAIRA each of them on how to improve and ad- dress the other’s concerns. With Mary I was focused on pushing her to write more tests and not just throw things to- gether, and with Melissa I was focused Lean Software on having her prototype something quickly and then add the polish after it was working. Both of them tried really hard, and both were miserable. It just Development— was not what they were meant to do. These two women were just not meant to collaborate. This got me thinking: how could we change the Building process so they both could write code the way they liked to create (and that catered to their strengths) and we could still meet our team objectives? and Shipping From this was born the v1/v2 devel- opment process. You see, certain developers love building the first versions or prototyp- Two Versions ing—they are the ones who love hack- ing things together to get something working quickly. They love (and are best at) building version 1 of a prod- uct. The other type of developers love building the second version. They see their code as a craft and write unit ONCE UPON A time (and isn’t that how all good stories tests for everything. “Test coverage” start?) I was managing a software team and we were and “beautiful code” are phrases they use a lot. working on several initiatives. Projects were assigned And of course this definition is not based on who was available, their skillsets, and their black and white—there are people who development goals. This resulted in two developers, fall on different sides of the line at differ- ent times, so it is more like a spectrum. let’s call them Mary and Melissa, being assigned to the Typically, the v1 person does not same project. like working with the v2 person, and vice versa—not for personal reasons, Mary and Melissa had been working together for a but because of the ways they think and few weeks when I started hearing complaints in my create software. At the heart of it, the one-on-ones with each of them about the other. Mary fundamental thing they enjoy about what they do is different. was complaining that Melissa was taking too long to By changing the way we thought do her part, and spending time on unit tests that did about software, we were able to ad- dress this concern and actually make not make sense because things were in flux with the the team more agile and solve some project. Meanwhile, Melissa was complaining that business problems along the way.

56 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 The V1/V2 Process lems or scale in a way that may not be challenge. So in some ways, designing When it comes to software develop- useful), or even have other unforeseen for scale when you do not have to scale ment there are many ways to build and issues. While there are certainly pros yet is solving the wrong problem. create products. to doing it right the first time, in prac- Of course I am not advocating doing In my experience it is difficult to tice it is very difficult to get it right, and something stupid, or writing bad code, build the right product the first time. building the end-all system will take but do not over-engineer or solve prob- By “right” I do not just mean something longer, delaying the answers to all of lems before you have them. Here are customers will want and use, and that the questions indicating you are mov- some examples: hopefully will generate revenue, but ing in the right direction. ˲˲ Foregoing the deployment of a also the right technical solution. How There is a popular movement called multi-node Cassandra cluster and stor- customers will use products can be dif- Lean Startup that advocates fast itera- ing the initial data in a single database ficult to predict. For example, before tion and data collection to home in on (with backup) that is fast and easy to you ship the first version of a product it the right product and requirements make arbitrary queries against; can be difficult to answer questions like: as quickly as possible. It is all about ˲˲ Faster UI development using stan- ˲˲ How quickly will the data grow? figuring out what customers will use dard forms and charts versus interest- ˲˲ How fast will writes need to be? and then building those things. These ing shapes or fancy custom graphs; and ˲˲ Will it be feasible to write directly ideas can also be applied to the techni- ˲˲ Cutting corners on unit tests (be- to the data store or do you need a queue cal ways we build products. cause the units are in flux) or com- to manage writes? ments and documentation. ˲˲ What will the throughput be, and Build V1 Fast, Then Build V2 Right All these shortcuts will help you get is it enough? Similar to creating a minimum viable something out faster. They may not be ˲˲ Will the system scale with usage? product (MVP), build the minimum best practices but they do ensure you Even if you could reasonably an- viable technical implementation that are building the right thing. swer these questions, or if you built a meets the business goals (meaning In many ways, this process is about system that took all these things into the product can meet current usage re- looking at the problems differently account such that you could “just add quirements and performance). and, instead of solving the “big” prob- hardware,” the effort would probably Most engineering teams have been lem, focusing on speed and efficien- be larger than if you had simply pushed taught to think about software design cy—getting the highest ROI (return to ship a first version quickly. And it is and architecture from the beginning on investment) for your development possible (if not always probable) such and to build something that can scale. resources. a system would be over-engineered However, if you do not have any cus- So you modify your development

IMAGERY BY 3D-SPARROW BY IMAGERY (since it was designed to solve prob- tomers yet, scaling really is not your process to ship the first version quick-

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 57 practice

ly, and then once v1 is out in the wild, be finished later and the total develop- The v2 will take much longer than the plan to start on v2 right afterward. ment time will be greater than if you v1. I have personally seen this process Then your v2 can address all the built the product once. If you build the evolve across four projects (no, it is not problems with the first version of the v1 using one technology (say, a MySQL a large sample set), and in each case product. You can improve usability, database that will not scale the way you the v2 took more than twice as long as cut or add features based on custom- need it to), and then the v2 needs a dif- the v1 to create and launch. Of course, er usage and feedback, pick the right ferent one (say, a proper key value store every v2 I have launched had a lot of ex- technology and libraries to power the for faster queries), the team will spend tra features, and they did not always le- product, and spend time writing accep- time shipping and building expertise verage a lot of the v1 code. Some people tance tests and unit tests are unlikely to on one database, only to have to learn assume the v2 will be faster, since “you change. From a business perspective, and manage another. have already built it once before,” but I you get the product out there faster, so Operations can be a headache. Most have not found this to be the case. it is easy to see if this product is going seasoned engineers will tell you that Have a tight feedback loop with your to drive revenue and success. building something really fast can customers. Make sure it is easy to get result in operations nightmares if insight into what people use, what they Why This Process Makes Sense the usage or data grows in a way that like, and what they do not like—that It reduces the risk of over-engineering. was not considered. No team likes way you can build a great v2. As noted earlier, getting something being operational and fighting fires. Make sure v1s and v2s are celebrated out quicker and understanding how Of course, if this is a problem it prob- equally. Sometimes teams or compa- it is used (or how successful it is in ably means your product has achieved nies celebrate the first launch of a fea- the market) will help confirm you some adoption and success, so at ture in a much grander way than the are building the right product and least you are solving something that next version. But in this case the next solving the right technical problems produces real value. However, even for version is what you need to grow your for your business. This allows you to the v1 it is still worthwhile to identify business. It is just as important as v1, mitigate the risk of building or scal- risks and potential bottlenecks, so in if not more so. Make sure you value ing out systems that may not address the event there are problems the team each of these equally, or people will the production issues. will have some ideas for solutions. gravitate to the project with the most If you build the wrong thing you can fanfare, not necessarily the one suited correct it. Since the product is truly the So You Want to Try It to their strengths and talents. minimum viable product, you reduce at Home … Err … Work? Be open to people being good at both the risk of wasting resources creating Like any software process or method- v1s and v2s. Sometimes I think I have the wrong product. Furthermore, you ology it is important to assess if this someone pegged and then they prove will get feedback earlier in the release makes sense for your business, prod- me wrong. Be open to the fact there are cycle and can pivot or make changes uct, and company—it certainly is not a not two types of people, and some peo- faster (since it is always more difficult one-size-fits-all. However, it definitely ple are just good at everything. (I just to change a system with a plethora of has its merits, so feel free to steal the wish I was one of those people!) moving parts). pieces you like. I hope some of these ideas will prove Chances are you will get to market If you do embark on this journey, useful for you and your team. If any- faster. Building something quickly will here are some pointers from my expe- thing, let this anecdote inspire you to help you launch sooner, which will rience: question the way you are doing things, help you test and determine the viabil- Get management buy-in. This is really and look critically at innovative ways to ity of your product faster than waiting critical, because a key to this approach improve how you build software. for the bigger launch. This means you being successful is always building a will not spend time building software v1 and a v2. If you do not have buy-in Related articles your customers will not use or you can- and your boss is content to keep and on queue.acm.org not sell. operate the v1, your team is going to Staffing and personnel happiness. A Conversation with Tim Bray hate you. Seriously. No one wants to http://queue.acm.org/detail.cfm?id=1046941 This is one of the biggest upsides to get stuck supporting crufty v1 soft- this method. Software engineers get to ware indefinitely. So make sure every- Big Games, Small Screens Mark Callow, Paul Beardow, and David Brittain do what they love, in a way that caters to one knows and understands the plan http://queue.acm.org/detail.cfm?id=1331296 their strengths. Moreover, they are not ahead of time. Major-league SEMAT - Why Should an forced to work with people who do not Measure everything. In order to build Executive Care? share their approach, and they do not the right v2 you need to know what to Ivar Jacobson, Pan-Wei Ng, Ian Spence, and have to work in a code base they cannot build. Where are your bottlenecks? Paul E. McMahon sculpt into their desired piece of art. How is the data growing? Without this http://queue.acm.org/detail.cfm?id=2590809 data it is very difficult to realize many Kate Matsudaira (katemats.com) is the founder of So, What Are the Downsides? of the advantages of this model. Make her own company Popforms. Previously, she worked Total time to build is more than if you sure you think about what you need to in engineering leadership roles at companies such as just got it right the first time. In this measure, and how you will measure it, Decide (acquired by eBay), Moz, and Amazon. model it is likely v2 development will before the product ships. © 2015 ACM 00010782/15/11 $15.00

58 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 DOI:10.1145/2814328

Article development led by queue.acm.org Optimizing NUMA systems applications with Carrefour.

BY FABIEN GAUD, BAPTISTE LEPERS, JUSTIN FUNSTON, MOHAMMAD DASHTI, ALEXANDRA FEDOROVA, VIVIEN QUÉMA, RENAUD LACHAIZE, AND MARK ROTH Challenges of Memory Management on Modern NUMA Systems

MODERN SERVER-CLASS SYSTEMS are typically built as several multicore chips put together in a single system. Each chip has a local DRAM (dynamic random-access memory) module; together they are referred to as a node. Nodes are connected via a high- speed interconnect, and the system is fully coherent.

This means that, transparently to the uniform memory access (or NUMA). programmer, a core can issue requests Systems with NUMA characteristics to its node’s local memory as well as to were built as early as the 1980s, and the memories of other nodes. The key along with the hardware operating sys- distinction is that remote requests will tem, support for NUMA has evolved. take longer, because they are subject Modern NUMA systems are quite dif- to longer wire delays and may have to ferent from the old ones, so we must jump several hops as they traverse the revisit our assumptions about them interconnect. The latency of memory- and rethink how to build NUMA-aware access times is hence non-uniform, be- operating systems. This article evalu- cause it depends on where the request ates performance characteristics of a originates and where it is destined to representative modern NUMA system, go. Such systems are referred to as non- describes NUMA-specific features

MONTH 2009 | VOL. 00 | NO. 00 | COMMUNICATIONS OF THE ACM 59 practice

in Linux, and presents a memory- core accesses memory from within the 1 is a diagram of a typical NUMA sys- management algorithm that delivers same node, it is called a local access. tem with four nodes and four cores per substantially reduced memory-access Similarly, an access to a different node node. At the time of this writing, NUMA times and better performance. is called a remote access. Remote ac- systems are built with up to eight nodes cesses have longer latencies than local and 10 cores per node. A Modern NUMA System ones, because they must traverse one Current x86 NUMA systems are NUMA systems consist of several nodes, or more interconnect links, communica- cache coherent (called ccNUMA), which each containing a subset of the system’s tion pathways between nodes that also means programs can transparently CPU cores and a portion of its RAM. If a service cache-coherency traffic. Figure access memory on local and remote nodes without changes to the code or Figure 1. A modern NUMA system. special operating system support. This allows easy migration to NUMA sys- tems, but it does not address important C1 C2 C5 C6 L3 Cache L3 Cache performance considerations. A naïve C3 C4 C7 C8 implementation—for example, a pro- Node 1 Node 2 gram that allocates all of its memory on Memory Node 1 Memory Node 2 a single node—can easily cause exces- sive remote accesses or the overloading of a memory controller. New vs. old NUMA systems. Avoiding Memory Node 3 Memory Node 4 performance pitfalls on NUMA systems requires considering how the nodes are C9 C10 C13 C14 connected, where the program’s mem- Node 3 L3 Cache L3 Cache Node 4 C11 C12 C15 C16 ory is placed, and how it accesses that memory. Previous NUMA-aware operat- ing systems focused on locality, attempt- ing to minimize the number of remote Figure 2. Performance differences. accesses at all costs in order to avoid the performance penalty. Modern NUMA 20 systems, however, have a strikingly dif- ferent latency profile compared with 15 the older ones. A remote access takes approximately 30% longer than a local one,2,7 while on older hardware, it could 10 take up to seven times longer.3 The re- mote-access penalty is substantially re- 5 duced on modern NUMA systems.

Performance Difference (%) Performance On the other hand, current CPUs can generate an immense load on the 0

BT CG DC EP FT IS LU MG SP UA memory subsystem, causing conges- x264 PCA Facesim Kmeans Wrmem tion on memory controllers and inter- Bodytrack Swaptions Matrixmult Fluidanimate Streamcluster connect links (if requests are remote). (a) Local vs. Remote Differences for Single-Threaded Applications If multiple cores are heavily accessing a single node, memory latencies can be as long as 1,200 cycles (!) due to conges- 100 tion, while normal latencies are only around 300 cycles. Avoiding memory 80 controller and interconnect conges-

60 tion, therefore, becomes the key con- cern on modern NUMA systems. Here, 40 we examine the effects of congestion on performance.

Performance Difference (%) Performance 20 NUMA Performance— 0 Locality And Congestion

BT (F) CG (F) DC (F) EP (–) FT (F) IS (I) LU (F)MG (F) SP (F) UA (F) x264 (I) PCA (I) Benchmarks can provide a complete Facesim (I) Kmeans (I) Wrmem (F) Bodytrack (–) Swaptions (–) Matrixmult (–) picture of the performance character- Fluidanimate (–) Streamcluster (I) istics of NUMA systems. The Numeri- (b) First-touch vs. Interleave Differences for Multithreaded Applications cal Aerodynamic Simulation (NAS),1 Princeton Application Repository for

60 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 practice

Shared-Memory Computers (PAR- memory accesses will be balanced. SEC),10 and Metis MapReduce9 suites Both policies work at the granularity were chosen here because they have a of a page (typically 4KB). CPU utilization greater than 30%, al- Figure 2b shows the absolute perfor- lowing the focus to be on NUMA effects mance difference between first-touch and not other factors such as disk I/O Avoiding (F) and interleave (I) for multithreaded or blocking synchronization. The ex- performance versions of applications. The applica- periments were conducted on an AMD tions are labeled (F) or (I) depending on server with four quad-core CPUs, as we pitfalls on NUMA which policy performed best. The fig- describe later. systems requires ure compares the two policies by show- The first experiment quantified the ing the performance difference be- effect of only the remote-access pen- considering how tween the best and worst policy for each alty, without the presence of memory- benchmark. If there was no negligible subsystem congestion. To do this, the the nodes are difference, the application is labeled benchmarks were run with only a sin- connected, where (–). The first observation to make is that gle , limiting the pressure on the no one policy is best for all applications. memory controllers and interconnects, the program’s Several applications perform best with and then compared under two differ- memory is placed, the first-touch policy, but many prefer ent memory configurations: local and interleaving. The second observation is remote. In the local memory configura- and how it accesses that NUMA effects beyond the remote- tion, applications were executed with that memory. access penalty can indeed severely af- their memory and thread on the same fect performance. For the Streamclus- node. In the remote memory configu- ter benchmark, using the first-touch ration, standard Linux tools were used policy nearly doubled the running time to force the application thread to run over the interleave policy. on a different node from its memory. We further investigated the NUMA Therefore, in the remote-memory case performance characteristics of Stream- all memory accesses were remote, and cluster and PCA (another benchmark in the local-memory case all memory that has significant performance loss accesses were local. with the first-touch policy) by using Figure 2a shows performance dif- hardware performance counters to ferences between local and remote gather the following key metrics: memory configurations for the single- ˲˲ Local access ratio. The portion of threaded versions of applications used RAM accesses that result in a local access. in this experiment. Performance never ˲˲ Memory latency. The number of cy- degraded by more than 20%, even when cles it takes to perform a RAM access, all memory requests were remote. on average. Higher latencies mean the Although the remote-access penal- CPU must stall for longer on a last-level ty is worth minimizing when possible, cache miss, which will negatively affect that is not the whole story of NUMA performance. performance effects. To demonstrate ˲˲ Memory-controller imbalance. The this, the benchmarks were run using standard deviation (as a percentage multiple threads, with one thread per of the mean) of the load on the mem- core and under two different com- ory controllers, where the load is mea- mon NUMA memory-allocation poli- sured as the number of requests per cies: first-touch and interleave. Linux’s time unit. Along with interconnect im- default policy is first-touch, where balance, it is a sign of congestion. memory is allocated on the same ˲˲ Average interconnect usage. The node as the thread that first accesses a average bandwidth utilization of the memory page. The first-touch policy is interconnect links. A low interconnect meant to maximize local accesses over usage could imply that either the appli- remote accesses, but of course it can- cation is not very memory intensive, or not guarantee local accesses because there is imbalance because some links threads on multiple nodes can share are left underutilized. data. On the other hand, the interleave ˲˲ Average interconnect imbalance. policy distributes memory alloca- The standard deviation (as a percent- tions equally on all nodes regardless age of the mean) of the bandwidth uti- of which threads access it. Interleav- lization of interconnect links. ing ensures that memory allocations ˲˲ L3MPKI. The number of last-level are balanced but not necessarily that cache misses per 1,000 instructions.

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 61 practice

This is a relative indicator of how much ing remote accesses. That is not to say pressure an application puts on the reducing remote accesses is not im- memory subsystem and of how sensitive portant (they do, after all, add latency an application is to memory latencies. and contribute to interconnect conges- ˲˲ Instructions per cycle (IPC). For tion), but this should not be the only the same application and workload, a A NUMA memory- goal. Managing congestion effectively higher IPC means better performance. management means being concerned with how the The metrics for Streamcluster and memory-access traffic is spread across PCA are reported in the accompany- algorithm should the system. It is not enough simply to ing table. Traffic congestion effects place importance use interleaving. Many applications are highlighted by differences in key do not suffer from imbalance, so they NUMA metrics for each benchmark on congestion would needlessly incur remote-access under the first-touch (F) and interleave management, delays (for example, the benchmarks (I) policies. The performance differ- in Figure 2b that prefer the first-touch ence between the two NUMA policies rather than focusing policy). The algorithm must be able to cannot be explained by a change in the intelligently place memory based on last-level cache miss rate, which stays solely on reducing the application’s access patterns, such the same. Nor can it be explained by remote accesses. that congestion is reduced whenever the local-access ratio, which stays the possible but locality is not sacrificed same for Streamcluster and in fact is when congestion is minimal. Since ac- worse for PCA in the case of the better cess patterns are not known a priori, performing interleave policy. The local- the algorithm must also be able to de- access ratio of PCA drops from 33% to termine the access patterns and move 25% when interleaving memory, but per- memory at runtime with low overhead. formance improves significantly, so the Later, we present our NUMA algo- conclusion is that better locality does rithm, called Carrefour, which takes all not necessarily improve performance. of these considerations into account. When using the first-touch policy First, though, we describe existing both applications show signs of con- NUMA tools available on Linux. (This gestion with high last-level cache miss article gives an overview of the Carre- rates, memory-controller imbalance, four algorithm. Please see Dashti et al.6 and interconnect imbalance. The for a complete discussion, including congestion results in high memory la- implementation details and exhaus- tencies. In the case of Streamcluster, tive experimental results.) the average memory latency with the NUMA on Linux. Linux allows ad- first-touch policy is more than double ministrators to set the NUMA policy the latency of the interleave policy. for applications via the numactl util- Interleaving balances the memory ity. The NUMA policies available are among the nodes, which reduces traf- first-touch, interleave, and restricting fic hotspots and congestion, and there- allocations to specific nodes. As de- fore improves memory latency and scribed earlier, first-touch is the pol- overall performance. A visualization icy of allocating memory on the same of the memory traffic and congestion node as the CPU that first accesses the of Streamcluster is shown in Figure 3; memory page, and the interleave policy traffic imbalance under first-touch is distributes memory pages on nodes in shown on the top and interleaving on a round-robin manner. the bottom. Nodes and links bearing Linux exposes manual NUMA mem- the majority of the traffic are shown ory-management functions to pro- proportionately larger in size and in grammers through the libnuma library brighter colors. The percentage values and associated system calls. This al- show the fraction of memory requests lows a program to query the NUMA to- destined for each node. pology, set NUMA policies for specific address ranges, and migrate memory NUMA Memory pages to different nodes at runtime. Placement Strategies (See Lameter8 for detailed information The results in Figure 2 and the table on Linux’s NUMA facilities.) motivate a NUMA memory-manage- Linux also provides robust support ment algorithm that places impor- for hardware performance counters, tance on congestion management, which are used for counting CPU events rather than focusing solely on reduc- such as cycles elapsed, instructions re-

62 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 practice tired, branch mispredictions, or cache but not at the cost of introducing load the application is unlikely to benefit misses. These events can be used to imbalance among CPUs. from better memory placement, so this calculate the important NUMA metrics Carrefour is a memory-placement rule prevents the overhead of Carrefour listed previously. Perf is the standard algorithm for NUMA systems that fo- when it is not needed. If the algorithm Linux tool for using hardware perfor- cuses on traffic management: placing does remain enabled, then Carrefour mance counters to profile applications. memory so as to minimize congestion iterates over the memory pages for It can gather data for several events on interconnect links or memory con- which it has gathered statistics and ap- with negligible performance overhead trollers. plies the replication, collocation, and and minimal developer effort. Carrefour uses global information interleaving techniques. Hardware instruction sampling is an and memory-usage statistics to inform We implemented memory replica- advanced CPU feature similar to perfor- three primary techniques for limiting tion in the Linux kernel with a patch mance counters. With instruction sam- congestion: pling, a proportion of instructions are ˲˲ Memory collocation. Moving mem- Figure 3. A composable select element. tagged by the hardware. Tagged instruc- ory to a different node so accesses will tions will record extra information about likely be local. their execution. It is necessary for ob- ˲˲ Replication. Copying memory to taining some NUMA-related statistics, several nodes so threads from each including memory-access latency and node can access it locally (useful for 1% the addresses of memory accesses. The read-only and read-mostly data). 97% feature is implemented as IBS (instruc- ˲˲ Interleaving. Moving memory tion-based sampling) on AMD CPUs such that it is distributed evenly and as PEBS (precision event-based among all nodes. sampling) on Intel CPUs. Unfortunately, All three of these techniques have Linux support for hardware-instruction been analyzed individually in prior sampling is limited and requires a cus- studies, but Carrefour combines them tom kernel module for most uses. into a novel algorithm that is effective AutoNUMA. AutoNUMA4 aims to for modern NUMA systems. 1% 1% provide Linux with a more proactive To combine these techniques and NUMA solution. A kernel task routinely apply them judiciously, Carrefour First-Touch iterates through the allocated memory collects per-page, per-process, and of each process and tallies the number global statistics from hardware perfor- of memory pages on each node for that mance counters. Carrefour also uses 25% 25% process. It also clears the present bit on hardware-instruction sampling to log the pages, which will force the CPU to which threads and nodes access which stop and enter the page-fault handler memory pages. Instruction sampling when the page is next accessed. lets Carrefour gather many more sam- In the page-fault handler it records ples at low overhead than AutoNUMA’s which node and thread is trying to ac- page-fault handler technique because cess the page before setting the present it has lower overhead per sample. bit and allowing execution to continue. The first metric Carrefour uses is the 25% 25% Pages that are accessed from remote number of RAM accesses per microsec- nodes are put into a queue to be mi- ond. If it is less than the threshold (50 grated to that node. After a page has in our experiments), then the rest of the already been migrated once, though, algorithm is completely disabled until Interleave future migrations require two recorded it becomes greater than the threshold. accesses from a remote node, which is If the rate of RAM accesses is low, then designed to prevent excessive migra- tions (known as page bouncing). Traffic congestion effects. AutoNUMA’s memory-placement algorithm, now known as Automatic Streamcluster PCA NUMA Balancing, has been merged Best (I) Worst (F) Best (I) Worst (F) into the Linux kernel. It can be enabled Local-access ratio 25% 25% 25% 33% through the sysctl interface by setting Memory latency 476 1197 465 660 kernel.numa _ balancing to 1. Memory-controller imbalance 8% 170% 5% 130% AutoNUMA also uses thread place- Interconnect imbalance 22% 85% 20% 68% ment to try to improve locality. The Interconnect usage 59% 33% 48% 31% scheduler will consider migrating or L3MPKI 16.85 16.89 7.35 7.4 swapping threads if it will cause more IPC 0.29 0.15 0.52 0.36 of the thread accesses to be local (based on the gathered page-fault statistics)

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 63 practice

to the virtual memory layer, and our free memory available, and at least ing particular pages is simple: pages implementation is able to automati- 95% of the application’s memory ac- are replicated if they are observed to cally maintain consistency when there cesses must be reads because the per- have accesses from multiple nodes in is a write to a replicated page. To en- formance cost of a write to a replicated read-only mode. Replication improves able replication there must be enough page is quite high. The rule for replicat- both locality and congestion because a replicated page can be accessed locally Figure 4. PARSEC and Metis. from more than one node. Collocation is enabled if the local 20 access ratio is less than 80%. Pages that have been accessed only by a single re-

15 mote node are migrated to that node, thereby improving locality. The primary purpose of interleav- 10 ing is to alleviate congestion by dis- tributing memory—and, therefore, memory accesses—among multiple 5 nodes. The first step is to consider

Performance Difference (%) Performance the memory-controller imbalance. If it is below 35%, then interleaving is 0 deemed unprofitable and it is disabled BT CG DC EP FT IS LU MG SP UA x264 PCA globally. Otherwise, pages that have Facesim Kmeans Wrmem Bodytrack Swaptions Matrixmult Fluidanimate recorded read and write accesses from Streamcluster more than one node are migrated to a (a) Local vs. Remote Differences for Single-Threaded Applications random node, where the probability of being migrated to a specific node is in- versely proportional to the relative load 100 on that node’s memory controller. The source code for Carrefour is avail- 80 able at https:// github.com/Carrefour.

60 Evaluation Testbed. All experiments were con- 40 ducted on an AMD system with 64GB of RAM and four quad-core Opteron 8385

Performance Difference (%) Performance 20 processors running at 2.3GHz. It is di- vided into four NUMA nodes with four 0 cores and 16GB of RAM per node (the

BT (F) CG (F) DC (F) EP (–) FT (F) IS (I) LU (F)MG (F) SP (F) UA (F) topology is shown in Figure 1) intercon- x264 (I) PCA (I) Facesim (I) Kmeans (I) Wrmem (F) nected with HyperTransport 1.0 links. Bodytrack (–) Swaptions (–) Matrixmult (–) Fluidanimate (–) Streamcluster (I) The operating system was Linux ker- (b) First-touch vs. Interleave Differences for Multithreaded Applications nel v3.6, and the AutoNUMA configura- tion used v27 of the patch. A variety of multithreaded bench- marks were used for the evaluation: Figure 5. NAS parallel benchmarks. PARSEC benchmark suite v2.1,10 Fa- ceRec v5.0,5 Metis MapReduce bench- 9 autoNUMA Manual Interleaving Carrefour mark suite, and the NAS parallel 1 10 benchmark suite v3.3. The PARSEC benchmarks used the “native” work- 0 loads, and the NAS benchmarks used problem sizes that provided running –10 times of at least 10 seconds. Applica- tions that had CPU utilizations below –20 33% were excluded because they are not affected by memory-management poli- –30 with Respect to Linux (%) cies. Each configuration and bench- Performance Improvement Performance mark was run 10 times, which resulted –40 BC CG DC EP FT IS LU MG SP UA in standard deviations of less than 2% for the Carrefour, default Linux, and in- terleaving configurations. AutoNUMA

64 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 practice

Figure 6. Load imbalance for selected benchmarks.

Linux autoNUMA Manual Interleaving Carrefour

160 80 140 120 60 100 80 40 60 Load Imbalance Load Imbalance 40 20 on Interconnect Links (%) on Memory Controllers (%) 20 0 0

MG SP MG SP PCA PCA Stream- Long Stream- Long Facesim cluster FaceRec FaceRec- Facesim cluster FaceRec FaceRec-

(a) Memory Controllers (b) Interconnect Links gave standard deviations of up to 9%. in two cases Carrefour greatly outper- Carrefour improves the performance Performance. We evaluated Car- forms the second-best technique. It of Streamcluster by 180%, and manual refour’s performance against Linux’s improves the performance of FaceRec interleaving improves it by only 100%. default policy (first-touch), manually in- Long by 120% over default Linux, One exception is IS, which is im- terleaving memory using Linux’s inter- where manual interleaving improves proved by manual interleaving but not leave policy, and the AutoNUMA patch. performance by only 60%. Similarly, by Carrefour. Carrefour’s sampling Figures 4 and 5 show the performance improvement relative to default Linux. Figure 7. DRAM latency and locality for selected benchmarks. There are three general classes of applications. First are those that Linux autoNUMA Manual Interleaving Carrefour have the same performance no matter 120 which NUMA technique is used (for example, Bodytrack and Swaptions). 100 These applications are not memory in- tensive and tend to have a low last-level 80 cache miss rate. They also do not suf- 60 fer much overhead from Carrefour or AutoNUMA, because most of the over- Ratio Of Local 40 head is proportional to the memory in- Memory Accesses (%) 20 tensiveness of the application. The second class of applications is 0 Facesim Stream- FaceRec FaceRec- PCA MG SP memory intensive, but the default first- cluster Long touch policy works well for them. BT, (a) Local Memory Access Ratio, Higher is Better CG, DC, FT, MG, and UA fall into this category. For these applications, man- ual interleaving hurts performance be- 1200 cause it eliminates the locality benefit of first-touch without reducing conges- 1000 tion. On the other hand, Carrefour does 800 not cause poor memory placement but only has a small overhead. 600 The remaining benchmarks suf- fer from poor memory placement un- 400 der default Linux. AutoNUMA is able 200

to improve the performance of some Latency (Nbcycles/req) Average these applications, but for others (for 0 Facesim Stream- FaceRec FaceRec- PCA MG SP example, FaceRec and PCA) it has only cluster Long a small impact. Carrefour, on the other (b) Average Memory Latency, Lower is Better hand, significantly improves the per- formance of these applications, and

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 65 practice

and migration rate cannot keep up are significant and the problem is non- Acknowledgments with the burst of traffic produced by IS, trivial, which motivates careful study We thank Oracle Labs and the Brit- so the memory is not balanced in time and a comprehensive solution. ish Columbia Innovation Council for to improve performance. Contrary to previous NUMA stud- funding this work. We further profiled select applica- ies, our experiments found congestion tions in order to see how Carrefour af- causes the most serious NUMA prob- Related articles fects the key imbalance, locality, and lems. Congestion happens when the on queue.acm.org latency metrics. Figures 6 and 7 present rate of requests to memory controllers NUMA (Non-Uniform the results. Figure 6 shows the load im- or the rate of traffic over interconnects is Memory Access): An Overview balance for selected benchmarks. Low- too high, which causes excessive delays Christoph Lameter er is better. for memory accesses. It can be alleviated http://queue.acm.org/detail.cfm?id=2513149 In Figure 6a Carrefour consistently by balancing the traffic among multiple Scalability Techniques for Practical minimizes the imbalance on memory memory controllers and interconnect Synchronization Primitives controllers, as does manual interleav- links. The other factor of NUMA per- Davidlohr Bueso http://queue.acm.org/detail.cfm?id=2698990 ing. AutoNUMA is sometimes able to formance is locality, which is what pre- reduce the imbalance but usually not vious NUMA algorithms have focused Photoshop Scalability: Keeping It Simple to the same degree, and in the case of Clem Cole and Russell Williams on. Good locality means that most of http://queue.acm.org/detail.cfm?id=1858330 FaceRec it makes the imbalance worse the memory accesses will be to the local than default Linux. The imbalance on node and therefore do not pay the laten- References interconnect links as depicted in Fig- cy cost of traversing interconnect links. 1. Bailey, D. NAS Parallel Benchmarks. RNR Technical ure 6b shows similar trends. As shown earlier, the two NUMA Report (1994); http://www.nas.nasa.gov/publications/ npb.html. Although manual interleaving is able concerns of congestion and locality are 2. Boyd-Wickizer, S. et al. Corey: An operating system for to reduce imbalance, it always does so at difficult to reconcile, and for any par- many cores. In 8th Usenix Symposium on Operating Systems and Design (2008), 43–57. the cost of locality. This is made evident ticular application we cannot know the 3. Brecht, T. On the importance of parallel application in Figure 7a. Carrefour, on the other best memory placement beforehand. placement in NUMA multiprocessors. In Proceedings of the Usenix Symposium on Experiences with hand, always has the highest or nearly Carrefour uses hardware performance Distributed and Multiprocessor Systems (1993) 4, 1. the highest local memory-access ratio. counters and hardware sampling to de- 4. Corbet, J. AutoNUMA: The other approach to NUMA scheduling. LWN.net (2012); http://lwn.net/ For applications that have good locality termine an application’s memory-ac- Articles/488709/. under default Linux (MG and SP), Au- cess patterns online with low overhead. 5. CSU Face Identification Evaluation System. Evaluation of Face Recognition Algorithms. Colorado toNUMA retains the good locality but is It then uses that knowledge to apply State University, 2010; http://www.cs.colostate.edu/ able to improve the locality of Facesim. evalfacerec/index10.php. three page-level techniques: memory 6. Dashti, M. et al. Traffic management: A holistic The effects of imbalance and the replication, memory interleaving, and approach to memory placement on NUMA systems. In Proceedings of the 18th International Conference local access ratio are reflected in the memory collocation. Each technique on Architectural Support for Programming Languages memory-access latency, shown in Figure serves a specific purpose: collocation and Operating Systems (2013), 381–394. 7. David, T., Guerraoui, R. and Trigonakis, V. Everything you 7b. As expected, Carrefour produces the improves locality, interleaving reduces always wanted to know about synchronization but were lowest (or is tied for the lowest) average imbalance, and replication does both afraid to ask. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (2013), 33-48. memory latencies for each profiled ap- in situations when reads vastly outnum- 8. Lameter, C. An overview of non-uniform memory plication. There is also a strong correla- ber writes. The novelty of Carrefour is access. Commun. ACM 56, 9 (2013), 59–65. 9. Metis MapReduce Library; http://pdos.csail.mit.edu/metis/. tion between the average memory laten- in combining these strategies and ap- 10. PARSEC Benchmark Suite; http://parsec.cs.princeton.edu/. cy and a benchmark’s performance. For plying each only when appropriate.

example, Streamcluster and FaceRec The result is Carrefour is able to Fabien Gaud is a senior software engineer at Coho Data, have large reductions in memory latency improve performance compared with focusing on performance and scalability. with Carrefour, and they show large per- default Linux for many applications. Baptiste Lepers is a postdoc at EPFL. His research topics include performance profiling, optimizations for NUMA formance improvements in Figure 4. For two benchmarks, Streamcluster systems, and multicore programming. Overall, we can conclude Carrefour and FaceRecLong, performance is Justin Funston and Mohammad Dashti are Ph.D. systematically fixes NUMA memory- more than doubled by using Carrefour. students at the University of British Columbia. Funston’s research interests include memory management, thread placement issues in nearly all situa- Unlike manual interleaving and Au- scheduling, and multicore systems, Dashti research tions, is able to greatly outperform oth- toNUMA, Carrefour never significantly focuses on operating systems, GPGPU, and heterogeneous CPU/GPU systems. er techniques in some circumstances, degrades performance by making im- Alexandra Fedorova is an associate professor in the ECE and does not significantly hurt perfor- proper page placements. department at the University of British Columbia. Her mance for any application. As NUMA systems grow and the num- research focuses on performance, usability, and energy- efficiency of computer systems. ber of cores issuing memory requests Vivien Quéma is a professor at Grenoble INP (ENSIMAG), Conclusion increases, NUMA effects will continue France. His research is about understanding, designing, NUMA architecture is for scaling the being a concern. Carrefour demon- and building (distributed) systems. processor count of today’s server-class Renaud Lachaize is an assistant professor at the strates a collection of techniques that University of Grenoble, France. His research interests are systems. In the near future, expect sys- effectively reduce these concerns. Devel- in the area of operating systems and distributed systems, tems to have even more NUMA nodes opers can use the methods and insights with currently a particular focus on multicore systems. and more complicated NUMA topolo- gained from Carrefour, along with the Mark Roth currently works at Google as an engineer.

gies. The experiments described here tools described earlier, to optimize their Copyright held by authors. Publication rights licensed to show the performance effects of NUMA applications for NUMA systems. ACM. $15.00

66 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 ACM Books. In-depth. Innovative. Insightful.

The VR Book: Human-Centered Design for Virtual Reality By Jason Jerald, PhD The VR Book focuses on human-centered design of virtual reality (VR) experiences. Creating compelling VR experi- ences is an incredibly complex challenge. When VR is done well, these experiences can be brilliant and pleasurable, but when done badly, they can result in frustration and sickness. While there are many causes of bad VR such as the limitations of technology, much is centered on a lack of understanding of human perception, interaction, design principles, and real users. This book focuses on human elements of VR, such as how users perceive and intuitively interact with various forms of reality, causes of VR sickness, creating useful and pleasing For more info please visit content, and how to design and iterate upon effective VR applications. It is not just for VR designers; it is for the entire http://books.acm.org team, as they all should understand the basics of perception, or contact ACM at VR sickness, interaction, content creation, and iterative design. [email protected] Good VR design requires strong communication between human and machine, indicating what interactions are possible, what is currently occurring, and what is about to occur. A human-centered design principle, like lean M methods, is to avoid completely defining the problem at &C Association for Computing Machinery the start and to iterate upon repeated approximations and 2 Penn Plaza, Suite 701 modifications through rapid tests of ideas with real users. New York, NY 10121-0701, USA Phone: +1-212-626-0658 Thus, The VR Book is intended as a foundation for anyone Email: [email protected] and everyone involved in creating VR experiences including: designers, managers, programmers, artists, psychologists, engineers, students, educators, and user M experience professionals. &C Morgan & Claypool Publishers Paperback ISBN: 9781970001129 Hardcover ISBN: 9781970001150 1210 Fifth Avenue, Suite 250 Ebook ISBN: 9781970001136 DOI: 10.1145/2792790 San Rafael, CA 94901, USA Phone: +1-415-462-0004 Email: [email protected] contributed articles

DOI:10.1145/2756546 data to better understand diverse geo- Digital maps can be engineered to adapt spatial phenomena. Over the past de- cade, GISs have further merged with to a person’s unique interests and experience Web technologies and mobile comput- in geographic space. ing, enabling mass adoption of digital maps while overcoming the limita- BY ANDREA BALLATORE AND MICHELA BERTOLOTTO tions of paper maps. As interactive digital maps replace paper maps, this “ubiquitous cartog- raphy” is quietly becoming part of our lives, changing not only the con- sumption but also the production of Personalizing 5,7 geographic information. Just below the surface of this tumultuous recon- figuration, the fundamental problems of cartography have hardly changed. Maps Complex, dynamic, and uncertain geo- graphic data needs to be represented on a screen, selecting what needs to be displayed and how to display it with respect to the user’s informational needs.6 Appropriate cartographic pro- jections, scales, generalization prin- ciples, human-computer interaction, GEOGRAPHIC MAPS CONSTITUTE a ubiquitous medium and semiotic conventions constitute through which we understand, construct, and navigate essential ingredients for the design of usable digital maps. our natural and built surroundings. At the intersection Although maps are often perceived of the explosion of geographic information online, as a form of objective, scientific knowl- data-mining techniques, and the increasing popularity edge about the world, the same area can be represented from many alter- of Web maps, a novel possibility has emerged: Instead native perspectives, including and ex- of generating one map for large numbers of users, user profiling and implicit feedback analysis can support key insights creation of a different map for each person. The ˽˽ Cartography traditionally focuses on producing maps for large groups of automated personalization of the map-making process readers, and digital maps, including Google’s, have barely begun to challenge is still in its infancy but has the potential to provide this approach; collecting explicit and implicit feedback from users, digital more relevant maps to millions of users worldwide. cartography is able to capture a person’s While mapmaking has traditionally aimed to geographic knowledge, experiences, and attitudes, better supporting spatial produce static maps to be printed and distributed to learning and decision making. a target audience, geographic information systems ˽˽ As a research frontier, automated map personalization requires real-time task (GISs) provide interactive tools to collect and process detection, geographic user profiling, trajectory analytics, data fusion, geo- information dynamically, transforming not only visualization, and sentiment analysis, along with insight from cognitive cartography but also geography, urban planning, and psychology and human geography.

any activity that relies on geographic knowledge. ˽˽ Depending on design, personal maps could foster exploration of the Since the 1960s, using GISs, geographers, urban environment beyond the user’s known planners, army generals, and economists have been territory or reinforce segregation, fragmenting the collective knowledge

generating different representations of the same input of the spaces we inhabit. NORTHEAST CHRISTIAN BY ILLUSTRATION

68 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 69 contributed articles

cluding different pieces of uncertain ital maps are still fundamentally un- social, and computational aspects. information, and choosing arbitrary touched by mass customization. Over the past 15 years, academic re- graphical and symbolic conventions; searchers, including us, have investigat- for example, Figure 1 includes alter- Map Personalization So Far ed ideas and techniques through which native representations of University Maps are complex cultural and techni- map personalization can be achieved, College Dublin. For cartographers, it cal objects that assemble multiple data working in two complementary strands. is uncontroversial that radically dif- sources, assumptions about the user, On the one hand, the automated adap- ferent maps are needed to perform cartographic traditions and practices, tation of the map is pursued to increase different tasks. Nautical charts, tour- and design choices. All elements that clarity and efficiency and reduce infor- ist maps, and urban-planning maps form a digital map can in principle be mation overload by removing or high- display different geographic informa- personalized to increase the usabil- lighting features based on user prefer- tion tailored to specific tasks (such as ity, efficiency, and clarity of the map ences and current task. On the other, reaching a port safely, understand- with respect to a task.20 To personalize the area of recommendation has gener- ing the structure of a city, or identify- maps, useful information can be pro- ated techniques to personalize search ing a suitable location to build a new vided by the user through explicit or results and recommendations of hotels, bridge). Less obvious is the fact that implicit feedback.10 restaurants, and other points of interest different maps might be needed by “Explicit feedback” is the conscious based on individual and/or collective different people to perform the same selection of preferences in the inter- preferences. These two strands over- task. Since the 1950s, psychologi- face, and any action that is explicitly lap in that similar techniques can be cal studies have shown every person aimed at expressing a preference about used to adapt the map and personalize perceives and develops an individual any element of the map, as in, say, search results and recommendations. mental model of their environment, changing the language and the default In a pioneering work in 2000, Op- based on direct and mediated subjec- settings of the interface. By contrast, permann and Specht17 developed Hip- tive experiences.11 implicit feedback includes any data pie, a tool that presents museum infor- Likewise, since the late 1990s, the about the user who expresses a pref- mation based on the context of use. It economic value of personalization of erence indirectly; for example, mov- relies on a user model that represents Web-based services has attracted con- ing the mouse cursor and clicking on the user’s knowledge and interests, a siderable attention, resulting in now- a geographic area expresses a form of domain model of the information be- ubiquitous personalized news stories, interest in that region, while hiding a ing displayed, and a space model in commercial offers, film recommenda- layer at the beginning of each session which the interaction occurs. Brunato tions, and search results. In 1995, Nich- indicates lack of relevance. The user’s and Battiti4 devised the Personal Item olas Negroponte, founder of MIT’s Me- location represents another instance of Locator and General Recommendation dia Lab, imagined a newspaper called implicit feedback about what areas of Index Manager (PILGRIM), a recom- Daily Me that would automatically col- the map are of particular interest. Im- mender system that takes the user’s lo- lect and arrange stories relevant to the plicit feedback can in principle be ex- cation into account to rank Web pages. reader, rather than impose the same tracted from any data generated by the Although recommender systems have content on everyone, overcoming the user, including activity on social media, been widely adopted since the mid- paradigm of mass production that instant messaging, purchases from on- 2000s, little work has been done to in- dominated the 20th century.16 Know- line stores, and email messages. crease their spatial awareness.21 ing a customer’s behavior and tastes The most important indicator that Our work focuses on the use of im- through surveillance techniques has must be determined is the task the user plicit feedback to adapt the map con- become commonplace in marketing, is currently performing on the map. tent itself. The core assumption is that in a tight feedback loop between com- Common tasks performed on popular implicit feedback indicators (such as panies and their current or prospective online services (such as Yahoo! Maps mouse movements and navigational consumers, in what has been called by and Google Maps) include information behavior) can be used to infer user in- the oxymoron “mass customization.”19 retrieval, general exploration of a re- terests.13 A recurring cognitive issue, As Web-based digital maps progres- gion of interest, and routing. Different particularly in the context of mobile sively become the main portal through maps suit different tasks, with respect computing, is that of spatial informa- which to view the world and its places, to features, layers, and controls. While tion overload, or display of excessive the idea of applying mass personaliza- in some cases the user’s intentions are amounts of information on the map, tion to maps comes within reach. It is easy to detect (such as typing a place hindering, rather than helping, the now conceivable to develop mapping name is likely to indicate an informa- user. CoMPASS is a GIS application platforms that generate personalized tion retrieval and/or a routing task), that monitors user interaction to rec- maps not only for a specific task but many behaviors do not imply specific ommend groups of features (such as for a specific individual, taking into tasks and present a considerable inter- layers) to users.12,22 The MAPPER sys- account the individual’s experience, pretive challenge. Automatic task de- tem generates maps containing spe- behavior, knowledge, and particular tection can be performed with many in- cific features, taking into account the viewpoint. Surprisingly, while many dicators, including user demographics, user’s preferences and the computa- online products and services have been interaction logs, search history, and the tional context by implicitly monitoring personalized over the past decade, dig- user’s context and its spatial, temporal, the interactions of users when brows-

70 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 contributed articles

Figure 1. Alternative cartographic representations of University College Dublin. Maps a–c are from commercial services, and d–f are based on OpenStreetMap open data using different themes; visualization generated with GeoFabrik tools.

(a) Bing Satellite (b) Google Maps (c) Bing Maps

(d) OpenStreetMap (e) OpenStreetMap (f) OpenStreetMap

ing maps and inferring individual and work of inter-connected datasets, to review the state of the art of map group preferences.23 This approach increase the semantic structure of the personalization in products on the has been evaluated on a variety of map- geographic features.1 Although this consumer market, it is useful to dis- based tasks,24 showing increased ef- body of research initiated the theoreti- tinguish between manual and auto- ficiency in task completion; a similar cal and practical development of map mated personalization, reflecting im- approach was applied to detect the cur- personalization, new concepts, para- plicit and explicit feedback. Manual rent task and adapt the map to it.14 digms, and techniques await further personalization allows users to modi- In the RecoMap prototype,2 we ex- investigation and evaluation. fy aspects of the map, using preferenc- plored the possibility of computing This line of research promises to es, bookmarking, and map editors. By interest scores for geographic fea- generate a plethora of commercial contrast, automated personalization tures based on two complementary applications, greatly enriching cur- relies on implicit feedback to modify aspects—interaction (amount of in- rent Web-mapping platforms. Since the map without user intervention, teraction with a feature) and proximity the mid-2000s, following increased using data mining to model the user’s (physical proximity to the feature) to bandwidth and more sophisticated tastes and intentions; the table here generate personalized recommenda- Web browsers, a growing non-spe- lists personalization capabilities of tions. A memory model simulates the cialist mass market for Web maps popular, global, currently active map- decay of interest over time, assuming if has emerged, first on desktop com- ping services. the user does not interact with an ob- puters, and more recently on GPS- Google Maps is the only service to- ject, the user’s interest in it is declining. enabled smartphones.a In order to day that provides some automated Moreover, we investigated the possibil- personalization, tailoring the search ity of integrating crowdsourced spatial results and ads based on the user’s a http://www.comscore.com/Insights/Blog/ data into the personalization analysis Map_Searches_Shift_from_Desktops_to_ search history and ratings, claiming and utilizing Linked Open Data, a net- Smartphones to generate “a map for every person

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 71 contributed articles

Personalization in Web maps, October 2015.

Product Manual personalization Automated personalization

Vector base map. In “My Places,” users can set their home and work Search results, recommendations, address. MapsEngine cloud tool can be used to create new maps by adding and advertisements are based on Google Maps layers on a set of base maps. previous ratings and searches. Complex new maps can be created, combining base maps with ArcGIS Online (ESRI) user-defined layers. Advanced user-defined analytics available. None.

Apple Maps Vector base map. Users can bookmark locations in iCloud. None. OSM open data can be used to generate customized maps with OpenStreetMap (OSM Foundation) a variety of dedicated tools (such as MapBox). None.

HERE (Nokia) Users can create “collections” of locations. None. Users can save favorite locations and vehicles to improve routing. MapQuest (AOL) With “My Maps,” they can save collections of locations and routes. None.

Yahoo! Maps Based on Nokia’s HERE. None.

ViaMichelin In “My Michelin,” users can bookmark locations, restaurants, and itineraries. None.

OS Map Finder (U.K. Ordnance Survey) Users can draw paths. None.

Bing Maps (Microsoft) Users can bookmark locations in “My Places.” None.

and place.”b Other popular Web map- and volume of data provides a large- es. By monitoring how users perform ping products (such as ArcGIS Online ly unexplored yet fertile basis from their tasks, the engine should be able and Yahoo! Maps) offer some manual which to rethink maps. As we have to mine and extract meaningful pat- personalization, typically in the form shown, the field of map personal- terns, inferring effective user models. of bookmarks or editors to create and ization is still at an early stage, with Based on these models, the current share new maps with user-provided little research conducted or applied task can be detected and trigger per- data and visual styles. None of these to commercial products. To further sonalization in its two dimensions— products attempts to perform auto- map personalization, several com- adaptation and recommendation—at mated personalization. ponents must be integrated into a appropriate moments, unobtrusively coherent conceptual framework (see supporting the user. Computational Challenges Figure 2). A personalization engine Many possibilities lie ahead to de- The considerable increase in variety must be able to perform multivariate liver more relevant, effective, and use- feedback analysis on the many chan- ful maps to Internet users. Given the b http://google-latlong.blogspot.it/2013/05/ nels through which users express same geographic data, such a system meet-new-google-maps-map-for-every.html their spatial interests and preferenc- will tailor different maps for, say, a Japanese tourist in San Francisco and Figure 2. A framework for map personalization. for an Italian ex-pat who lives and works in San Francisco. The tourist Users Interaction Streams Maps explicitly indicated an interest in ar- Needs Contexts chitecture and a dislike for fast food. Features/Layers Language - Devices Her interaction shows implicitly an Projections/Scales Culture - Situations interest in historical areas, reflected Tasks Generalization Demographics - Free Exploration Landmarks in the map by increasing the promi- Social Network - Place Search Symbols nence of history-themed museums. Mental Maps - Wayfinding/Routing Toolbars The map also captures that she was in Knowledge - Spatial Analysis the city before, displaying previously Perform Adaptation/ visited points of interest, facilitat- Reflect and Update Record Feedback and Recommendation ing spatial comprehension and way- Detect Current Task Based On User Models finding, while at the same time em- Activity Logs phasizing unknown areas of the city Past Activity Data Sources Bookmarks Analyze Feedback, that feature notable buildings. Fast Feedback Analysis Ratings Infer and Update food restaurants remain hidden, un- Social Network Analysis Interests User Models less she searches for them explicitly or Sentiment Analysis Preferences Data Mining when they might provide useful navi- gation landmarks. Before meal times, User Models Personalization Engine the map emphasizes restaurants that were recommended by her friends

72 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 contributed articles who visited the city, while at night it Geographically weighted person- cal viewpoint, interactive maps have increases the visibility of movie the- alization. As maps assist users in the moved from a tile-based approach, aters, taking into account her interest exploration and navigation of the geo- in which maps are served through in cinema. graphic space, personalization should cached pre-rendered images, to a By contrast, the Italian ex-pat be able to tap the spatial variation in more flexible vector-based approach, specified an interest in music. As he human activities. Development of geo- in which the rendering occurs in real navigates the city, the map tends to graphically weighted techniques relies time in the client, providing the ideal hide familiar tourist attractions while on the assumption that individual in- platform for experimenting with alter- highlighting content neighboring his formation needs and content relevance native rendering choices and styles. home-to-work commutes. Based on change both spatially and temporally. Knowledge from spatial cognition his interaction history, the system also Such spatialization of users and con- could also be useful for producing captures an interest in the Mission tent and their relationships increases better personalized maps and gain District, which becomes more promi- the computational complexity of tradi- further insight into the human per- nent and detailed on the map. As he tional personalization models. In turn, ception and understanding of the geo- uses a car to drive around the city, the spatiotemporal user models would en- graphic environment.8 public transport infrastructure fades able finer and more sensitive personal- To achieve significant advances into the background of the map. How- ization of contents. in these areas, academic researchers ever, if his movement patterns match Geosemantic interoperability and and commercial developers must tap an efficient bus route, the system dis- data fusion. Map personalization re- the informational wealth produced by cretely suggests an alternative trans- quires aggregation and fusion of a millions of users worldwide in their portation option. Occasionally, the range of heterogeneous sources of daily interactions mediated by online map emphasizes an unfamiliar neigh- geographic information characterized platforms, in which space and place borhood that presents a high density by intrinsic uncertainty, vagueness, are deeply intertwined with social, cul- of music venues, inviting him to go be- and rapid obsolescence, ranging from tural, and economic processes. Due to yond the borders of his daily routine. traditional government agencies to the intrinsic complexity of these pro- To make this vision real, several com- crowdsourced, volunteered datasets. cesses, map personalization needs putational challenges lie ahead. In this sense, research in the context help from thriving research areas Real-time task detection and pre- of the semantic Web and linked open in the context of big-data analytics. diction. As maps are used in a variety data provides suitable representation- From a complementary perspective, of situations for different tasks, the al tools to organize, store, explore, and NLP and sentiment analysis can be system must be able to detect and pre- retrieve spatiotemporal objects and used to mine user-generated opinions dict them effectively. For this purpose, data streams.9 Reliable mechanisms about places to increase or decrease specific machine-learning techniques are needed to reference entities and the emphasis of specific features. Ad- must be developed and optimized. In perform identity resolution, reducing vanced data-mining techniques are the collection of implicit feedback, the friction caused by interoperability needed to extract meaning from noisy relevant features include the user’s issues.3 interaction logs. Beyond these com- spatial and temporal context, as well Geoparsing and sentiment analy- putational steps, the challenges of as search and interaction streams. sis. As a vast amount of geographic map personalization are intrinsically Spatial user modeling. Because of knowledge is expressed in natural lan- multidisciplinary, harnessing ideas maps’ spatial nature, personalization guage, natural language processing and tools from geographic informa- requires deep understanding of user (NLP) is crucial for map personaliza- tion science, cartography, cognitive behavior in space and time. Hence, tion, extracting value from unstruc- psychology, human-computer interac- the aggregation and interpretation of tured data. Geoparsing, or extraction tion, and software design. large numbers of noisy spatiotempo- of geographic information from natu- ral trajectories containing GPS fixes, ral language, is an open problem in Consequences clicks, and search logs are essential NLP, intimately connected to word- From a societal viewpoint, automated for developing models of user behav- sense disambiguation. Detection of map personalization at a mass scale ior able to capture and predict recur- affect, sentiment, and emotion in text could have serious implications that ring patterns and anomalies (such is an emerging yet important aspect should be responsibly taken into ac- as sightseeing, as opposed to daily of interpreting user behavior, improv- count, particularly by commercial commuting). Recording, storing, and ing extraction and modeling of users’ developers whose products reach mining a large volume of spatiotem- opinions about places. millions of users. Beyond the obvi- poral trajectories constitute an open Cognitive map design. Concepts ous concern for privacy, fostered by research challenge.25 Trajectories can and principles from cognitive map any surveillance-based technology, traverse the geographic space, as well design15 can be applied to generate specific problems include the poten- as other spaces, including mouse tra- and validate alternative cartographic tial loss of a common representation jectories in a user interface. Spatial representations, providing an excit- of geographic realities. Personalized social network analysis can also illu- ing opportunity to test cognitive the- maps might result in what Internet ac- minate the deep structures that influ- ories against real scenarios on large tivist Eli Pariser calls a “filter bubble,” ence interaction with maps. numbers of users. From a technologi- increasing social and cultural seg-

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 73 contributed articles

regation between groups of users.18 on commercial applications; for ex- on Human-Centered Informatics 4, 1 (2011), 1–67. 9. Janowicz, K., Scheider, S., Pehle, T., and Hart, G. Likewise, personalized landmarks ample, Google has been exploring Geospatial semantics and linked spatiotemporal data: can be useful for increasing the clar- location-based advertising, trying to Past, present, and future. Semantic Web 3 (2012), 321–332. ity of maps but might also reduce the maximize the relevance and profit- 10. Kelly, D. and Teevan, J. Implicit feedback for inferring common semantic ground shared ability of ads that have a strong spatial user preference: A bibliography. ACM SIGIR Forum 37, 2 (Fall 2003), 18–28. by the inhabitants of a geographical component. Similarly, most research 11. Lynch, K. The Image of the City. MIT Press, Cambridge, MA, 1960. area. In this regard, Google Maps, cur- focuses on efficiency, reducing in- 12. Mac Aoidh, E. and Bertolotto, M. Improving spatial rently the only commercial product to formation overload, increasing clar- data usability by capturing user interactions. In The European Information Society: Leading the Way with include some form of map personal- ity, and helping users complete tasks Geo-information, S.I. Fabrikant and M. Wachowicz, ization, presents several unresolved more quickly or with lower cognitive Eds. Springer, Berlin, Germany, 2007, 389–403. 13. Mac Aoidh, E., Bertolotto, M., and Wilson, D.C. Analysis questions. The most conspicuous is load, as in, say, decision making, of implicit interest indicators for spatial data. In the product’s lack of transparency, information retrieval, and routing. Proceedings of the 15th Annual ACM International Symposium on Advances in Geographic Information making it difficult for users to under- However, exciting possibilities also Systems (Seattle, WA, Nov. 7–9). ACM Press, New stand why certain features are recom- exist beyond increased efficiency. Per- York, 2007, 1–4. 14. Mac Aoidh, E., McArdle, G., Petit, M., Ray, C., mended over others. The user models sonalized maps need not reinforce us- Bertolotto, M., Claramunt, C., and Wilson, D. generated by Google are black boxes ers’ biases and limited perspectives Personalization in adaptive and interactive GIS. Annals of GIS 15, 1 (2009), 23–33. not accessible to the users they are but can be designed to operate in the 15. Montello, D. Cognitive map-design research in supposed to represent, and, more im- opposite way, attracting attention the twentieth century: Theoretical and empirical approaches. Cartography and Geographic Information portant, there is no visible “off” but- to the unknown and unfamiliar and Science 29, 3 (2002), 283–304. ton to disable the personalization; promoting diversity, serendipity, and 16. Negroponte, N. Being Digital. Random House, New York, 1995. even when logged out, the search re- discovery. In education, self-adaptive 17. Oppermann, R. and Specht, M. A context-sensitive nomadic exhibition guide. In Handheld and Ubiquitous sults are still personalized in unclear maps could support students, tailor- Computing, P. Thomas and H.-W. Gellersen, Eds. ways based on cookies and the IP loca- ing maps to different learning styles LNCS, Vol. 1927. Springer, Berlin, Germany, 2000, 127–142. tion of the user’s machine. and backgrounds. It is reasonable 18. Pariser, E. The Filter Bubble: How the New A serious challenge for academic to expect map personalization could Personalized Web Is Changing What We Read and How We Think. Penguin, London, U.K., 2012. research in map personalization is trigger a quiet but deep reconfigura- 19. Piller, F.T. and Tseng, M.M., Eds. Handbook of Research the lack of realistic interaction data- tion of familiar maps, leading to unex- in Mass Customization and Personalization, Volume 1. World Scientific, Singapore, 2010. sets for evaluating novel systems and pected changes in the way we perceive 20. Reichenbacher, T. Adaptive concepts for a mobile approaches. As private corporations and imagine the world around us. cartography. Journal of Geographical Sciences 11, 1 (2001), 43–53. are understandably reticent to share 21. Stiller, C., Ros, F., and Ament, C. Towards spatial their map-interaction logs, the stud- Acknowledgments awareness in recommender systems. In Proceedings of the International Conference for Internet ies discussed here have limited evalua- The research presented here was Technology and Secured Transactions (London, U.K., tions, failing to reflect the complexity, funded by a Strategic Research Cluster Nov. 9–12). IEEE, Piscataway, NJ, 2009, 1–7. 22. Weakliam, J., Bertolotto, M., and Wilson, D. Implicit noise, and variety of situations in real grant (07/SRC/I1168) by Science Foun- interaction profiling for recommending spatial content. mapping applications, limiting their dation Ireland under the National De- In Proceedings of the 13th Annual ACM International Workshop on Geographic Information Systems observation to small, artificial, and velopment Plan. (Bremen, Germany, Nov. 4–5). ACM Press, New York, 2005, 285–294. controlled contexts. A few large corpo- 23. Weakliam, J., Wilson, D., and Bertolotto, M. rations (such as Google and Microsoft) References Personalising map feature content for mobile map 1. Ballatore, A. and Bertolotto, M. 2011. Semantically users. In Map-based Mobile Services: Theories, attract the vast majority of online map enriching VGI in support of implicit feedback analysis. Methods and Implementations, L. Meng, A. Zipf, and T. users and are thus in a privileged posi- Web and Wireless Geographical Information Systems, Reichenbacher, Eds. Springer, Berlin, Germany, 2008, K. Tanaka, P. Fröhlich, and K.-S. Kim, Eds. LNCS, 125–145. tion to unobtrusively devise and evalu- Vol. 6574 (Kyoto, Japan, Mar. 3–4). Springer, Berlin, 24. Wilson, D., Bertolotto, M., and Weakliam, J. ate proprietary techniques on large Germany, 78–93. Personalizing map content to improve task completion 2. Ballatore, A., McArdle, G., Kelly, C., and Bertolotto, efficiency.International Journal of Geographical groups of users on a variety of tasks, M. RecoMap: An interactive and adaptive map- Information Science 24, 5 (2010), 741–760. th interpreting their behavior as implicit based recommender. In Proceedings of the 25 25. Zheng, Y. and Zhou, X., Eds. Computing with spatial ACM Symposium on Applied Computing (Sierre, trajectories. Database Management & Information feedback. For this reason, academic Switzerland, Mar. 22–26). ACM Press, New York, 2010, Retrieval. Springer, Berlin, 2011. 887–891. research must either focus on well- 3. Ballatore, A., Wilson, D., and Bertolotto, M. A survey defined cognitive, computational, and of volunteered open geo-knowledge bases in the Andrea Ballatore ([email protected]) is a semantic Web. In Quality Issues in the Management postdoctoral researcher and research coordinator in the cartographic aspects of map personal- of Web Information, G. Pasi, G. Bordogna, and K. Jain, Center for Spatial Studies at the University of California, ization that can be convincingly evalu- Eds. Intelligent Systems Reference Library, vol. 50. Santa Barbara. Springer, Berlin, Germany, 2013, 93–120. ated or work in close partnership with 4. Brunato, M. and Battiti, R. PILGRIM: A location broker Michela Bertolotto ([email protected]) is a map providers. No progress in map and mobility-aware recommendation system. In senior lecturer in the School of Computer Science and Proceedings of the IEEE International Conference Informatics at the University College Dublin, Ireland. personalization can be assessed in the on Pervasive Computing and Communications (Fort absence of rigorous measures to quan- Worth, TX, Mar. 23–26). IEEE, Piscataway, NJ, 2003, © 2015 ACM 00010782/15/12 $15.00 265–272. tify the effectiveness of techniques, al- 5. Gartner, G., Bennett, D., and Morita, T. Towards ubiquitous cartography. Cartography and Geographic gorithms, and models. Information Science 34, 4 (2007), 247–257. 6. Haklay, M., Ed. Interacting with Geospatial Technologies. John Wiley & Sons, Chichester, U.K., 2010. Conclusion 7. Haklay, M., Singleton, A., and Parker, C. Web mapping Watch the authors discuss Development of personalized maps 2.0: The neogeography of the GeoWeb. Geography their work in this exclusive Compass 2, 6 (Nov. 2008), 2011–2039. Communications video. has important applications in many 8. Hirtle, S.C. Geographical design: Spatial cognition and http://cacm.acm.org/ domains. To date, the focus has been geographical information science. Synthesis Lectures videos/personalizing-maps

74 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 DOI:10.1145/2699407 Connecting mathematical logic and computation, it ensures that some aspects of programming are absolute.

BY PHILIP WADLER Propositions as Types

POWERFUL INSIGHTS ARISE from linking two fields of study previously thought separate. Examples include Descartes’s coordinates, which links geometry to algebra, Planck’s Quantum Theory, which links particles to waves, and Shannon’s Information Theory, which links thermo­dynamics to com- Propositions as Types is a notion munication. Such a synthesis is of- with many names and many origins. fered by the principle of Propositions It is closely related to the BHK Inter­ as Types, which links logic to compu- pretation, a view of logic developed by tation. At first sight it appears to be a the intuitionists Brouwer, Heyting, and simple coincidence—almost a pun— Kolmogorov in the 1930s. It is often re- but it turns out to be remarkably ro- ferred to as the Curry–Howard Isomor- bust, inspiring the design of automat- phism, referring to a correspondence ed proof assistants and programming observed by Curry in 1934 and refined languages, and continuing to influ- by Howard in 1969. Others draw atten- ence the forefronts of computing. tion to significant contributions from de Bruijn’s Automath and Martin-Löf’s key insights Type Theory in the 1970s. Propositions as Types is a notion with ˽˽ Propositions as Types observes a deep depth. It describes a correspondence correspondence between logic and computation: propositions in a logic between a given logic and a given pro­ correspond to types in a programming gramming language. At the surface, it says language; proofs of propositions correspond that for each proposition in the logic there to programs of the corresponding type; is a corresponding type in the program- and simplification of proofs corresponds to evaluation of programs. ming language—and vice versa. Thus we have ˽˽ Propositions as Types is broadly applicable, applying to a wide variety of logics (intuitionistic, second-order, classical, propositions as types. linear) and of language features (lambda calculus parametric polymorphism, It goes deeper, in that for each proof of continuations, concurrency). a given proposition, there is a program ˽˽ Often the same ideas are discovered of the corresponding type—and vice independently by logicians and computer scientists, demonstrating some aspects versa. Thus we also have of programming language design are not arbitrary but absolute. proofs as programs.

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 75 contributed articles

And it goes deeper still, in that for each Church and the Theory “effectively calculable,” and then three way to simplify a proof there is a corre- of Computation­ come along at once. The three were sponding way to evaluate a program— The origins of logic lie with Aristotle lambda calculus, published in 1936 and vice versa. Thus we further have and the stoics in classical Greece, by Church,7 recursive functions, pro- Ockham and the scholastics in the posed by Gödel at lectures in Princeton simplification of proofs as middle ages, and Leibniz’s vision of in 1934 and published in 1936 by evaluation of programs. a calculus ratiocinator at the dawn Kleene,24 and Turing machines, pub- of the enlightenment. Our interest lished in 1937 by Turing.36 Hence, we have not merely a shallow in the subject lies with formal logic, Lambda calculus was introduced bijection between propositions and which emerged from the contributions by Church at Princeton, and further types but a true isomorphism, pre- of Boole, De Morgan, Frege, Peirce, developed by his students Rosser serving the deep structure of proofs Peano, and others in the 19th century. and Kleene. At that time, Princeton and programs, simplifications, and As the 20th century dawned, White­ rivaled Göttingen as a center for evaluation. head and Russell’s Principia Mathematica the study of logic. The Institute for Propositions as Types is a notion with demonstrated formal logic could express Advanced Study was co-located with breadth. It applies to a range of logics, a large part of mathematics. Inspired the Mathematics Department in including propositional, predicate, second-­ by this vision, Hilbert and his col- Fine Hall. In 1933, Einstein and von order, intuitionistic, classical, modal, leagues at Göttingen became the lead- Neumann joined the Institute, and and linear. It underpins the foundations of ing proponents of formal logic, aiming Gödel arrived for a visit. functional ­programming, explaining fea- to put it on a firm foundation. Logicians have long been con- tures including functions, records, vari- One goal of Hilbert’s Program was cerned with the idea of func- ants, parametric polymor­phism, data to solve the Entscheidungsproblem tion. Lambda calculus provides a abstraction, continuations, monads, lin- (decision problem), that is, to develop concise notation for functions, includ- ear types, and session types. It has inspired an “effectively calculable” proce- ing “first-class” functions that may automated proof assistants and pro- dure to determine the truth or falsity appear as arguments or results of other gramming languages, including Agda, of any statement. The problem pre- functions. It is remarkably compact, Automath, Coq, Epigram, F#, F*, Haskell, supposes completeness—that for containing only three constructs: LF, ML, NuPRL, Scala, Singu­larity, and any statement, either it or its nega- variables, function abstraction, and Trellys. tion possesses a proof. In his address function application. Church6 at first Propositions as Types is a notion to the 1930 Mathematical Congress introduced lambda calculus as a way with mystery. Why should it be the case in Königsberg, Hilbert affirmed his to define notations for logical formu- that intuitionistic natural deduction, belief in this principle, concluding las (almost like a macro language) in a as developed by Gentzen in the 1930s, “Wir müssen wissen, wir werden wissen” new presentation of logic. All forms of and simply typed lambda calculus, as (“We must know, we will know”), bound variable could be subsumed to developed by Church around the words later engraved on his tomb- lambda binding; for instance, instead same time for an unrelated purpose, stone. Perhaps a tombstone is an of $x. A[x], Church wrote S(lx. A[x]). should be discovered 30 years later appropriate place for these words, given However, it was later discovered by to be essentially identical? And why that any basis for Hilbert’s optimism Kleene and Rosser that Church’s sys- should it be the case that the same cor- had been undermined the day before, tem was inconsistent. By this time, respondence arises again and again? when at the selfsame conference Church and his students had realized The logician Hindley and the com- Gödel18 announced his proof that the system was of independent inter- puter scientist Milner independently arithmetic is incomplete. est. Church had foreseen this possi- developed the same type system, now While the goal was to satisfy bility in his first paper on the subject, dubbed Hindley–Milner. The logician Hilbert’s program, no precise defini- where he wrote, “There may, indeed, Girard and the computer scientist tion of “effectively calculable” was be other applications of the system Reynolds independently developed required. It would be clear whether a than its use as a logic.” the same calculus, now dubbed given procedure was effective or not, Church discovered a way of encod- Girard–Reynolds. Curry–Howard is a like Justice Stewart’s characterization ing numbers as terms of lambda cal- double-barreled name that ensures of obscenity, “I know it when I see it.” culus. The number n is represented the existence of other double-bar- But to show the Entscheidungsproblem by a function that accepts a function reled names. Those of us who design undecidable required a formal definition f and a value x, and applies the func- and use programming languages of “effectively calculable.” tion to the value n times; for instance, may often feel they are arbitrary, but One can find allusions to the con- the number three is lf. lx. f ( f ( f (x)). Propositions as Types assures us some cept of algorithm in the work of Euclid With this representation, it is easy aspects of programming are abso- and, eponymously, al-Khwarizmi, but to encode lambda terms that can add lute. (See the online appendix, which the concept was formalized only in or multiply, but it was not clear how contains a full version of this article, the 20th century, and then simulta- to encode the predecessor function, along with additional details and ref- neously received three independent which finds the number one less than erences, plus a historic note provided definitions by logicians. Like buses, a given number. One day in the den- by William Howard.) you wait 2,000 years for a definition of tist’s office, Kleene suddenly saw how

76 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 contributed articles to define predecessor.23 Once this he completed a doctorate under hurdle was overcome, Church and Church’s supervision. his students soon became convinced Turing’s most significant differ- any “effectively calculable” function ence from Church was not in logic of numbers could be represented by a or mathematics but in philosophy. term in the lambda calculus. Whereas Church Whereas Church merely presented the Church proposed l-definabil- merely presented definition ofl -definability and baldly ity as the definition of “effectively claimed it corresponded to effective calculable,” what we now know the definition of calculability, Turing undertook an as Church’s Thesis, and demon- -definability and analysis of the capabilities of a “com- strated there was a problem whose l puter” (at this time, the term referred solution was not l-definable, that baldly claimed to a human performing a computa- of determining whether a given l-term tion assisted by paper and pencil). has a normal form, what we now it corresponded Turing argued that the number of know as the Halting Problem. A year to effective symbols must be finite (for if infinite, later, he demonstrated there was no some symbols would be arbitrarily l-definable solution to thecalculability, close to each other and undistinguish- Entscheidungsproblem.­ Turing undertook able), that the number of states of In 1933, Gödel arrived for a visit mind must be finite (for the same rea- at Princeton. He was unconvinced an analysis son), and that the number of symbols by Church’s contention that every of the capabilities under consideration at one moment effectively calculable function was must be bounded (“We cannot tell at l-definable. Church responded by of a “computer.” a glance whether 9999999999999999 offering that if Gödel would pro- and 999999999999999 are the same”). pose a different definition, then Later, Gandy14 would point out that Church would “undertake to prove Turing’s argument amounts to a the- it was included in l-definability.” In a orem asserting any computation a series of lectures at Princeton in 1934, human with paper and pencil can per- based on a suggestion of Herbrand, form can also be performed by a Turing Gödel proposed what came to be known machine. It was Turing’s argument as “general recursive functions” as his that finally convinced Gödel; since candidate for effective calculability. l-definability, recursive functions, and Kleene took notes and published Turing machines had been proved the definition.24 Church and his stu- equivalent, he now accepted that all dents soon determined that the two three defined “effectively calculable.” definitions are equivalent; every gen- As mentioned, Church’s first use of eral recursive function is l-definable, lambda calculus was to encode formu- and vice versa. Rather than mollifying las of logic, but this encoding had to Gödel, this result caused him to doubt be abandoned because it led to incon- his own definition was correct! Things sistency. The failure arose for a reason stood at an impasse. related to Russell’s paradox, namely Meanwhile, at Cambridge, Turing, that the system allowed a predicate to a student of Max Newman, inde- act on itself, and so Church adapted pendently formulated his own a solution similar to Russell’s, that of notion of “effectively calculable” classifying terms according to types. in the form of what we now call a Church’s simply typed lambda cal- Turing machine, and used it to show culus ruled out self-application, per- the Entscheidungsproblem undecid- mitting lambda calculus to support a able. Before the paper was published, consistent logical formulation.8 Newman was dismayed to discover Whereas self-application in Turing had been scooped by Church. Russell’s logic leads to paradox, self- However, Turing’s approach was suf- application in Church’s untyped ficiently different from Church’s to lambda calculus leads to non-termi- merit independent publication. Turing nating computations. Conversely, hastily added an appendix sketching Church’s simply typed lambda calcu- the equivalence of l-definability to his lus guarantees every term has a normal machines, and his paper36 appeared form, or corresponds to a computation in print a year after Church’s, when that halts. Turing was 23. Newman arranged for Untyped lambda calculus or typed Turing to travel to Princeton, where lambda calculus with a construct for

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 77 contributed articles

general recursion (sometimes called that any proof could be normalized observation, Howard pointed out there a fixpoint operator) permits the defi- to one that is not “roundabout,” where is a similar correspondence between nition of any effectively computable “no concepts enter into the proof natural deduction, on the one hand, function but has a Halting Problem other than those contained in the final and simply typed lambda calculus, on that is unsolvable. Typed lambda cal- result.” For example, in a normalized the other, and he made explicit the culus without a construct for general proof of the formula A & B, the only third and deepest level of the corre- recursion has a Halting Problem that formulas that may appear are itself spondence, as described in the intro- is trivial—every program halts!—but and its subformulas, A and B, and the duction, that simplification of proofs cannot define some effectively com- subformulas of A and B themselves. corresponds to evaluation of programs. putable functions. Both kinds of cal- No other formula (such as (B & A) ⊃ (A Howard showed the correspondence culus have their uses, depending on & B) or A ∨ B) may appear; this is called extends to the other logical connec- the intended application. the Subformula Principle. An immedi- tives—conjunction and disjunction— ate consequence was consistency. It is by extending his lambda calculus Gentzen and the Theory of Proof a contradiction to prove false, written with constructs that represent pairs A second goal of Hilbert’s program . The only way to derive a contradic- and disjoint sums. Just as proof rules was to establish the consistency of tion is to prove, say, both A ⊃  and A come in introduction and elimination various logics. If a logic is inconsis- for some formula A. But given such a pairs, so do typing rules; introduction tent, it can derive any formula, ren- proof, one could normalize it to one rules correspond to ways to define or dering it useless. containing only subformulas of its construct a value of the given type, and In 1935, at the age of 25, Gentzen15 conclusion, . But  has no subfor- elimination rules correspond to ways introduced not one but two new for- mulas! It is like the old saw, “What to use or deconstruct values of the mulations of logic—natural deduction part of no don’t you understand?” given type. and sequent calculus—that became Logicians became interested in nor- We can describe Howard’s observa- established as the two major systems malization of proofs because of its tion as follows: for formulating a logic and remain role in establishing consistency. so to this day. He showed how to nor- Gentzen preferred the system of • Conjunction. Conjunction A & B malize proofs to ensure they were not Natural Deduction because it was, corresponds to Cartesian product “roundabout,” yielding a new proof of in his view, more natural. He intro- A ´ B, or a record with two fields, the consistency of Hilbert’s system. duced Sequent Calculus mainly as also known as a pair. A proof of And, to top it off, to match the use of a technical device for proving the the proposition A & B consists of a the symbol $ for the existential quanti- Subformula Principle, though it proof of A and a proof of B. fication introduced by Peano, Gentzen has independent interest. It is an Similarly, a value of type A ´ B con- introduced the symbol " to denote uni- irony that Gentzen was required sists of a value of type A and a versal quantification. He wrote implica- to introduce Sequent Calculus in value of type B. tion as A ⊃ B (if A holds, then B holds), order to prove the Subformula • Disjunction. Disjunction A ∨ B cor- conjunction as A & B (both A and B Principle for Natural Deduction. He responds to a disjoint sum A + B, hold), and disjunction as A ∨ B (at least needed a roundabout proof to show or a variant with two alternatives. one of A or B holds). the absence of roundabout proofs! A proof of the proposition A ∨ B Gentzen’s insight was that proof Later, in 1965, Prawitz showed how consists of either a proof of A or a rules should come in pairs, a feature to prove the Sub­formula Principle proof of B, including an indica- not present in earlier systems (such as directly, by introducing a way to sim- tion of which of the two has been Hilbert’s). In natural deduction, these plify Natural Deduction proofs; and proved. Similarly, a value of type are introduction and elimination pairs. this set the ground for Howard’s work A + B consists of either a value of An introduction rule specifies under described in the next section. type A or a value of type B, includ- what circumstances one may assert ing an indication of whether this a formula with a logical connective Propositions as Types is a left or right summand. (for instance, to prove A ⊃ B, one may In 1934, Curry observed a curious • Implication. Implication A ⊃ B cor- assume A and then must prove B), while fact, relating a theory of functions to responds to function space A → B. the corresponding elimination rule a theory of implication.11 Every type A proof of the proposition A ⊃ B shows how to use that logical connec- of a function (A → B) could be read as consists of a procedure that given tive (for instance, from a proof of A ⊃ B a proposition (A ⊃ B), and under this a proof of A yields a proof of B. and a proof of A, one may deduce B, a reading the type of any given function Similarly, a value of type A → B property dubbed modus ponens in the would always correspond to a provable consists of a function that when middle ages). As Gentzen noted, “The proposition. Conversely, for every prov- applied to a value of type A returns introductions represent, as it were, able proposition there was a function a value of type B. the “definitions” of the symbols con- with the corresponding type. cerned, and the eliminations are no In 1969, Howard circulated a xeroxed This reading of proofs goes back to more, in the final analysis, than the manuscript;22 it was not published until the intuitionists and is often called consequences of these definitions.” 1980, where it appeared in a Festschrift the BHK interpretation, named for A consequence of this insight was dedicate to Curry. Motivated by Curry’s Brouwer, Heyting, and Kolmogorov.

78 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 contributed articles

Brouwer founded intuitionism, and was married to the heir of the King of the second-order lambda calculus was Heyting and Kolmogorov formalized Batavia, but that due to a mix-up no one discovered twice, once by the logician intuitionistic logic and developed the knows which of two individuals, Marco Girard16 and once by the computer sci- interpretation in the 1920s and 1930s. or Giuseppe, is the heir. Alarmed, she entist Reynolds.33 And for the same Realizability, introduced by Kleene wails, “Then do you mean to say that I reason, a similar system that supports in the 1940s, is based on a similar am married to one of two gondoliers, principle type inference was also dis- interpretation. but it is impossible to say which?” To covered twice, once by the logician Given the intuitionistic reading of which the response is “Without any Hindley20 and once by the computer proofs, it hardly seems surprising that doubt of any kind whatever.” scientist Milner.27 Building on the cor- intuitionistic natural deduction and Logic comes in many varieties, and respondence, Mitchell and Plotkin28 lambda calculus should correspond one distinction is between “classical” observed existential quantification in so closely. But it was not until Howard and “intuitionistic.” Intuitionists, second-order logic corresponds pre- that the correspondence was laid out concerned by cavalier assumptions cisely to data abstraction, an idea that clearly, in a way that allowed working made by some logicians about the now underpins much research in the logicians and computer scientists to nature of infinity, insist upon a con- semantics of programming languages. put it to use. structionist notion of truth. In par- The design of generic types in Java Howard’s paper22 divides into two ticular, they insist that a proof of A ∨ and C# draws directly upon Girard– halves. The first half explains a corre- B must show which of A or B holds, Reynolds, while the type systems of spondence between two well-understood and hence they would reject the claim functional languages, including ML and concepts, the propositional connec- that Casilda is married to Marco or Haskell, are based on Hindley–Milner. tives &, ∨, ⊃ on the one hand and the Giuseppe until one of the two was Philosophers might argue as to whether computational types ´, +, → on the identified as her husband. Perhaps mathematical systems are “discovered” other hand. The second half extends Gilbert and Sullivan anticipated intu- or “devised,” but the same system aris- this analogy, and for well-understood itionism, for their story’s outcome ing in two different contexts argues that concepts from logic proposes new is that the heir turns out to be a third here the correct word is “discovered.” concepts for types that correspond to individual, Luiz, with whom Casilda is, Two major variants of logic are intu- them. In particular, Howard proposes conveniently, already in love. itionistic and classical. Howard’s origi- that the predicate quantifiers " and $ Intuitionists also reject the law of nal paper observed a correspondence corresponds to new types we now call the excluded middle, which asserts with intuitionistic logic. Not until “dependent types.” A ∨ ¬A for every A, since the law gives two decades later was the correspon- With the introduction of depen- no clue as to which of A or ¬A holds. dence extended to also apply to clas- dent types, every proof in predicate Heyting formalized a variant of sical logic, when Griffin19 observed logic can be represented by a term Hilbert’s classical logic that captures that Peirce’s Law in classical logic of a suitable typed lambda calculus. the intuitionistic notion of provability. provides a type for the call/cc oper- Mathematicians and computer scien- In particular, the law of the excluded ator of Scheme. Murthy31 went on to tists proposed numerous­ systems based middle is provable in Hilbert’s logic, note that Kolmogorov and Gödel’s on this concept, including de Bruijn’s but not in Heyting’s. Further, if the double-negation translation, widely Automath,13 Martin-Löf’s­ type theory,26 law of the excluded middle is added used to relate intuitionistic and clas- Bates and Constable’s PRL and nuPRL,2 as an axiom to Heyting’s logic, then it sical logic, corresponds to the contin- and Coquand and Huet’s Calculus of becomes equivalent to Hilbert’s. uation-passing style transformation Constructions,9 which developed into Propositions as Types was first for- widely used by both semanticists and the Coq proof assistant. mulated for intuitionistic logic. It is implementers of lambda calculus. Applications include CompCert, a perfect fit, because in the intuition- Parigot,32 Curien and Herbelin,10 and a certified compiler for the C program- ist interpretation the formula A ∨ B Wadler39 introduced various computa- ming language verified in Coq; a com- is provable exactly when one exhibits tional calculi motivated by correspon- puter-checked proof of the four-color either a proof of A or a proof of B, so the dences to classical logic. theorem also verified in Coq; parts type corresponding to disjunction is a Modal logic permits propositions to of the Ensemble distributed system disjoint sum. be labeled as “necessarily true” or “pos- verified in NuPRL; and 20,000 lines of sibly true.” Clarence Lewis introduced browser plug-ins verified in F*. Other Logics, Other Computation modal logic in 1910, and his 1938 text- de Bruijn’s work was independent of The principle of Propositions as Types book25 describes five variants, S1–S5. Howard’s, but Howard directly inspired would be remarkable even if it applied Some claim each of these variants has Martin-Löf and all the other work listed only to one variant of logic and one vari- an interpretation as a form of compu- earlier. Howard was (justly!) proud of his ant of computation. How much more tation via Propositions as Types, and a paper, citing it as one of the two great remarkable, then, that it applies to a wide down payment on this claim is given by achievements of his career.34 variety of logics and of computation. an interpretation of S4 as staged com- Quantification over propositional putation due to Davies and Pfenning,12 Intuitionistic Logic variables in second-order logic corre- and of S5 as spatially distributed com- In Gilbert and Sullivan’s The Gondoliers, sponds to type abstraction in second- putation due to Murphy et al.30 Casilda is told that as an infant she order lambda calculus. For this reason, Moggi29 introduced monads as a

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 79 contributed articles

Figure 1. Gerhard Gentzen (1935)—Natural technique to explain the semantics of We begin with the details of natural Deduction. important features of programming deduction as defined by Gentzen15; the languages such as state, exceptions, and proof rules are shown in Figure 1. To A B A & B A & B input–output. Monads became widely simplify our discussion, we consider &-I &-E1 &-E2 A & B A B adopted in the functional language just two of the connectives of natural x Haskell and later migrated into other deduction. We write A and B as place- [A] ⋅ languages, including Clojure, Scala, holders standing for arbitrary formu- ⋅ A ⊃ B A 3 ⋅ -E F#, and C#. Benton et al. observed las. Conjunction is written A & B, and B ⊃ x B ⊃-I that monads correspond to yet another implication is written A ⊃ B. A ⊃ B modal logic, differing from all of S1–S5. We represent proofs by trees, where In classical, intuitionistic, and modal each node of the tree is an instance of a logic, any hypothesis can be used an arbi- proof rule. Each proof rule consists of Figure 2. A proof. trary number of times—zero, once, or zero or more formulas written above a many. Linear logic, introduced in 1987 line, called the “premises,” and a single [B & A]z [B & A]z by Girard,17 requires that each hypoth- formula written below the line, called &-E2 &-E1 esis is used exactly once. Linear logic is the “conclusion.” The interpretation A B &-I “resource conscious” in that facts may be of a rule is that when all the premises A & B z used up and superseded by other facts, hold, then the conclusion follows. ⊃-I suiting it for reasoning about the world The proof rules come in pairs, with (B & A) ⊃ (A & B) where situations change. Computational rules to introduce and to eliminate each aspects of linear logic are discussed by connective, labeled -I and -E, respectively. Abramsky1 and Wadler,38 among many As we read the rules from top to bottom, Figure 3. Simplifying proofs. others. Most recently, Session Types, a introduction and elimination rules do way of describing communication pro- what they say on the tin: The first “intro- ⋅ ⋅ ⋅ ⋅ tocols introduced by Honda,21 have been duces” a formula for the connective, ⋅ ⋅ A B related to intuitionistic linear logic by which appears in the conclusion but not ⋅ 4 &-I ⋅ Caires and Pfenning, and to classical lin- in the premises; the second “eliminates” A & B ⋅ ear logic by Wadler.40 a formula for the connective, which &-E1 =⇒ A A Propositions as Types remains a appears in a premise but not in the con- topic of active research. clusion. An introduction rule describes [A]x under what conditions we say the con- ⋅ ⋅ ⋅ ⋅ Natural Deduction ⋅ ⋅ nective holds—how to define the connec- B ⋅ A We now turn to a more formal develop- tive. An elimination rule describes what ⊃-Ix ⋅ ⋅ ⋅ ⋅ ment, presenting a fragment of natu- we may conclude when the connective A ⊃ B A ⋅ ⊃-E =⇒ B ral deduction and a fragment of typed holds—how to use the connective. B lambda calculus in a style that makes The introduction rule for conjunc- clear the connection between the two. tion, &-I, states that if formula A holds and formula B holds, then the for- Figure 4. Simplifying a proof. mula A & B must hold as well. There are two elimination rules for conjunc- [B & A]z [B & A]z tion. The first, &-E1, states that if the &-E2 &-E1 A B formula A & B holds, then the formula

&-I A must hold as well. The second, &-E2, A & B BA z concludes B rather than A. ⊃-I &-I (B & A) ⊃ (A & B) B & A The introduction rule for impli- ⊃-E cation, ⊃-I, states that if from the A & B assumption that formula A holds we

=⇒ may derive the formula B, then we may conclude the formula A ⊃ B holds and BA BA discharge the assumption. To indicate &-I &-I that A is used as an assumption zero, B & A B & A &-E2 &-E1 once, or many times in the proof of B, A B we write A in brackets and tether it to &-I A & B B via ellipses. A proof is complete only

=⇒ when every assumption in it has been discharged by a corresponding use of ⊃-I, which is indicated by writing the AB &-I same name (here x) as a superscript A & B on each instance of the discharged assumption and on the discharging

80 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 contributed articles

rule. The elimination rule for implica- Figure 5. Alonzo Church (1935)—Lambda Calculus. tion, ⊃-E, states that if formula A ⊃ B holds and if formula A holds, then we M : A N : B L : A × B L : A × B may conclude formula B holds as well; ×-I ×-E1 ×-E2 M, N : A B as mentioned earlier, this rule also 〈 〉 × π1 L : A π2 L : B goes by the name modus ponens. [x : A]x Critical readers will observe we ⋅ ⋅ L : A → B M : A use similar language to describe ⋅ →-E N : B LM : B rules (“when-then”) and formulas →-Ix (“implies”). The same idea applies at λx. N : A → B two levels, the meta level (rules) and the object level (formulas), and in two notations, using a line with premises above and conclusion below for impli- Figure 6. A program. cation at the meta level, and the symbol [z : B × A]z [z : B × A]z ⊃ with premise to the left and conclu- ×-E2 ×-E1 sion to the right at the object level. It is π2 z : A π1 z : B -I almost as if to understand implication × 〈π2 z, π1 z〉 : A × B one must first understand implication! →-Iz This Zeno’s paradox of logic was wryly λz . 〈π2 z, π1 z〉 : (B × A) → (A × B) observed by Carroll.5 We need not let it disturb us; everyone possesses a good informal understanding of implica- Figure 7. Evaluating programs. tion, which may act as a foundation for ⋅ ⋅ ⋅ ⋅ its formal description. ⋅ ⋅ A proof of the formula M : A N : B ×-I ⋅ ⋅ 〈M, N〉 : A × B ⋅ (B & A) ⊃ (A & B). ×-E1 =⇒ M : A π1 〈M, N〉 : A is shown in Figure 2; that is, if B and A [x : A]x hold, then A and B hold. This may seem ⋅ ⋅ ⋅ ⋅ so obvious as to be hardly deserving ⋅ ⋅ of proof! However, the formulas B ⊃ A N : B ⋅ M : A x ⋅ ⋅ ⊃ →-I ⋅ ⋅ and A B have meanings that differ, λx. N : A → B M : A ⋅ and we need some formal way to con- →-E =⇒ N[M/x]: B clude that the formulas B & A and A & B (λx. N) M : B have meanings that are the same. This is what our proof shows, and it is reas- suring it can be constructed from the Figure 8. Evaluating a program. rules we posit. [z : B × A]z [z : B × A]z The proof reads as follows. From ×-E2 ×-E1 B & A we conclude A, by &-E2, and from π2 z : A π1 z : B ×-I B & A we also conclude B, by &-E1. From 〈π2 z, π1 z〉 : A × B y : B x : A A and B we conclude A & B, by &-I. That →-Iz ×-I is, from the assumption B & A (used λz. 〈π2 z, π1 z〉 : (B × A) → (A × B) 〈y, x〉 : B × A twice) we conclude A & B. We discharge → -E (λz. 〈π z, π z〉) 〈y, x〉 : A × B the assumption and conclude (B & A) ⊃ 2 1 (A & B) by ⊃-I, linking the discharged assumptions to the discharging rule by ⇐= writing z as a superscript on each. y : B x : A y : B x : A ×-I ×-I Some proofs are unnecessarily 〈y, x〉 : B × A 〈y, x〉 : B × A roundabout. Rules for simplifying ×-E2 ×-E 1 proofs appear in Figure 3, and an exam- π2 〈y, x〉 : A π1 〈y, x〉 : B -I ple appears in Figure 4. Let us focus on × 〈π2 〈y, x〉, π1 〈y, x〉〉 : A × B the example first. The top of Figure 4 shows a larger proof built from the proof in Figure 2. ⇐= The larger proof assumes as premises x : A y : B ×-I two formulas, B and A, and concludes 〈x, y〉 : A × B with the formula A & B. However, rather

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 81 contributed articles

than concluding it directly we derive the contains an instance of ⊃-I followed by ´-I, states that if term M has type A and result in a roundabout way, in order to ⊃-E and is simplified by replacing each term N has type B, then we may form illustrate an instance of ⊃-E, modus of the two assumptions of B & A on the left the pair term áM, Nñ of product type A ponens. The proof reads as follows: On by a copy of the proof of B & A on the right. ´ B. There are two elimination rules the left is the proof given previously, The result is the second proof, which, as for products. The first, ´-E1, states concluding in (B & A) ⊃ (A & B); on the a result of the replacement, now con- that if term L has type A ´ B, then we

right, from B and A we conclude B & A by tains an instance of &-I followed by &-E2, may form the term p1 L of type A, which &-I. Combining these yields A & B by ⊃-E. and another instance of &-I followed selects the first component of the pair.

We may simplify the proof by apply- by &-E1. Simplifying each of these yields The second, ´-E2 is similar, save that it

ing the rewrite rules of Figure 3. These the third proof, which derives A & B forms the term p2 L of type B. rules specify how to simplify a proof directly from the assumptions A and B The introduction rule for functions, when an introduction rule is immedi- and can be simplified no further. →-I, states that if given a variable x of ately followed by the corresponding It is not difficult to see that proofs type A we have formed a term N of type B, elimination rule. Each rule shows two in normal form satisfy the Subformula then we may form the lambda term lx. proofs connected by an arrow, indicat- Principle: Every formula of such a proof N of function type A → B. The variable ing that the “redex” (the proof on the must be a subformula of one of its undis- x appears free in N and bound in lx. N. left) may be rewritten, or simplified, to charged assumptions or of its conclu- Undischarged assumptions correspond yield the “reduct” (the proof on the right). sion. The proof in Figure 2 and the to free variables, while discharged Rewrites always take a valid proof to final proof of Figure 4 both satisfy this assumptions correspond to bound vari- another valid proof. property, while the first proof of Figure 4 ables. To indicate that the variable x For &, the redex consists of a proof of does not, since (B & A) ⊃ (A & B) is not a may appear zero, once, or many times A and a proof of B that combine to yield subformula of A & B. in the term N, we write x : A in brackets A & B by &-I, which in turn yields A by and tether it to N : B via ellipses. A term

&-E1. The reduct consists simply of the Lambda Calculus is closed only when every variable in it is proof of A, discarding the unneeded We now turn our attention to the sim- bound by a corresponding l term. The proof of B. There is a similar rule, not ply typed lambda calculus of Church8; elimination rule for functions, →-E, shown, to simplify an occurrence of &-I the type rules are in Figure 5. To sim- states that given term L of type A → B

followed by &-E2. plify our discussion, we take both prod- and term M of type A we may form the For ⊃, the redex consists of a proof ucts and functions as primitive types; application term L M of type B. of B from assumption A, which yields Church’s original calculus contained For natural deduction, we noted ear- A ⊃ B by ⊃-I, and a proof of A, which only function types, with products as lier there might be confusion between combine to yield B by ⊃-E. The reduct a derived construction. We now write implication at the meta level and at consists of the same proof of B, but A and B as placeholders for arbitrary the object level. For lambda calculus now with every occurrence of the types, and L, M, N as placeholders for the distinction is clearer, as we have assumption A replaced by the given arbitrary terms. Product types are writ- implication at the meta level (if terms proof of A. The assumption A may be ten A ´ B, and function types are writ- above the line are well typed, then so used zero, once, or many times in the ten A → B. Now, instead of formulas, are terms below) but functions at the proof of B in the redex, so the proof of our premises and conclusions are object level (a function has type A → B A may be copied zero, once, or many judgments of the form because if it is passed a value of type A times in the proof of B in the reduct. then it returns a value of type B). For this reason, the reduct may be M : A What previously had been discharge of larger than the redex, but it will be assumptions (perhaps a slightly dif- simpler in the sense it has removed indicating term M has type A. fuse concept) becomes binding of vari- an unnecessary detour via the sub- Like proofs, we represent type deri- ables (a concept understood by most proof of A ⊃ B. vations by trees, where each node of computer scientists). We can think of the assumption of A the tree is an instance of a type rule. The reader will have observed a in ⊃-I as a debt that is discharged by the Each type rule consists of zero or more striking similarity between Gentzen’s proof of A provided in ⊃-E. The proof judgments written above a line, called rules from the preceding section and in the redex accumulates debt and pays the “premises,” and a single judgment Church’s rules from this section; ignor- it off later; while the proof in the reduct written below the line, called the “con- ing the terms in Church’s rules then pays directly each time the assumption clusion.” The interpretation of a rule is they are identical if one replaces & by ´ is used. Proof debt differs from mon- that when all the premises hold, then and ⊃ by →. The coloring of the rules is etary debt in that there is no interest, the conclusion follows. chosen to highlight the similarity. and the same proof may be duplicated Like proof rules, type rules come in A program of type freely as many times as needed to pay pairs. An introduction rule describes off an assumption, the very property how to define or construct a term of the (B ´ A) → (A ´ B) that money, by being difficult to coun- given type, while an elimination rule terfeit, is designed to avoid! describes how to use or deconstruct a is shown in Figure 6. Whereas the differ- Figure 4 demonstrates use of these term of the given type. ence between B & A and A & B appears a rules to simplify a proof. The first proof The introduction rule for products, mere formality, the difference between

82 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 contributed articles

B ´ A and A ´ B is easier to appreciate; consists simply of term M of type A, instance of ´-I followed by ´-E2, and converting the latter to the former discarding the unneeded term N of another instance of ´-I followed by requires swapping the elements of the type B. There is a similar rule, not ´-E1. Rewriting each of these yields pair, which is precisely the task per- shown, to rewrite an occurrence of ´-I the third program, which derives the formed by the program corresponding followed by ´-E2. term áx, yñ of type A ´ B, and can be to our former proof. For →, the redex consists of a deri- evaluated no further. The program reads as follows. From vation of term N of type B from variable Hence, simplification of proofs cor- variable z of type B ´ A we form term p2 z x of type A, which yields the lambda responds exactly to evaluation of of type A by ´-E2 and also term p1 z of term lx. N of type A → B by →-I, and programs, in this instance demon- type B by ´-E1. From these two terms we a derivation of term M of type A, which strating that applying the function to form the pair áp2 z, p1 zñ of type A ´ B by combine to yield the application (lx. the pair indeed swaps its elements. ´-I. Finally, we bind the free variable z to N) M of type B by →-E. The reduct con- form the lambda term lz. áp2 z, p1 zñ of sists of the term N[M/x], which replaces Conclusion type (B ´ A) → (A ´ B) by →-I, connecting each free occurrence of the variable x Proposition as Types informs our view the bound typings to the binding rule by in term N by term M. Further, if in the of the universality of certain program- writing z as a superscript on each. The derivation that N has type B we replace ming languages. function accepts a pair and swaps its ele- each assumption that x has type A by The Pioneer spaceship includes a ments, exactly as described by its type. the derivation that M has type A, we get plaque designed to communicate with A program may be evaluated by a derivation showing N[M/x] has type B. aliens, if any should ever intercept it (see rewriting. Rules for evaluating pro- Since the variable x may appear zero, Figure 9). They may find some parts of grams appear in Figure 7, and an once, or many times in the term N, the it easier to interpret than others. A example appears in Figure 8. Let us term M may be copied zero, once, or radial diagram shows the distance of 14 focus on the example first. many times in the reduct N[M/x]. For pulsars and the center of our galaxy from The top of Figure 8 shows a larger this reason, the reduct may be larger Sol. Aliens are likely to determine the program built from the program in than the redex, but it will be simpler in length of each line is proportional to the Figure 6. The larger program has two the sense it has removed a subterm of distances to each body. Another diagram free variables, y of type B and x of type type A → B. Discharge of assumptions shows humans in front of a silhouette A, and constructs a value of type A ´ B. thus corresponds to applying a func- of Pioneer. If Star Trek gives an accurate However, rather than constructing tion to its argument. conception of alien species, they may it directly we reach the result in a Figure 8 demonstrates use of these respond, “They look just like us, except roundabout way, in order to illustrate rules to evaluate a program. The first they lack pubic hair.” However, if the an instance of →-E, function applica- program contains an instance of →-I aliens’ perceptual system differs greatly tion. The program reads as follows: followed by →-E, and is rewritten from our own, they may be unable to On the left is the program given previ- by replacing each of the two occur- decipher these squiggles. ously, forming a function of type (B ´ rences of z of type B ´ A on the left What would happen if we tried to A) → (A ´ B). On the right, from B and by a copy of the term áy, xñ of type B communicate with aliens by transmit- A we form the pair áy, xñ of type B ´ A by ´ A on the right. The result is the sec- ting a computer program? In the movie ´-I. Applying the function to the pair ond program, which, as a result of Independence Day, the heroes destroy the forms a term of type A ´ B by →-E. the replacement, now contains an invading alien mothership by infecting We may evaluate this program by applying the rewrite rules of Figure Figure 9. Plaque on Pioneer spaceship. 7. These rules specify how to rewrite a term when an introduction rule is 2 immediately followed by the corre- sponding elimination rule. Each rule shows two derivations connected by an arrow, indicating the “redex” (the term on the left) may be rewritten, or 4 evaluated, to yield the “reduct”(the 1 term on the right). Rewrites always take a valid type derivation to another 5 3 valid type derivation, ensuring rewrites preserve types, a property known as “subject reduction” or “type soundness.” For ´, the redex consists of term M of type A and term N of type B that 6 combine to yield term áM, Nñ of type A ´ B by ´-I, which in turn yields term p1 áM, Nñ of type A by ´-E1. The reduct

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 83 contributed articles

28. Mitchell, J.C., Plotkin, G.D. Abstract types have it with a computer virus. Close inspec- References existential type. Transactions on Programming tion of the transmitted program shows 1. Abramsky, S. Computational interpretations of linear Languages and Systems 10, 3 (July 1988), 470–502. it contains curly braces; it is written in a logic. Theoretical Compututer Science 111, 1–2 29. Moggi, E. Notions of computation and monads. (1993), 3–57. Information and Computation 93, 1 (1991), 55–92. dialect of C! It is unlikely that alien spe- 2. Bates, J.L., Constable, R.L. Proofs as programs. 30. Murphy VII, T., Crary, K., Harper, R., Pfenning, F. cies would program in C and unclear Transactions on Programming Languages and A symmetric modal lambda calculus for distributed Systems 7, 1 (Jan. 1985), 113–136. computing. In Proceedings of the 19th Annual IEEE that aliens could decipher a program 3. Benton, P.N., Bierman, G.M., de Paiva, V. Symposium on Logic in Computer Science (Turku, written in C if presented with one. Computational types from a logical perspective. Finland, July 13–17). IEEE Press, 2004, 286–295. Journal of 8, 2 (1998), 31. Murthy, C. An evaluation semantics for classical What about lambda calculus? Pro­ 177–193. proofs. In Proceedings of Sixth Annual IEEE positions as Types tell us lambda calcu- 4. Caires, L., Pfenning, F. Session types as Symposium on Logic in Computer Science intuitionistic linear propositions. In Proceedings of (Amsterdam, the Netherlands, July 15–18). IEEE lus is isomorphic to natural deduction. the 21st International Conference on Concurrency Press, 1991, 96–107. Theory (Paris, France, Aug. 31–Sept. 3, 2010), 32. Parigot, M. lm-calculus: An algorithmic interpretation It seems difficult to conceive of alien 222–236. of classical natural deduction. In Logic Programming beings who do not know the funda- 5. Carroll, L. What the Tortoise said to Achilles. Mind 4, and Automated Reasoning, Volume 624 of Lecture 14 (Apr. 1895), 278–280. Notes in Computer Science. Springer-Verlag, 1992, mentals of logic, and we might expect 6. Church, A. A set of postulates for the foundation of 190–201. the problem of deciphering a program logic. Ann. Math. 33, 2 (1932), 346–366. 33. Reynolds, J.C. Towards a theory of type structure. 7. Church, A. An unsolvable problem of elementary In Proceedings of the Symposium on Programming, written in lambda calculus to be closer number theory. American Journal of Mathematics Volume 19 of Lecture Notes in Computer Science to the problem of understanding 58, 2 (Apr. 1936), 345–363; presented to the (1974). 408–423. American Mathematical Society, Apr. 19, 1935; 34. Shell-Gellasch, A.E. Reflections of my advisor: the radial diagram of pulsars than of abstract in Bulletin of the American Mathematical Stories of mathematics and mathematicians. The understanding the image of a man and Society 41 (May 1935). Mathematical Intelligencer 25, 1 (2003), 35–41. 8. Church, A. A formulation of the simple theory of types. 35. Szabo, M.E., Ed. The Collected Papers of Gerhard a woman on the Pioneer plaque. Journal of Symbolic Logic 5, 2 (June 1940), 56–68. Gentzen. North Holland Publishing Co., Amsterdam, We might be tempted to conclude 9. Coquand, T. and Huet, G.P. The calculus of the Netherlands, 1969. constructions. Information and Computation 76, 2/3 36. Turing, A.M. On computable numbers, with lambda calculus is universal, but first (1988), 95–120. an application to the Entscheidungsproblem. ponder the suitability of the word 10. Curien, P.-L., Herbelin, H. The duality of computation. Proceedings of the London Mathematical Society In Proceedings of the International Conference on s2–42, 1 (1937); received May 28, 1936, read “universal.” These days, the multiple- Functional Programming (Montreal, Canada, Sept. Nov. 12, 1936. worlds interpretation of quantum 18–20). ACM Press, New York, 2000, 233–243. 37. van Heijenoort, J. From Frege to Gödel: A 11. Curry, H.B. Functionality in combinatory logic. Sourcebook in Mathematical Logic, 1879–1931. physics is widely accepted. Scientists Proceedings of the National Academy of Science 20 Harvard University Press, Cambridge, MA, 1967. imagine that in different universes (1934), 584–590. 38. Wadler, P. A taste of linear logic. In Proceedings 12. Davies, R., Pfenning, F. A modal analysis of staged of the 18th International Symposium on one might encounter different funda- computation. In Principles of Programming Mathematical Foundations of Computer Science mental constants (such as the strength Languages (St. Petersburg Beach, FL, 1996). 258–270. Volume 711 of Lecture Notes on Computer 13. de Bruijn, N.G. The mathematical language Science (Gdan´sk, Poland, Aug. 30–Sept. 3). of gravity or the Planck constant). But Automath, its usage, and some of its extensions. Springer-Verlag, 1993, 185–210. In Proceedings of the Symposium on Automatic 39. Wadler, P. Call-by-value is dual to call-by-name. easy as it may be to imagine a universe Demonstration, Volume 125 of Lecture Notes In Proceedings of the International Conference on where gravity differs, it is difficult in Computer Science (Versailles, France, Dec.). Functional Programming (Uppsala, Sweden, Aug. Springer-Verlag, 1968, 29–61. 25–29).ACM Press, New York, 2003, 189–201. to conceive of a universe where fun- 14. Gandy, R. The confluence of ideas in 1936. InThe 40. Wadler, P. Propositions as sessions. In Proceedings damental rules of logic fail to apply. Universal Turing Machine: A Half-century Survey, of the International Conference on Functional R. Herken, Ed. Springer, 1995, 51–102. Programming (Copenhagen, Denmark, Sept. Natural deduction, and hence lambda 15. Gentzen, G. Untersuchungen über das logische 10–12). ACM Press, New York, 2012, 273–286. calculus, should not only be known Schließen. Math. Z. 39, 2–3 (1935), 176–210, 405–431; reprinted in Szabo.35 by aliens throughout our universe 16. Girard, J.Y. Interprétation functionelle et élimination Philip Wadler ([email protected]) is Professor but also throughout others. So we des coupures dans l’arithm étique d’ordre supérieure. of Theoretical Computer Science in the Laboratory These D’Etat, Université Paris VII, 1972. for Founda­tions of Computer Science in the School of may conclude it would be a mistake 17. Girard, J.-Y. Linear logic. Theoretical Computer Informatics at the University of Edinburgh, Scotland. to characterize lambda calculus as a Science 50 (1987), 1–102. 18. Gödel, K. Über formal unterscheidbare Sätze der universal language, because calling it Principia Mathematica und verwandter Systeme I. universal would be too limiting. Monatshefte für Mathematik und Physik 38 (1931), 173–198; reprinted in Heijenoort.37 19. Griffin, T. A formulae-as-types notion of control. Acknowledgments In Proceedings of the 40th Annual Symposium on Principles of Programming Languages (Rome, Italy, Thank you to Gershom Bazerman, Pete Jan. 23–25). ACM Press, New York, 1990, 47–58. Bevin, Guy Blelloch, Rintcius Blok, 20. Hindley, R. The principal type scheme of an object in combinatory logic. Transactions of the American Ezra Cooper, Ben Darwin, Benjamin Mathematical Society 146 (Dec. 1969), 29–60. Denckla, Peter Dybjer, Johannes Emer- 21. Honda, K. Types for dyadic interaction. In Proceedings of the Fourth International Conference ich, Martin Erwig, Yitz Gale, Mikhail on Concurrency Theory (Hildesheim, Germany, Aug. Glushenkov, Gabor Greif, Vinod Gro- 23–26, 1993), 509–523. 22. Howard, W.A. The formulae-as-types notion ver, Sylvain Henry, Philip Hölzenspies, of construction. In To H.B. Curry: Essays on William Howard, John Hughes, Colin Combinatory Logic, Lambda Calculus, and Formalism. Academic Press, 1980, 479–491; Lupton, Daniel Marsden, Craig McLaugh- original version was circulated privately in 1969. lin, Tom Moertel, Simon Peyton-Jones, 23. Kleene, S. Origins of recursive function theory. Annals of the History of Computing 3, 1 (1981), 52–67. Benjamin Pierce, Lee Pike, Andrés Si- 24. Kleene, S.C. General recursive functions of natural numbers. Mathematical Annalen 112, 1 (Dec. 1936); card-Ramírez, Scott Rostrup, Dann To- abstract in Bulletin of the AMS (July 1935). liver, Moshe Vardi, Jeremy Yallop, Rich- 25. Lewis, C. and Langford, C. Symbolic Logic, 1938; reprinted by Dover, 1959. ard Zach, Leo Zovik, and the referees. 26. Martin-Löf, P. Intuitionistic Type Theory. Bibliopolis, This work was funded under Engineer- Naples, Italy, 1984. 27. Milner, R. A theory of type polymorphism in ing and Physical Sciences Research programming. Journal of Computer and System Copyright held by authors. Council grant EP/K034413/1. Sciences 17, 3 (1978), 348–375. Publication rights licensed to ACM. $15.00.

84 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 Inviting Young Scientists

Meet Great Minds in Computer Science and Mathematics

As one of the founding organizations of the Heidelberg Laureate Forum http://www.heidelberg-laureate-forum.org/, ACM invites young computer science and mathematics researchers to meet some of the preeminent scientists in their field. These may be the very pioneering researchers who sparked your passion for research in computer science and/or mathematics. These laureates include recipients of the ACM A.M. Turing Award, the Abel Prize, the Fields Medal, and the Nevanlinna Prize. The Heidelberg Laureate Forum is September 18–23, 2016 in Heidelberg, Germany. This week-long event features presentations, workshops, panel discussions, and social events focusing on scientific inspiration and exchange among laureates and young scientists.

Who can participate? New and recent Ph.Ds, doctoral candidates, other graduate students pursuing research, and undergraduate students with solid research experience and a commitment to computing research How to apply: Online: https://application.heidelberg-laureate-forum.org/ Materials to complete applications are listed on the site. What is the schedule? Application deadline—February 3, 2016. We reserve the right to close the application website early depending on the volume Successful applicants will be notified byend of March/early April 2016.

More information available on Heidelberg social media

PHOTOS: ©HLFF / B. Kreutzer (top); ©HLFF / C. Flemming (bottom) contributed articles

DOI:10.1145/2756543 ISPs argue these price increases Economic incentives that alleviate congestion and usage penalties are needed in re- sponse to rapid growth in data traffic, for Internet customers can also improve as driven by increasing demand for business performance for network operators. smart devices, bandwidth-hungry ap- plications, cloud-based services, ma- BY SOUMYA SEN, CARLEE JOE-WONG, chine-to-machine traffic, and media- SANGTAE HA, AND MUNG CHIANG rich Web content.5,20 This explosive growth, which Cisco’s visual network- ing index projects will cause a nearly tenfold increase in global wireless traf- fic between 2014 and 2019,4 requires Smart Data investment in expanding wired and wireless network capacities (such as additional spectrum, Wi-Fi hotspots for offloading data traffic, backhaul- Pricing: ing infrastructure, and newer tech- nologies like 4G/LTE and femtocells). The benefits of this capacity expansion Using are partly accrued by the content pro- viders who attract more advertising and e-commerce revenue from greater user demand while further driving Economics to demand for bandwidth. ISPs contend they are trapped in a vicious cycle that does not allow them to match their Manage Network prices to their costs. Measures (such as throttling, data caps, and usage-based metered pricing) are thus viewed as Congestion essential for regulating demand and managing network congestion. However, penalizing demand would be harmful for the Internet ecosystem and could restrict network access for some users. Appreciating the role of economics in network management key insights SINCE 2010, THERE has been rapid evolution in pricing ˽˽ Network operators should not see practices among Internet service providers (ISPs) in demand growth as a problem to be solved by penalizing users but as the U.S. and other international markets, particularly an opportunity to monetize their networks in moving away from flat-rate to usage-based pricing by broadening the revenue base and 19 managing congestion by creating the in cellular networks. In 2010, AT&T eliminated right economic incentives for users. unlimited data plans and introduced a tiered plan ˽˽ Two complementary approaches—time- dependent pricing and traffic offloading— of about $10/GB, along with deliberately slowing aim to reduce network congestion by giving users incentives and mechanisms the traffic of heavy users and adding hefty overage to shift their use to less-congested times fees. Verizon and other U.S. ISPs have since introduced or frequencies and networks. ˽˽ Smart data pricing encompasses other similar data plans. These changes have fueled the approaches in network pricing and continuing debate about Net neutrality and the engineering, including toll-free sponsored data, to benefit network operators, openness of the Internet. consumers, and content providers alike.

86 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 and understanding how these models eral, it asks three kinds of questions nisms. Although researchers have can be realized in practical network along the following dimensions: been exploring the interplay between systems is crucial to the Internet’s Who pays? Who should pay for band- networks and economics for years5,8,15 long-term growth and sustainability. width (such as zero-rating, sponsored (for a detailed survey of various pric- Exploring the link between eco- content, and two-sided pricing)?; ing proposals, see Sen et al.21), the nomic principles and network engi- What services? What service should need for designing and demonstrat- neering is the goal of several recent be charged for (such as transaction- ing fully functional prototypes has be- research efforts in smart data pric- based pricing and quality-based pric- come more urgent due to users’ grow- ing (SDP).24 SDP mechanisms go be- ing)?; and ing demand for data. Consequently, a yond simple byte-counting schemes How to charge? How to enable and key aspect of recent SDP research has to include time/location/app-based charge for mechanisms like time/lo- been how to bridge the gap between dynamic pricing, usage-based pric- cation/congestion-dependent pricing analytical models and practical con- ing with differentiated speed tiers, and traffic offloading? siderations: auction-based smart markets, Wi-Fi In this article, we mainly address From practice to modeling. Network offloading, proactive caching, zero- “How?,” reporting on two research technology should be complemented rating or sponsored content, and quo- directions—time-dependent pricing with sound economics, while analyti-

IMAGE BY JOHN LUND BY IMAGE ta-aware content adaptation. In gen- (TDP) and traffic-offloading mecha- cal models should account for the real-

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 87 contributed articles

Figure 1. Components of the TDP system: (a) control-feedback loop of dynamic TDP; (b) functionality separation between user-side and operator-side devices.

(a) User Interface Price Optimizer

User Response Aggregate Traffic User Behavior Measurement Estimation

(b) User Device

User Interface

ISP Server Allow or Block

Price and Price Optimizer Autopilot App Secure Usage App Backend Usage Connection Prediction Scheduler Aggregate Traffic User Behavior Database Measurement Estimation

world constraints imposed by existing develop scalable systems and related Case 1. Time-Dependent Pricing technical and practical operational signaling protocols for these incentive Much of the need for expanded net- considerations, including parameter mechanisms? How should the various work capacity is due to large peak measurability, data granularity, solu- required functionalities (such as con- demand created by users’ simultane- tion scalability and complexity, inte- gestion measurement, traffic profil- ous consumption of data; Cisco’s vi- gration and deployment feasibility, ing, and delay-optimal scheduling) be sual networking index predicts peak regulatory requirements; and divided among network backend and hour traffic will grow at 64% CAGR.4 From modeling to practice. Analyti- end-user devices?; and Yet, as Odlyzko et al.16 correctly said, cal models should provide guidance Field trials. How can ISPs use hu- ISP attempts to slow this growth by on the economics of bandwidth pric- man-computer interaction principles transitioning from flat-rate to usage- ing and policy, as well as on the ar- to design end-user interfaces so users based pricing are unlikely to solve the chitecture that will implement the understand and respond to the pricing problem. To discourage large peak models in operational networks. The signals? demand, prices should have a tem- prototypes developed must then be A series of new initiatives since poral component or vary over differ- tested by deploying them in the wild. 2012, including the Internet Architec- ent times of the day, as in TDP. Only To incorporate this interplay in SDP, ture Board’s Workshop on Internet then will users be incentivized to researchers should take a holistic ap- Technology Adoption & Transition,14 spread their demand over time, im- proach bringing together ideas from Smart Data Pricing Forum,23 and In- proving network resource utilization economics, networking, information ternet Research Task Force Working by reducing the peaks and filling in systems, and human-computer in- Group on Global Access to the Inter- the valley periods. Writing in 2008 at teraction. In general, SDP research net for All,17 provide momentum for Google’s Public Policy blog,a Vinton G. involves three stages, each with inter- such interdisciplinary collaborations. Cerf advocated a similar view, saying, esting research questions essential to This article discusses two complemen- “Network Management also should realizing a new pricing scheme, as in tary research themes in SDP—TDP and be narrowly tailored, with bandwidth the following: traffic offloading—that aim to reduce constraints aimed essentially at times Analytical modeling. How can ISPs network congestion by giving users of actual congestion.” He cautioned create economic models for comput- incentives and mechanisms to “shift” against ISPs rushing to change their ing optimized prices or incentives their use to less-congested times or to pricing, favoring a more detailed they are willing to offer and users are other supplementary networks (such study on the efficacy of such dynamic willing to accept for modifying their as Wi-Fi). We also consider SDP’s im- bandwidth-consumption behavior in plications for the Internet’s long-term a http://googlepublicpolicy.blogspot.com/2008/ the desired manner?; sustainability and accessibility to a 08/whats-reasonable-approach-for-managing. System development. How can ISPs wider user population. html

88 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 contributed articles or time-varying pricing mechanisms. window of announced prices; that is, a The work we present here is one such new price point for the 24th hour is com- attempt to understand the efficacy puted and announced every hour. Users and feasibility of TDP for mobile data. thus receive day-ahead price guaran- The telecommunications industry tees while being rewarded for shifting has long used dynamic TDP plans for To discourage large some portion of their data usage (such voice calls as a response to demand peak demand, as non-critical traffic) according to the variability in call volume by adjusting announced price points. ISPs can mea- users’ prices and incentives. However, prices should sure changes in usage volume at differ- dynamic pricing plans for data traffic, have a temporal ent hours of the day in response to the in spite of their theoretical potential to given set of prices and estimate users’ make resource allocation much more component or vary willingness to shift different types of efficient, have remained largely unreal- traffic, which is in turn used to com- ized in the global market, possibly due over different times pute the next set of optimized prices. to the gap between the large body of an- of the day, as in TDP. These prices are computed through a alytical work on the topic and the lack convex optimization formulation that of functional prototypes implement- minimizes the provider’s total cost ing these ideas. of overshooting capacity and the cost The extensive academic literature of providing these incentives to users on dynamic pricing theories21 includes while accounting for current estimates responsive pricing (setting prices so as of users’ willingness to shift traffic, tem- to keep user demand under a certain poral variations in usage volume, and threshold), proportional fairness pric- capacity constraints.9 The framework ing (setting prices to optimize a propor- thus requires a control-feedback loop tional fairness criterion on the amount between ISPs and users (see Figure 1a). of bandwidth allocated to different us- In addition to considering users’ ers), priority pricing (explicitly account- psychological preference for certainty ing for QoS by allowing users to pay less regarding future price points, TDP by accepting a longer delay at congest- must accommodate technological and ed times), and “smart market” auction regulatory concerns. In order to pre- pricing (deciding whether to admit a serve users’ privacy, our formulation packet into the network at congested does not require ISPs track each indi- times based on the user-specified bid vidual user’s usage pattern, precluding attached to that packet). But such the need for any deep packet inspection schemes face two practical challenges: to realize this pricing scheme. The for- users prefer flat rates over the uncer- mulation remains computationally ef- tainties associated with near-real-time ficient as the number of network users price fluctuations,15,22 and users are grows; the model in Ha et al.9 implicitly often reluctant to delegate price bid- assigns the aggregate traffic from all us- ding or traffic scheduling to automated ers and applications into virtual traffic agents, preferring the psychological as- classes characterized by different delay- surance of manual control despite the and price-sensitivity estimates. It then greater convenience of automation.22 accounts for heterogeneity across users For users to be comfortable with dy- by separately modeling the probabi- namic pricing for data, ISPs must be listic deferral behavior for each traffic willing to provide guarantees on avail- class. This optimization model avoids able future incentives or prices, design utility functions, which can be difficult intuitive user interfaces to aid manual to measure quantitatively. Instead, all decision making, and demonstrate the parameters (such as changes in usage underlying system’s feasibility with a volume in each period in response to proof-of-concept prototype. prices) can be measured directly (such Practice to models. To address the as users’ price and delay sensitivities practical issues within an analytical for different traffic classes) or estimat- framework, we have explored an alter- ed by the ISP without compromising native pricing approach—dynamic day- the system’s scalability or user privacy. ahead TDP9—in which the ISP com- Models to practice. Figure 1b out- putes hourly prices one day in advance lines the system we use to realize this and advertises them to all users. The day-ahead TDP for mobile data; the ag- provider continues to compute new gregate traffic measurement, user sen- prices to maintain a sliding one-day sitivity estimation, and price computa-

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 89 contributed articles

tion engines are on the ISP side, while a green for >30%). Users could view their user interface design to ensure users mobile application that communicates usage history superimposed over the of- are able to understand and respond to with the pricing engine is on each end- fered prices to visualize how much they pricing signals.22 user device. The primary purpose of spent and saved on data. Additionally, With optimized day-ahead time-de- the mobile application is to give users to help users save money, the app, called pendent prices, resource utilization at information on available price points, TUBE (a free version with only traffic- off-peak hours nearly doubled, indicat- but it also has optional features like monitoring features, called DataWiz, ing TDP can also improve utilization of usage monitoring and alerts, as well is available in both iOS and Android network capacity by flattening and dis- as an auto-pilot mode for automated appstores; see http://scenic.princeton. tributing demand over different times scheduling of applications based on edu/datawiz/), provided an interface for of the day. The resulting maximum ob- available prices and user-specified de- users to see their top five bandwidth- served daily peak-to-average ratio (PAR) lay sensitivities. For privacy reasons, consuming applications, set alerts, and decreased by 30% with dynamic day- such scheduling information remains configure their weekly budgets and app ahead TDP, and approximately 20% of in the app but is not communicated to delay sensitivities for automatic sched- the PARs from the pre-trial period were the ISP. For more on the user-interface uling, as in Figure 3b–3e. greater than the maximum PAR with design, efficacy, and user response to Using the analytical model in Ha et TDP. Although the population size of these features, see Sen et al.22 al.9 we offered optimized day-ahead our field trial was somewhat limited We developed and tested a proto- time-varying price discounts on the due to the complexity of actually con- type of this system over eight months in baseline price of $10/GB to all users. ducting it, the results are promising, 2012 in different phases of a random- We found users did shift their traffic demonstrating it is possible to not only ized field experiment in Princeton, from high-price to low-price periods operationalize dynamic pricing for mo- NJ, with 50 mobile users on the AT&T under TDP. Overall data use by iPad bile data but also to use time-varying network. In it, we effectively became a users decreased by 10.1% in high-price prices to change user behavior. resale ISP offering this data plan with periods and increased by 15.7% in low- TDP can also be offered as time- day-ahead dynamic prices to the trial price periods; that is, for most users, dependent sponsored content, whereby participants. We separated the 3G traf- average use decreased in high-price consumers do not see a price fluctua- fic of the participants from that of other periods relative to average use at the tion, but the valley discount is reflected customers using an access-point-name same hours before the trial.9 Addition- in the price sponsoring parties (such as setup, tunneling participants’ 3G traf- ally, by focusing on use in consecutive content providers and enterprises) pay fic from the ISP’s core network into lab periods, where discounts differed by to an ISP. servers (see Figure 2). The participants only 1% but the colors of the price in- installed the TDP mobile application dicator bars were different (such as Case 2. Traffic Offloading on their iOS devices. Wi-Fi use, voice comparing usage volumes in a yellow In addition to shifting demand from calls, and text messaging were not in- price period with a 29% discount to peak to valley time periods, ISPs can alle- cluded in the trial traffic, as these ser- the following green price period with viate congestion by shifting demand off vices do not count toward 3G data caps. a 30% discount), we found a significant their cellular networks onto supplemen- The mobile application running on change in use, even though the abso- tary networks like Wi-Fi or femtocells, a users’ devices displayed the prices/dis- lute percent change in discount was process known as “traffic offloading.” counts available for the next 24 hours only 1%. This resulting changed behav- Many ISPs encourage offloading by in a color-coded format (see Figure 3a). ior indicates users paid more attention selling bundles of base and supple- Each price is color-coded by its discount to the color-coding than to the actual mentary technologies; for example, the rate (such as red for 0%–10%, orange value of the price discounts, emphasiz- French telecommunications company for 11%–19%, yellow for 20%–29%, and ing the need for careful and intuitive Orange offers a £2 bundle of 3G and

Figure 2. Field trial setup for dynamic day-ahead, time-dependent usage-based pricing.

Gateway 3G Core Network PSTN BSS GMSC MSC

Firewall AuC VLR HLR DNS User Data Flow VPN iPhone BSC or iPad Internet NAT GGSN

SGSN Lab Servers

90 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 contributed articles

13 Wi-Fi hotspot access. Conventional Figure 3. Graphical user interfaces of the mobile app used in the TDP trial: (a) landscape wisdom says increasing the access view of superimposed price and usage history by day, week, and month; (b) view of the price of the base technology (such as top-five bandwidth-consuming apps in the bottom split-screen; (c) weekly budget adjust- 3G or 4G) will encourage users to pur- ment screen; (d) app-delay sensitivity settings screen; and (e) app-specific temporal block- ing in parental control. chase the bundle and offload more to the supplementary network. This seems to be a direction many service (a) providers are pursuing today with vari- ous penalty mechanisms on their base technology. But this strategy does not account for the fact the supplementary network can itself become more con- gested as more and more users offload their traffic, potentially making it less attractive to users. That is, there is a complex interaction among the prices of the base and supplementary tech- nologies, the relative network conges- tion externalities of these two tech- nologies, and the coverage area of the supplementary technology. Economic models and their practical realiza- tion in field trials can help ISPs design more effective offloading mechanisms. Here, we discuss related results from (b) (c) our studies that focus on the theory, as well as implementation, of ideas that can improve offloading performance and make it easier for users to make offloading decisions. Practice to models. Understanding how users decide to adopt the base technology’s network or a bundle of base and supplementary technolo- gies, as well as deriving the resulting equilibrium and transient market out- comes, requires analytical models that incorporate practical issues like con- gestion on both networks and the cov- erage area of supplementary networks. Sen et al.18 introduced a model to study the dynamics of competition between two generic network technologies with (d) (e) cross-network externalities in the pres- ence of network gateways or convert- ers. In Joe-Wong et al.,13 we extended this framework to develop an analyti- cal model in which users individually make their adoption decisions based on several factors (such as the technol- ogies’ intrinsic qualities, users’ hetero- geneity in the evaluation of these quali- ties, negative congestion externalities from the presence of other subscribers on the technologies, and access rates charged by an ISP). Using the analytical model intro- duced by Joe-Wong et al.,13 we stud- ied how user-level decisions translate into aggregate adoption dynamics and

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 91 contributed articles

characterize the equilibrium outcomes practical, cost-aware Wi-Fi offloading terests for cost, throughput, and delay for different system parameters. The system called Adaptive bandwidth Man- via user-side management tools can model reveals seemingly intuitive strat- agement through USer-Empowerment, thus facilitate user decisions, lead- egies can sometimes have unintended or AMUSE,10 a user-centric tool that ing to offloading benefits that could consequences for congestion on the learns users’ behavior and mobility pat- be further improved when integrated base network technology; for example, terns to help decide which applications with ISP-centric approaches (such as increasing the coverage area of the to offload to what times of day, enabling ad hoc authorization for offloading to supplemental technology can increase users to stay under their data caps. In hidden Wi-Fi access points or market- traffic on the supplemental network, doing so, it uses a utility-optimization based solutions for offloading from motivating some users to drop the bun- algorithm to decide if and how much base stations to third-party-owned Wi- dled service altogether and use only the 3G bandwidth must be allocated to each Fi access points and femtocells).3,6,11 base technology. Likewise, increasing application at any moment of the day Such measures to push some control the base technology’s access price can based on the user’s available budget, ap- from the network core out to end us- cause some users to find the base tech- plication delay sensitivities, and input ers complement previous and exist- nology’s coverage and relative network from prediction algorithms regarding ing efforts of the networking research congestion conditions do not offset the the user’s demand patterns and Wi-Fi community, as with Briscoe et al.2 decrease in their utility from the base availability in the future. price increase. These users are then To implement this allocation in Win-Win Solution motivated to drop the bundled service practice, AMUSE uses a receiver-side The two approaches to managing net- and use just the base technology, thus transmission control protocol (TCP) work congestion discussed earlier, contributing to an increase in conges- bandwidth control algorithm that en- namely realizations of TDP and delaying tion on that network. Careful analysis forces each application’s assigned optimal traffic offloading, can help cre- using such economic models can prove download rate by controlling the TCP ate a financial win-win solution for ISPs very useful in providing guidance and advertisement window on the user and their users. ISPs benefit through re- insight into potential outcomes due to side. The algorithm is thus fully con- duced peak congestion, and users have network pricing and policy changes. tained on end-user devices and does more choices and technologies to help Models to practice. While analytical not require modification on the TCP them save on their monthly bills. frameworks like the one we discuss here server side, making it suitable for real- Research on network resource pric- can provide insight into users’ long- world deployment (see Figure 4). ing also has implications for bridg- term adoption behaviors (such as equi- An AMUSE system implementation ing the digital divide between those librium and stability), users’ minute- on Windows 7 tablets, when tested who can and those who cannot access to-minute offloading decisions reflect with simulated user behavior based the Internet regularly. Rural local ex- their more immediate concerns. Users on 3G and Wi-Fi usage and availabil- change carriers (RLECs) often suffer face a three-way trade-off among cost, ity data from a field study of 37 mo- from congestion in their wired net- throughput, and delay; while they can bile users, showed other offloading works due to the persistence of the save money and receive greater through- algorithms yield 14% and 27% less middle-mile problem. Although the put by waiting for Wi-Fi access, they may user utilities than AMUSE for light cost of middle-mile bandwidth has de- not want to wait for certain critical ap- and heavy users, respectively.10 Intelli- clined over the years due to increased plications. To navigate it, we propose a gently managing users’ competing in- demand needed to fill the middle mile, the bandwidth requirements of home Figure 4. System modules and interaction across AMUSE components. users have also increased sharply. The cost of middle-mile upgrades to meet the Federal Communication App-Level Sesseion Commission’s target speed of 4Mbps Tracker will be substantial and is a barrier to digital expansion in rural areas.7 New User Interface access-pricing mechanisms like TDP can help reduce middle-mile invest- 3G Usage App Delay App Usage ment costs by reducing RLEC peak- Budget Tolerances History capacity provisioning or leasing needs and improving resource utilization in App Usage the valley periods. Providers can thus Prediction match their prices to the cost of deliv- Utility Maximization TCP Rate Controller ery while also creating incentives for Algorithm light users to adopt broadband ser- Wi-Fi Access vices. Instead of being charged by the Prediction volume of data consumed, users can save on their monthly bill by choosing “when” to consume the data. An exten- sion of this idea can be used to create

92 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 contributed articles

ultra-affordable data plans for delay- plemented with intuitive designs, can sensitivity to broadband service speed: What are the implications for public policy? In Smart Data Pricing, tolerant users that allow automated help ISPs better monetize and man- S. Sen, C. Joe-Wong, S. Ha, and M. Chiang, M., Eds. opportunistic access to the network age their network capacity while em- John Wiley & Sons, Inc., Hoboken, NJ, 2014. 8. Gupta, A., Stahl, D.O., and Whinston, A.B. The only when large discounts are available powering users with more options to economics of network management. Commun. ACM or when the network is lightly loaded. avoid hefty overage fees. With implicit 42, 9 (Sept. 1999), 57–63. 9. Ha, S., Sen, S., Joe-Wong, C., Im, Y., and Chiang, M. Implementation of such dynamic TDP pricing signals and automated deci- TUBE: Time dependent pricing for mobile data. ACM SIGCOMM Computer Communication Review 42, 4 and opportunistic offloading schemes sion making, these solutions can be (Oct. 2012), 247–258. can help service providers better utilize readily adapted to emerging machine- 10. Im, Y., Joe-Wong, C., Ha, S., Sen, S., Kwon, T., and Chiang, M. AMUSE: Empowering users for cost-aware their available resources and increase to-machine and Internet of Things offloading with throughput-delay trade-offs.IEEE Internet adoption by being financially applications. These approaches also Transactions on Mobile Computing (forthcoming, 2015). 11. Iosifidis, G., Gao, L., Huang, J., and Tassiulas, L. A attractive to even light data users. have positive implications for making double auction mechanism for mobile data offloading Content providers have also tried to Internet access affordable for many markets. IEEE/ACM Transactions on Networking (Sept. 2014). bridge the digital divide with zero-rat- more users, thereby contributing to 12. Joe-Wong, C., Ha, S., and Chiang, M. Sponsoring ing, app-based pricing and sponsored bridging the digital divide. mobile data: An economic analysis of the impact on 1,12 users and content providers. In Proceedings of the content, all of which open further The scope of SDP is much broader 34th IEEE International Conference on Computer interesting questions for SDP research. than the two particular cases we out- Communications (Hong Kong, China, Apr. 26–May 1). IEEE, New York, 2015. For example, how should platforms lined here. The complementary ques- 13. Joe-Wong, C., Sen, S., and Ha, S. Offering for sponsored content or zero-rating tions “Who?” and “What?” are gain- supplementary network technologies: Adoption behavior and offloading benefits.IEEE/ACM be made compatible with Net neutral- ing traction, in addition to “How?” on Transactions on Networking 23, 2 (Feb. 2014). ity regulations, as in the case of Face- which we focused. Researchers must 14. Lear E. Report from the IAB Workshop on Internet Technology Adoption and Transition. IETF Network book’s Internet.org initiative? And move beyond models, trials, and test- Working Group, Internet-Draft, May 19, 2014; http:// tools.ietf.org/html/draft-iab-itat-report-03 how will such subsidized plans affect ing of prototypes and consider inte- 15. Odlyzko, A. Will smart pricing finally take off? InSmart broadband adoption and network con- gration with existing infrastructure, Data Pricing, S. Sen, C. Joe-Wong, S. Ha, and M. Chiang, Eds. John Wiley & Sons, Inc., Hoboken, NJ, 2014. gestion? Much of the current SDP re- the design of pricing and signaling 16. Odlyzko, A., St. Arnaud, B., Stallman, E., and Weinberg, search focuses on such questions. protocols, and regulatory concerns. M. Know Your Limits: Considering the Role of Data Caps and Usage-Based Billing in Internet Access SDP research will help bring together Service. White Paper. Public Knowledge, Washington, Conclusion academic researchers, network provid- D.C., May 2012. 17. Sathiaseelan, A. Researching Global Access to the As demand for bandwidth grows in ers, content providers, the e-commerce Internet for All (GAIA). The IETF Journal. Internet both wireless and wired networks, ISPs industry, and policymakers to design Society IRTF Working Group, Reston, VA, 2014. 18. Sen, S., Jin, Y., Guerin, R., and Hosanagar, K. Modeling are pursuing penalty mechanisms like and deploy new mechanisms to help the dynamics of network technology adoption and deliberately slowing traffic, capping, ensure the sustainable growth of the the role of converters. IEEE/ACM Transactions on Networking 18, 6 (Dec. 2010), 1793–1805. overage charges, and usage-based fees Internet, mobile, and content markets. 19. Sen, S., Joe-Wong, C., Ha, S., and Chiang, M. to manage their available network ca- Incentivizing time-shifting of data: A survey of time-dependent pricing for Internet access. IEEE pacity. But such measures are arguably Acknowledgments Communications Magazine 50, 11 (Nov. 2012), 91–99. suboptimal and even harmful to the We thank many collaborators in aca- 20. Sen, S., Joe-Wong, C., Ha, S., and Chiang, M. Smart data pricing: Economic solutions to network congestion. ACM Internet ecosystem. In contrast, ideas demic institutions and industry, espe- SIGCOMM eBook on Recent Advances in Networking, ACM SIGCOMM, New York, 2013. from economics can help design in- cially Krishan Sabnani of Bell Labs for 21. Sen, S., Joe-Wong, C., Ha, S., and Chiang, M. A survey centives and pricing policies that are his encouragement and feedback on of broadband data pricing: Past proposals, current plans, and future trends. ACM Computing Surveys 46, beneficial to both service providers this article. 2 (Nov. 2013), 1–37. and their users. Although many analyt- 22. Sen, S., Joe-Wong, C., Ha, S., and Chiang, M. When the price is right: Enabling time-dependent pricing ical models for pricing-based network References of broadband data. In Proceedings of ACM SIGCHI management have been proposed, 1. Andrews, M., Rieman, M., Wang, Q., and Ozen, U. (Paris, France, Apr. 27–May 2). ACM Press, New York, Economic models of sponsored content in wireless 2013, 2477–2486. their implementation has begun only networks with uncertain demand. In Proceedings of 23. Smart Data Pricing Forum; http://scenic.princeton.edu/sdp/ the IEEE Conference on Computer Communications 24. Smart Data Pricing Research; http://sdpresearch.org/ recently. Tackling today’s challenges Workshops (Turin, Italy, Apr. 19). IEEE, New York, requires not only developing analyti- 2013, 345–350. 2. Briscoe, B., Darlagiannis, V., Heckman, O., Oliver, Soumya Sen ([email protected]) is an assistant professor cal models that incorporate practical H., Siris, V., Songhurst, D., and Stiller, B. A market in the Department of Information Systems & Decision managed multi-service Internet. Computer concerns (such as measurability, scal- Sciences in the Carlson School of Management at the Communications 26, 1 (Mar. 2003), 404–414. University of Minnesota, Minneapolis. ability, privacy, and user behavior) but 3. Cheung, M. and Huang, J. Optimal delayed Wi-Fi offloading. InProceedings of the 11th International also demonstrate efficacy and feasibil- Carlee Joe-Wong ([email protected]) is a Ph.D. Symposium on Modeling and Optimization in Mobile, candidate in the Program in Applied and Computational ity through prototypes and field trials. Ad Hoc, and Wireless Networks (Tsukuba Science City, Mathematics at Princeton University, Princeton, NJ. Here, we have focused on “shift- Japan, May 13–17). IEEE, 2013. 4. Cisco. Cisco Visual Networking Index: Global Mobile ing” demand through two comple- Sangtae Ha ([email protected]) is an assistant Data Traffic Forecast Update, 2014–2019; http:// professor in the Department of Computer Science mentary efforts that aim to alleviate www.cisco.com/c/en/us/solutions/collateral/service- with a joint appointment in the Interdisciplinary provider/visual-networking-index-vni/white_paper_ network congestion by creating in- Telecommunications Program at the University of c11-520862.html Colorado at Boulder. centives and mechanisms to modify 5. El-Sayed, M., Mukhopadhyay, A., Urrutia-Valdés, C., and Zhao, Z.J. Mobile data explosion: Monetizing the Mung Chiang ([email protected]) is a professor of user behavior or shift demand to less opportunity through dynamic policies and QoS pipes. electrical engineering and an affiliated faculty member in congested times (through TDP) or to a Bell Labs Technical Journal 16, 2 (Sept. 2011), 79–100. applied and computational mathematics and in computer 6. Gao, L., Iosifidis, G., Huang, J., Tassiulas, L., and Li, science at Princeton University, Princeton, NJ. supplementary network (through de- D. Bargaining-based mobile data offloading.IEEE Journal on Selected Areas in Communications 32, 6 lay-optimized traffic offloading). The (June 2014), 1114–1125. Copyright held by the authors. results indicate such measures, if im- 7. Glass, V., Stefanova, S., and Dibelka, R. Customer price Publication rights licensed to ACM. $15.00

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 93 review articles

DOI:10.1145/2739043 The connection between online communication and psychological well-being depends on whom you are communicating with.

BY ROBERT KRAUT AND MOIRA BURKE Internet Use and Psychological Well-Being: Effects of Activity and Audience

PEOPLE AROUND THE world have incorporated the Internet into their daily lives, using it to find information, communicate with friends and family, shop, play games, Internet use influences the amount of and pass the time. How does it affect our well-being? interpersonal communication people The media frequently posit the Internet is changing engage in, the partners with whom our social lives, warning of a “Sad, Lonely World key insights 13 Discovered in Cyberspace” or asking “Is Facebook ˽˽ Talking with close friends online is linked to improvements in social support, 16 Making Us Lonely?” Even Pope Benedict warned depression, and other measures of well-being, but talking with strangers that “virtual contact cannot and must not take the and reading about acquaintances are not. 19 place of direct human contact.” A major reason for ˽˽ Readers should be skeptical of cross- sectional and survey-based studies this fascination with new technologies and social linking well-being to Internet use.

relationships is that these relationships have important ˽˽ Instead, experiments or longitudinal designs pairing surveys with log data consequences for both physical and psychological provide more reliable insights. 8 31 health (for recent reviews, see Callaghan and Thoits ). ˽˽ Human agency is key: The effect of These concerns have been reflected in the scholarly technology on our lives depends on how we use it, what we talk about, literature as well. For example, prior research suggests and whom we talk to.

94 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 they communicate, and the quality of the amount of social support people Communication with weak ties the communication episodes in which have available to them and thereby im- and strangers is unlikely to have the they participate (for example, Cum- prove the downstream consequences same psychological benefits as com- mings9). Much research shows people associated with social support, for ex- munication with stronger ties. While communicate online primarily with ample, by reducing depression, stress, it is possible for friendships initiated people with whom they communicate and loneliness, and improving mood online to develop into close relation- offline—their relatively strong social and even physical health.21,28 ships over time,17,13 the preponder- ties—and their online communica- However, other research suggests ance of ties in our online social net- tion supplements rather than replac- the quality of the communication peo- works are acquaintances and other es their offline communication.10,20 ple have online is impoverished and weak ties.29 If substitution is occur- For example, in early research, Bik- less valuable than time spent in spo- ring, with communication with weak man and Eveland3 and Hampton and ken communication, either face-to- ties crowding out communication Wellman12 showed how everyday use face or by phone (for example, Cum- with stronger ones, increased use of computer networks increased peo- mings9). Moreover, the relatively low of the Internet could potentially de- ples’ recognition of those they com- cost of online communication and its crease the social support people have municated with online. If so, one insensitivity to distance may encour- available and harm their psychologi- would expect that increased use of the age people to differentially increase cal well-being. Internet, especially for communica- their communication with weak ties Our research over the past 15 years 29 IMAGE BY NATALIYA YAKOVLEVA NATALIYA BY IMAGE tion with strong ties, would increase or strangers. has attempted to determine how every-

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 95 review articles

day use of the Internet influences us- effects.26 When trying to understand differentiated Internet users from ers’ psychological well-being. the causal connection between Inter- non-users,14 and more recent research net use and psychological well-being, often examines the amount people use A Methodological Critique the problem is initial well-being or rela- the Internet without comparing quali- The answers to questions about the tively stable social characteristics, such tatively different types of use.28 Yet re- impact of Internet use on well-being as social competence or extraversion, search that does differentiate types of are unclear for two reasons. First, can influence both how people use the use suggests that different uses of the while questions about causation and Internet and their social and psycho- Internet may have distinct associations change demand longitudinal or ex- logical well-being. For example, in lon- with well-being for different kinds of perimental data,27 typical research in gitudinal work, Mikami and colleagues people. For example, over a one-year this area uses cross-sectional survey demonstrated that adolescents’ social period, depression increased among techniques. Cross-sectional analysis competence and psychological health Dutch adolescents who used the In- can produce misleading conclusions. (for example, depression) predicted ternet for Web surfing, but declined For example, in one of our early stud- their online activities seven years later, among those who used it for chatting ies with the Pew Internet and Ameri- including their online social network with friends; moreover, these asso- can Life Project, cross-sectional results size and the number of their ties offer- ciations occurred only among adoles- showed Internet communication with ing verbal social support.18 In the face cents who received little social support a particular other person was strongly of this evidence showing adolescents’ from their closest friends.23 In another associated with phone and face-to-face depression predicts their adult Inter- study, instant messaging use predict- communication with that person. In net use, it is then difficult to argue us- ed increases in depression, but email contrast, longitudinal analyses of the ing only cross-sectional studies that use did not;32 the increase in depres- same data showed greater Internet use Internet use causes depression. sion may have been limited to instant during a time period was associated The second major problem is that messaging because it tends to be sub- with declines in in-person visits to the much of the extant research fails to dif- stantively shorter and more superfi- partner.25 One reason that conclusions ferentiate types of Internet use and use cial than email.9 Burke and colleagues from cross-sectional analyses should by different types of people, and thus found that exchanging messages with not be trusted is they confound predis- fails to provide insight into the mecha- one’s Facebook ties was associated positions for using technology with its nisms at play. Early research simply with increases in social capital while simply reading news about them was Figure 1. Relative changes in self-reported depression symptoms (CES-D) over a six-month not; however, individuals with lower period predicted by initial social support and types of Internet use. social communication skills seemed to benefit from both types of activi- Zero on the y-axis represents changes in depression among survey ties.6 Understanding the complicated respondents with average levels of social support and Internet use. Each point shows the relative change in depression in standard relationship between Internet use and deviation units and its standard error associated with having more well-being requires an examination of initial social support and/or use of the Internet to communicate with the Internet’s myriad forms and dif- friends and family or with new people. “High” means a standard ferences among users that both drive deviation more than average of a predictor variable (adapted from Bessière et al.1). online activity and moderate its effects.

0.3 Differentiating Internet Uses 0.2 in Longitudinal Research Research from our lab over the past 15 0.1 years has attempted to overcome these 0.0 challenges. It employs a common methodology, using lagged dependent –0.1 variable linear regression on longitudi- Change in Depression –0.2 nal data to examine how different uses of the Internet during a time interval predict changes from the beginning to the end of the interval in social and psychological outcomes such as social support, depression, and loneliness. for New People (3) for New People for New People (6) for New People This research is correlational and not for Friends/Family (7) for Friends/Family for Friends/Family (4) for Friends/Family as powerful as random assignment experiments in demonstrating the causal impact of an intervention, for 24 High Support and High Internet Use High Support and High Internet Use example, Shaw. However, random as- Avg. Support and High Internet Use Avg. Support and High Internet Use Avg. Avg. Support and Avg. Internet Use (1) Support and Avg. Avg. Avg. Support and High Internet Use (2) Avg. High Support and Avg. Internet Use (5) High Support and Avg. signment is generally impractical if the goal is to identify the long-term impact of Internet use, because in developed

96 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 review articles countries where Internet use is already munication with closer friends. Or, pervasive, few people would willingly other changes in unmeasured factors, agree to be randomly assigned to forgo such participants’ satisfaction with Internet access use for long periods of their relationships, may have driven in- time. A panel design mitigates many creases in both time spent online and threats to making causal inferences Understanding depression. from correlational data. By observing the complicated We conducted a replication with the same individuals multiple times, a new sample, when a larger fraction by knowing the temporal ordering of relationship of people’s social networks were In- the intervention (Internet activities) between Internet ternet users and when the Internet and the outcomes (well-being), and by offered a wider variety of services.1 In controlling for initial well-being, this use and well- this research started in late 2000, 922 research design minimizes the possi- respondents from a national sample bility that self-selection to the interven- being requires of U.S. households were contacted us- tion and reverse causation account for an examination ing random digit dialing. In three sur- associations between Internet use and veys spread over a year, respondents changes in well-being. of the Internet’s described on multi-item scales how Our first research on this topic was myriad forms and frequently they used the Internet for conducted in 1997, the dawn of the In- different purposes: communicating ternet era for most people who did not differences among with friends and family; communi- work in universities or research labora- users that both cating in online groups and to meet tories.15 In this research, families with people; retrieving and using informa- high school-aged children were given drive online activity tion; seeking entertainment or es- computers and Internet access. Their and moderate its cape; shopping; and acquiring health email and Web use was monitored, information or talking about health. and family members completed three effects. Confirmatory factor analyses demon- surveys over one year period. Results strated that differentiating these types indicated the more people used the of Internet use fit the data better than Internet, independent of the way they a model that assumed use reflected a used it, the more their depression in- single dimension ranging from light creased and social support and other use to heavy use. measures of psychological well-being Figure 1 illustrates how Internet use declined. Although depressed people was associated with changes in well- typically engage in less social contact being. It shows changes in self-report- than do less-depressed people,18 initial ed depression over a six-month period depression was statistically controlled among people who initially had differ- in the analysis and Internet use was ent levels of social resources available measured subsequent to the initial and who used the Internet for differ- measures of depression; therefore ent purposes compared to people with variations in initial depression cannot an average amount of social support account for these results. Moreover, and of Internet use. The first point this research tested for and found no shows the change among people who evidence for reverse causation, where initially had average levels of social early measures of social support, de- support and who used the Internet an pression, loneliness, or stress predict- average amount across the different ed subsequent Internet use. purposes; it has been normalized to Even though this research differen- zero for comparison purposes. Point tiated asocial Internet use (Web brows- 2 shows the more people used the In- ing) from social uses (online commu- ternet overall, the more their self-re- nication via email and participation in ported levels of depression increased online groups), it did not differentiate compared to the base rate, although communication with stronger or weak- the increase was not statistically sig- er ties. Indeed, during this early era, nificant during the first six-month in- most participants’ close ties were not terval reported in Bessière1 (p < 0.10). yet online. Therefore, use of the Inter- However, additional data collection in net for any purpose, even highly social, a six-month follow-up with the same interpersonal communication, may respondents showed the increase in have presented an opportunity cost, depression with overall Internet use shifting people’s time and attention was statistically significant over the away from more fruitful offline com- year-long period (p < .02).2

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 97 review articles

Moreover, differentiating types of cated, and their extroversion. Com- ate a specific online communication Internet use and communication part- pared to those who initially had fewer episode they had participated in the ners clarified these results. Use of the social resources, people who initially previous day.9 Another possibility is a Internet for finding information, for had greater social resources offline and substitution effect: interactions with entertainment, or for commerce was used the Internet to communicate with weak ties reduced the amount of time not associated with greater changes in strangers reported larger increases in and attention people had available depression than the base rate (ps < .25), depression than one would expect sim- to spend on richer and more valuable but use for communication was. Be- ply from their social resources or their interactions with closer friends. This cause the online population had grown Internet use alone (compare point 6 explanation is consistent with the find- so much, the research could differenti- to points 3 and 5). In contrast, people ing that online communication with ate online communication with close who used the Internet to communi- strangers was associated with increas- ties from communication with weaker cate more with friends and family had es in depression most strongly among ties and strangers. More use of the In- decreases in depression regardless of individuals who had higher levels of of- ternet to communicate with weaker their initial social resources (compare fline social resources (social support) ties (in particular, to meet new people point 7 with points 4 and 5). at the start of the study. and to hang out in groups comprised This correlational data cannot de- Our most recent research assessing primarily of strangers) was associated termine why online communication how the Internet influences psycholog- with increases in depression com- with strangers and other weak ties ical well-being focused on the variety of pared to the base rate (see point 3). In was associated with increases in de- uses people make of online social net- contrast, more online communication pression while communication with work sites, in particular Facebook.4,6,7 with friends and family was associated friends and family was associated with As of 2012, two-thirds of American with declines in depression (point 4). declines in depression. One possibility Internet-using adults subscribed to an Moreover, interaction results suggest is that interactions with weak ties were online social network, with Facebook a substitution effect. Respondents re- simply less satisfying than interactions being the most popular. Approximately ported on their social resources on the with family or friends. This explana- 65% of Facebook’s 1.4 billion monthly initial questionnaire: their perceived tion is suggested by other research we active users worldwide visit the site on social support, the number of friends conducted, which asked survey respon- a typical day.11 with whom they regularly communi- dents in a national sample to evalu- The goal of this research was to as- sess how the well-being of Facebook Figure 2. Changes in self-reported social support over a month-long period associated with users changed with different kinds different uses of Facebook (left) and major life events (right). of Facebook use and interaction with different partners. To conduct this The y-axis represents change in social support for an average Facebook user. Each point on the left shows the mean change in social support analysis, we combined data from three and its standard error associated with a standard-deviation increase surveys administered one month apart in one of four uses of Facebook (communicating with strong ties, measuring social support, depression, communicating with weak ties, reading others’ broadcast content, and and other aspects of psychological well- broadcasting one’s own content). Each point on the right shows the association between changes in social support and major life events, being with de-identified, aggregated such as a death in the family or getting married (adapted from Burke.5) counts of Facebook activity. The Face- book use data consisted of counts from With Facebook Use With Major Life Events server logs of online activity (for ex- 0.10 ample, number of wall posts and com- ments posted and read, likes delivered and received, stories read, and photos 0.05 viewed); no content was analyzed. Re- spondents reported how close they felt to up to eight of their Facebook friends, 0.0 up to six of whom were close friends they identified (mean = 4.4 close ties)

Change in Social Support –0.05 and the remaining ones randomly se- lected from their Facebook friend net- works. They rated themselves as sub- –0.10 stantially closer to the ties they chose than to the randomly selected ties. Re- spondents’ ratings of closeness were used to train a linear regression model

Death (5) to estimate their tie strength with each Illness (9) Reading (3) Divorce (11) Divorce New Job (8) Marriage (6) Lost Job (10) Lost New Baby (7) of their approximately 130 Facebook 5 Broadcasting (4) friends (see Burke for details of the es- Weak-tie comm. (2) Weak-tie

Strong-tie comm. (1) timation procedures). The large sam- ple (N = 1,927 respondents communi-

98 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 review articles cating with over 2.4 million Facebook perceived social support (see point sistent with a hypothesis that commu- friends in total) and logs of partici- 1), as well as happiness, self-reported nicating online with close friends and pants’ activity on Facebook allow us to health, depression, loneliness, nega- family can have beneficial effects on differentiate types of communication: tive affect and stress (all ps < .05). psychological well-being as measured one-on-one exchanges such as wall Reverse causation cannot account by declines in depression, loneliness posts or comments that were targeted for these results; measures of initial and stress, and increases in perceived at the recipient; reading friends’ sta- psychological well-being did not pre- social support, mood, and life satisfac- tus updates that were broadcast widely dict changes in either respondents’ tion. In contrast, many other uses of rather than being tailored for a specific communication with strong ties or the Internet, including using the In- viewer; and broadcasting one’s own the ties’ communication directed at ternet for information, entertainment content, such as status updates, game respondents. In contrast to commu- and communicating online with weak- scores, and photos, out to a wider circle nication with strong ties, communica- er ties, do not have similar, positive of friends. Confirmatory factor analy- tion with weaker ties (point 2), reading associations with psychological well- ses demonstrated the validity of differ- content such as status updates that being. Indeed our earliest research entiating one-on-one communication were broadcast to a larger audience showed that communication with from broadcast communication. (point 3), or broadcasting one’s own strangers was associated with declines We hypothesized that one-on-one content (point 4) were not associated in psychological well-being. exchanges, by virtue of being tailored with these improvements in well-be- Methodologically, one should be for a single recipient, would be more ing. Moreover, the association of one- skeptical about conclusions drawn likely to increase social support, while on-one communication with improve- from cross-sectional research about reading broadcasts from a wider au- ments in well-being was stronger for its possible effects because preexisting dience and sending them out would more substantive communication (for differences in psychological well-being not, because they require less effort example, written comments) than for can shape how people use the Internet. per recipient and are likely to deal with stylized, low-effort communication Indeed, our research demonstrates less intimate topics than messages (for example, Facebook “likes”).7 different conclusions from cross-sec- meant for a single person. Using the The effect sizes were large in com- tional and longitudinal analyses of tie-strength model, we were also able parison to the effects of other life the same data.25 To better understand to distinguish whether the people they events. For example, receiving a stan- the relationship between Internet use interacted with on Facebook were close dard deviation more communication and well-being, one needs longitudi- friends or weaker acquaintances. Us- from strong ties—approximately 60 nal data and data that differentiates ing a tie-strength threshold of 5 on a more comments—was associated types of Internet use and classes of 7-point closeness scale to distinguish with increases in perceived social sup- communication partners. Preexisting close from weak ties, 39.4% of partici- port as large as those that occur fol- conditions like depression often cause pants’ ties were considered “strong” lowing a death in the family, a time withdrawal from social activities both and the median user had 38 strong ties when the bereaved receive an out- online and offline; however, the analy- (M = 47). The cutoff of 5 was both the pouring of support and condolences ses in this article control for respon- mean and median tie strength for the from others (compare points 1 and 5 dents’ initial well-being. Therefore dif- ties participants selected as very close in Figure 2). The effect size for strong- ferences in preexisting conditions like friends in the survey; however, the re- tie communication was also compa- depression and its association with sults we report here are substantively rable to the effect size for other major withdrawal from social interactions the same if we use a higher threshold life events, like getting married, hav- cannot account for the findings. We and consider close friends to be ones ing a new baby, or losing one’s job.6 caution, however, that despite the ad- with a threshold of 6 or 7 on the 7-point In contrast, after accounting for this vantages that derive from panel data, scale. Participants also reported recent strong-tie communication, well-being the analyses are correlational. There- major life events, such as the death of did not improve with other Facebook fore, it is possible that changes during a family member or losing a job. We activities, such as talking with weak the measurement period in some third used these events as control variables ties, reading friends’ broadcasts, or variable, like relocating, losing a job, in our models and as a baseline for un- broadcasting content oneself. or acquiring a serious illness, can shift derstanding what constitutes a mean- both how people use the Internet and ingful change in well-being. Summary and Conclusion their psychological well-being. Results paralleled our earlier stud- Does penetration of the Internet into Much has changed in the decades ies. The relationship between Face- people’s lives for connecting to other since we first began communicating book use and well-being depended people, finding information, and en- on the Internet, and so in addition on how people used the site and with tertainment have larger consequenc- to being skeptical of cross-sectional, whom they communicated. As shown es, beyond directly supporting these undifferentiated research, we must in Figure 2, receiving targeted, one- tasks? The lessons from this litera- continuously reevaluate the impact on-one communication such as pri- ture review are both substantive and the Internet has on our lives. For ex- vate messages, wall posts, or com- methodological. ample, ubiquitous access via smart- ments from one’s strong ties was In terms of substantive conclu- phones and other mobile devices has associated with improvements in sions, research reviewed here is con- changed social norms of connectiv-

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 99 review articles

social support. CyberPsychology & Behavior 5, 2 ity. The composition of our online References (2002), 157–171. 1. Bessière, K., Kiesler, S., Kraut, R. and Boneva, B. networks has expanded to include 25. Shklovski, I., Kraut, R. and Rainie, L. The Internet Effects of Internet use and social resources on and social participation: Contrasting cross-sectional grandparents and coworkers as well changes in depression. Information, Communication & and longitudinal analyses. J. Computer-Mediated Society 11, 1 (2008), 47–70. as many more strangers available on Communication 10, 1 (2004). 2. Bessière, K., Pressman, S., Kiesler, S. and Kraut, R.E. 26. Shklovski, I., Kiesler, S. and Kraut, R.E. The Internet Twitter and other social media. In Effects of Internet use on health and depression: A and social interaction: A meta-analysis and critique longitudinal study. J. Medical Internet Research 12, 1 addition to text that dominated early of studies, 199–2003. Computers, Phones, and the (2010), e6; http://www.jmir.org/2010/2011/e2016/HTML. Internet: Domesticating Information Technology. R.E. online interpersonal communica- 3. Bikson, T.K. and Eveland, J.D. The interplay of work Kraut, M. Brynin and S. Kiesler, (Eds). Oxford University group structures and computer support. Intellectual tion, the genres of online interactions Press, New York, 2006, 251–264. Teamwork: Social and Technological Foundations 27. Singer, J.D. and Willett, J.B. Applied Longitudinal Data of Cooperative Work. J. Galegher, R.E. Kraut and et have expanded to include game-play- Analysis. Oxford University Press, New York, 2003. al. (Eds). Englandiates, Inc., Hillsdale, NJ, 1990, 28. Steinfield, C., Ellison, N. and Lampe, C. Social capital, ing and the exchange of pictures and 245–290. self-esteem, and use of online social network sites: 4. Burke, M., Marlow, C. and Lento, T. Social network movies. Millennials have never known A longitudinal analysis. J. Applied Developmental activity and social well-being. In Proceedings of the Psychology 29, 6 (2008), 434–445. a world without the Internet. We can 28th International Conference on Human Factors in 29. Sutcliffe, A., Dunbar, R., Binder, J. and Arrow, H. Computing Systems. ACM Press, New York, 2010, see detailed histories of our online in- Relationships and the social brain: Integrating 1909-1012. psychological and evolutionary perspectives. British J. teractions with Facebook friends and 5. Burke, M. Reading, writing, relationships: The impact Psychology 103, 2 (2011), 149–168. of social network sites on relationships and well-being their relationships with others. There 30. Thaler, R. and Sunstein, C. Nudge: Improving Decisions (Ph.D. Thesis, 2011). Carnegie Mellon University, about Health, Wealth, and Happiness. Yale University are many opportunities to investigate Pittsburgh, PA. Press, New Haven, CT, 2008. 6. Burke, M., Kraut, R.E. and Marlow, C. Social capital how these changes affect our well- 31. Thoits, P.A. Mechanisms linking social ties and support on Facebook: Differentiating uses and users. In to physical and mental health. J. Health and Social being over the long term. However, Proceedings of CHI’2011: The ACM Conference on Behavior 52, 2 (2011), 145–161. Human Factors in Computing Systems. ACM, New our research over the past 15 years has 32. van den Eijnden, R.J.J.M., Meerkerk, G.J., Vermulst, York, 571-580. demonstrated that, much like offline A.A., Spijkerman, R. and Engels, R.C.M.E. Online 7. Burke, M. and Kraut, R. Using Facebook after losing communication, compulsive Internet use, and communication, the impact depends a job: Differential benefits of strong and weak ties. psychosocial well-being among adolescents: A In Proceedings of the 2013 Conference on Computer on the nature of the communication longitudinal study. Developmental Psychology 44, 3 Supported Cooperative Work. ACM, NY, 1419–1430. (2008), 655. and with whom we are talking. 8. Callaghan, P. and Morrissey, J. Social support and 33. Walther, J.B. Interpersonal effects in computer- health: A review. J. Advanced Nursing 18, 2 (2008), mediated interaction: A relational perspective. If use of the Internet does cause 203–210. Communication Research 19, 1 (1992), 52–90. 9. Cummings, J., Butler, B. and Kraut, R. The quality of changes in psychological well-being, online social relationships. Commun. ACM 45, 7 (July what is the nature of these “Internet 2002), 103–108. 10. Ellison, N.B., Steinfield, C. and Lampe, C. The benefits effects”? Although evidence suggests of Facebook ‘friends:’ Social capital and college Recommended Reading the Internet, like print media and tele- students’ use of online social network sites. J. Burke, M. and Kraut, R. Using Facebook Computer-Mediated Communication 12, 4 (2007), vision before it, seems to have identi- 143–1168. after losing a job: Differential benefits fiable effects on psychological well- 11. Facebook, Inc. Company info. Facebook, Menlo Park, of strong and weak ties. In CSCW’13: being, these associations do not imply CA, 2015. Proceedings of the 2013 Conference on 12. Hampton, K. and Wellman, B. Neighboring in Netville: Computer Supported Cooperative Work. a strong technological determinism. How the Internet supports community and social capital in a wired suburb. City & Community 2, 4 ACM, NY, 2013, 1419–1430. Human agency is key, because the (2003), 277–311. technological effects depend upon 13. Harmon, A. Sad, lonely world discovered in cyberspace. Ellison, N.B., Steinfield, C. and Lampe, C. New York Times, (Aug. 30. 1998), section 1, 1. The benefits of Facebook ‘friends:’ Social how people decide to use technol- 14. Katz, J.E. and Rice, R.E. Social Consequences of capital and college students’ use of online ogy. However, as with other daily-life Internet Use: Access, Involvement, and Interaction. MIT Press, Cambridge, MA, 2002. social network sites. J. Computer-Mediated activities, the way choices are framed 15. Kraut, R.E., Patterson, M., Lundmark, V., Kiesler, S., Communication 12, 4 (2007), 143–1168. and the effort involved in engaging Mukhopadhyay, T. and Scherlis, W. Internet paradox: A social technology that reduces social involvement Kraut, R.E. et al. Internet paradox: A social in the activities are likely to bias the and psychological well-being? American Psychologist technology that reduces social involvement choices people make (see Thaler30 53, 9 (1998), 1017–1031. and psychological well-being? American 16. Marche, S. Is Facebook making us lonely? The Psychologist 53, 9 (1998), 11017–1031. for a fuller presentation of this ar- Atlantic (May 2012). 17. McKenna, K., Green, A.S. and Gleason, M. Relationship Sutcliffe, A. et al. Relationships and the gument). Even though people seem formation on the Internet: What’s the big attraction? willing to invest more effort to com- J. Social Issues 58, 1 (2002), 9–31. social brain: Integrating psychological municate with closer ties than weaker 18. Mikami, A.Y., Szwedo, D.E., Allen, J.P., Evans, M.A. and and evolutionary perspectives. British J. Hare, A.L. Adolescent peer relationships and behavior Psychology 103, 2 (2011), 149–168. ties and receive more benefits from problems predict young adults’ communication 22 on social networking websites. Developmental interactions with them, modern in- Psychology 46, 1 (2010), 46–56. Robert Kraut ([email protected]) is the Herbert A. 19. Pope Benedict XVI. Truth, proclamation and formation technology can change the Simon Professor of Human-Computer Interaction, in the authenticity of life in the digital age. June 5, 2011. Human-Computer Interaction Institute, Carnegie Mellon effort needed to keep up with distant 20. Raine, L., Boase, J., Horrigan, J. B. and Wellman, B. University, Pittsburgh, PA. friends, to meet and have discussions The Strength of Internet Ties. Pew Internet and American Life Project, Washington, D.C., 2006 Moira Burke ([email protected]) is a data scientist at with strangers, to have rich commu- 21. Rains, S.A. and Young, V. A meta-analysis of research on Facebook, Menlo Park, CA. nication with specific ties or to have formal computer-mediated support groups: Examining group characteristics and health outcomes. Human superficial interactions with acquain- Communication Research 35, 3 (2009), 309–336. tances. These changes in effort can 22. Roberts, S.G. and Dunbar, R.I. The costs of family and friends: An 18-month longitudinal study of relationship shift how people spend their time, maintenance and decay. Evolution and Human which relationships they retain over Behavior 32, 3 (2011), 186–197. 23. Selfhout, M.H., Branje, S.J., Delsing, M., ter Bogt, T.F. time, and what they talk about. Thus, and Meeus, W.H. Different types of Internet use, depression, and social anxiety: The role of perceived it is not the use of technology per se, friendship quality. J. Adolescence 32, 4 (2009), but these decisions, which can be 819–833. 24. Shaw, L.H. and Gant, L.M. In defense of the Internet: biased by technology, which directly The relationship between Internet communication Copyright held by authors. influence psychological well-being. and depression, loneliness, self-esteem, and perceived Publication rights licensed to ACM. $15.00.

100 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 research highlights

P. 102 P. 103 Technical What Makes Paris Perspective Paris Beyond Look Like Paris? Frommer’s By Carl Doersch, Saurabh Singh, Abhinav Gupta, Josef Sivic, and Alexei A. Efros By Noah Snavely

P. 111 P. 112 Technical NoDB: Efficient Query Perspective In-Situ Database Execution on Raw Data Files Management By loannis Alagiannis, Renata Borovica-Gajic, Miguel Branco, Stratos Idreos, and Anastasia Ailamaki By David Maier

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 101 research highlights

DOI:10.1145/2830538 To view the accompanying paper, Technical Perspective visit doi.acm.org/10.1145/2830541 rh Paris Beyond Frommer’s By Noah Snavely

TRY TO VISUALIZE PARIS. If you are like me, Instead, the authors start from you will imagine a tourist-guidebook a crucial insight: what makes Paris composite—the Eiffel Tower lit up at Can a computer look like Paris is not necessarily im- night, the Notre Dame Cathedral, the observe a city age patches that commonly appear in bridges over the Seine, and so on. But Paris, but instead those patches that this is not what most of Paris looks and discern its appear in Paris but nowhere else— like. It turns out that many people visual essence? in other words, patches that distin- can look at a photograph of any ran- guish Paris from all other cities. Us- domly selected street corner in Paris, ing a new discriminative clustering and can correctly identify the city with technique, they show how they can high accuracy, even without paying at- automatically identify such distinc- tention to any text in the photo. One tive patches. The discovered patches must conclude that all of Paris is in- they show are remarkably evocative fused with a “Paris-ness”—a certain je of Paris, capturing its unique balco- ne sais quoi—that leaves an indelible on sites like Flickr. However, in the nies, signs, light posts, and other ele- visual mark on the City of Light. case of a city like Paris, this approach ments. When I first saw these results How can we quantify this Paris- would result in a representation of at SIGGRAPH, it immediately struck ness? Can a computer automatically Paris akin to the collage of landmarks me they were on to something new discover and tell us what makes Paris of my own tourist-centered imagi- and important, and showed some- look so much like Paris? More broad- nation, because this is how Paris is thing that no previous visual cluster- ly, this question of visual style is an represented in photos shared online. ing method had shown. And many important one in computer graph- To be sure, these landmarks are im- other interesting insights follow— ics and vision. Identifying the key portant, but to capture the real es- such as the fact that U.S. cities are elements that characterize a style— sence of Paris we need to look further. pretty similar to each other, but are whether a style of interior design, This work instead turns to Google distinctive from cities on the Conti- art, or, in the case of a city, its archi- Street View to gather many thousands nent in that they are filled with cars. tecture and ornamentations—could of photos captured systematically Using large image collections for aid in a range of applications, such throughout the city from cars—the computer vision and graphics is by as obtaining reference imagery for a kind of views you would see on a stroll now a tried-and-true approach. Past new design task, or for summarizing down the street. Hence, the visual ele- work has used photos mined from or categorizing the look of a large set ments discovered in this work are the the Internet to train better object of images. sort you would encounter in the city recognition systems or build 3D However, identifying and charac- every day, perhaps not even noticing models. But it is exciting to see the terizing visual style automatically is a they make up its visual fabric. fresh and innovative use of big data very challenging problem, one that is But given a large, representative presented here. There is something difficult even to formulate in a rigor- set of images of a place, how could magical about automatically distill- ous way. This is where the following we use them to compute these visual ing the visual signature of a place— paper steps in. This work, and several elements? One approach would be signatures we all can sense but cannot companion papers in computer vi- to break each image up into small easily articulate. And more broadly, sion, offers a creative, inspiring new patches (say, about the size of a door this work represents an exciting new approach to discovering the visual or window), group these patches by direction in discovering visual styles style of a city like Paris. The authors visual similarity using a clustering al- from big data. achieve this feat through new algo- gorithm, and then identify the most rithms that analyze massive collec- common clusters as the key visual el- Noah Snavely ([email protected]) is an associate professor in the computer science department at Cornell tions of photos of Paris and other ements. But, as this work shows, this University, Ithaca, NY, where he works in the graphics and places around the world. approach does not work very well— vision group and works on the technical staff at Google. One key aspect of this visual discov- even if you account for uninteresting ery problem is starting with the right patches such as those in the sky, the image data. A popular approach in results of standard clustering meth- computer vision is to mine data from ods are unremarkable patches such a large set of consumer photos shared as edges and corners. Copyright held by author.

102 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 DOI:10.1145/2830541 What Makes Paris Look Like Paris? By Carl Doersch, Saurabh Singh, Abhinav Gupta, Josef Sivic, and Alexei A. Efros

Abstract given tens of thousands of geo-localized images of some Given a large repository of geo-tagged imagery, we seek to geographic region R, we aim to find a few hundred visual automatically find visual elements, for example windows, elements that are both: (1) repeating, that is, they occur balconies, and street signs, that are most distinctive for a often in R, and (2) geographically discriminative, that is, certain geo-spatial area, for example the city of Paris. This they occur much more often in R than in RC. Figure 1 shows is a tremendously difficult task as the visual features dis- sample output of our algorithm: for each photograph we tinguishing architectural elements of different places can show three of the most geo-informative visual elements that be very subtle. In addition, we face a hard search problem: were automatically discovered. For the Paris scene (left), the given all possible patches in all images, which of them are street sign, the window with railings, and the balcony sup- both frequently occurring and geographically informative? port are all flagged as informative. To address these issues, we propose to use a discriminative But why is this topic important for modern computer clustering approach able to take into account the weak geo- graphics? (1) Scientifically, the goal of understanding graphic supervision. We show that geographically represen- which visual elements are fundamental to our perception tative image elements can be discovered automatically from of a complex visual concept, such as a place, is an interest- Google Street View imagery in a discriminative manner. We ing and useful one. Our paper shares this motivation with a demonstrate that these elements are visually interpretable number of other recent works that do not actually synthe- and perceptually geo-informative. The discovered visual ele- size new visual imagery, but rather propose ways of find- ments can also support a variety of computational geogra- ing and visualizing existing image data in better ways, be it phy tasks, such as mapping architectural correspondences selecting candid portraits from a video stream,5 summariz- and influences within and across cities, finding representa- ing a scene from photo collections,19 finding iconic images tive elements at different geo-spatial scales, and geographi- of an object,1 etc. (2) More practically, one possible future cally informed image retrieval. application of the ideas presented here might be to help CG modelers by generating the so-called “reference art” for a city. For instance, when modeling Paris for Pixar’s 1. INTRODUCTION Ratatouille, the co-director Jan Pinkava faced exactly this Consider the two photographs in Figure 1, both down- problem: “The basic question for us was: ‘what would Paris loaded from Google Street View. One comes from Paris, look like as a model of Paris?’, that is, what are the main the other one from London. Can you tell which is which? things that give the city its unique look?”14 Their solution Surprisingly, even for these nondescript street scenes, peo- was to “run around Paris for a week like mad tourists, just ple who have been to Europe tend to do quite well on this looking at things, talking about them, and taking lots of task. In an informal survey, we presented 11 subjects with pictures” not just of the Eiffel Tower but of the many stylis- 100 random Street View images of which 50% were from tic Paris details, such as signs, doors, etc.14 (see photos on Paris, and the rest from eleven other cities. We instructed pp. 120–121). But if going “on location” is not feasible, our the subjects (who have all been to Paris) to try and ignore approach could serve as basis for a detail-centric reference any text in the photos, and collected their binary forced- art retriever, which would let artists focus their attention choice responses (Paris/Not Paris). On average, subjects on the most statistically significant stylistic elements of were correct 79% of the time (std = 6.3), with chance at 50% the city. (3) And finally, more philosophically, our ultimate (when allowed to scrutinize the text, performance for some goal is to provide a stylistic narrative for a visual experience subjects went up as high as 90%). What this suggests is that of a place. Such narrative, once established, can be related people are remarkably sensitive to the geographically infor- to others in a kind of geo-cultural visual reference graph, mative features within the visual environment. But what highlighting similarities and differences between regions. are those features? In informal debriefings, our subjects For example, one could imagine finding a visual appear- suggested that for most images, a few localized, distinctive ance “trail” from Greece, through Italy and Spain and into elements “immediately gave it away.” For example for Paris, Latin America. In this work, we only take the first steps in things like windows with railings, the particular style of bal- this direction—connecting visual appearance across cit- conies, the distinctive doorways, the traditional blue/green/ ies, finding similarities within a continent, and differences white street signs, etc. were particularly helpful. Finding between neighborhoods. But we hope that our work might those features can be difficult though, since every image act as a catalyst for research in this new area, which might can contain more than 25,000 candidate patches, and only be called computational geo-cultural modeling. a tiny fraction will be truly distinctive. In this work, we want to find such local geo-informative The original version of this paper was published in SIG- features automatically, directly from a large database of pho- GRAPH, 2012. tographs from a particular place, such as a city. Specifically,

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 103 research highlights

Figure 1. These two photos might seem nondescript, but each contains hints about which city it might belong to. Given a large image database of a given city, our algorithm is able to automatically discover the geographically informative elements (patch clusters to the right of each photo) that help in capturing its “look and feel.” On the left, the emblematic street sign, a balustrade window, and the balcony support are all very indicative of Paris, while on the right, the neoclassical columned entryway sporting a balcony, a Victorian window, and, of course, the cast-iron railings are very much features of London.

2. PRIOR WORK visual elements automatically from a large online image In the field of architectural history, descriptions of urban dataset. Not only are the resulting visual elements geo- and regional architectural styles and their elements are well graphically discriminative (i.e., they occur only in a established. Such local elements and rules for combining given locale), but they also typically look meaningful to them have been used in computer systems for procedural humans, making them suitable for a variety of geo-data modeling of architecture to generate 3D models of entire visualization applications. The next section describes cities in an astonishing level of detail, for example, Mueller the data used in this work, followed by the full descrip- et al.,12 or to parse images of facades, for example, Teboul tion of our algorithm. et al.22 However, such systems require significant manual effort from an expert to specify the appropriate elements 3. THE DATA and rules for each architectural style. Flickr has emerged as the data-source of choice for most At the other end of the spectrum, data-driven approaches recently developed data-driven applications in computer have been leveraging the huge datasets of geo-tagged vision and graphics, including visual geo-location.2, 6, 11 images that have recently become available online. For However, the difficulty with Flickr and other consumer example, Crandall et al.2 use the GPS locations of 35,000 photo-sharing websites for geographical tasks is that there consumer photos from Flickr to plot photographer- is a strong data bias toward famous landmarks. To correct defined frequency maps of cities and countries. Geo- for this bias and provide a more uniform sampling of the tagged datasets have also been used for place recognition8, 17 geographical space, we turn to Google Street View—a including famous landmarks.10, 11 Our work is particularly huge database of street-level imagery, captured as panora- related to Schindler et al.17 and Knopp et al.,8 where mas using specially designed vehicles. This enables extrac- geo-tags are also used as a supervisory signal to find sets tion of roughly fronto-parallel views of building facades of image features discriminative for a particular place. and, to some extent, avoids dealing with large variations of While these approaches can work very well, their image camera viewpoint. features typically cannot generalize beyond matching Given a geographical area on a map, we automatically specific buildings imaged from different viewpoints. scrape a dense sampling of panoramas of that area from Alternatively, global image representations from scene Google Street View. From each panorama, we extract recognition, such as GIST descriptor13 have been used two perspective images (936 × 537 pixels), one on each for geolocalization of generic scenes on the global Earth side of the capturing vehicle, so that the image plane is scale.6, 7 There, too, reasonable recognition performance roughly parallel to the vehicle’s direction of motion. This has been achieved, but the use of global descriptors results in approximately 10,000 images per city. For this makes it hard for a human to interpret why a given image project, we downloaded 12 cities: Paris, London, Prague, gets assigned to a certain location. Barcelona, Milan, New York, Boston, Philadelphia, San Finally, our paper is related to a line of work on unsu- Francisco, San Paulo, Mexico City, and Tokyo. pervised object discovery16, 20 (and especially Quack et al.,15 who also deal with mining geo-tagged image data). Such 4. DISCOVERING GEO-INFORMATIVE ELEMENTS methods attempt to explicitly discover features or objects Our goal is to discover visual elements which are char- which occur frequently in many images and are also useful acteristic of a given geographical locale (e.g., the city of as human-interpretable elements of visual representation. Paris). That is, we seek patterns that are both frequently But being unsupervised, these methods are limited to only occurring within the given locale, and geographically dis- discovering things that are both very common and highly criminative, that is, they appear in that locale and do not visually consistent. appear elsewhere. Note that neither of these two require- In contrast, here we propose a discovery method ments by itself is enough: sidewalks and cars occur fre- that is weakly constrained by location labels derived quently in Paris but are hardly discriminative, whereas from GPS tags, and which is able to mine representative the Eiffel Tower is very discriminative, but too rare to be

104 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12

useful (<0.0001% in our data). In this work, we will repre- 4.1. Our approach sent visual elements by square image patches at various From the tens of millions of patches in our full posi- resolutions, and mine them from our large image data- tive set, we randomly sample a subset of 25,000 high- base. The database will be divided into two parts: (i) the contrast patches to serve as candidates for seeding the positive set containing images from the location whose clusters. Throughout the algorithm, we represent such visual elements we wish to discover (e.g., Paris); and (ii) patches using a HOG+color descriptor. First, the initial the negative set containing images from the rest of the geo-­informativeness of each patch is estimated by find- world (in our case, the other 11 cities in the dataset). We ing the top 20 NN patches in the full dataset (both posi- assume that many frequently occurring but uninteresting tive and negative), measured by normalized correlation, visual patterns (trees, cars, sky, etc.) will occur in both the and counting how many of them come from Paris. Figure positive and negative sets, and should be filtered out. Our 3 shows NNs for a few randomly selected patches and for biggest challenge is that the overwhelming majority of the patches whose neighbors all come from Paris. Note our data is uninteresting, so matching the occurrences of that the latter patches are not only more Parisian, but the rare interesting elements is like finding a few needles also considerably more coherent. This is because gen- in a haystack. erating a coherent cluster is a prerequisite to retriev- One possible way to attack this problem would be to ing matches exclusively from Paris: any patch whose first discover repeated elements and then simply pick the matches are incoherent will likely draw those matches ones which are the most geographically discriminative. randomly from inside and outside Paris. We keep the A standard technique for finding repeated patterns in data candidate patches that have the highest proportion of is clustering. For example, in computer vision, “visual their NNs in the positive set, while also rejecting near- word” approaches21 use k-means clustering on image duplicate patches (measured by spatial overlap of more patches represented by SIFT descriptors. Unfortunately, than 30% between any 5 of their top 50 NNs). This reduces standard visual words tend to be dominated by low-level the number of candidates to about 1000. features, like edges and corners (Figure 2a), not the larger Some good elements, however, get matched incorrectly visual structures we are hoping to find. While we can during the nearest-neighbors phase. Figure 4 shows a try clustering using larger image patches (with a higher- patch that contains both a street sign and a vertical bar on dimensional feature descriptor, such as HOG3), k-means the right (the end of the facade). The naïve distance metric behaves poorly in very high dimensions, producing visually does not know what is important, and so it tries to match inhomogeneous clusters (Figure 2b). We believe this hap- both. Yet too few such patches exist in the dataset; for the pens because k-means (and similar approaches) partition remainder, the algorithm matches the vertical bar simply the entire feature space. This tends to lose the needles in because it is more frequent. To fix this problem, we aim our haystack: the rare discriminative elements get mixed to learn a distance metric that gives higher weight to the with, and overwhelmed by, less interesting patches, features that make the patch geo-discriminative. making it unlikely that a distinctive element could ever Recently, Shrivastava et al.18 showed how one can emerge as its own cluster. improve visual retrieval by adapting the distance metric to In this article, we propose an approach that avoids par- the given query using discriminative learning. We adopt titioning the entire feature space into clusters. Instead, we similar machinery, training a linear SVM detector for each start with a large number of randomly sampled candidate visual element in an iterative manner as in Singh et al.20 patches, and then give each candidate a chance to see if Unlike these previous works, however, we emphasize that it can converge to a cluster that is both frequent and dis- criminative. We first compute the nearest neighbors (NNs) Figure 3. Left: Randomly sampled candidate patches and their nearest of each candidate, and reject candidates with too many neighbors according to a standard distance metric. Right: After neighbors in the negative set. Then we gradually build sorting the candidates by the number of retrieved neighbors that come from Paris, coherent Parisian elements have risen to the top. clusters by applying iterative discriminative learning to each surviving candidate. The following section presents the details of this algorithm. Patch Nearest neighbors Patch Nearest neighbors

Figure 2. (a) k-Means clustering using SIFT (visual words) is dominated by low-level features. (b) k-Means clustering over higher-dimensional HOG features produces visually incoherent clusters.

(a) (b)

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 105 research highlights

the weak labels are the workhorse of the distance learn- in Paris). We return the top few hundred detectors as our ing. In the case of Figure 4, for example, we know that the geo-informative visual elements. street sign is more important because it occurs only in Figure 5 illustrates the progression of these iterations. Paris, whereas the vertical bar occurs everywhere. We For example, in the left column, the initial NNs contain train an SVM detector for each visual element, using the only a few windows with railings. However, windows with top k NNs from the positive set as positive examples, and railings differ more from the negative set than the win- all negative-set patches as negative examples. While this dows without railings; thus the detector quickly becomes produces a small improvement (Figure 5, row 2), it is not more sensitive to them as the algorithm progresses. The enough, since the top k matches might not have been very rightmost example does not appear to improve, either good to begin with. So, we iterate the SVM learning, using in visual similarity or in geo-discriminativeness. This is the top k detections from previous round as positives (we because the original candidate patch was intrinsically set k = 5 for all experiments). The idea is that with each not very geo-informative and would not make a good round, the top detections will become better and ­better, visual element. Such patches have a low final accuracy resulting in a continuously improving detector. However, and are discarded. doing this directly would not produce much improvement Implementation Details: Our current implementation because the SVM tends to over-fit to the initial positive considers only square patches (although it would not be examples Singh et al.,20 and will prefer them in each next difficult to add other aspect ratios), and takes patches at round over new (and better) ones. Therefore, we apply scales ranging from 80-by-80 pixels all the way to height- cross-validation by dividing both the positive and the of-image size. Patches are represented with standard negative parts of the dataset into l equally sized subsets HOG3 (8 × 8 × 31 cells), plus a 8 × 8 color image in L*a*b (we set l = 3 for all experiments). At each iteration of the colorspace (a and b only). Thus the resulting feature has 8 training, we apply the detectors trained on the previous × 8 × 33 = 2112 dimensions. During iterative learning, we round to a new, unseen subset of data to select the top k use a soft-margin SVM with C fixed to 0.1. The full min- detections for retraining. In our experiments, we used ing computation is quite expensive; a single city requires three iterations, as most good clusters did not need more approximately 1800 CPU-hours. But since the algorithm is to converge (i.e., stop changing). After the final iteration, we highly parallelizable, it can be done overnight on a cluster. rank the resulting detectors based on their accuracy: per- centage of top 50 firings that are in the positive dataset (i.e., 4.2. Results and validation Figure 6 shows the results of running our algorithm on Figure 4. Top: Using the naïve distance metric for this patch several well-known cities. For each city, the left column retrieves some good matches and some poor matches, because the shows randomly chosen images from Google Street View, patch contains both a street sign and a vertical bar on the right. while the right column shows some of the top-ranked Bottom: Our algorithm reweights the dimensions of our patch descriptor to separate Paris from non-Paris. The algorithm learns visual element clusters that were automatically discov- that focusing on the street sign achieves maximum separation from ered (due to space limitations, a subset of elements was the non-Paris walls. selected manually to show variety; see the project webpage for the full list). Note that for each city, our visual elements PatchMWeight atches convey a better stylistic feel of the city than do the random images. For example, in Paris, the top-scoring elements zero-in on some of the main features that make Paris look like Paris: doors, balconies, windows with railings, street Patch Weight Matches signs and special Parisian lampposts. It is also interest- ing to note that, on the whole, the algorithm had more trouble with American cities: it was able to discover only a few geo-informative elements, and some of them turned

Figure 5. Steps of our algorithm for three sample candidate patches in Paris. The first row: initial candidate and its NN matches. Rows 2–4: Iterations of SVM learning (trained using patches on left). Red boxes indicate matches outside Paris. Rows show every 7th match for clarity. Notice how the number of not-Paris matches decreases with each iteration, except for rightmost cluster, which is eventually discarded.

Input Matches Input Matches Input Matches kNN kNN kNN . 1 . 1 . 1 iter iter iter . 2 . 2 . 2 iter iter iter .3 . 3 . 3 iter iter iter

106 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12

Figure 6. Google Street View versus geo-informative elements for six cities. Arguably, the geo-informative elements (right) are able to provide better stylistic representation of a city than randomly sampled Google Street View images (left).

Random Images for Paris Street-view Extracted Visual Elements from Paris Random Images for Prague Street-view Extracted Visual Elements from Prague

Random Images for London Street-view Extracted Elements from London Random Images for Barcelona Street-view Extracted Elements from Barcelona

Random Images for San Francisco (SF) Extracted Elements from SF Random Images for Boston Extracted Elements from Boston

out to be different brands of cars, road tunnels, etc. This In addition to the qualitative results, we would also might be explained by the relative lack of stylistic coher- like to provide a more quantitative evaluation of our ence and uniqueness in American cities (with its melting algorithm. While validating data-mining approaches is pot of styles and influences), as well as the supreme reign difficult in general, it is possible to measure (1) to what of the automobile on American streets. extent our elements are specific to particular locations,

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 107 research highlights

and (2) do users find them subjectively geo-informative 5.1. Mapping patterns of visual elements in a visual discrimination task? So far, we have shown the discovered visual elements for To evaluate how geo-informative our visual elements are, a given city as an ordered list of patch clusters (Figure 6). we ran the top 100 Paris element detectors over an unseen Given that we know the GPS coordinates of each patch, dataset which was 50% from Paris and 50% from elsewhere. however, we could easily display them on a map, and then For each element, we found its geo-­informativeness by com- search for interesting geo-spatial patterns in the occur- puting the percentage of the time it fired in Paris out of the rences of a given visual element. Figure 7 shows the geo- top 100 firings. The average accuracy of our top detectors graphical locations for the top-scoring detections for each was 83% (where chance is 50%). We repeated this for our top of three different visual ­elements (a sampling of detec- 100 Prague detectors, and found the average accuracy on an tions are shown below each map), revealing interestingly unseen dataset of Prague to be 92%. non-uniform distributions. For example, it seems that bal- Next, we repeated the above experiment with people­ conies with cast-iron railings (left) occur predominantly rather than computers. To avoid subject fatigue, we on the large thoroughfares (bd Saint-Michel, bd Saint- reduced the dataset to 100 visual elements, 50 from Paris Germain, rue de Rivoli), whereas windows with cast-iron and 50 from Prague. Fifty percentage of the elements railings (middle) appear mostly on smaller streets. The were the top-ranked ones returned by our algorithm for arch-supporting column (right) is a distinguishing feature Paris and Prague. The other 50% were randomly sam- of the famous Place des Vosges, yet it also appears in other pled patches of Paris and Prague (but biased to be high- parts of Paris, particularly as part of more recent Marché contrast, as before, to avoid empty sky patches, etc.). In Saint-Germain (this is a possible example of the so-called a web-based study, subjects (who have all been to Paris “architectural citation”). Automatically discovering such but not necessarily Prague) were asked to label each of architectural patterns may be useful to both architects and the 100 patches as belonging to either Paris or Prague urban historians. (forced choice). The results of our study (22 naive sub- jects) are as follows: average classification performance 5.2. Visual correspondences across cities for the algorithm-selected patches was 78.5% (std = 11.8), Given a set of architectural elements (windows, balconies, while for random patches it was 58.1% (std = 6.1); the etc.) discovered for a particular city, it is natural to ask what p-value for a paired-samples t-test was <10−8. While on these same elements might look like in other cities. As it random patches subjects did not do much better than turns out, a minor modification to our algorithm can often chance, performance on our geo-informative elements accomplish this task. We have observed that a detector was roughly comparable to the much simpler full-image for a location-specific architectural element will often fire classification task reported in the beginning of the paper on functionally similar elements in other cities, just with (although since here we only used Prague, the setups are a much lower score. That is, a Paris balcony detector will not quite the same). return mostly London balconies if it is forced to run only on London images. Naturally these results will be noisy, but we 5. APPLICATIONS can clean them up using an iterative learning approach sim- Now that we have a tool for discovering geographically ilar to the one in Section 4.1. The only difference is that we informative visual elements for a given locale, we can use require the positive patches from each iteration of training them to explore ways of building stylistic narratives for cit- to be taken not just from the source city, but from all the cit- ies and of making visual connections between them. Here ies where we wish to find correspondences. For example, to we discuss just a few such directions. find correspondences between Paris, Prague, and London,

Figure 7. Examples of geographic patterns in Paris (shown as red dots on the maps) for three discovered visual elements (shown below each map). Balconies with cast-iron railings are concentrated on the main boulevards (left). Windows with railings mostly occur on smaller streets (middle). Arch supporting columns are concentrated on Place des Vosges and the St. Germain market (right).

Map data © OpenStreetMap contributors, CC BY-SA

Place des Vosges St. Germain market

108 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12

Figure 8. Visual correspondence. Each row shows corresponding detections of a single visual element detector across three different cities.

Paris, France Prague, Czech Republic London, England

we initialize with visual elements discovered in Paris and Figure 9. Object-centric image averages for the element then, at each round of “clean-up” training, we use nine top detector in the top row of Figure 8. Note how the context captures positive matches to train each element SVM, three from the differences in facade styles between Paris (left) and London each of the three cities. Figure 8 illustrates the result of (right). this procedure. Note how capturing the correspondence between similar visual elements across cities can often high- light certain stylistic differences, such as the material for the balconies, the style of the street-lamps, or the presence and position of ledges on the facades.

5.3. Visualizing facade layout Another interesting observation is that some discovered visual elements, despite having a limited spatial extent, can often encode a much larger architectural context. This becomes particularly apparent when looking at the same visual element detector applied in different cit- famous landmarks (e.g., the Eiffel Tower), but largely ies. Figure 9 shows object-centric averages (in the style on a set of stylistic elements, the visual minutiae of of Torralba and Oliva23) for the detector in the top row of daily urban life. We proposed a method that can auto- Figure 8 for Paris and London. That is, for each city, the matically find a subset of such visual elements from a images with the top 100 detections of the element are large dataset offered by Google Street View, and dem- first centered on that element and then averaged together onstrated some promising applications. This work is in image space. Note that not only do the average detec- but a first step toward our ultimate goal of providing tions (red squares) look quite different between the two stylistic narratives to explore the diverse visual geogra- cities, but the average contexts reveal quite a lot about phies of our world. Currently, the method is limited to the differences in the structure and style of facades. In discovering only local elements (image patches), so a Paris, one can clearly see four equal-height floors, with a logical next step would be trying to capture larger struc- balcony row on the third floor. In London, though, floor tures, both urban (e.g., facades), as well as natural (e.g., heights are uneven, with the first floor much taller and fields, rivers). Finally, the proposed algorithm is not more stately. limited to geographic data. Figure 10 shows promising results for mining discriminative patches on indoor 6. CONCLUSION scenes, and cars, suggesting that visual elements can So, what makes Paris look like Paris? We argued that be a useful tool for exploring a wide variety of image the “look and feel” of a city rests not so much on the few data domains.

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 109 research highlights

Figure 10. Our algorithm applied to other data sources. Top: Elements for indoor scenes, where the weak label is 1 of 67 indoor scene categories.4, 20 Bottom: Stylistic elements that differentiate cars from different decades.9 Indoor scenes: Church: Bedroom: Bathroom:

Office: Casino:

Closet: Shoe store:

Cars over time: 1920s: 1940s: 1960s: 1980s:

21. Sivic, J., Zisserman, A. Video google: shape priors. In IEEE Conference References A text retrieval approach to object on Computer Vision and Pattern 1. Berg, T., Berg, A. Finding iconic Conference on Computer Vision matching in videos. In IEEE 9th Recognition (CVPR) (2010), IEEE, images. In The 2nd Internet Vision (ECCV) (2008), Springer, 427–440. International Conference on Computer 3105–3112. Workshop at Conference on Computer 11. Li, Y., Crandall, D., Huttenlocher, D. Vision (ICCV) (2003), IEEE, 1470–1477. 23. Torralba, A., Oliva, A. Statistics of Vision and Pattern Recognition (CVPR) Landmark classification in large- 22. Teboul, O., Simon, L., Koutsourakis, P., natural image categories. Netw. (2009), IEEE, 1–8. scale image collections. In IEEE Paragios, N. Segmentation of Comput. Neural Syst. 14, 3 (2003), 2. Crandall, D., Backstrom, L., 12th International Conference on building facades using procedural 391–412. Huttenlocher, D., Kleinberg, J. Computer Vision (ICCV) (2009), Mapping the world’s photos. IEEE, 1957–1964. In Proceedings of the 18th 12. Mueller, P., Wonka, P., Haegler, S., Carl Doersch ([email protected]), Josef Sivic ([email protected]), Computer International Conference on Ulmer, A., Van Gool, L. Procedural Machine Learning Department, Carnegie Science Department, INRIA/Ecole World Wide Web (WWW) (2009), modeling of buildings. ACM Trans. Mellon University, Pittsburgh, PA. Normale Supérieure, Paris, France. 761–770. Graph. (SIGGRAPH) 25, 3 (2006), 3. Dalal, N., Triggs, B. Histograms 614–623. Saurabh Singh (saurabh.me@gmail. Alexei A. Efros ([email protected]. of oriented gradients for human 13. Oliva, A., Torralba, A. Building the com), Computer Science Department, edu), Electrical Engineering and Computer detection. In IEEE Conference gist of a scene: The role of global University of Illinois, Urbana-Champaign, Science (EECS) Department, University of on Computer Vision and Pattern image features in recognition. Prog. Champaign, IL. California, Berkeley, Berkeley, CA. Recognition (CVPR). Volume 1 (2005), Brain Res. 155 (2006), 23–36. IEEE, 886–893. 14. Paik, K. The Art of Ratatouille. Abhinav Gupta ([email protected]. 4. Doersch, C., Gupta, A., Efros, A.A. Chronicle Books, 2006. edu), Robotics Institute, Carnegie Mellon Mid-level visual element discovery 15. Quack, T., Leibe, B., Van Gool, L. University, Pittsburgh, PA. as discriminative mode seeking. World-scale mining of objects and In Advances in Neural Information events from community photo Processing Systems (NIPS). collections. In Proceedings of Volume 26 (2013), 494–502. the International Conference on 5. Fiss, J., Agarwala, A., Curless, B. Content-based Image and Watch the authors discuss Candid portrait selection from video. Video Retrieval (CIVR) (2008), their work in this exclusive ACM Trans. Graph. (SIGGRAPH Asia) 47–56. Communications video. 30, 6 (2011), 128. 16. Russell, B.C., Efros, A.A., Sivic, J., http://cacm.acm.org/ 6. Hays, J., Efros, A. Im2gps: Estimating Freeman, W.T., Zisserman, A. Using videos/what-makes-paris- geographic information from a multiple segmentations to discover look-like-paris single image. In IEEE Conference objects and their extent in image on Computer Vision and Pattern collections. In IEEE Conference Recognition (CVPR) (2008), IEEE, 1–8. on Computer Vision and Pattern 7. Kalogerakis, E., Vesselova, O., Recognition (CVPR) (2006), IEEE, Hays, J., Efros, A., Hertzmann, A. 1605–1614. Image sequence geolocation with 17. Schindler, G., Brown, M., Szeliski, R. human travel priors. In IEEE City-scale location recognition. 12th International Conference on In IEEE Conference on Computer Computer Vision (ICCV) (2009), Vision and Pattern Recognition IEEE, 253–260. (CVPR) (2007), IEEE, 1–7. 8. Knopp, J., Sivic, J., Pajdla, T. 18. Shrivastava, A., Malisiewicz, T., Avoiding confusing features in place Gupta, A., Efros, A.A. Data-driven recognition. In European Conference visual similarity for cross-domain on Computer Vision (ECCV) (2010), image matching. ACM Trans. Graph. Springer, 748–761. (SIGGRAPH Asia) 30, 6 (2011), 154. 9. Lee, Y.J., Efros, A.A., Hebert, M. 19. Simon, I., Snavely, N., Seitz, S.M. Style-aware mid-level representation Scene summarization for online for discovering visual connections image collections. In IEEE 11th in space and time. In IEEE 14th International Conference on International Conference on Computer Vision (ICCV) Computer Vision (ICCV) (2013), IEEE, (2007), IEEE, 1–8. 1857–1864. 20. Singh, S., Gupta, A., Efros, A.A. 10. Li, X., Wu, C., Zach, C., Lazebnik, S., Unsupervised discovery of mid-level Frahm, J.-M. Modeling and recognition discriminative patches. In European of landmark image collections using Conference on Computer Vision Copyright held by authors. iconic scene graphs. In European (ECCV) (2012), Springer, 73–86. Publication rights licensed to ACM. $15.00.

110 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 DOI:10.1145/2830506 To view the accompanying paper, Technical Perspective visit doi.acm.org/10.1145/2830508 rh In-Situ Database Management By David Maier

IMAGINE YOU HAVE a collection of data The authors built a specific in- files—say, cell-tower call records— stance of a NoDB system called Post- and you believe some of them might The following paper gresRaw. The main techniques it contain useful information: towers is exciting, uses are to avoid parsing portions that are near capacity, numbers whose of a file not needed by the current calls are frequently dropped, failed as it minimizes query, and reusing the work it does hand-offs. You want to run a few que- upfront costs do via a “positional map” (a sort of ries over selected files to explore which structural index) that remembers the ones might merit further analysis and when exploring location of fields in records it does determine what kinds of knowledge new data sources, access. Thus, initial queries avoid you might extract. If a file is small, you the full load cost of an external file, might transfer it to your own computer and it opens up while later queries take advantage of and inspect it with a spreadsheet pro- a wide range previous parsing work. The authors gram or an analysis environment such also consider incremental methods as R. However, suppose individual files of additional on in-situ data, such as collecting are too large to fit on your computer. techniques statistics and caching data from one What are your choices then for explor- query to another. The evaluation of ing them? to pursue for in-situ PostgresRaw shows the performance One possibility is to load the data data management. advantage of NoDB technology over into a database management system the alternatives mentioned here. (DBMS) and use the query language— Should we expect our DBMSs in the likely SQL—to ask your questions. future will do away with internal stor- While you get the advantage of a high- age and run entirely over in-situ data? level data language, it might take Likely not—the NoDB approach tar- hours to load the data before you can gets large, static datasets (though it pose your first query. If you want to could be extended to handle certain switch to another file, then you have to classes of updates, such as append- wait again while that file loads. More- it can use the file data in situ, without ing records). Data subject to small, over, any file you do load now takes up having to load it first. They term their frequent changes is best maintained at least twice the storage space, since approach “NoDB” to indicate it does by the DBMS. Also, if you determine its data is in both the database and not require a separate copy of the data at some point that a file will be in- on the file system. Deleting the origi- stored internally to the DBMS. tensely queried in the future, then it nal file might not be possible, if it has Note that some DBMSs do support is worthwhile to incur the up-front other users. links to external files that are viewed load cost, in exchange for faster que- An option is to use a MapReduce as tables, which are parsed and tem- ries. Nevertheless, the paper is excit- framework, such as Hadoop, to run porarily loaded on demand. That ing, as it minimizes up-front costs your preliminary analyses. Now your approach does avoid the initial load when exploring new data sources, “time to insight” is delayed by hav- into persistent storage and allows the and it opens up a wide range of addi- ing to write (and probably debug) a loading process to overlap with other tional techniques to pursue for in-situ program. Even if you are able to for- query stages. However, loading hap- data management, such as incremen- mulate your questions as programs pens on every query, as with the Map- tal value-based indexing, synthesiz- fairly quickly (perhaps using a lan- Reduce approach. Furthermore, ing access methods for common file guage layer such as Hive or PigLa- such external data is a second-class types, and selective transfer of in-situ tin), each query you run will scan the citizen, lacking the indexes and sta- data to internal storage. whole file anew. In addition, you lose tistics that speed performance on performance enhancements such as internal data. The NoDB approach, David Maier ([email protected]) is Maseeh Professor of Emerging Technologies in the Department of Computer indexes and optimization available in contrast, tries to make in-situ data Science at Portland State University, Portland, OR. in a DBMS. first class, by using an incremental, Such unattractive trade-offs face pay-as-you-go approach to provid- nearly everyone wanting to quickly ex- ing DBMS functionality that tries to plore a new data source. The following minimize up-front load costs, while paper by Alagiannis et al. investigates a capturing the work that is done for third approach, extending a DBMS so the benefit of later queries. © 2015 ACM 0001-0782/15/12 $15.00

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 111 research highlights

DOI:10.1145/2830508 NoDB: Efficient Query Execution on Raw Data Files By loannis Alagiannis, Renata Borovica-Gajic, Miguel Branco, Stratos Idreos, and Anastasia Ailamaki

Abstract technology, they are rarely used for emerging applica- As data collections become larger and larger, users are faced tions. This is largely due to the complexity involved; there with increasing bottlenecks in their data analysis. More data is a significant initialization cost in loading data and pre- means more time to prepare and to load the data into the paring the database system for queries. For example, a database before executing the desired queries. Many appli- scientist needs to quickly examine a few Terabytes of new cations already avoid using database systems, for example, data in search of certain properties. Even though only a scientific data analysis and social networks, due to the com- few attributes might be relevant for the task, the entire plexity and the increased data-to-query time, that is, the data must first be loaded inside the database. Besides time between getting the data and retrieving its first useful being a significant time investment, it is also important results. For many applications data collections keep grow- to consider the extra computing resources required for a ing fast, even on a daily basis, and this data deluge will only full load and its side-effects with respect to energy con- increase in the future, where it is expected to have much sumption and economical sustainability. more data than what we can move or store, let alone analyze. Instead of using database systems, emerging applica- We here present the design and roadmap of a new par- tions rely on custom solutions that usually miss important adigm in database systems, called NoDB, which do not database features. For instance, declarative queries, schema require data loading while still maintaining the whole evolution and complete isolation from the internal repre- feature set of a modern database system. In particular, we sentation of data are rarely present. There are a wide variety show how to make raw data files a first-class citizen, fully of competing approaches but users remain exposed to many integrated with the query engine. Through our design and low-level details and must work close to the physical level lessons learned by implementing the NoDB philosophy to obtain adequate performance and scalability. A growing over a modern Database Management Systems (DBMS), we part of the database community recognizes the need for sig- discuss the fundamental limitations as well as the strong nificant and fundamental changes to database design, rang- opportunities that such a research path brings. We identify ing from low-level architectural redesigns to changes in the performance bottlenecks specific for in situ processing, way users interact with the system.2, 5, 8, 9, 12, 14, 16, 17, 21 namely the repeated parsing and tokenizing overhead and The NoDB philosophy. We recognize this new need, which the expensive data type conversion. To address these prob- is a direct consequence of the data deluge, and describe the lems, we introduce an adaptive indexing mechanism that roadmap toward NoDB, a new database design philosophy maintains positional information to provide efficient access that we believe will come to define how future database sys- to raw data files, together with a flexible caching structure. tems are designed. The goal of the NoDB philosophy is to We conclude that NoDB systems are feasible to design and make database systems more accessible to the user by elimi- implement over modern DBMS, bringing an unprecedented nating major bottlenecks of current state-of-the-art technol- positive effect in usability and performance. ogy that increases the data-to-query time. The data-to-query time is of critical importance as it defines the moment when a database system becomes usable and thus useful. There 1. INTRODUCTION are, however, fundamental processes in modern database We are in the era of data deluge, where the amount of gener- architectures that represent a major bottleneck for data-to- ated data outgrows the capabilities of query processing tech- query time. The NoDB philosophy changes the way a user nology. Many emerging applications, from social networks interacts with a database system by eliminating one of the to scientific experiments, are representative examples of this most important bottlenecks, that is, data loading. We advo- deluge, where the rate at which data is produced exceeds any cate querying over raw data, in situ (i.e., in its original place) past experience. Scientific disciplines such as astronomy as the principal way to manage data in a database and we are soon expected to collect multiple Terabytes of data on a propose to redesign the query processing layers of database daily basis. Similarly, web-based businesses such as social systems to incrementally and adaptively query raw data files networks or web log analysis are already confronted with in situ, while automatically creating and refining auxiliary a growing stream of large data inputs. Therefore, there is structures to speed-up future queries. a clear need for efficient big data processing to enable the evolution of businesses and sciences to the new era of data The original version of this paper was published in deluge. ­Proceedings of the 2012 ACM SIGMOD International Conference Motivation. Although Database Management Systems on Management of Data (Scottsdale, Arizona, USA). (DBMS) remain overall the predominant data analysis

112 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12

Adaptive data loads. We originally introduced the idea 2.1. Straightforward approaches of adaptive data loading in an earlier vision paper.9 The cur- We describe two straightforward ways to directly query rent paper makes numerous and significant contributions, raw data files. The first approach is to simply run the load- toward demonstrating the feasibility and the potential of ing procedure whenever a relevant query arrives: when a that vision. Using a mature and complete implementation query referring to table R arrives, only then load table R, and over a modern DBMS, we identify and overcome fundamen- immediately evaluate the query over the loaded data. Data tal limitations in NoDB systems. Most importantly, we show may be loaded into temporary tables that are discarded after how to make raw files first-class citizens without sacrific- processing the query, or it may be loaded into persistent ing query performance. We also introduce several innova- tables stored on disk. These approaches however, signifi- tive techniques such as selective parsing, adaptive indexing cantly penalize the first query, since creating the complete structures that operate on the raw files, caching techniques, table before evaluating the query implies that the same data and statistics collection over raw files. Overall, we describe needs to be accessed twice, once for loading and once for how to exploit current relational databases to conform to query evaluation. the NoDB philosophy while identifying limitations and A better approach is to tightly integrate the raw file opportunities in the process. accesses with the query execution. This is accomplished by Contributions. Our contributions are as follows: enriching the leaf operators of the query plans, for example, the scan operator, with the ability to access raw data files. • We convert a traditional relational database (PostgreSQL) Therefore, the scan operator tokenizes and parses a raw file into a NoDB system (PostgresRaw), and discover that on-the-fly, creates the tuples and passes them to the remain- the main bottlenecks are the repeated access and pars- ing of the query plan. The key difference is that data parsing ing of raw files. Therefore, we design an innovative and processing occur in a pipelined fashion, that is, the raw adaptive indexing mechanism that makes the trip back file is read from disk in chunks and once a tuple or a group to the raw files efficient. of tuples is produced, the scan immediately passes those • We demonstrate that the query response time of a tuples upstream. NoDB system can be competitive with a traditional Both straw-man techniques require that the proper DBMS, even without prior data loading. schema be known a priori; the user needs to declare the • We show that NoDB systems provide quick access to the schema and mark all tables as in situ tables. Other than data under a variety of workloads. PostgresRaw query that, both techniques represent a straightforward imple- performance improves adaptively as it processes addi- mentation of in situ query processing; they do not require tional queries and it quickly matches or outperforms significant new technology other than a careful integration traditional DBMS, including MySQL and PostgreSQL. of existing loading procedures with query processing. • We describe opportunities with the NoDB philosophy, Limitations of straightforward approaches. The approaches as well as challenges such a research path brings. discussed here are similar to the external files function- ality offered by modern database systems such as Oracle 2. QUERYING RAW DATA and MySQL. Such solutions are not viable for extensive In this section, we introduce the NoDB philosophy. For ease and repeated query processing. For example, if data is not of presentation, we first discuss a straw-man approach to kept in persistent tables, then every future query needs to in situ querying, where every query relies exclusively on raw perform loading from scratch, which is a major overhead. files for query processing. Then, we address the weaknesses Materializing loaded data into persistent tables however, of the straw-man approach by introducing the core concepts forces a single query to incur all loading costs. Therefore, of NoDB that enable efficient access to raw data. such approaches are only viable if a user needs to fire few Typical storage and execution. A row-store DBMS organizes queries. data in the form of tuples, stored sequentially one tuple after the Neither straw-man technique allows the implementation other in the form of slotted pages. Each page contains a collec- of important database systems functionality. In particu- tion of tuples as well as additional metadata information to help lar, given that data is not loaded, there is no mechanism to in-page navigation. These pages are created during the loading exploit indexing; modern database systems do not support process. Before being able to submit queries, the data must first indexes on raw data. Without index support, query plans for be loaded, which transforms it from the raw format to the data- straw-man techniques rely only on full scans, incurring a base page format. During query processing the system brings significant performance degradation compared to a DBMS pages into memory and processes the tuples. In order to create with loaded data and indexes. In addition, the optimizer proper query plans, that is, to decide the operators and their cannot exploit any statistics, since statistics in a modern order of execution, an optimizer is used, which exploits previ- DBMS are created only after data is loaded. The lack of sta- ously collected statistics about the data. A query plan can be seen tistics and indexing means that straw-man techniques do as a tree where each node is a relational operator and each leaf not provide query processing performance comparable to a corresponds to a data access method. The access methods define modern DBMS and any time gained by skipping data load- how the system accesses the tuples. Each tuple is then passed ing is lost after only a few queries. one-by-one through the operators of a query plan. The NoDB Even though in situ features, such as external files, are ­philosophy needs to be integrated with the afore-mentioned important for the users, current implementations are far design for efficient and adaptive query execution. from the NoDB vision of providing an instant gateway to the

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 113 research highlights

data, without losing the performance advantages achieved grand challenge is to come up with a seamless design that by modern DBMS. integrates such features into a modern DBMS.

2.2. The NoDB philosophy 3. POSTGRESRAW: BUILDING NoDB IN POSTGRESQL The NoDB philosophy aims to provide in situ access with In this section, we discuss the design of our NoDB proto- query processing performance that is competitive with a type, called PostgresRaw, implemented by modifying the database system operating over previously loaded data. In open-source DBMS PostgreSQL. We show how to minimize other words, the vision is to completely shed the loading parsing and tokenizing costs within a row-store engine via costs, while achieving or improving the query processing selective and adaptive parsing actions. In addition, we pres- performance of a traditional DBMS. Such performance char- ent a novel raw file indexing structure that adaptively main- acteristics make the DBMS usable and flexible; a user may tains positional information to speed-up future accesses only think about the kind of queries to pose and not about on raw files. Finally, we present caching and exploitation of setting up the system in advance and going through all the statistics in PostgresRaw. The ideas described in this section initialization steps that are necessary today. can be used as guidelines for turning modern row-stores The design we propose in this work takes significant steps into NoDB systems. in identifying and eliminating or greatly minimizing initial- In the remaining of this section we assume that raw data ization and query processing costs that are unique for in situ is stored in comma-separated value (CSV) files. Comma- systems. The target behavior is visualized in Figure 1. It illus- separated value files as textual files are challenging for trates an important aspect of the NoDB philosophy; even an in situ engine, considering the high conversion cost to though individual queries may take longer to respond than binary format and the fact that fields may be variable length. in a traditional system, the data-to-query time is reduced, Nonetheless, being a common data source, they present an because there is no need to load and prepare data in advance ideal use case for PostgresRaw. or to fine tune the system when different queries arrive. In addition, performance improves gradually as a function of 3.1. On-the-fly parsing the number of queries processed. We first discuss aspects related to on-the-fly raw file pars- New challenges of NoDB systems. The main bottleneck ing and essential features such as selective parsing and of in situ query processing is the access to raw data. The tuple formation. We later describe the core PostgresRaw costs involved in raw data access significantly degrade components. query performance. In a traditional DBMS, parsing raw Query plans in PostgresRaw. When a query submitted data files is more expensive than accessing database pages. to PostgresRaw references relational tables that are not The NoDB philosophy aims at making raw data a first-class yet loaded, PostgresRaw needs to access the respective raw citizen, integrating raw data access in an abstract way into file(s). PostgresRaw overrides the scan operator with the the query processing layer, allowing query processing with- ability to access raw data files directly, while the remain- out a priori loading. However, a NoDB system can only be ing query plan, generated by the optimizer, works without useful and attractive in practice if it achieves performance changes compared to a conventional DBMS. levels comparable to a modern DBMS. Therefore, the main Parsing and tokenizing raw data. Every time a query needs challenge for a NoDB system is to minimize the cost of to access raw data, PostgresRaw has to perform parsing and accessing raw data. tokenization. In a typical CSV structure, each CSV file repre- From a high level point of view, we distinguish between sents a relational table, each row in the CSV file represents two directions; the first one aims at minimizing the cost of a tuple of a table and each entry in a row represents an attri- raw data access through the careful design of data struc- bute value of the tuple. During parsing, PostgresRaw needs tures that can speed-up such accesses; the second one aims first to identify each tuple, or row in the raw file. Once all at selectively eliminating the need for raw data access by tuples have been identified, PostgresRaw must then search careful caching and scheduling raw data accesses. The final for the delimiter separating different values and transform those characters into their proper binary values. Overall, these extra parsing and tokenizing actions represent a signif- Figure 1. Improving user interaction with NoDB. icant overhead inherent to in situ query processing; a typical DBMS performs all these steps at loading time and directly Q4 reads binary database pages during query processing. Q4 Q3 Selective tokenizing. PostgresRaw reduces the tokeniz- Q3 Q2 ing costs by opportunistically aborting tokenizing tuples as Q1 soon as the required attributes for a query have been found. Q4 This occurs at a per tuple basis. Given that CSV files are orga- Q2 Q3 Response time Load Q2 nized in a row-by-row basis, selective tokenizing does not Q1 bring any I/O benefits; nonetheless, it significantly reduces Q1 the CPU processing costs. DBMS with DBMS NoDB Selective parsing. In addition to selective tokenizing, external files PostgresRaw also employs selective parsing to further reduce raw access costs. PostgresRaw transforms to binary only the

114 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12

values required to answer the query. For example, if a query PostgresRaw learns as much information as possible dur- requests the 4th and 8th attribute of a given file and the ing each query. For instance, it does not keep maps only for query contains a selection on the 4th attribute. PostgresRaw the attributes requested in the query, but also for attributes with selective parsing converts all values of the 4th attribute tokenized along the way; for example, if a query requires to binary but delays the binary transformation of the 8th attributes in positions 10 and 15, all positions from 1 to 15 attribute, until it knows that the given tuple qualifies. may be kept. Selective tuple formation. To fully capitalize on selective Storage format. The dynamic nature of the positional parsing and tokenizing, PostgresRaw also applies selective map requires a physical organization that is easy to update tuple formation. Tuples are not fully composed but only con- and incurs low cost during query execution. To achieve effi- tain the attributes required for a given query. In PostgresRaw, cient reads and writes, the PostgresRaw positional map is tuples are only created after the select operator, that is, after implemented as a collection of chunks, partitioned verti- knowing which tuples qualify. cally and horizontally. Each chunk fits comfortably in the Overall selective tokenizing, parsing, and tuple forma- CPU caches, allowing PostgresRaw to efficiently acquire all tion help to significantly minimize the on-the-fly processing information regarding several attributes and tuples with a costs, since PostgresRaw parses only what is necessary to single access. The map can also be extended by adding more produce query answers. chunks either vertically (i.e., adding positional information about more tuples of already partially indexed attributes) or 3.2. Indexing horizontally (i.e., adding positional information about cur- Even with selective tokenizing, parsing and tuple formation, rently non-indexed attributes). Figure 2 shows an example the cost of accessing raw data is still significant. This section of a positional map, where the attributes do not necessar- introduces an auxiliary structure that allows PostgresRaw to ily appear in the map in the same order as in the raw file. compete with a DBMS with previously loaded data. This aux- The positional map does not mirror the raw file. Instead, iliary structure is a positional map, and forms a core compo- it adapts to the workload, keeping in the same chunk attri- nent of PostgresRaw. butes accessed together during query processing. Adaptive positional map. We introduce the adaptive posi- Exploiting the positional map. The information con- tional map to reduce parsing and tokenizing costs. It main- tained in the positional map can be used to jump to the tains low level metadata information on the structure of the exact position of the file or as close as possible. For example, flat file, which is used to navigate and retrieve raw data faster. if a query is looking for the 9th attribute of a file, while the This metadata information refers to positions of attributes map contains information for the 4th and the 8th attribute, in the raw file. For example, if a query needs an attribute PostgresRaw uses the positional map to jump to the 8th X that is not loaded, then PostgresRaw can exploit this meta- attribute and parse it until it finds the 9th attribute. data information that describes the position of X in the raw Maintenance. The positional map is an auxiliary struc- file and jump directly to the correct position without having ture and may be dropped fully or partly at any time without to perform expensive tokenizing steps to find X. any lost of critical information; the next query simply starts Map population. The positional map is created on-the- rebuilding the map from scratch. PostgresRaw assigns a fly during query processing, continuously adapting to que- storage threshold for the size of the positional map such ries. Initially, the positional map is empty. As queries arrive, that the map fits comfortably in memory. Once the storage PostgresRaw adaptively and continuously augments the threshold is reached, PostgresRaw drops parts of the map to positional map. The map is populated during the token- ensure it is always within the threshold limits. izing phase, that is, while tokenizing the raw file for the Adaptive behavior. The positional map is an adaptive current query, PostgresRaw adds information to the map. data structure that continuously indexes positions based

Figure 2. An example of indexing raw files with positional map. after Query 1 on a4,a7 after Query 2 on a2,a5

Positional map Positional map

Raw file p4, p7 Tuple 1 p4, p7 p2, p5 Tuple 1 a1, a2, a3, a4, a5, a6, a7, ..., an p4, p7 p4, p7 p2, p5 Tuple 2 a1, a2, a3, a4, a5, a6, a7, ..., an Tuple 2 . a1, a2, a3, a4, a5, a6, a7, ..., an p4, p7 p4, p7 p2, p5 . a1, a2, a3, a4, a5, a6, a7, ..., an . p4, p7 p4, p7 p2, p5 . a1, a2, a3, a4, a5, a6, a7, ..., an . Tuple 6 a1, a2, a3, a4, a5, a6, a7, ..., an p4, p7 . p4, p7 p2, p5 ... p4, p7 Tuple 6 p4, p7 p2, p5

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 115 research highlights

on the most recent queries. This includes requested attri- PostgreSQL 9.0, thus the direct comparison between the butes as well as patterns, or combinations, in which those two systems is important to understand the impact of in attributes are used. As the workload evolves, some attri- situ querying. We have to point out that PostgresRaw is butes may no longer be relevant and are dropped by a least highly affected by any performance bottlenecks present in recently used (LRU) policy. Similarly, combinations of attri- PostgreSQL, since they share the same query engine. butes used in the same query, which are also stored together, All experiments are conducted in a Sun X4140 server with may be dropped to give space for storing new combinations. 2× Quad-Core AMD Opteron processor (64 bit), 2.7 GHz, Populating the map with new combinations is decided 512KB L1 cache, 2MB L2 cache and 6MB L3 cache, 32GB during pre-fetching, depending on where the requested RAM, 4× 250GB 10,000 RPM SATA disks (RAID-0) and using attributes are located on the current map. The distance Ubuntu 9.04. that triggers indexing of a new attribute combination is a The experiments presented in this section, use a raw data PostgresRaw parameter. In our prototype, the default setting file of 11GB, containing 7.5 × 106 tuples. Each tuple contains is that if all requested attributes for a query belong in differ- 150 attributes with integers distributed randomly in the ent chunks, then the new combination is indexed. range [0–109).

3.3. Caching 4.1. Positional map The positional map allows for efficient access of raw files. Impact. The first experiment investigates the impact of An alternative and complementary direction is to avoid raw the positional map. In particular, we investigate how the file access altogether. Therefore, PostgresRaw also contains behavior of PostgresRaw is affected as the map is popu- a cache that temporarily holds previously accessed data, for lated dynamically with positional information based on the example, a previously accessed attribute or even parts of workload. an attribute. If the attribute is requested by future queries, The set up of the experiment is as follows. We create a PostgresRaw will read it directly from the cache. random set of queries accessing a subset of the attributes The cache holds binary data and is populated on-the-fly found in the raw file. We refer to queries as random, because during query processing. To minimize the parsing costs and they may ask for any attribute. Each query asks for 10 ran- to maintain the adaptive behavior of PostgresRaw, caching dom attributes and retrieves all the rows of the file. We mea- does not force additional data to be parsed, that is, only the sure the average time PostgresRaw needs in order to process requested attributes for the current query are transformed all queries with a varying storage capacity for the positional to binary. The cache follows the format of the positional map map, from 14.3MB up to 2.1GB. such that it is easy to integrate it in the PostgresRaw query The results are shown in Figure 3. The impact of the flow, allowing queries to seamlessly exploit both the cache positional map is significant as it eventually improves and the positional map in the same query plan. The size of response times by more than a factor of 2. In addition, the cache is a parameter than can be tuned depending on performance improves rapidly, not requiring the maxi- the resources. PostgresRaw follows the LRU policy to drop mum capacity. With little less than the 1/4 of the point- and populate the cache. Overall, the PostgresRaw cache can ers (260 million positions) collected, execution time is be seen as the place holder for adaptively loaded data. already only 15% from the full indexed case. After 3/4 of the pointers are collected, response time remains con- 3.4. Statistics stant even though the workload is random. Therefore, Optimizers rely on statistics to create good query plans. PostgresRaw does not need to maintain positional infor- Most important plan choices depend on the selectivity esti- mation for the complete raw file, thereby saving signifi- mation that helps ordering operators such as joins. Creating cant storage and access costs, without compromising statistics in modern databases, however, is only possible performance. after data is loaded. Scalability. The next experiment investigates the scal- We extend the PostgresRaw scan operator to create statis- ability of PostgresRaw when exploiting the positional tics on-the-fly. We carefully invoke the native statistics rou- map. The set up is the same as in the previous experiment tines of the DBMS, providing it with a sample of the data. with the difference that this time the file size is increased Statistics are then stored and are exploited in the same way gradually from 2GB to 92GB. We use two ways to increase as in conventional DBMS. In order to minimize the overhead of creating statistics during query processing, PostgresRaw creates statistics only on requested attributes, that is, only Figure 3. Effect of the number of pointers in the positional map. on attributes that PostgresRaw needs to read and which are 50 required by at least the current query. 40 On-the-fly creation of statistics brings a small overhead on 30 the PostgresRaw scan operator, while allowing PostgresRaw 20 to implement high-quality query execution plans. 10 0 Execution time (s) 4. EXPERIMENTAL EVALUATION 0 200 400 600 800 1000 1200 In this section, we present an experimental analysis of # Pointers (in millions) PostgresRaw. PostgresRaw is implemented on top of

116 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12

the file size; first, by adding more attributes to the file query. When the cache and the positional map are enabled and second, by appending more rows to the file. In the the second query is 82–88% faster than the first. The Baseline first case, queries remain the same as before. In the sec- variation improves slightly mainly due to file system caching ond case, queries incrementally access more attributes as and from there on it provides constant performance, which we increase the file size. We ensure that for every case we is not competitive with the other variations; every query compare, queries perform similar I/O and computation needs to scan the raw file without any help from indexing actions. We allow unlimited storage space for the posi- and caching. tional map. Nevertheless, we store only positions accessed When only the positional map is used, the first few que- by the most recent queries. ries collect metadata information, improving future attri- Figure 4 depicts the results. For both cases we observe bute retrieval by minimizing the parsing and tokenizing linear scalability; PostgresRaw exploits the positional costs. The rest of the queries benefit from this informa- map to nicely scale as raw files grow both vertically and tion, demonstrating improved and stable performance. horizontally. The positional map allows PostgresRaw to navigate as close as possible to the required attributes, which is important 4.2. Positional maps and caching particularly when a parse the raw file, which increases the This experiment investigates the behavior of overall execution time (3–5 times in this example). Figure 5 PostgresRaw when exploiting both the positional map shows that the combined effects of the positional map and and caching or only one of them. We create 50 queries, caching achieve the best performance; PostgresRaw PM+C where each query randomly accesses five columns and outperforms all other approaches across the entire query all the rows of the raw file. We study four variations. The sequence. first one, called Baseline, does not use positional maps or caching, representing the behavior of PostgresRaw as 4.3. Adapting to workload changes if it were a straw-man external files implementation. The In this experiment, we demonstrate that PostgresRaw second variation, called PostgresRaw PM, uses only the progressively and transparently adapts to changes in the positional map while the third, called PostgresRaw C, workload. We use the same raw file as in the previous experi- uses only the cache and an additional minimal map with ments but the query sequence is expanded to 250 queries. positional information for the end of lines. The final ver- Each query again refers to five random attributes of the file. sion, called PostgresRaw PM+C, combines all previous The query sequence is divided into five epochs and in each techniques. epoch we execute 50 different queries. All queries within the Figure 5 plots the response time for each query. Since same epoch focus on a given part of the raw file. The maxi- there is no a priori knowledge to exploit, all PostgresRaw mum size of the cache is limited to 2.8GB, while the posi- variations need to touch the raw file to extract the needed tional map does not exceed 715MB. data for the first query; thus, they all show similar perfor- Figure 6 depicts the results, separating each epoch with mance. Performance improves drastically as of the second vertical lines at positions 50, 100, . . ., 200. The graph plots both the response time for each query in the sequence and how the size of the PostgresRaw cache evolves as queries are Figure 4. Scalability of the positional map. evaluated. 400 During the first epoch, queries refer to columns 1–50. 300 The cache and the positional map are initially empty. After executing 32 queries all data in this part of the file 200 is cached; the cache does not increase and performance 100 Vary #tuples Vary #attributes remains stable. In the second epoch, queries retrieve

Execution time (s) 0 data between columns 51–100. Performance fluctuates 020406080 100 as some queries can fully exploit the cache and have File size (GB) faster response times while others need to go back to

Figure 6. Adapting to changes in the workload. Figure 5. Effect of the positional map and caching. 100 100 100 Cache utilization PostgresRaw PM+C PostgresRaw PM Execution time e (%)

PostgresRaw C Baseline g 10 50 10 Cache usa Execution time (s) Execution time (s) 1 1 0 01020304050 050 100 150 200 250 Query sequence Query sequence

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 117 research highlights

the file. After the second epoch, the cache is full and all the difference that they refer to fewer attributes at steps of queries enjoy good performance. During the third epoch, 20% at a time. we launch a random set of queries requesting columns Figure 7 shows the results. PostgresRaw achieves the between 1 and 100, that is, the same regions used in the best overall performance. It is competitive with DBMS X previous epochs. Since PostgresRaw has built a complete and MySQL for this sequence of queries. External files in cache of this region, no I/O or parsing is required. In the MySQL (CSV Engine) and DBMS X are significantly slower fourth epoch, queries ask for columns 75–125, that is, half than querying over loaded data or PostgresRaw, since each of the queries hit previously explored areas and half of the query repeatedly scans the entire file. Conventional wisdom queries hit new regions. PostgresRaw uses a LRU replace- indicates that the overhead inherent to in situ querying is ment policy in its cache and drops previously cached data problematic. This is indeed the case for straightforward in to accommodate the new requests. During the last epoch, situ techniques such as external files. Nonetheless, these the workload slightly shifts to the region of columns results show that the in situ overhead is not a bottleneck 85–135. PostgresRaw needs to replace parts of its cache if we apply more advanced techniques that amortize the while parts of the requested data are retrieved from the file overhead across a sequence of queries, allowing for quick by exploiting the positional map. access to the data. Compared to PostgreSQL, PostgresRaw Overall, we observe that PostgresRaw gracefully adapts to shows a significant advantage (25.75% in this case) even the changes of the workload. In every epoch, PostgresRaw though it uses the same query engine. PostgreSQL is 53% quickly adapts, adjusting and populating its cache and the slower than DBMS X if we consider the query execution positional maps, automatically stabilizing to good perfor- time (without the loading costs). PostgresRaw, on the mance levels. Additionally, the maintenance of the cache other hand, manages to be 6% faster than DBMS X even and the positional map do not add significant overhead to though it uses the same engine as PostgreSQL; by avoid- query execution. ing the loading costs, PostgresRaw has already answered the first four queries when DBMS X starts processing the 4.4. PostgresRaw versus other DBMS first query. In our next experiment, we demonstrate the behavior Overall, PostgresRaw shows that it is feasible to amor- of PostgresRaw against state-of-the-art DBMS. We com- tize the overheads inherent to in situ querying over a pare MySQL (5.5.13), DBMS X (a commercial system) and sequence of queries, making an in situ system competitive PostgreSQL against PostgresRaw with positional maps and with a conventional DBMS without requiring a priori data caching enabled. MySQL and DBMS X offer “external files” loading. functionality, which enables direct querying over raw files. Therefore, for MySQL and DBMS X we include two sets of 4.5. Statistics in PostgresRaw performance results; (a) using external files, and (b) using In our final experiment, we demonstrate the behavior of previously loaded data. For queries over loaded data we also PostgresRaw when statistics are created on-the-fly during report the time required to load the data; our goal is to show query processing. We use four instances of TPC-H decision the overall data-to-query time. support benchmark Query 1. We compare two versions of We study the cumulative time needed to run a sequence PostgresRaw. The first one generates statistics on-the-fly in of queries with each system. We use a sequence of nine que- an adaptive way, while the second one does not generate or ries where we also vary selectivity and projectivity. All queries exploit statistics at all. have one selection predicate and then project and run aggre- Figure 8 shows the response times when running all gations on the rest of the attributes. The first query requires four queries. The first query uses the same plan in both all attributes and accesses all rows of the file. This is the versions of PostgresRaw and initializes the positional worst case for PostgresRaw since we have to pay the whole map and the caching as well. Collecting statistics adds cost of populating the positional map and the cache up an additional overhead of 4.5 s in the execution time of front. The next four queries are the same with the difference the first query. PostgresRaw analyzes and creates statis- that they access fewer rows at steps of 20% at a time. Then, tics only for the attributes required for the current query. the final four queries are again similar to the first query with After the first query, the rest of the queries have different

Figure 7. Comparing the performance of PostgresRaw with other DBMS. ˜5971 s 3000 Q9 Q8 Q7 Q6 2500 2357 s Q5 Q4 2000 1671 s Q3 Q2 1500 Q1 Load 831 s 1000 656 s 617 s 500 Execution time (s) 0 MySQL CSV Engine DBMS XPDBMS X ostgreSQL PostgresRaw MySQL w/ external files PM + CX

118 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12

Figure 8. Execution time as PostgresRaw generates statistics. Complex database schemas. Database Management Systems support complex database schemas with large w/ statistics 150 number of tables and columns within a table. Nonetheless, w/o statistics complex schemas usually require a database administrator 100 (DBA) to tune vendor-specific configuration settings. For instance, a commercial DBMS we tested does not allow 50 a row to be split across pages; if there are many columns within a table, or columns have large fields, the DBA must

Execution time (s) 0 Q1_a Q1_b Q1_c Q1_d manually increase the page size, buffer pool and table Query sequence space. These configurations are not straightforward and are also subjected to additional limitations: for example, pages must also have a minimum number of rows. In addi- tion, larger tuples cause unpredictable behavior due to the behavior even though they follow the same query tem- use of slotted pages in the DBMS. plate. In the PostgresRaw version with statistics support, Types of data analysis. Current DBMS are best suited to queries run three times faster in comparison with the ver- manage data that is loaded only once or rarely in an incre- sion without statistics. By examining the query plans, we mental fashion, with well-known and rarely changing notice that the optimizer selects a different set of opera- workloads. DBMS require physical design steps for best tors and changes the ordering of operators in PostgresRaw performance, such as creating indexes, which are time- with statistics which explains the improvement in perfor- ­consuming tasks. In situ databases, however, are more suited mance. Generating the statistics on-the-fly adds only a for users that need to explore data without having to load small overhead, while it significantly improves query plan entire datasets. Users should be willing to pay a penalty dur- selection. ing the early queries, as long as they do not need to create data loading scripts. In situ databases are also useful when 5. IN SITU QUERYING: TRADE-OFFS there are large datasets but users need to frequently analyze In situ querying, although desirable in theory, is thought small subsets of the data; such scenarios are increasingly to be prohibitive in practice. Executing queries directly common. over raw data files incurs additional overhead to the execu- Integration with external tools. Database Management tion, when compared to query execution over previously Systems are designed to be the main repository for the data, loaded data. Nonetheless, our PostgresRaw implementa- which makes the integration of DBMS data with external tion demonstrates that auxiliary structures reduce the tools inherently hard. Techniques such as ODBC, stored time to access raw data files and amortize the overhead procedures and user-defined functions aim to facilitate the across a sequence of queries. In situ query execution, how- interaction with data stored on the DBMS. Nonetheless, ever, introduces a new set of trade-offs, which require fur- none of these techniques is fully satisfactory and in fact, this ther analysis: is a common complaint of scientific users, who have large Data type conversion. For ASCII files, PostgresRaw must repositories of legacy code that operates against raw data convert the data into its proper type, for example, from files. Migrating and reimplementing these tools in a DBMS string to integer. Conventional DBMS perform this conver- would be difficult and likely require vendor-specific hooks. sion only once at loading time. To alleviate the data type The NoDB philosophy significantly facilitates such data conversion overhead, PostgresRaw only converts the attri- integration, since users may continue to rely on their legacy butes in the tuple that are actually needed to answer a query. code in parallel to systems such as PostgresRaw. Nonetheless, data type conversion is not always an over- Database independence. Database Management Systems head: if a raw data file consists of variable-length strings, store data in database pages using proprietary and vendor- then PostgresRaw over CSV files is actually faster than a specific formats. The DBMS has complete ownership over conventional DBMS because there is no need to convert data the data, which is a cause of concern for some users. The nor create secondary copies when loading data into a DBMS. NoDB philosophy, however, achieves database indepen- Different data types, however, affect NoDB performance in dence, since the data files remain the main data repository. different ways and should be taken into account when decid- ing which data to cache. 6. OPPORTUNITIES File size versus database size. Loading data into a DBMS The NoDB philosophy drastically and fundamentally rede- creates a second copy of the data. This copy can be stored fines the way database systems are designed. It requires in an optimized manner: for example, integers stored in revisiting well-established assumptions and implementa- a database page (in binary) likely take less space than in tion techniques, while also enabling new opportunities, ASCII. Nonetheless, there are cases where a second copy which are discussed in this section. does not imply less data. Variable-sized data stored in Flexible storage. NoDB systems do not require a priori fixed-size fields usually takes more space in a database loading, which implies no need for a priori decisions on page rather than in its raw form. Therefore, depending how data is physically organized during loading. Data that on the workload, in situ engines can benefit from keeping is adaptively loaded can be cached in memory or written to data in its raw form. disk in a format that enables faster access. Data compression

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 119 research highlights

can also be applied, where beneficial. Deciding the proper the goal of auto-tuning tools. Every major database vendor storage layout is an open research question. Rows, columns, offers offline indexing features, where an auto-tuning tool and hybrids all have comparative advantages and disadvan- performs offline analysis to determine the proper physical tages. Nevertheless, a NoDB system benefits from avoiding design for a specific workload.1, 6, 18, 22 More recently, these to choose in advance. Physical layout decisions can be done ideas have been extended to support online indexing,4, 20 online, and change overtime as the workload changes.3 hence removing the need to know the workload in advance. Adaptive indexing. The NoDB philosophy brings new These techniques are a significant step forward, but still opportunities toward achieving fully autonomous database require all data to be loaded in advance. systems, that is, systems that require zero initialization Adaptive indexing. Database cracking and adaptive and administration. Recent efforts in database cracking indexing introduce the notion of incrementally refining the and adaptive indexing7, 10, 11, 13 demonstrate the potential physical design by following and matching the workload for incrementally building and refining indexes without patterns.7, 10, 11, 13 This shares the adaptive goal of the NoDB requiring an administrator to tune the system, or know- philosophy, where each query is seen as an advice on how to ing the workload. Still, though, all data has to be loaded refine indexes. Nonetheless, similarly to the previous case, up front, forcing a delay in data-to-query time. We envision existing adaptive indexing techniques also require all data that adaptive indexing can be exploited and enhanced for to be loaded up front. NoDB systems. External files. Most modern DBMS offer the ability to Auto-tuning tools. In this paper, we have considered the query data files directly with SQL, that is, without loading. hard case of zero a priori idle time or workload knowledge. External files, however, can only access raw data with no Traditional systems assume “infinite” idle time and knowl- support for database features such as DML operations, edge to perform all necessary initialization steps. In many indexes or statistics and require every query to access the cases, though, the reality can be somewhere in between. For entire data file, as if no other query did so in the past. In example, there might be some idle time but not enough to practice, this functionality is used to facilitate data load- load all data. Auto-tuning tools for NoDB systems, given a ing tasks and not for querying. NoDB systems, however, budget of idle time and workload knowledge, can exploit provide incremental data loading, on-the-fly index cre- idle time to load and index as much of the relevant data. The ation and caching to assist future queries and drastically rest of the data remains unloaded and unindexed until rele- improve performance. vant queries arrive. A NoDB tuning tool should consider raw Information extraction. Information extraction tech- data access costs, I/O costs in addition to the typical query niques have been extended to provide direct access to raw workload based parameters. The NoDB philosophy brings text data,15 similarly to external files. The difference from new opportunities in exploiting every single bit of idle time external files is that raw data access relies on information or workload knowledge. extraction techniques instead of directly parsing raw data Information integration. Another major opportunity files. These efforts are motivated by the need to bridge mul- with the NoDB vision is the potential to query multiple dif- tiple different data formats and make them accessible via ferent data sources and formats. NoDB systems can adopt SQL, usually by relying on wrappers.19 format-specific plugins to handle different raw data file formats. Implementing these plugins in a reusable manner 8. CONCLUSION requires applying data integration techniques but may also Very large data processing is increasingly becoming a neces- require the development of new techniques, so that com- sity for modern applications in businesses and in sciences. monalities between formats are determined and reused. For state-of-the-art database systems, the incoming data Additionally, supporting different file formats also requires deluge is a problem. In this paper, we introduce a database the development of hybrid query processing techniques, design philosophy that turns the data deluge into a tremen- or even adding support for multiple data models (e.g., for dous opportunity for database systems. It requires drastic array data). changes to existing query processing technology but elimi- File system interface. Another interesting opportunity nates one of the most fundamental bottlenecks present in that comes with NoDB is that of bridging the gap between classical database systems for the past 40 years, that is, the file systems and databases. Unlike traditional database sys- data loading overhead. Until now, it has not been possible to tems, data in NoDB systems is always stored in file systems, exploit database technology until data is fully loaded. NoDB such as NTFS or ext4. This provides NoDB the opportunity to systems permanently remove this restriction by enabling in intercept file system calls and gradually create auxiliary data situ querying. structures that speed-up future queries. This article described the NoDB philosophy, identifies problems, solutions and opportunities. It also describes 7. RELATED WORK the transformation of a modern row-store, PostgreSQL, The NoDB philosophy draws inspiration from several into a NoDB prototype system, which we call PostgresRaw. decades of research on database technology and it is related Experiments on PostgresRaw demonstrate competitive per- to a plethora of research topics. We briefly discuss related formance with traditional DBMS. PostgresRaw, however, work in this section. does not require any previous assumptions about which Auto-tuning. The NoDB philosophy advocates for mini- data to load, how to load it or which physical design steps mizing or eliminating the data-to-query time, which is also to perform before querying the data. Instead, it accesses the

120 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12

raw data files adaptively and incrementally, allowing users databases. In ICDE (2008), 636–645. for legacy data sources. In VLDB 16. Kersten, M., Idreos, S., Manegold, S., (1997), 266–275. to explore new data quickly and improving the usability of Liarou, E. The researcher’s guide to 20. Schnaitter, K., Abiteboul, S., Milo, T., database systems. the data deluge: Querying a scientific Polyzotis, N. COLT: Continuous database in just a few seconds. In on-line tuning. In SIGMOD (2006), The NoDB philosophy does not stop here however. PVLDB. Volume 4 (2011), 1474–1477. 793–795. We describe open issues and research challenges for the 17. Nandi, A., Jagadish, H.V. Guided 21. Stonebraker, M., Becla, J., DeWitt, D., interaction: Rethinking the query- Lim, K.-T., Maier, D., Ratzesberger, O., database community at large. We expect that address- result paradigm. In PVLDB. Volume 4 Zdonik, S. Requirements for science (2011), 1466–1469. data bases and SciDB. In CIDR ing these new challenges will enable a new generation of 18. Papadomanolakis, S., Ailamaki, A. (2009). database systems that serve the needs of modern applica- AutoPart: Automating schema 22. Zilio, D., Rao, J., Lightstone, S., design for large scientific databases Lohman, G., Storm, A., Garcia- tions and users. using data partitioning. In SSDBM Arellano, C., Fadden, S. DB2 design (2004), 383–392. advisor: Integrated automatic 19. Roth, M.T., Schwarz, P. Don’t scrap physical database. In VLDB (2004). References Szalay, A., DeWitt, D., Heber, G. it, wrap it! A wrapper architecture 1. Agrawal, S., Chaudhuri, S., Kollar, L., Scientific data management in the Marathe, A., Narasayya, V., Syamala, M. coming decade. SIGMOD Rec. 34 Database tuning advisor for (2005), 34–41. Microsoft SQL server 2005. In VLDB 9. Idreos, S., Alagiannis, I., Johnson, R., (2004), 1110–1121. Ailamaki, A. Here are my data files. 2. Ailamaki, A., Kantere, V., Dash, D. Here are my queries. Where are my loannis Alagiannis, Renata Borovica- Stratos Idreos ([email protected]. Managing scientific data.Commun. results? In CIDR (2011). Gajic, Miguel Branco, and Anastasia edu), Harvard University, Cambridge, MA. ACM 53 (2010), 68–78. 10. Idreos, S., Kersten, M., Manegold, S. Ailamaki ({ioannis.alagiannis, renata. 3. Alagiannis, I., Idreos, S., Ailamaki, A. Database cracking. In CIDR (2007). borovica, miguel.branco, anastasia. ailamaki}@epfl.ch), École Polytechnique H2O: A hands-free adaptive store. In 11. Idreos, S., Kersten, M., Manegold, S. SIGMOD (2014), 1103–1114. Self-organizing tuple reconstruction Fédérale de Lausanne, Lausanne, 4. Bruno, N., Chaudhuri, S. To tune or not in column-stores. In SIGMOD (2009), Switzerland. to tune? A lightweight physical design 297–308. alerter. In VLDB (2006), 499–510. 12. Idreos, S., Liarou, E. dbTouch: Analytics 5. Cohen, J., Dolan, B., Dunlap, M., at your fingertips. InCIDR (2013). Hellerstein, J., Welton, C. MAD skills: 13. Idreos, S., Manegold, S., Kuno, H., New analysis practices for big data. Graefe, G. Merging what’s cracked, PVLDB 2 (2009), 1481–1492. cracking what’s merged: Adaptive 6. Dash, D., Polyzotis, N., Ailamaki, A. indexing in main-memory column- CoPhy: A scalable, portable, and stores. PVLDB 4 (2011), 586–597. interactive index advisor for large 14. Jagadish, H.V., Chapman, A., Elkiss, A., workloads. PVLDB 4 (2011), 362–372. Jayapandian, M., Li, Y., Nandi, A., Yu, C. 7. Graefe, G., Kuno, H. Self-selecting, Making database systems usable. In self-tuning, incrementally optimized SIGMOD (2007), 13–24. indexes. In EDBT (2010), 371–381. 15. Jain, A., Doan, A., Gravano, L. 8. Gray, J., Liu, D., Nieto-Santisteban, M., Optimizing SQL queries over text © 2015 ACM 0001-0782/15/12 $15.00

World-Renowned Journals from ACM ACM publishes over 50 magazines and journals that cover an array of established as well as emerging areas of the computing field. IT professionals worldwide depend on ACM's publications to keep them abreast of the latest technological developments and industry news in a timely, comprehensive manner of the highest quality and integrity. For a complete listing of ACM's leading magazines & journals, including our renowned Transaction Series, please visit the ACM publications homepage: www.acm.org/pubs.

ACM Transactions ACM Transactions on Interactive on Computation Intelligent Systems Theory PLEASE CONTACT ACM MEMBER SERVICES TO PLACE AN ORDER Phone: 1.800.342.6626 (U.S. and Canada) +1.212.626.0500 (Global) Fax: +1.212.944.1318 (Hours: 8:30am–4:30pm, Eastern Time) Email: [email protected] Mail: ACM Member Services General Post Offi ce PO Box 30777 New York, NY 10087-0777 USA

ACM Transactions on Interactive ACM Transactions on Computation Intelligent Systems (TIIS). This Theory (ToCT). This quarterly peer- quarterly journal publishes papers reviewed journal has an emphasis on research encompassing the on computational complexity, foun- design, realization, or evaluation of dations of cryptography and other interactive systems incorporating computation-based topics in theo- some form of machine intelligence. retical computer science. www.acm.org/pubs

PUBS_halfpage_Ad.indd 1 DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF6/7/12 THE ACM 11:38 121AM CAREERS

Boise State University Boston University is an Equal Opportunity/Af- ing. Teaching responsibilities include digital de- Department of Computer Science firmative Action Employer and all qualified appli- sign, computer architecture, embedded systems, Eight Open Rank, Tenured/Tenure-Track cants will receive consideration for employment design projects, technical electives, and graduate Faculty Positions without regard to race, color, religion, sex, nation- courses. Desired areas of expertise include com- al origin, disability status, protected veteran sta- puter engineering, and hardware and firmware The Department of Computer Science at Boise tus, or any other characteristic protected by law. aspects of mobile computing platforms. For de- State University invites applications for eight We are a VEVRAA Federal Contractor. tails, qualifications, and application instructions open rank, tenured/tenure-track faculty posi- (online application required), visit WWW.CALPO- tions. Seeking applicants in the areas of big data LYJOBS.ORG and apply to requisition #103795. (including distributed systems, HPC, machine Bradley University Application review begins Jan. 4, 2016. EEO. learning, visualization), cybersecurity, human Computer Science and Information Systems computer interaction and computer science edu- Department cation research. Strong applicants from other ar- Assistant Professor California State University, Fullerton eas of computer science will also be considered. Department of Computer Science Applicants should have a commitment to excel- The Computer Science and Information Systems Assistant Professor lence in teaching, a desire to make significant con- Department at Bradley University invites applica- tributions in research, and experience in collabo- tions for a tenure-track Assistant Professor posi- The Department of Computer Science invites rating with faculty and local industry to develop and tion starting in August 2016. The tenure-track applications for tenure-track positions at the sustain funded research programs. A PhD in Com- Assistant Professor requires a PhD in Computer Assistant Professor level starting August 2016. puter Science or a closely related field is required by Science or a closely related field; candidates work- For a complete description of the department, the date of hire. For additional information, please ing on their dissertation with anticipated comple- the position, desired specialization and other visit http://coen.boisestate.edu/cs/jobs. tion date before August 2016 will be considered. qualifications, please visit http://hr.fullerton.edu/ Please visit www.bradley.edu/humanresources/ diversity/job-openings/. opportunities for full position description and Boston College application process. Computer Science Department Columbia University Tenure Track Faculty Position Department of Computer Science Cal Poly Faculty Positions The Boston College Computer Science Department Electrical Engineering invites applications for a full-time non-tenure track Assistant or Associate Professor - Electrical Columbia Engineering invites applications for faculty position, beginning September 2016. Engineering and Aerospace Engineering faculty positions in the Department of Computer The position is a three-year renewable con- Science at Columbia University in the City of New tract. Applicants should have a Ph.D. in Computer The Electrical Engineering Department and Aero- York. Applications at the assistant professor, and Science or related discipline, and possess a strong space Engineering Department at Cal Poly, SLO, in exceptional cases, at the associate professor commitment to undergraduate teaching. We will invite applications for a full-time tenure-track and full professor levels, will be considered. begin reviewing applications on December 1, faculty position at the Assistant or Associate Pro- Applications are sought in all areas of com- 2015, and will continue considering applications fessor rank for Fall 2016. Duties include teach- puter science, with particular emphasis on, but until the position is filled. Additional information ing coursework in Electrical Engineering and not limited to, the following areas: Theory (all is available at www.cs.bc.edu/employment. Aerospace Engineering, building a collaborative levels) and Programming languages (Assistant or research program in the area of Satellite Com- Associate Professor level). munication and Mobile Terrestrial Communica- Candidates must have a Ph.D. or its profes- Boston University tions. Desired areas of expertise include satellite sional equivalent by the starting date of the ap- Department of Electrical & Computer electronic systems including the Cube Sat form pointment. Applicants for this position at the As- Engineering (ECE) factor, and mobile terrestrial communication sistant Professor and Associate Professor without Assistant Professor systems. For details, qualifications, and applica- tenure levels must demonstrate the potential to tion instructions (online application required), do pioneering research and to teach effectively. The Department of Electrical & Computer Engi- visit WWW.CALPOLYJOBS.ORG and apply to req- Applicants for this position at the tenured level neering (ECE) at Boston University (BU) is seeking uisition #103796. Application review begins Jan. (Associate or Full Professor) must have a dem- candidates for a tenure-track Assistant Professor 4, 2016. EEO. onstrated record of outstanding research accom- Position in the general domain of computer sys- plishments, excellent teaching credentials and tems and software, motivated by areas such as the established leadership in the field. Internet of Things (IoT), cybersecurity, privacy, Cal Poly The successful candidate is expected to con- and the cloud. Please visit http://www.bu.edu/ Electrical Engineering tribute to the advancement of their field and the ece/facultysearch for instructions on how to ap- Assistant or Associate Professor - Electrical & department by developing an original and lead- ply. Candidates must possess a relevant Ph.D. de- Computer Engineering ing externally funded research program, and by gree, show strong potential for attracting external contributing to the undergraduate and graduate research funding, and possess a strong commit- The Electrical Engineering Department and educational mission of the Department. Colum- ment to teaching. This ECE position is aligned Computer Engineering Program at Cal Poly, SLO, bia fosters multidisciplinary research and en- with a broader university-level initiative in Data invite applications for a full-time, tenure-track courages collaborations with academic depart- Science – an area of strategic growth for Boston faculty position at the Assistant or Associate Pro- ments and units across Columbia University. The University (http://www.bu.edu/datascience/). fessor rank for Fall 2016. Duties include teaching Department is especially interested in qualified The application deadline is December 31, undergraduate and graduate computer/electrical candidates who can contribute, through their re- 2015. The review of applications will begin on Oc- engineering courses and building a collaborative search, teaching, and/or service, to the diversity tober 1, 2015. research program in the area of mobile comput- and excellence of the academic community.

122 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 For additional information and to apply, please to upload letters of recommendation, at least one see: http://engineering.columbia.edu/faculty-job- of which should comment on teaching. Email fac- Florida Institute of Technology opportunities. Applications should be submitted [email protected] with any questions. Faculty position in Software Engineering electronically and include the following: curricu- Dartmouth is an equal opportunity/ affirma- lum-vitae including a publication list, a description tive action employer with a strong commitment The Department of Computer Sciences & Cyber- of research accomplishments, a statement of re- to diversity. In that spirit, we are particularly in- security at the Florida Institute of Technology search and teaching interests and plans, contact in- terested in receiving applications from a broad invites applications for an open faculty position formation for three experts who can provide letters spectrum of people, including women, persons in Software Engineering, beginning in Fall 2016. of recommendation, and up to three pre/reprints of of color, persons with disabilities, veterans or any The department is ABET accredited in both scholarly work. All applications received by Decem- other legally protected group. Computer Science and Software Engineering. ber 15, 2015 will receive full consideration. Application review will begin November 1, Required qualifications for the position include Applicants can consult www.cs.columbia.edu 2015, and continue until the position is filled. an earned Ph.D. with a specialization in software for more information about the department. engineering, evidence of the ability to develop Columbia is an affirmative action/equal op- and sustain an active research program and a portunity employer with a strong commitment to Florida Institute of Technology sincere interest in quality teaching, at both the the quality of faculty life. CS Faculty Positions undergraduate and graduate levels. Our current software engineering strengths are in testing, Florida Tech invites applications for open-rank cybersecurity, maintenance and evolution, but Dartmouth College faculty positions in computer science that be- we welcome applicants from all areas. Our prefer- Department of Computer Science gin in Fall 2016. All areas of CS are considered. ence is for faculty members who conduct research Assistant Professor of Computer Science The CS department offers BS, MS, and PhD that is both pragmatic and academically rigorous. degree programs. Florida Tech is in the Space The Department has significant active research The Dartmouth College Department of Computer Coast, where NASA Kennedy Space Center is funding from multiple government agencies and Science invites applications for two tenure-track located. According to Brookings Institute’s commercial companies. New faculty will have the faculty positions at the level of assistant profes- report on “America’s Advanced Industries,” opportunity to work with the Harris Institute for sor. We seek candidates who will be excellent re- Space Coast ranks 7th in Advanced Industries’ Assured Information. Florida Tech is a NSA/DHS searchers and teachers in the areas of: (1) security Share of Total 2013 Employment. Submission designated Center for Academic Excellence in or machine learning; and (2) theoretical computer of applications and more information are at: Information Assurance Research. Florida Tech is science; although outstanding candidates in any cs.fit.edu/careers/ located in Melbourne on Florida’s Space Coast, area will be considered. We particularly seek can- Equal Opportunity Employer Minorities/ one of the nation’s fastest-growing high-tech ar- didates who will help lead, initiate, and participate Women/Veterans/Disabled eas. The campus is 5 m inutes from the Indian in collaborative research projects both within We are an E-verify employer River estuary, 10 minutes from the Atlantic Ocean Computer Science and involving other Dartmouth EEO is the Law and 50 minutes from Kennedy Space Center and researchers, including those in other Arts & Sci- http://www.fit.edu/hr/documents/eeoc_law.pdf Orlando. For more information on the Depart- ences departments, Dartmouth’s Geisel School of ment of Computer Sciences please visit our web- Medicine, Thayer School of Engineering, and Tuck School of Business. A Ph.D. degree or ABD in Com- puter Science or a closely related field is required. The department is home to 20 tenured and tenure-track faculty members and two research faculty members. Research areas of the depart- ment encompass the areas of security, computa- tional biology, machine learning, robotics, sys- tems, algorithms, theory, digital arts, vision, and Assistant or Associate Professor graphics. The Computer Science department is Software Engineering Program in the School of Arts & Sciences, and it has strong Ph.D. and M.S. programs and outstanding under- The Software Engineering Program, jointly administered by the Departments of Electrical and graduate majors. The department is affiliated with Computer Engineering (ECpE) and Computer Science (CompSci) at Iowa State University, Dartmouth’s M.D.-Ph.D. program and has strong Ames, IA, invites applications for the position of Assistant or Associate Professor in the area collaborations with Dartmouth’s other schools. of software engineering, particularly as it relates to cyber-security or large scale data analysis. Dartmouth College, a member of the Ivy The Department of Electrical and Computer Engineering and the Department of Computer League, is located in Hanover, New Hampshire Science have strong graduate programs. Almost all Ph.D. students are supported by research or (on the Vermont border). Dartmouth has a beau- teaching assistantships. These well-funded research programs provide an excellent academic tiful, historic campus, located in a scenic area on environment in which to work and make career progress. In addition, the cutting-edge research the Connecticut River. Recreational opportuni- and educational activities are nurtured through interdisciplinary interactions facilitated by the ties abound in all four seasons. With an even dis- Laurence H. Baker Center for Bioinformatics and Biological Statistics; the Center for tribution of male and female students and over Computational Intelligence, Learning and Discovery; the Center for Integrative Animal one third of the undergraduate student popula- Genomics; the Cyber Innovation Institute; the Information Assurance Center; the Virtual Reality tion members of minority groups, Dartmouth is Applications Center; the Center for Nondestructive Evaluation; and the U.S. Department of committed to diversity and encourages applica- Energy’s Ames Laboratory, all of which are located on the Iowa State University campus. tions from women and minorities. Iowa State University is an Equal Opportunity/Affirmative Action employer. All qualified To create an atmosphere supportive of re- applicants will receive consideration for employment without regard to race, color, age, search, Dartmouth offers new faculty members religion, sex, sexual orientation, gender identity, genetic information, national origin, marital grants for research-related expenses, a quarter status, disability, or protected veteran status, and will not be discriminated against. Inquiries of sabbatical leave for each three academic years can be directed to the Director of Equal Opportunity, 3350 Beardshear Hall, 515-294-7612. in residence, and flexible scheduling of teaching All interested, qualified persons are encouraged to apply early and must apply for this position responsibilities. by visiting http://www.iastatejobs.com/postings/14051 and complete the Employment Applicants are invited to submit application Application for vacancy #500149. Inquiries regarding the faculty search should be directed to materials via Interfolio at http://apply.interfolio. Professor Johnny Wong at [email protected] or (515) 294-2586. For full consideration, com/31847 (for the Security/ML position) or http:// applications must be received by Dec. 15, 2015. apply.interfolio.com/31850 (for the Theory posi- tion). Upload a CV, research statement, and teach- Iowa State University is an Equal Opportunity/Affirmative Action Employer. ing statement, and request at least four references

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 123 CAREERS

site (http://cs.fit.edu). Information on the Harris tion in cybersecurity. Those with expertise in net- http://www.cis.fordham.edu. Institute is also available online (http://harris- work security, software security, cyber-physical Applications can be electronically submitted institute.fit.edu/). Applicants should send letters systems, computer forensics, wireless security, to Interfolio Scholar Services: of intent, curriculum vitae, research and teach- biometrics-based security, e-Systems Security, For Cybersecurity Position: ing summaries and full contact information for and other related areas are encouraged to apply. apply.interfolio.com/31854 at least three references, via email (swe-faculty- This postdoctoral researcher will conduct high- For Data Analytics Position: [email protected] ). quality research in cybersecurity and teach cours- apply.interfolio.com/31855 Review of applications will begin in Novem- es for MS program in Cybersecurity. ber and continue until the position is filled. Send a detailed CV, research statement, Include(1) Cover letter with qualifications,(2) * Official transcripts of all collegiate work teaching statement, and info of 3 references in a Curriculum vitae,(3) Research Statement,(4) must be sent directly from the attended institu- single pdf file to [email protected]. Include Teaching Statement,(5) Sample scholarship, tion to the Human Resources Office prior to the job-code “POSTDOC-Cybersecurity” in subject and(6) At least three letters of recommendation. first day of employment. All international degrees line. Review of applications will begin immedi- Applications will be accepted until the position must have a course-by-course official evaluation ately and continue until the position is filled. is filled. Preference will be given to applications and translation sent to the Human Resources For more details, visit: http://www.cis.ford- received by January 15, 2016. Office directly from an evaluation company affili- ham.edu/events/CybersecPostdoc.pdf For inquiries, contact: Palma Hutter at: hut- ated with the National Association of Credential [email protected]. Evaluation Services, Inc. (NACES). Fordham University, an independent, Catho- Equal Opportunity Employer Minorities/ Fordham University lic University in the Jesuit tradition, is commit- Women/Veterans/Disabled CIS Department ted to excellence through diversity and welcomes We are an E-verify employer Tenure-track Assistant Professors candidates of all backgrounds. Fordham is an EEO is the Law Equal Opportunity Employer. Fordham University invites applications for two tenure track Assistant Professor Positions in the Fordham University CIS Department, to start in fall 2016. The two Harvard University Department of Computer and Information positions require a Ph.D. in Computer Science, John A. Paulson School of Engineering and Science Information Science or related fields, a commit- Applied Sciences Postdoctoral Position in Cybersecurity ment to teaching excellence, and an active pro- Tenure-track Faculty Position in Computer gram of research. One of the positions is in Cyber- Science We invite applications for a postdoctoral re- security and the other in Data Analytics. searcher position in Cybersecurity in the Depart- These selected candidates are expected to The Harvard John A. Paulson School of Engineer- ment of Computer and Information Science of teach graduate and undergraduate courses in ing and Applied Sciences seeks applicants for a Fordham University. Computer and Information Science, and conduct position at the tenure-track level in Computer Sci- This position requires a Ph.D. in Computer high-quality research. ence, with an expected start date of July 1, 2016. Science or a closely related area with a specializa- For information about the department, visit This is a broad faculty search and we welcome

Florida International University is a comprehensive university offering 340 majors in Ideal candidates for junior positions should have a record of exceptional research in their 188 degree programs in 23 colleges and schools, with innovative bachelor’s, master’s early careers. Candidates for senior positions must have an active and proven record and doctoral programs across all disciplines including medicine, public health, law, of excellence in funded research, publications, and professional service, as well as a journalism, hospitality, and architecture. FIU is Carnegie-designated as both a research demonstrated ability to develop and lead collaborative research projects. In addition to university with high research activity and a community-engaged university. Located developing or expanding a high-quality research program, all successful applicants must in the heart of the dynamic south Florida urban region, our multiple campuses serve be committed to excellence in teaching at both the graduate and undergraduate levels. An over 55,000 students, placing FIU among the ten largest universities in the nation. Our earned Ph.D. in Computer Science or related disciplines is required. annual research expenditures in excess of $100 million and our deep commitment to Non-tenure track instructor positions (Job Opening 507474) engagement have made FIU the go-to solutions center for issues ranging from local We seek well-qualified candidates in all areas of Computer Science and Information to global. FIU leads the nation in granting bachelor’s degrees, including in the STEM Technology. Ideal candidates must be committed to excellence in teaching a variety of fields, to minority students and is first in awarding STEM master’s degrees to Hispanics. courses at the undergraduate level. A graduate degree in Computer Science or related Our students, faculty, and staff reflect Miami’s diverse population, earning FIU the disciplines is required; significant prior teaching and industry experience and/or a Ph.D. in designation of Hispanic-Serving Institution. At FIU, we are proud to be ‘Worlds Ahead’! Computer Science is preferred. For more information about FIU, visit fiu.edu. HOW TO APPLY: The School of Computing and Information Sciences (SCIS) seeks exceptionally qualified Qualified candidates for open- rank faculty positions are encouraged to apply to (Job candidates for tenure-track and tenured faculty positions at all levels as well as non-tenure Opening ID #508676); and candidates for instructor positions are encouraged to apply track faculty positions at the level of Instructor, including visiting instructor appointments. to (Job Opening ID # 507474). Submit applications at facultycareers.fiu.edu and SCIS is a rapidly growing program of excellence at the University, with 30 tenure-track attach cover letter, curriculum vitae, statement of teaching philosophy, research statement, faculty members and over 2,000 students, including over 80 Ph.D. students. SCIS offers etc as individual attachments. Candidates will be required to provide names and contact B.S., M.S., and Ph.D. degrees in Computer Science, an M.S. degree in Telecommunications information for at least three references who will be contacted as determined by the search and Networking, an M.S. degree in Cybersecurity, and B.S., B.A., and M.S. degrees in committee. To receive full consideration, applications and required materials should be Information Technology. SCIS has received over $22M in the last four years in external received by December 31st, 2015. Review will continue until position is filled. research funding, has six research centers/clusters with first-class computing and support infrastructure, and enjoys broad and dynamic industry and international partnerships. If you are interested in a visiting appointment please contact the department directly by emailing Dr. Mark Weiss at [email protected]. All other applicants should apply by going Open-Rank Tenure Track/Tenured Positions (Job ID# 508676) to facultycareers.fiu.edu. SCIS seeks exceptionally qualified candidates for tenure-track and tenured faculty positions at all levels. We seek well-qualified candidates in all areas; researchers in the areas of FIU is a member of the State University System of Florida and an Equal Opportunity, computer systems, cybersecurity, cognitive computing, data science, health informatics, Equal Access Affirmative Action Employer. All qualified applicants will receive and networking are particularly encouraged to apply. Preference will be given to candidates consideration for employment without regard to race, color, religion, sex, national origin, who will enhance or complement our existing research strengths. disability status, protected veteran status, or any other characteristic protected by law.

124 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 outstanding applicants in all areas of computer uate teaching and graduate training. and service. science, including applicants whose research Required application documents include a The department of Intelligent Systems Engi- and interests connect to such areas as engineer- cover letter, cv, a statement of research interests, neering is an innovative new program that focus- ing, medicine, and the social sciences. We are a teaching statement, and up to three represen- es on the engineering of systems of smaller-scale, particularly interested in candidates working in tative papers. Candidates are also required to often mobile devices that draw upon modern in- the broad areas of machine learning, human- submit the names and contact information for formation technology techniques including intel- computer interaction, programming languages, at least three and up to five references, and the ligent systems, big data and user interface design. and systems (including networking, architecture, application is complete only when three letters Its foundation also includes computer engineer- and databases). have been submitted. We encourage candidates ing, cyber-physical systems, sensor and detector The Computer Science program at Harvard to apply by December 15, 2015, but will continue technologies, signal processing, and information University is experiencing a period of strong to review applications until the position is filled. and control theory. We intend to add about 20 growth and expansion following an extraordinary Applicants will apply on-line at https://academic- new faculty over the next 4 years covering these gift in support of new faculty from alumnus and positions.harvard.edu/postings/6497 areas as well as interdisciplinary thrusts in bioen- former Microsoft CEO Steve Ballmer, ‘77, and the Harvard is an equal opportunity employer gineering, molecular and nanoscale engineering, largest gift in the University’s history, received and all qualified applicants will receive consid- environmental engineering and neuro-engineer- from John A. Paulson, M.B.A. ’80, in support of eration for employment without regard to race, ing with an interdisciplinary IT component. The SEAS. color, religion, sex, sexual orientation, gender program will offer a BS and a PhD for students Computer Science at Harvard benefits from identity, national origin, disability status, pro- entering in fall 2016, while a future MS will be outstanding undergraduate and graduate stu- tected veteran status, or any other characteristic developed this fall.The program will draw upon dents, world-leading faculty, an excellent loca- protected by law. IU Bloomington’s considerable education and tion, significant industrial collaboration, and research strengths in biology, chemistry, com- substantial support from the Harvard Paulson puter science, environmental science, informat- School. Information about Harvard’s current Indiana University ics, physics, network science, psychological and faculty, research, and educational programs School of Informatics and Computing brain sciences, business and law. New faculty will in computer science is available at http://www. Faculty Positions in Intelligent Systems have considerable opportunity and responsibil- seas.harvard.edu/computer-science. The associ- Engineering ity to shape the development of curricula and re- ated Institute for Applied Computational Science search. There will be a strong emphasis on world- (http://iacs.seas.harvard.edu) fosters connec- The School of Informatics and Computing (SoIC) class research, built around a few strong focused tions among computer science, applied math, at Indiana University (IU) Bloomington invites ap- laboratories and proactively involving undergrad- data science, and various domain sciences at Har- plications for faculty positions in Intelligent Sys- uates. More information can be found at https:// vard through its graduate programs and events. tems Engineering. Multiple positions are open www.engineering.indiana.edu/. The department Candidates are required to have a doctorate at all levels (asst, assoc, or full). Cluster hires are will be located in the new SoIC building which or terminal degree by the expected start date. In encouraged as are interdisciplinary applications will be complete in about 30 months. addition, we seek candidates who have a strong spanning the interests of SoIC and collaborating Interested candidates should review the research record and a commitment to undergrad- with IU units. Duties include teaching, research, application requirements and submit their

TENURE-TRACK AND TENURED FACULTY POSITIONS IN Call for INFORMATION SCIENCE AND TECHNOLOGY The newly launched ShanghaiTech University invites talented faculty candidates Postdoctoral Fellows in to fill multiple tenure-track/tenured positions as its core founding team in the School of Information Science and Technology (SIST). Candidates should have outstanding academic records or demonstrate strong potential in cutting-edge research areas EXECUTABLE BIOLOGY of information science and technology. They must be fluent in English. Overseas academic training is highly desired. Besides establishing and maintaining a world-class research profile, faculty candidates are also expected to contribute Executable biology is the study of biological systems substantially to graduate and undergraduate education within the school. ShanghaiTech is matching towards a world-class research university as a hub for as reactive dynamic systems (i.e., systems that evolve training future generations of scientists, entrepreneurs, and technological leaders. Located in a brand new campus in Zhangjiang High-Tech Park of the cosmopolitan with time in response to external events). Shanghai, ShanghaiTech is at the forefront of modern education reform in China. Academic Disciplines: We seek candidates in all cutting edge areas of information Are you a talented and motivated scientist looking science and technology that include, but not limited to: computer architecture and technologies, micro-electronics, high speed and RF circuits, intelligent for an opportunity to conduct research at the inter- and integrated information processing systems, computations, foundation and applications of big data, visualization, computer vision, bio-computing, smart energy/ section of BIOLOGY and COMPUTER SCIENCE at power devices and systems, next-generation networking, statistical analysis as well a young, dynamic institution that fosters scientific as inter-disciplinary areas involving information science and technology. Compensation and Benefits: Salary and startup funds are internationally excellence and interdisciplinary collaboration? competitive, commensurate with experience and academic accomplishment. We also offer a comprehensive benefit package to employees and eligible dependents, including housing benefits. All regular faculty members will be within our new tenure- Apply at www.ist.ac.at/executablebiology track system commensurate with international practice for performance evaluation Deadline December 31, 2015 and promotion. Qualifications: • Ph.D. (Electrical Engineering, Computer Engineering, Computer Science, or related field) • A minimum relevant research experience of 4 years. Applications: Submit (in English, PDF version) a cover letter, a 2-3 page detailed research plan, a CV with demonstrated strong record/potentials; plus copies of 3 most significant publications, and names of three referees to:sist@shanghaitech . edu.cn. For more information, visit http://www.shanghaitech.edu.cn. Deadline: December 31, 2015 (or until positions are filled).

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 125 CAREERS

application at: http://indiana.peopleadmin.com/ The Johns Hopkins University applications will begin in December 2015. While postings/1900 Department of Computer Science candidates who complete their applications by For full consideration applications are due by Tenure-Track Faculty Positions December 15, 2015 will receive full consideration, 2/1/16 but earlier submission is encouraged. the department will consider exceptional Questions regarding the positions or applica- The Johns Hopkins University’s Department of applicants at any time. tion process can be directed to isechair@indiana. Computer Science seeks applicants for tenure- The Johns Hopkins University is commit- edu or Faculty Search, SoIC, 919 E 10th St, Bloom- track faculty positions at all levels and across all ted to active recruitment of a diverse faculty and ington, IN 47408. areas of computer science. Particular emphasis student body. The University is an Affirmative Ac- Indiana University is an equal employment and is at the junior level and in the areas of systems, tion/Equal Opportunity Employer of women, mi- affirmative action employer and a provider of ADA distributed systems, networks and system secu- norities, protected veterans and individuals with services. All qualified applicants will receive con- rity, however, all qualified applicants in all areas disabilities and encourages applications from sideration for employment without regard to age, of computer science will be considered. these and other protected group members. Con- ethnicity, color, race, religion, sex, sexual orienta- The Department of Computer Science has 25 sistent with the University’s goals of achieving tion or identity, national origin, disability status or full-time tenured and tenure-track faculty mem- excellence in all areas, we will assess the compre- protected veteran status. bers, 8 research and 3 teaching faculty members, hensive qualifications of each applicant. 130 PhD students, 130 MSE/MSSI and 216 un- The Whiting School of Engineering and the dergraduate students. We have several affiliated Department of Computer Science are committed Indiana University – research centers and institutes including Center to building a diverse educational environment. Purdue University Fort Wayne for Language and Speech Processing, JHU Infor- Department of Computer Science mation Security Institute, Laboratory for Compu- Tenure-Track Faculty Position tational Sensing and Robotics, Institute for Data Macalester College Intensive Engineering and Science, Engineering Assistant Professor The Department of Computer Science at IPFW in Healthcare and more. More information about invites applications for a tenure-track position the Department of Computer Science can be Applications are invited for a tenure-track Com- at the Assistant Professor rank with expertise in found at www.cs.jhu.edu and about the Whiting puter Science position at Macalester College to Information Systems or Computer Science. The School of Engineering at www.engineering.jhu. begin Fall, 2016. Candidates must have or be com- department is ABET accredited in Computer Sci- edu/about/. Qualifications and required materi- pleting a PhD in CS and have a strong commitment ence. Candidates specializing in Business Intel- als can be found at http://www.cs.jhu.edu/about/ to both teaching and research in an undergraduate ligence, Information Science, Health Informat- employment-opportunities/. liberal arts environment. Areas of highest priority ics, Enterprise Systems and Business Process Applicants should submit a curriculum vitae, include computer and data security and privacy, Management, Information Management, or Web a research statement, a teaching statement, mobile and ubiquitous computing, human-com- Science will be considered. In addition to these three recent publications, and complete contact puter interaction, and visualization. See http:// specific areas, outstanding candidates in any information for at least three references. www.macalester.edu/mscs for details. Evaluation Computer Science area will receive full consider- Applications must be made on-line at https:// of applications will begin December 1. Apply URL: ation. The position will start August 15, 2016. academicjobsonline.org/ajo/jobs/6431. Review of https://academicjobsonline.org/ajo/jobs/5794 Requirements for the position are: Ph.D. in Information Systems, Computer Science, or closely related field; a strong record of research that supports the teaching and research missions of the Department; excellent communication and interpersonal skills; and an interest in working Director with students and collaborating with the busi- The College of Engineering at The Pennsylvania State University invites nominations and applications for the ness community. position of Director of the School of Electrical Engineering and Computer Science. The School houses the Electrical Job Responsibilities include: teaching under- Engineering Department and the Computer Science and Engineering Department. The College seeks an individual graduate and graduate level courses in Informa- who will provide innovative and energetic leadership with e ective administrative skills and a strong commitment tion Systems and Computer Science, academic to higher education. The Director will lead two Department Heads in administering the School, and will be a advising, and a strong pursuit of scholarly en- member of the College of Engineering Leadership Team. An ideal candidate will have the vision, energy, and experience to continue the process of integrating the two highly ranked departments into an agile, intellectually deavors. Service to and engagement with the broad school, and brand the new school and establish its footprint in learning, discovery and service to University, Department, and community is also the technical community and society. The candidate should have a widely recognized reputation in Electrical or required. Computer Engineering, Computer Science, or experience in related elds. An earned doctorate is required. Review of applications will begin December Established in 1893, the Department of Electrical Engineering is among the largest, oldest and the most innovative 15, 2015 and will continue until the position is in the nation. Approximately 550 undergraduate students and 220 graduate students are enrolled in the department, which has 40 faculty. The Department o ers BS, MS, MEng and PhD degrees in electrical engineering. The filled. A complete application should include Department of Computer Science and Engineering was created in 1993 with the merger of the Computer Engineering a cover letter, CV, statements on research and Program and the Computer Science Department. The department o ers B.S. degrees in both computer engineering teaching, with the names and contact informa- and computer science, and M.S., MEng, and Ph.D. degrees in computer science and engineering. There are approximately tion of three references. For teaching, we re- 550 undergraduate students and 150 graduate students enrolled in the department, with 43 faculty members. There quire evidence of teaching experience or effec- are a number of unique research facilities associated with the School of Electrical Engineering and Computer Science. tiveness and a 1-2 page teaching philosophy. All In recent years, our faculty received $35M Network Science Center Award, a $48M Collaborative Research Alliance for the Science of Security, a $10M NSF Expeditions in Computing Award, a $5M Dow Chemical Award for Flexible Electronics materials must be submitted electronically to and a $2.9 M DOE ARPA-E Award for Solar Cell Research. Growth is planned in core areas as well as numerous the Chair of the Search Committee at cssearch@ interdisciplinary areas. These key areas are strongly supported by the University through the Institute for Cyber ipfw.edu. All candidates who are invited to in- Science, the Huck Institutes of the Life Sciences, the Materials Research Institute, the Penn State Institutes of Energy terview on campus will be required to prepare and the Environment, and the Applied Research Laboratory. Nominations and applications will be considered a 45 to 60 minute instructional student-based until the position is lled. Candidates from academia, industry or government agencies are encouraged to apply. Screening of applicants will begin on November 1, 2015 and it is intended that the position be lled by the beginning presentation. An official transcript and three of the 2016/17 academic year. Applicants should submit a statement of professional interests, a curriculum vita, and letters of recommendation are required upon the names and addresses of four references. Please submit these three items in one pdf le electronically to acceptance of offer. Employment is contingent http://apptrkr.com/677384. Applications will be treated with the strictest condence. Inquiries can be made to on a satisfactory background records check. Thomas La Porta via e-mail to [email protected] or by phone at: 814-865-6725. IPFW is an EEO/AA employer fully commit- CAMPUS SECURITY CRIME STATISTICS: For more about safety at Penn State, and to review the Annual Security Report which contains information about crime statistics and other safety and security matters, please go ted to achieving a diverse workforce. All indi- to http://www.police.psu.edu/clery/ , which will also provide you with detail on how to request a hard copy of the viduals, including minorities, women, individu- Annual Security Report. Penn State is an equal opportunity, armative action employer, and is committed to als with disabilities, and protected veterans are providing employment opportunities to all qualied applicants without regard to race, color, religion, age, sex, encouraged to apply. sexual orientation, gender identity, national origin, disability or protected veteran status.

126 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 Max Planck Institute for Software The successful Head will provide: Candidates for this position are expected to Systems ˲˲Vision and leadership for nationally recognized hold a PhD in computer science or closely related Tenure-track openings computing education and research programs field (ABDs may be considered). Level of appoint- ˲˲Exceptional academic and administrative skills ment is commensurate with qualifications and Applications are invited for tenure-track faculty ˲˲A strong commitment to faculty recruitment experience. The preferred candidate will have an positions in all areas related to the theory and and development established research record as demonstrated by practice of software systems, including secu- ˲˲A strong commitment to promoting diversity significant publications beyond dissertation or rity and privacy, embedded and mobile systems, research grants in relevant areas. Experience in computational social science, legal, economic, Applicants must have a Ph.D. in computer teaching university courses in computer science and social aspects of computing, NLP, machine science, software engineering, computer engi- is also preferred. learning, information and knowledge manage- neering, or a closely related field. The successful Applicants must apply on-line http://www.jobs. ment, programming languages, verification, par- candidate must have earned national recognition msstate.edu/ (PARF 9253) and complete a Personal allel and distributed systems. by a distinguished record of accomplishments in Data Information Form. A letter of application, cur- A doctoral degree in computer science or re- computer science education and research. Dem- riculum vita, teaching and research statements, lated areas and an outstanding research record onstrated administrative experience is desired, and names and contact information of at least are required. Successful candidates are expected as is teaching experience at both the undergradu- three references must also be submitted. Review of to build a team and pursue a highly visible re- ate and graduate levels. The successful candidate applications will begin as early as November 2015 search agenda, both independently and in col- must qualify for the rank of professor. and will continue until the position is filled. laboration with other groups. Applicants must apply online at www.jobs. MSU is an equal opportunity employer, and MPI-SWS, founded in 2005, is part of a net- msstate.edu (PARF#9306) by completing the all qualified applicants will receive consideration work of over 80 Max Planck Institutes, Germany’s Personal Data Information Form and submitting for employment without regard to race, color, premier basic research facilities. MPIs have an a cover letter outlining your experience and vi- religion, ethnicity, sex (including pregnancy and established record of world-class, foundational sion for this position, a curriculum vitae, and the gender identity), national origin, disability status, research in the sciences, technology, and the names and contact information of at least three age, sexual orientation, genetic information, pro- humanities. The institute offers a unique en- professional references. tected veteran status, or any other characteristic vironment that combines the best aspects of a Screening of candidates will begin January protected by law. We always welcome nomina- university department and a research laboratory: 15, 2016 and will continue until the position is tions and applications from women, members Faculty enjoy academic freedom, receive institu- filled. MSU is an equal opportunity employer, and of any minority group, and others who share our tional funding and attract additional third-party all qualified applicants will receive consideration passion for building a diverse community that re- funds for employment without regard to race, color, flects the diversity in our student population. to build and lead a team of graduate students religion, ethnicity, sex (including pregnancy and and post-docs; they supervise doctoral theses, gender identity), national origin, disability status, and have the opportunity to teach graduate and age, sexual orientation, genetic information, pro- North Carolina State University undergraduate courses. The institute offers out- tected veteran status, or any other characteristic Department of Computer Science standing technical infrastructure and adminis- protected by law. We always welcome nomina- Security Faculty Positions trative support, as well as internationally com- tions and applications from women, members petitive compensation. of any minority group, and others who share our The Department of Computer Science at North The institute is located in Kaiserslautern passion for building a diverse community that re- Carolina State University (NCSU) seeks to fill ten- and Saarbruecken, in the tri-border area of Ger- flects the diversity in our student population. ure-track faculty positions in the area of Security many, France and Luxembourg. We maintain an Please direct any questions to Dr. Jonathan starting August 16, 2016. international and diverse work environment and Pote, Search Committee Chair (662) 325-3280 or Successful security candidates must have a seek applications from outstanding research- [email protected]). strong commitment to academic and research ers worldwide. The working language is English; excellence, and an outstanding research record knowledge of the German language is not re- commensurate with the expectations of a major quired for a successful career at the institute. Mississippi State University research university. Required credentials include Qualified candidates should apply at “https:// Department of Computer Science and a doctorate in Computer Science or a related apply.mpi-sws.org/”. To receive full consider- Engineering field. While the department expects to hire at the ation, applications should be received by Decem- Faculty Positions in Computer Science and Assistant Professor level, candidates with excep- ber 15, 2015. Engineering tional research records are encouraged to apply The institute is committed to increasing the for a senior position. The department is one of representation of minorities, women and individ- The Department of Computer Science and Engi- the largest and oldest in the country. It is part of a uals with physical disabilities in Computer Sci- neering (http://www.cse.msstate.edu) is seeking top US College of Engineering, and has excellent ence. We particularly encourage such individuals to fill one open position for a tenure-track faculty and extensive ties with industry and government to apply. member at the Assistant/Associate Professor lev- laboratories. The department’s research expendi- The initial tenure-track appointment is for el. The primary areas of interest for this position tures and recognition have been growing steadily five years; it can be extended to seven years based is software engineering and computer security, as has the recognition of our impact in the areas on a midterm evaluation in the fourth year. A per- however exceptional candidates in all areas will of security, systems, software engineering, educa- manent contract can be awarded upon a success- be considered. tional informatics, networking, and games. For ful tenure evaluation in the sixth year. Mississippi State University is a comprehen- example, we have one of the largest concentra- sive land-grant institution with approximately tions of NSF Early Career Award winners (25 of 20,000 students and about 1,300 faculty mem- our current or former faculty have received one). Mississippi State University bers. The Department of Computer Science and NCSU is located in Raleigh, the capital of Department of Computer Science and Engineering has 16 tenure-track faculty posi- North Carolina, which forms one vertex of the Engineering tions and offers academic programs leading to world-famous Research Triangle Park (RTP). RTP Professor and Head the bachelor’s, master’s and doctoral degrees is an innovative environment, both as a metropol- in computer science and bachelor’s degrees in itan area with one of the most diverse industrial Applications and nominations are being sought software engineering and computer engineering. bases in the world, and as a center of excellence for the Professor and Head of the Department Faculty members and graduate students work promoting technology and science. The Research of Computer Science and Engineering (www.cse. with a number of on- campus research centers. Triangle area is routinely recognized in nation- msstate.edu) at Mississippi State University. The Research expenditures total about $5.2 million wide surveys as one of the best places to live in the Head is responsible for the overall administra- dollars annually and the university as a whole is U.S. We enjoy outstanding public schools, afford- tion of the department and this is a 12-month ranked 72nd among U.S. institutions in computer able housing, and great weather, all in the prox- tenured position. science expenditures. imity to the mountains and the seashore.

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 127 CAREERS

Applications will be reviewed as they are re- applications for one or more full time nine- with a proven record of working with robots in ceived. The positions will remain open until suit- month tenure-track faculty positions in the area real environments. Duties include teaching un- able candidates are identified. Applicants are of data science and engineering broadly con- dergraduate and graduate courses, conducting encouraged to apply by December 15, 2015. Ap- strued. Subareas of interest include (but are not research, securing research funding, and ser- plicants should submit the following materials limited to) databases and data management, vice. Candidates should show evidence of strong online at http://jobs.ncsu.edu (reference position visualization and visual analytics, security, data research promise, potential for developing an number 00001096) cover letter, curriculum vi- mining, and signal processing for big data. The externally-funded research program, and com- tae, research statement, teaching statement, and appointments are anticipated at the Assistant mitment to quality advising and teaching at the names and complete contact information of four Professor rank, but exceptionally strong candi- graduate and undergraduate levels. Applicants references, including email addresses and phone dates may be considered at the rank of Associate who are women or under-represented minorities numbers. Candidates can obtain information Professor or Professor. are strongly encouraged to apply. about the department and its research programs, A Ph.D. in Computer Science, Electrical and Oregon State University is located in Corval- as well as more detail about the position advertised Computer Engineering, or related field is re- lis, a college town renowned for its high quality here at http://www.csc.ncsu.edu/. Inquiries may be quired by the start of employment. Duties include of life. sent via email to: [email protected]. teaching undergraduate and graduate courses, For full consideration, apply online by Feb 15, NCSU is an equal opportunity and affirmative conducting research in the area of interest, se- 2016 with a letter of interest; vita; two-page state- action employer. In addition, NCSU welcomes all curing research funding, and service. Candidates ment of research interests; one-page statement of persons without regard to sexual orientation or should show evidence of strong research prom- teaching interests; one-page statement of philos- genetic information. Individuals with disabilities ise, potential for developing an externally funded ophy toward equity, inclusion and diversity; and requiring disability-related accommodations in research program, and commitment to quality names and contact information for at least three the application and interview process please call advising and teaching at the graduate and under- references. (919) 515-3148. graduate levels. Applicants should demonstrate a More information, including a complete strong commitment to collaboration with other position description is available at http:// research groups in the School of EECS, with other robotics.oregonstate.edu/jobs. The Ohio State University departments at Oregon State University, and out- Computer Science and Engineering side the university. Applicants who are women Department or under-represented minorities are strongly en- Pacific Lutheran University Multiple Tenure-Track Positions at the couraged to apply. Assistant Professor of Computer Science and Assistant Professor level The school of EECS emphasizes a culture of Computer Engineering collegiality and excellence in both research and The Computer Science and Engineering Depart- education. With 56 tenured/tenure-track faculty, PLU invites applications for a tenure-track Assis- ment at The Ohio State University seeks to fill we enroll 200 PhD, 250 MS and 2,800 undergradu- tant Professor to begin September 1, 2016. This multiple tenure-track positions at the assistant ate students. position teaches a variety of lower- and upper- professor level. We are particularly interested in Oregon State University is located in Corval- division courses. For qualifications and details, recruiting in the following areas: cybersecurity, lis, a college town renowned for its high quality visit https://employment.plu.edu. Apply: https:// machine learning, distributed systems & cloud of life. employment.plu.edu/postings/3715. EOE/AA computing, and data management. For full consideration, apply online by The department is committed to enhancing Feb 15, 2016 with a letter of interest; vita; two- faculty diversity; women, minorities, and individ- page statement of research interests; one-page Purdue University uals with disabilities are especially encouraged to statement of teaching interests; one-page Department of Computer Science apply. statement of philosophy toward equity, inclusion Tenure-Track/Tenured Faculty Positions Some of these positions are partially funded and diversity; and names and contact information by the university-wide Discovery Themes Initia- for at least three references. For more information, The Department of Computer Science at Purdue tive, a significant investment in key thematic ar- visit eecs.oregonstate.edu/jobs. University is in a phase of significant growth. Ap- eas, including the Data Analytics Collaborative plications are solicited for seven tenure-track and which will establish a singular presence in data tenured positions at the Assistant, Associate and analytics at Ohio State. The university is also re- Oregon State University Full Professor levels. Outstanding candidates in sponsive to dual-career families and strongly pro- Tenure-Track Faculty Position in Robotics all areas of computer science will be considered. motes work-life balance through a suite of insti- Review of applications and candidate interviews tutionalized policies. The Oregon State University Robotics program will begin early in October 2015, and will contin- Applicants should hold or be completing seeks applications for full-time faculty positions ue until the positions are filled. a PhD in computer science & engineering or a to support our M.S. and Ph.D. programs in Robot- The Department of Computer Science offers a closely related field, have a commitment to and ics (http://robotics.oregonstate.edu). All areas of stimulating academic environment with research demonstrated record of excellence in research, robotics will be considered. Appointments are programs in most areas of computer science. In- and a commitment to excellence in teaching. anticipated at the Assistant Professor rank, but formation about the department and a descrip- To apply, please submit your application via exceptionally strong candidates may be consid- tion of open positions are available at http://www. the online database. The link can be found at: ered at the rank of Associate Professor or Profes- cs.purdue.edu. https://web.cse.ohio-state.edu/cgi-bin/portal/ sor. The successful candidates will have office Applicants should hold a PhD in Computer fsearch/apply.cgi and laboratory space physically located with the Science, or related discipline, be committed to Review of applications will begin in December existing Robotics group, and will have an admin- excellence in teaching, and have demonstrated and will continue until the positions are filled. istrative home most appropriate to their area of excellence in research. Successful candidates will The Ohio State University is an Equal Oppor- expertise within the College of Engineering. As a be expected to conduct research in their fields tunity/Affirmative Action Employer. Land/Sea/Air/Space grant institution, with strong of expertise, teach courses in computer science, ties to oceanography and the NOAA fleet, as well and participate in other department and univer- as hosting an FAA UAV test site, there are many sity activities. Salary and benefits are competitive, Oregon State University opportunities to collaborate across disciplines and Purdue is a dual career friendly employer. Ap- School of Electrical Engineering and Computer and utilize Robotics as an enabling technology. plicants are strongly encouraged to apply online Science Candidates should hold a Ph.D. degrees in at https://hiring.science.purdue.edu. Alternative- Tenure-Track position in Data Science and robotics, mechanical engineering, electrical and ly, hardcopy applications can be sent to: Faculty Engineering computer engineering, computer science, or Search Chair, Department of Computer Science, other robotics-related disciplines by the start of 305 N. University Street, Purdue University, West The School of Electrical Engineering and Com- employment, and have a demonstrated record of Lafayette, IN 47907. A background check will be puter Science at Oregon State University invites scholarship. We particularly welcome candidates required for employment. Purdue University is

128 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 an EEO/AA employer fully committed to achiev- a student population of great diversity – in age, be given to complete applications received by ing a diverse workforce. All individuals, including cultural background, ethnicity, primary language February 15. Applications for both positions will minorities, women, individuals with disabilities, and academic preparation – through course ma- continue to be accepted after these dates until the and protected veterans are encouraged to apply. terials, teaching strategies and advisement. positions are filled. Applications should include a cover letter, Salary Range: vita, teaching statement, research statement, San Diego State University Commensurate with qualifications and and three letters of reference, at least one (pref- Department of Computer Science experience. erably two) of which should speak to the candi- Chair of Computer Science Starting Date: August 22, 2016 date’s teaching ability. In your cover letter, please Employment is contingent upon proof of briefly describe your current research agenda; Department of Computer Science at SDSU seeks eligibility to work in the United States. what would be attractive to you about teaching candidates for the Chair position with a PhD in For full description and consideration, visit in a liberal arts college environment; and what Computer Science or a closely related field, and http://apptrkr.com/681277 background, experience, or interests are likely to a sustained record of supported research. The De- Screening of candidates will begin February 1, make you a strong teacher of Swarthmore College partment is a dynamic and growing unit looking 2016. The positions remain open until filled. students. for a visionary Chair to lead it into its next phase The Computer Engineering program at SJSU Tenure-track applications are being accepted of expansion. was ranked number one in the nation among online at https://academicjobsonline.org/ajo/ We strive to build and sustain a welcoming public institutions in its category by US News in jobs/6161. environment for all. SDSU is seeking applicants 2014. Ideally situated at the center of Silicon Val- Visiting applications are being accepted online with commitment to working effectively with in- ley, the program enjoys close and multi-leveled at https://academicjobsonline.org/ajo/jobs/6173. dividuals from diverse backgrounds and mem- industry ties and offers ample opportunities for Candidates may apply for both positions. bers of underrepresented groups. industry collaboration and applied research. For more details and application proce- The Computer Engineering Department at dures, please apply via http://apply.interfolio. San José State University is a leading provider Swarthmore College com/31841. of engineering professionals to Silicon Valley’s Department of Computer Science SDSU is a Title IX, equal opportunity high-tech industries. It provides its faculty with Lab Lecturer employer. A full version of this ad can be found at: opportunities for close collaborative ties with http://cs.sdsu.edu/ industry and research partners in Silicon Valley. The Department of Computer Science is cur- The department serves more than 2500 under- rently accepting applications for a Lab Lecturer. graduate and graduate students and offers BS The Lab Lecturer position is full time during the San José State University and MS degrees in both computer and software academic year (Fall and Spring semesters) with Computer Engineering Department engineering. AA/EOE. summers off. The start date is January 11, 2016. Assistant/Associate Professor, Computer/ Swarthmore College has a strong institu- Software Engineering tional commitment to excellence through diver- Swarthmore College sity and in its educational program and employ- Job Opening ID (JOID): 23459 Computer Science Department ment practices. The College actively seeks and The Computer Engineering Department at San Tenure Track and Visiting Positions welcomes applications from candidates with ex- José State University invites applications for 4 ten- ceptional qualifications, particularly those with ure-track faculty positions at the rank of Assistant The Computer Science Department invites ap- demonstrated commitments to a more inclusive or Associate Professor. Areas of particular inter- plications for one tenure-track position and mul- society and world. est include cloud computing and virtualization, tiple visiting positions at the rank of Assistant Swarthmore College is a small, selective, big data analytics, networking, storage, mobile Professor to begin Fall semester 2016. liberal arts college located 10 miles outside of systems, parallel and distributed systems, cyber Swarthmore College has a strong institutional Philadelphia. The Computer Science Department security, and embedded systems, but other areas commitment to excellence through diversity and offers majors and minors at the undergraduate in computer and software engineering will also inclusivity in its educational program and em- level. be considered. ployment practices. The College actively seeks The Lab Lecturer position is an Instructional and welcomes applications from candidates with Staff position at the college. The responsibilities Qualifications: exceptional qualifications, particularly those of the position include, but are not limited to: ˲˲Applicants must have a Ph.D. in Computer En- with demonstrated commitments to a more in- teaching lab sections of the introductory courses gineering, Software Engineering, Computer Sci- clusive society and world. in the Computer Science Department; working ence, or Electrical Engineering. Swarthmore College is a small, selective, with faculty to develop lab assignments for the ˲˲For appointment at the Assistant Professor liberal arts college located 10 miles outside of introductory courses; creating lab assignment rank, the candidate must demonstrate potential Philadelphia. The Computer Science Department write-ups and documentation on tools used in for teaching and scholarly excellence. For ap- offers majors and minors at the undergraduate introductory labs; supporting faculty in creating pointment at an Associate rank, the candidate level. and setting up lab code examples, documenta- must have a record of broad teaching experi- Applicants must have teaching experience tion, and software tools for lab work; lab grading ence and significant scholarly and professional and should be comfortable teaching a wide range and coordinating student graders; and holding achievements commensurate with the rank. of courses at the introductory and intermedi- regular office hours and helping students in the ˲˲Applicants should have awareness of and sen- ate level. Candidates should additionally have a lab during open lab hours. More information sitivity to the educational goals of a multicultural strong commitment to involving undergraduates about the Computer Science Department can be population as might have been gained in cross- in their research. A Ph.D. in Computer Science at found on our website at www.cs.swarthmore.edu. cultural study, training, teaching and other com- or near the time of appointment is required. A master’s degree or Ph.D. in computer sci- parable experience. For the tenure-track position, we are particu- ence or a related field with extensive computer larly interested in applicants whose areas will science background is required. Prior teaching Responsibilities: complement and broaden our program, includ- experience at the college level is preferred. ˲˲A faculty member is expected to teach, super- ing theory and algorithms, programming lan- Applications should include a vita, teaching vise, and advise students in both undergraduate guages, and systems areas. Strong applicants in statement, and two letters of reference that speak and graduate programs, and to establish a re- other areas will also be considered. to the candidate’s teaching ability. search program related to his/her field of interest. For the visiting position, strong applicants in Applications are being accepted online at ˲˲A faculty member will participate in depart- any area will be considered. https://academicjobsonline.org/ajo/jobs/6465. ment, college, and university committee and For the tenure-track position, priority will be We will begin reviewing applications on Novem- other service assignments. given to complete applications received by De- ber 9. Applications will continue to be accepted ˲˲A faculty member must address the needs of cember 15. For the visiting position, priority will until the position is filled.

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 129 CAREERS

Trinity College, Hartford, Connecticut campus (http://nws.noaa.gov/oh/nwc/). The CS fied applicants will receive consideration for em- Assistant Professor of Computer Science department has twenty-four faculty members (15 ployment without regard to race, color, religion, tenured/tenure track faculty, seven of whom have sex, sexual orientation, gender identity, national Applications are invited for a tenure-track posi- interests in software engineering), over 600 un- origin, age, disability protected veteran status, or tion in computer science at the rank of Assistant dergraduates in an ABET accredited B.S. degree any characteristic protected by law. Professor to start in the fall of 2016. Candidates program, and 40 graduate students. The depart- must hold a Ph.D. in computer science at the time ment also offers a Software Engineering Concen- of appointment. We are seeking candidates with tration for its undergraduates. University of California, Riverside teaching and research interests in applied areas For additional details and to apply, visit http:// Bourns College of Engineering associated with Big Data (such as database and se.cs.ua.edu/facultyjobs or contact Dr. Jeffrey Faculty Position in Big Data Management information systems, data mining and knowl- Carver ([email protected]). Review of applications edge discovery, machine learning and cloud com- will begin immediately. The University of The University of California at Riverside (UCR) is puting), but other related areas will be seriously Alabama is an equal opportunity/affirmative embarking on a major new hiring initiative that considered. Applications should be submitted action employer. Women and minority applicants will add 300 tenured and tenure-track positions to: https://trincoll.peopleadmin.com/. Consid- are particularly encouraged to apply. in 33 cross-disciplinary areas selected through a eration of applications will begin on December peer-reviewed competition. Over the next three 15, 2015. Trinity College is an Equal-Opportunity/ years, we will hire multiple faculty members in Affirmative-Action employer. University of California, Riverside each area and invest in research infrastructure Department of Computer Science and to support their work. This initiative will build Engineering critical mass in vital and emerging fields of schol- Tufts University Bourns College of Engineering arship, foster truly cross-disciplinary work and Department of Computer Science in the School Faculty Positions further diversify the faculty at one of America’s of Engineering most diverse research universities. We encourage Assistant/Associate Professor in Machine The Department of Computer Science and Engi- applications from scholars committed to excel- Learning/Data Mining neering, University of California, Riverside invites lence and seeking to help redefine the research applications for two tenure-track and/or tenured university for the next generation. The Department of Computer Science in the faculty positions to begin in the 2016-17 academ- The Bourns College of Engineering is leading School of Engineering at Tufts University invites ic year. Priority will be given to candidates in the cluster hires to enhance UCR’s research strengths applications for a tenure-stream faculty appoint- areas of (1) Computer Graphics, Animation, and in Data Science. Five such hires have been ap- ment to begin in September 2016. We are seeking Gaming, and (2) Software Engineering; however, proved for the Data Science cluster and will have outstanding junior-level or mid-career-level can- exceptional candidates in other areas may also potential home departments in Engineering or didates for an appointment at the rank of Assis- be considered. Salary level will be competitive the Sciences. Candidates are expected to foster tant Professor or Associate Professor. and commensurate with qualifications and ex- research collaborations with existing faculty We welcome outstanding applicants with a perience. A Ph.D. in Computer Science (or in a across academic departments working on Data strong vision and research programs in the areas closely related field) at the time of employment Science related topics (including astronomy, bio- of machine learning and data mining broadly in- is required. Senior candidates need to have an logical sciences, computational biology, environ- terpreted to include related fields where data and outstanding record of research, funding support, mental sciences, physics, precision agriculture, its analysis are central. These include, for exam- teaching, and graduate student mentorship, etc.) This year we are seeking to fill one tenured/ ple, adaptive systems in robotics, computational while junior candidates need to show potential tenure-track faculty position from the Data Sci- linguistics, computational sustainability, and vi- to excel in these areas. Advancement through ence cluster, in the area of Big Data Management sual analytics. the faculty ranks at the University of California is with emphasis on scalable data management, big Application materials should be submitted through a series of structured, merit-based evalu- data analytics, and scalable data mining. While online through Interfolio at ations, occurring every 2-3 years, each of which priority will be given to senior candidates, prom- https://apply.interfolio.com/30915 includes substantial peer input. ising junior candidates will also be considered For more information please visit The CSE department offers several under- and are encouraged to apply. http://www.cs.tufts.edu/. graduate degrees, as well as MS and Ph.D. degrees Salary level will be competitive and commen- Inquiries should be emailed to in Computer Science. The Department currently surate with qualifications and experience. A Ph.D. [email protected]. has 25 faculty members, including multiple ACM/ in a relevant area at the time of employment is a IEEE/AAAS Fellows and Young Investigator/NSF minimum requirement. Senior candidates need Review of applications will begin January 5, CAREER award holders, who pride themselves in to have an outstanding record of research, fund- 2016 and will continue until the position is filled. combining top quality teaching with cutting edge ing support, teaching, and graduate student men- Tufts University is an Affirmative Action/ research. The research projects in the depart- torship, while junior candidates need to show po- Equal Opportunity employer. We are committed ment are funded by federal (NSF, NIH, AFOSR, tential to excel in these areas. to increasing the diversity of our faculty. Mem- DoD) or industrial sponsors, with the new awards UCR is a world-class research university with bers of underrepresented groups are strongly en- for 2015/16 exceeding 6 million dollars. More in- an exceptionally diverse undergraduate student couraged to apply. formation regarding the department is available body. Its mission is explicitly linked to providing at http://www.cs.ucr.edu. routes to educational success for underrepre- Full consideration will be given to applica- sented and first-generation college students. A University of Alabama tions received by January 2, 2016. We will contin- commitment to this mission is a preferred quali- Tenured/Tenure-Track Faculty Positions, ue to consider applications until the positions are fication. Computer Science filled. To apply, please register through the web- Advancement through the faculty ranks at Software Engineering Focus: Data Analytics link at http://www.engr.ucr.edu/facultysearch/. the University of California is through a series of and Computational Modeling For inquiries and questions, please contact us at structured, merit-based evaluations, occurring [email protected]. every 2-3 years, each of which includes substan- Openings exist for two Assistant/Associate/Full UCR is a world-class research university with tial peer input. professors in software engineering with specific an exceptionally diverse undergraduate student Full consideration will be given to applica- application to data analytics or computational body. Its mission is explicitly linked to providing tions received by January 4, 2016. We will con- modeling starting in Fall 2016. Outstanding can- routes to educational success for underrepre- tinue to consider applications until the position didates in all areas will be considered. At the time sented and first-generation college students. A is filled. To apply, please register through the we- of appointment, candidates must have earned commitment to this mission is a preferred quali- blink at http://www.engr.ucr.edu/facultysearch/. a Ph.D. in Computer Science or a related field. fication. For inquiries and questions, please contact us at Candidates will be expected to form collabora- The University of California is an Equal Op- [email protected]. tions with the new NOAA Water Center on UA’s portunity/Affirmative Action Employer. All quali- The University of California is an Equal Op-

130 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 portunity/Affirmative Action Employer. All quali- (or in a closely related field) at the time of employ- which opened in 2009. fied applicants will receive consideration for em- ment. To apply, please register through the web- To apply, applicants must complete an online ployment without regard to race, color, religion, link at http://www.engr.ucr.edu/facultysearch/. job application at https://www.jobswithucf.com/ sex, sexual orientation, gender identity, national For inquiries and questions, please contact us at postings/43185. In addition to the application, origin, age, disability protected veteran status, or [email protected]. candidates must also submit a signed cover let- any characteristic protected by law. The University of California is an Equal Op- ter, complete curriculum vitae, maximum two portunity/Affirmative Action Employer. All quali- page statement outlining research vision and fied applicants will receive consideration for em- teaching interests, and a list of at least three University of California, Riverside ployment without regard to race, color, religion, professional references complete with address, Bourns College of Engineering sex, sexual orientation, gender identity, national phone number, and email address. Department of Computer Science and origin, age, disability, protected veteran status, or Engineering any other characteristic protected by law. Faculty Positions in Cybersecurity University of Colorado Boulder Department of Computer Science The University of California at Riverside (UCR) is University of Central Florida Assistant Professor embarking on a major new hiring initiative that Computer Science Division, College of will add 300 tenured and tenure-track positions Engineering and Computer Science The Department of Computer Science at the in 33 cross-disciplinary areas selected through a Assistant or Associate Professor University of Colorado Boulder seeks applica- peer-reviewed competition. Over the next three tions for multiple tenure-track positions. The years, we will hire multiple faculty members in The Computer Science Division in the College openings are targeted at the level of Assistant each area and invest in research infrastructure of Engineering and Computer Science at UCF Professor, although candidates at higher ranks to support their work. This initiative will build invites applications for three tenure-track (or may be considered. Research areas of particular critical mass in vital and emerging fields of schol- tenured) positions at the assistant (or associate) interest include, but are not limited to, the areas arship, foster truly cross-disciplinary work and professor level, starting Fall 2016. Successful can- of secure and reliable software systems (with an further diversify the faculty at one of America’s didates will have a record of high-quality publica- emphasis in software engineering and/or secu- most diverse research universities. We encourage tions and be recognized for their expertise and rity), network science, scientific data analysis applications from scholars committed to excel- the impact of their research. Successful candi- and visualization, computer systems as applied lence and seeking to help redefine the research dates at the level of associate professor must have to autonomous and networked devices, and theo- university for the next generation. a record commensurate with that rank. We wel- retical computer science. Candidates must have a The Department of Computer Science and come exceptional applicants from all CS research Ph.D. in computer science or a related discipline Engineering, University of California, Riverside areas though we are particularly interested in and must show promise in their ability to develop invites applications for three tenure-track and/or applicants in the following areas: cyber security, an independent and internationally recognized tenured faculty positions in Cyber-security, with algorithms and the theory of computing, and hu- research program. They must also display an abil- particular interest in: (a) operating systems/dis- man-computer interaction. ity, a record of excellence, and/or a commitment tributed system security, (b) software security, (c) All applicants must have a Ph.D. from an ac- to teaching and working with undergraduate and applied cryptography, and (d) human computer credited institution in an area appropriate to graduate students of diverse backgrounds. Our interaction for understanding and improving the Computer Science and a strong commitment to department values inclusive excellence and we security of systems. Senior candidates need to the academic process, including scholarly publi- seek candidates that understand the benefits that have a strong record of research, teaching, and cations, sponsored research, and teaching. diversity brings to scientific innovation and who, graduate student mentorship, while junior can- Computer Science at UCF has a rapidly-grow- through their work, develop technologies that didates needs to show potential to excel in these ing educational and research program with over impact a wide range of communities. Our depart- areas. $4.5 million in research contracts and expendi- ment is also responsive to dual career situations. The CSE department offers several under- tures annually and over 215 graduate students. We will be accepting applications starting No- graduate degrees, as well as MS and Ph.D. degrees Computer Science has strong areas of research vember 5th, 2015. Details available at: http://www. in Computer Science. The Department currently in Computer Vision, Machine Learning, Virtual cs.colorado.edu/~kena/2015CSFacultySearch/ has 25 faculty members, including multiple ACM/ and Mixed Reality, Big Data, and Human-Com- The University of Colorado Boulder is an IEEE/AAAS Fellows and Young Investigator/NSF puter Interaction. The CS Division is also well- Equal Opportunity Employer. CAREER award holders, who pride themselves in known for the success of its two-time defending The University of Colorado Boulder conducts combining top quality teaching with cutting edge National Champion Cyber Defense team and the background checks for all final applicants. research. The research projects in the depart- exceptional record of its programming teams in ment are funded by federal (NSF, NIH, AFOSR, regional, national, and world competitions. More DoD) or industrial sponsors, with the new awards information about the Computer Science Divi- University of Detroit Mercy for 2015/16 exceeding 6 million dollars. More in- sion can be found at http://www.cs.ucf.edu/. Department of Mathematics, Computer formation regarding the department is available Research sponsors include NSF, NIH, NASA, Science, and Software Engineering at http://www.cs.ucr.edu. DOT, DARPA, ONR, and other agencies of the Tenure Track Faculty Position UCR is a world-class research university with DOD. Industry sponsors include AMD, Boeing, an exceptionally diverse undergraduate student Canon, Electronic Arts, General Dynamics, Har- University of Detroit Mercy seeking a faculty body. Its mission is explicitly linked to providing ris, Hitachi, Intel, Lockheed Martin, Oracle, SAIC, member in Computer Science or Software Engi- routes to educational success for underrepre- Symantec, Toyota USA, and Walt Disney World, as neering with emphasis on teaching and research sented and first-generation college students. A well as local startups. with UG & GR students. The deadline for submis- commitment to this mission is a preferred quali- UCF has the top-tier Carnegie Foundation sions is December 20, 2015. Visit: http://eng-sci. fication. designation of a “very high research activity” uni- udmercy.edu/faculty/faculty-positions/ Advancement through the faculty ranks at versity, is the nation’s second largest university, the University of California is through a series of and is ranked by U.S. News and World Report as structured, merit-based evaluations, occurring the third most up-and-coming university in terms University of Illinois at Urbana- every 2-3 years, each of which includes substan- of innovative changes in the areas of academics, Champaign tial peer input. faculty, and student life. Adjacent to UCF is a Department of Electrical and Computer Full consideration will be given to applica- thriving research park that hosts more than 100 Engineering (ECE) tions received by January 2, 2016. We will contin- high-technology companies and the Institute for Positions in Computing ue to consider applications until the positions are Simulation and Training. The Central Florida filled. Salary level will be competitive and com- area is designated by the State of Florida as the The Department of Electrical and Computer En- mensurate with qualifications and experience. Center of Excellence in Modeling and Simula- gineering (ECE) at the University of Illinois at Ur- Positions require a Ph.D. in Computer Science tion. UCF also has an accredited medical school, bana-Champaign invites applications for faculty

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 131 CAREERS

positions at all levels and in all areas in comput- guistics, electrical engineering, computer engi- We have 1500 undergraduate CS and CE majors ing, broadly defined, with particular emphasis on neering, and the humanities. We therefore favor and 400 MS and PhD students in our CS, CE, EE big data and its applications, including data ana- researchers who can collaborate to solve prob- and Cybersecurity graduate programs. A diverse lytics; data center and storage systems; parallel, lems involving multiple disciplines. The depart- portfolio of government and industrial sponsors high-performance, and energy-efficient comput- ment has two openings. One is tied to the newly supports our research, which has over $5M in ing; reliable and secure computing; distributed established University of Kentucky Institute for yearly research expenditures. computing; bio-inspired computing; verification; Biomedical Informatics (IBI; http://ibi.uky.edu), UMBC is a dynamic public research univer- wired/wireless networking; social networking; in which the focus areas include databases, bio- sity integrating teaching, research and service. mobile, wearable sensing & applications; and medical informatics, data analytics, and visual- As an Honors University, the campus offers computational genomics. From the transistor ization. For the other position, all areas of com- academically talented students a strong under- and the first computer implementation based on puting will be considered, but applicants with graduate liberal arts foundation that prepares von Neumann’s architecture to the Blue Waters expertise in security and privacy, big data, and them for graduate and professional study, entry petascale computer – the fastest computer on any cloud computing will receive preference. These into the workforce, and community service and university campus – ECE ILLINOIS faculty have areas are associated with the department’s Labo- leadership. The 2015 US News and World Report always been at the forefront of computing re- ratory for Advanced Networking, Center for Visu- Best Colleges report puts UMBC fourth in the search and innovation. Applications are encour- alization and Virtual Environments, and Software Most Innovative National Universities and sixth aged from candidates whose research programs Verification and Validation Lab. The candidate in the Best Undergraduate Teaching, National specialize in core as well as interdisciplinary ar- should be able to teach upper-level courses with Universities categories. Our strategic location eas of electrical and computer engineering. The a focus on databases and/or security, as well as in the Baltimore-Washington corridor is close department is engaged in exciting new and ex- (ultimately) courses in introductory systems se- to many federal laboratories, agencies and high- panding programs for research, education, and quence. tech companies. professional development, with strong ties to in- Candidates must have earned a PhD in Com- Applicants should submit a cover letter, a dustry. The ECE Department has recently settled puter Science or closely related field at the time brief statement of teaching and research ex- into its new 235,000 sq. ft. net-zero energy design employment begins. To apply, a University of perience and interests, a CV, and three letters building, which is a major campus addition with Kentucky Academic Profile must be submitted at of recommendation at http://apply.interfolio. maximum space and minimal carbon footprint. http://ukjobs.uky.edu/postings/80509. Applica- com/31543. Applications received by January 15, Qualified senior candidates may also be con- tions are now being accepted. Review of creden- 2016 are assured full consideration and those re- sidered for tenured full Professor positions as tials will begin immediately and continue until ceived later will be evaluated as long as the posi- part of the Grainger Engineering Breakthroughs the positions are filled. tions remain open. Send questions to jobsTT@ Initiative (http://graingerinitiative.engineering. For more detailed information about these csee.umbc.edu and see http://csee.umbc.edu/ illinois. edu), which is backed by a $100-million positions, go to www.cs.uky.edu/opportunities/ jobs for more information. We are committed to gift from the Grainger Foundation to support re- faculty. Questions should be directed to HR/ inclusive excellence and innovation and welcome search in big data and bioengineering, broadly Employment by phone at 1-859-257-9555 press applications from women, minorities, veterans defined. In addition, the University of Illinois is 2 or email ([email protected]), or to Diane and individuals with disabilities. UMBC is an af- home to Blue Waters petascale computer, which Mier ([email protected]) in the Computer firmative action/equal opportunity employer. is supported by the National Science Foundation Science Department. and developed and operated by the University of Illinois’ National Center for Supercomputing University of Maryland College Park Applications. Qualified candidates may be hired University of Maryland, Baltimore Department of Computer Science as Blue Waters Professors who will be provided County Faculty Positions substantial allocations on and expedited access Computer Science and Electrical Engineering to the supercomputer. To be considered as a Blue Tenure-Track Faculty Positions The Department of Computer Science at the Uni- Waters Professor, candidates need to mention versity of Maryland, College Park, MD, USA has Blue Waters as one of their preferred research ar- UMBC’s Department of Computer Science and several openings for faculty positions effective eas in their online application, and include a ref- Electrical Engineering invites applications for July 1, 2016 or earlier. The openings are at the erence to Blue Waters in their cover letter. three tenure-track Assistant Professor positions tenure-track Assistant Professor level or “junior- Please visit http://jobs.illinois.edu to view to begin in Fall 2016. Exceptionally strong can- level” tenured Associate Professor level. Appli- the complete position announcement and didates for higher ranks may be considered. Ap- cants will be considered for joint appointments application instructions. Full consideration will plicants must have or be completing a PhD in a between the Department of Computer Science be given to applications received by December relevant discipline, have demonstrated the ability and the Institute for Advanced Computer Studies 15, 2015, but applications will continue to be to pursue a research program, and have a strong (UMIACS). accepted until all positions are filled. commitment to undergraduate and graduate We are especially interested in recruiting in The University of Illinois conducts criminal teaching. Computational Biology, Cybersecurity, Machine background checks on all job candidates upon ac- All areas of specialization will be considered, Learning, and Databases; however, exceptional ceptance of a contingent offer. but we are especially interested in candidates in candidates in all areas will be considered. Illinois is an EEO Employer/Vet/Disabled www. the following areas: information assurance and Applications from women and other under- inclusiveillinois.illinois.edu. cybersecurity; mobile, wearable and IoT systems; represented groups are especially welcome. big data with an emphasis on machine learn- Please apply online at https://ejobs.umd. ing, analytics, and high-performance comput- edu and https://hiring.cs.umd.edu. Candidates University of Kentucky ing; knowledge and database systems; hardware must apply to both websites to receive consider- Department of Computer Science systems and experimental methods in circuits, ation. The review of applications will begin on Tenure-Track Faculty Positions devices, VLSI, FPGA, and sensors; cyber-physical December 1, 2015, and applicants are strongly systems; low-power systems; biomedical and encouraged to submit complete applications by The University of Kentucky Computer Science healthcare systems; and methods and tools for that date for full consideration. Questions can be Department expects to hire two tenure-track fac- hardware-software co-design. directed to the faculty recruitment committee at: ulty members to begin employment in August of The department is energetic, research-orient- [email protected]. 2016. ed and multi-disciplinary with programs in Com- Founded in 1856, University of Maryland, Col- The department seeks to hire energetic re- puter Science, Computer Engineering, Electrical lege Park is the flagship institution in the Univer- searchers/educators who are interested in the Engineering and Cybersecurity. Our faculty (34 sity System of Maryland. Our 1.250 acre College application of advanced computing to challeng- tenure-track, six teaching, 15 research) enjoy col- Park campus is minutes away from Washington, ing and relevant problems. Our faculty undertake laboration, working across our specializations as D.C., and the nexus of the nation’s legislative, interdisciplinary research, working with other well as with colleagues from other STEM, human- executive, and judicial centers of power. This departments including statistics, biology, lin- ities and arts departments and external partners. unique proximity to business and technology

132 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 leaders, federal departments and agencies, and communication skills, and a broad background EEO/AA Women, under-represented groups, indi- a myriad of research entities, embassies, think in computing. viduals with disabilities, and veterans are encour- tanks, cultural centers, and non-profit organiza- The Department has an ABET/CAC-accred- aged to apply. Apply online at: tions offers unparalleled synergistic opportuni- ited undergraduate program and MS and PhD https://www.unrsearch.com/postings/19013 ties for our faculty and students. The Department programs. See the website http://www.cs.olemiss. (for the big data position) of Computer Science at the University of Mary- edu for more information about the Department. https://www.unrsearch.com/postings/18990 land has been consistently ranked among the top The University is located in Oxford, one of (for the high performance computing 15 nationally. We have 47 full time tenured and America’s top-ranked college towns. Oxford has position) tenure track faculty in a wide variety of research a wonderful small-town atmosphere with afford- https://www.unrsearch.com/postings/19015 areas, and over 200 doctoral students drawn from able housing and excellent schools. (for the cybersecurity position) top undergraduate programs internationally. Individuals may apply online at http://jobs. https://www.unrsearch.com/postings/19004 Additional information about the Depart- olemiss.edu. The applicant is asked to supply (for the open position) ment of Computer Science and the Institute for a cover letter, curriculum vitae, research and Advanced Computer Studies is available at http:// teaching statements, and contact information for Review of applications will begin on January www.cs.umd.edu and at http://www.umiacs.umd. four references. Review of applications will begin 5, 2016 and will continue until the search closes edu. The Department is planning to move into immediately and continue until the position is on February 15, 2016. Inquiries should be direct- the Iribe Center in the near future, for more infor- filled or an adequate applicant pool is reached. ed to Ms. Lisa Cody, [email protected]. mation, see http://csctr.cs.umd.edu. The University of Mississippi is an EOE/AA/ The University of Maryland is an Equal Oppor- Minorities/Females/Vet/Disability/Sexual Ori- tunity, Affirmative Action Employer. entation/Gender Identity/Title VI/Title VII/Title The University of North Carolina at IX/504/ADA/ADEA employer. Greensboro Department of Computer Science University of Minnesota-Twin Cities Tenure Track Assistant Professor Department of Computer Science and University of Nevada, Reno Engineering CSE Department The University of North Carolina at Greensboro Multiple Tenure-Track Faculty Positions in Four Tenure-Track Faculty Positions (UNCG) seeks applications for a tenure-track Cyber Security position at the rank of Assistant Professor in the UNIVERSITY OF NEVADA, RENO. The CSE De- Department of Computer Science starting Au- The Department of Computer Science and Engi- partment invites applications for four tenure- gust 1, 2016. We are looking for candidates who neering at the University of Minnesota-Twin Cit- track faculty positions. Two positions are at assis- show exceptional promise in both research and ies invites applications for multiple tenure-track tant professor level: the first position is in the area teaching. Preferred research areas are those that faculty positions in cyber security and in support of big data with emphasis on security and privacy, build upon our existing areas of strength, which of a University-wide initiative (MnDRIVE) on ro- and the second position is in the area of high per- include artificial intelligence, databases and data botics, sensors, and advanced manufacturing formance computing with emphasis on parallel mining, foundations of computer science, hu- (http://cse.umn.edu/r/mndrive-minnesota-dis- and distributed computing. The third position, man-computer interaction, networking, security, covery-research-and-innovation-economy/). Spe- in the area of cybersecurity, is at associate profes- and big data analytics, but applicants in other re- cific topics of interest for the positions include sor level, and will fill the role of Technical Direc- search areas are also encouraged to apply. We are cyber security, sensing and networking, machine tor of the newly established Cybersecurity Center particularly interested in candidates who can pur- learning, computer vision, robot design, ma- (CSC) at UNR. The fourth position is at assistant, sue interdisciplinary research in the natural sci- nipulation, mobility, human-robot interaction, associate or full professor level, and is open to ences or computational mathematics, especially planning, algorithmic foundations, and embed- all research areas, with preference given to can- through research in bioinformatics or molecular ded systems. Applicants from other areas will be didates with expertise in embedded systems (In- modeling. Experience building an independent considered as long as they address how their work ternet of Things, cyber-physical systems, VLSI research program and mentoring graduate stu- fits into the security or MnDRIVE themes. Senior design), machine learning (deep learning, data dents is a plus. applicants will also be considered. We encourage analytics, bioinformatics), computer graphics UNCG is a public coeducational, doctoral- applications from women and under-represented and visualization.. Applicants must have a Ph.D. granting residential university chartered in 1891, minorities. Candidates should have a Ph.D. in in Computer Science or Computer Engineering with a Carnegie classification of RU/H (Research Computer Science or a closely related discipline by July 1, 2016. Candidates must be strongly com- University/High Research Activity). The Depart- at the time of appointment. The positions are mitted to excellence in research and teaching and ment of Computer Science at UNCG is a thriving open until filled, but for full consideration apply should demonstrate potential for developing ro- department with an established, ABET- accredit- at https://www.cs.umn.edu/resources/employ- bust externally funded research programs. The ed B.S. degree program and an active M.S. degree ment/faculty by December 15, 2015. The Universi- department has several faculty with NSF Career program, and is experiencing rapid enrollment ty of Minnesota is an equal opportunity employer awards and leaders in statewide and multi-state growth. The department currently has 7 tenured and educator. multi-million dollar NSF awards. Our research faculty members who are all active in research, as is supported by NSF, DoD, DHS, NASA, Google, well as lecturers and part-time faculty. For more Microsoft, Ford and AT&T. The department’s an- information on the Computer Science Depart- The University of Mississippi nual research expenditures have exceeded $2M ment, visit the Department’s web page at http:// Department of Computer and Information in recent years, while FY15 funding exceeds $3M. www.uncg.edu/cmp. Science We offer B.S., M.S., and Ph.D. degrees and have Candidates must hold or anticipate a Ph.D. in Assistant Professor Position strong research and education programs in Intel- Computer Science or a related discipline by Au- ligent Systems, Computer and Network Systems, gust 1, 2016. The Department of Computer and Information Software Systems, and Games and Simulations. Submit curriculum vitae, research and teach- Science at the University of Mississippi invites ap- In the last five years, the College of Engineering ing statements, and four letters of reference plications for a tenure-track Assistant Professor has witnessed an unprecedented growth in stu- through UNCG JobSearch at http://jobsearch. position. dent enrollment and number of faculty positions. uncg.edu. You may direct your informal inquiries An applicant must hold a PhD or equivalent The College is positioned to further enhance the to Dr. Stephen Tate, Department of Computer Sci- in computer science or a closely related field by growth of its students, faculty, staff, and facilities ence, University of North Carolina at Greensboro, August 15, 2016. An applicant must have the as well as its research productivity and its gradu- Greensboro, NC 27402 ([email protected]). ability to teach both graduate and undergradu- ate and undergraduate programs. UNR, Nevada’s Review of applications will begin on January ate students, conduct research in major areas of land grant University, has nearly 21,000 students. 18, 2016 and continue until the position is filled. computer and information science, and super- Reno is a half-hour drive to beautiful Lake Tahoe, UNC Greensboro is especially proud of the di- vise MS and PhD students. An applicant must an excellent area for a wide range of outdoor ac- versity of its student body and we seek to attract provide evidence of research potential, effective tivities. San Francisco is within a four-hour drive. an equally diverse applicant pool for this posi-

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 133 CAREERS

tion. We are an EEO/AA employer with a strong Sciences. Current annual research expenditures Washington College commitment to increasing faculty diversity and are around $4.5 million. Department of Mathematics & Computer will respond creatively to the needs of dual-career The Department currently has 25 faculty Science couples. members and offers BS, MS, and PhD degrees in Assistant Professor Electrical Engineering and Computer Engineer- ing (jointly with the Department of Computer The Department of Mathematics & Computer Sci- University of Northern Iowa Science). Current enrollment consists of 436 ence at Washington College in Chestertown MD Department of Computer Science undergraduate and 175 graduate students. The invites applications for a tenure-track position in Assistant Professor of Computer Science Swanson School of Engineering just completed computer science at the rank of Assistant Profes- a $100 million renovation and redesign of Ben- sor beginning in August 2016. For full details visit The Department of Computer Science at the Uni- edum Engineering Hall with state-of-the-art http://www.washcoll.edu/offices/human-resourc- versity of Northern Iowa invites applications for teaching and research laboratories, which is the es/employment.php. EOE M/F/D/V a tenure-track assistant professor position to be- Department home. gin August 2016. Applicants must hold a Ph.D. in The University of Pittsburgh, located just min- Computer Science or a closely-related discipline. utes from downtown, is adjacent to the University Wesleyan University The department seeks candidates able to teach of Pittsburgh Medical Center and Carnegie Mel- Department of Mathematics and Computer and conduct research in software engineering, lon University. The Pittsburgh metropolitan area Science as well as to participate broadly in the CS curricu- has all the amenities of a large urban area includ- Assistant Professor of Computer Science lum. ing major sports, theatre, opera, ballet, sympho- Detailed information about the position ny, museums, zoo, etc. The urban renaissance in The Department of Mathematics and Computer and the department are available at http://www. Pittsburgh is directly attributable to the strong Science at Wesleyan University invites applica- cs.uni.edu/. To apply, visit http://jobs.uni.edu/. growth in high tech industries, such as informa- tions for a tenure track assistant professorship in Applications received by January 15, 2016, will be tion/computer technology, robotics, and biotech- Computer Science to start in 2016-2017. Wesleyan given full consideration. nology. has a strong research group in theory, program- Pre-employment background checks are re- Review of applications will begin on Decem- ming languages, and software engineering and a quired. UNI actively seeks to enhance diversity ber 1, 2015 and applications will be accepted 2-1 teaching load. We encourage candidates in all and is an Equal Opportunity/Affirmative Action until the position is filled. The start date for this areas to apply, but especially databases, machine employer. The University encourages applica- position is on or after September 1, 2016. Please learning, natural language processing, software tions from persons of color, women, individuals send nominations or application materials (cur- engineering and software systems. For descrip- living with disabilities, and protected veterans. riculum vitae, names and contact information tion and application procedure see http://www. All qualified applicants will receive consideration of at least three references, and any other docu- wesleyan.edu/mathcs/employment.html for employment without regard to age, color, ments deemed relevant) electronically to Ms. creed, disability, gender identity, national origin, Nancy Donaldson ([email protected]) and Dr. race, religion, sex, sexual orientation, protected Sanjeev Shroff, Chair, Electrical and Computer Yahoo Labs veteran status, or any other basis protected by fed- Engineering Chair Search Committee (sshroff@ Intern Scientist (Job Number: 1545124) eral and/or state law. UNI is a smoke-free campus. pitt.edu). (Yahoo Labs Summer Internship Program - We highly encourage candidates from under- 2016) represented US minority groups and/or females University of Pittsburgh to apply for this position. The University of Pitts- Yahoo Labs is pioneering the new sciences un- Swanson School of Engineering burgh is an affirmative action/equal opportunity derlying the Web. As the center of scientific excel- Chair, Department of Electrical and Computer employer and does not discriminate on the basis lence for Yahoo, Yahoo Labs delivers both funda- Engineering (ECE) of age, color, disability, gender, gender identity, mental and applied scientific leadership through marital status, national or ethnic origin, race, re- published research and new technologies power- The Swanson School of Engineering at the Uni- ligion, sexual orientation or veteran status. ing the company’s products. versity of Pittsburgh (http://www.engineering. Yahoo Labs is looking for exceptional PhD pitt.edu/) invites applications and nominations students to work with us in our summer intern for the position of Chair of the Department of The University of Texas at San Antonio program. We will have openings in the US (New Electrical and Computer Engineering (http:// Department of Computer Science York City, Sunnyvale, San Francisco) plus our www.engineering.pitt.edu/ECE/). We are looking Faculty Positions in Computer Science locations in London and Haifa. We seek world- for an energetic leader with a vision to inspire in- class graduate students in pursuit of a PhD in novative research and educational programs. A The Department of Computer Science at The Computer Science, Mathematics, Statistics, or a successful candidate must have a strong record University of Texas at San Antonio invites appli- related area. We are particularly interested in stu- of academic and professional accomplishments cations for a tenured/tenure-track position at the dents working on Machine Learning, algorithms, to support an appointment as a full professor assistant or associate professor level, starting Fall Natural Language Processing, Knowledge Repre- with tenure. 2016. Interested candidates with research focus sentation, HCI, Multimedia, Mobile Innovations, The Department has a tradition of excellence in one or more areas of system software, data search (systems or algorithms), collaborative in education and research, with many of our fac- science, high performance computing, or cloud filtering, auctions, mechanism design, linear ulty winning awards for outstanding teaching. computing are encouraged to apply. algebra, Systems or analysis of large data. Ideal Research activities are organized around four See http://www.cs.utsa.edu/fsearch for infor- candidates will have finished at least 3 years of main areas: (1) Biomedical Electronics and Sig- mation on the Department and application in- graduate work. nal Processing, (2) Energy and Electric Power structions. Screening of applications will begin Interns are expected to work with our scien- Systems and Technologies, (3) Nano/Micro- immediately. Full consideration will be given to tists to perform original research, apply scientific Electronics & Photonics, and (4) Cyber Systems applications received by January 4, 2016, and the thinking and techniques to improve the perfor- and Technologies. All four of these focus areas search will continue until the positions are filled mance and effectiveness of our products, and fall under the theme of “Reshaping the World or the search is closed. The University of Texas at solve problems for our users and advertisers by on All Scales.” The Department faculty members San Antonio is an Affirmative Action/Equal Op- analyzing mountains of data. They will have the have significant research collaborations with portunity Employer. opportunity to publish their work and expand the their colleagues from other engineering depart- Department of Computer Science horizons of web science. Candidates will need ments, the Center for Energy (http://www.en- RE: Faculty Search to submit a CV plus a letter of recommendation gineering.pitt.edu/cfe/), the Peterson Institute The University of Texas at San Antonio from their graduate advisor. for NanoScience and Engineering (http://www. One UTSA Circle Yahoo Inc. is an equal opportunity employer. nano.pitt.edu/), the School of Medicine, and de- San Antonio, TX 78249-0667 For more information or to search all of our open- partments from the Dietrich School of Arts and Phone: 210-458-4436 ings please visit http://labs.yahoo.com/careers.

134 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 last byte

[CONTINUED FROM P. 136] one of What are you working on now? who completed his Ph.D., worked for Right now, some colleagues and I IBM, and then came back as a faculty “Some design with are working on a grant that examines member. We had a very productive col- FPGAs, and that’s reducing the power consumption of laboration until he passed away. the non-core components of a chip. great, but the number There are a lot of people who are look- You’re referring to Robert M. Owens, of people building ing at how to make the cores more who died in 1997. energy efficient. We’re looking at how We started building projects and custom hardware to make the components that support they were a wonderful experience. We in universities has the cores more energy efficient. We’re built two different special-purpose focusing now on the cache space— signal processors, including custom dwindled to almost how much of the cache you really need chip design, board design, algorithm nothing.” to have turned on, especially when design, software design—all sorts of some of the cores are turned off, and things that we didn’t know we were get- how you manage it—but there are a lot ting into when we started the projects. of non-core components to work on.

The first, called the Arithmetic Cube, Throughout your career, you’ve been was a high-speed programmable VLSI You also came up with a new hardware involved in advocating on behalf of processor for solving linear digital sig- description language: Gate LangUagE, women in the field. nal processing problems. or GLUE. I’ve been involved with outreach to We designed and implemented a This was in the early days before women both through ACM (ACM-W) custom architecture that could run there was Verilog or VHDL, so we came and in particular through the Comput- at the speed of the commercial parts up with our own HDL which, looking ing Research Association’s Committee that were available at the time, even back, was very much like a rudimentary on Women (CRA-W). And I still am in- though it was an academic project and version of Verilog. We got some things volved in local campus activities and not implemented in the most current pretty close to right in that regard. try to attend the Grace Hopper Celebra- technology node. It had custom CMOS tion about every other year. chips that we had fabricated through Several of the tools you developed MOSIS (Metal Oxide Semiconductor over the course of your work on those What’s changed? Implementation Service), a board-lev- projects were pioneering in their own Unfortunately, the numbers at the el design, and novel signal processing right—for instance, the architectural- undergraduate level have not come up algorithms. level power simulator that eventually like we had hoped. There are some lo- What we discovered along the way is became SimplePower. cations where they’ve improved—Har- that the design tools we needed to sup- At the end of the MGAP Project, we vey Mudd, for example, and Carnegie port the design process either didn’t started working on an accurate power Mellon, though I haven’t seen their exist, or existed only in companies that simulation tool. Until then, we just fo- latest figures. But at many places, the didn’t distribute them. So we had to cused on performance and area, but number of undergraduate women in build the design tools as well. We built we were beginning to realize, as had computer science and computer en- the logic synthesis tools, a high-level syn- many, that power was going to be the gineering is still in the low teens. My thesis tool, automatic cell-generation next big problem. Bob passed away in sense is that we do have more women tools, and the software we needed to test the very early stages of its design, so in leadership positions. the design. It turned out to be a much I continued on that work with a new At the graduate level, the numbers are bigger project than we anticipated. Penn State colleague, Vijay Narayan- better because we have women from off- an, and once again a bunch of gradu- shore, which is great. Most of these wom- After that, you built a processor known ate students. Most of our work since en come to the U.S. because they want to as the MicroGrain Array Processor, or then has been this simulation-based get a Ph.D. If they’re going to be adding MGAP, designed to address complex work. There’s been very little building. to the infrastructure either here or in signal and image processing problems. their home country, it’s a great thing. We wanted to build a more pro- Is that because of cost? But the number of native-born U.S. grammable processor than the Arith- It’s really, really expensive these women going into computer science metic Cube. That led to the MGAP days to build custom hardware at the and engineering is still way too low. implementation, which was very current or near-current technology It’s a pipeline issue. So that’s always FPGA-like, looking back at it, except node. You just can’t afford to do it at a little frustrating for those of us who that it only allowed nearest-neighbor a university, so most architects do have spent a lot of time working to communication. There were two gen- simulation-based research. Some de- change it. erations of that. We used a lot of the sign with FPGAs, and that’s great, but design tools that we had built for the the number of people building custom Leah Hoffmann is a technology writer based in Piermont, NY. Arithmetic Cube, and of course we de- hardware in universities has dwindled signed new tools. to almost nothing. © 2015 ACM 0001-0782/15/12 $15.00

DECEMBER 2015 | VOL. 58 | NO. 12 | COMMUNICATIONS OF THE ACM 135 last byte

DOI:10.1145/2833226 Leah Hoffmann Q&A Redefining Architectures Mary Jane Irwin on building advanced circuits, special processors, and a hardware description language, while advocating for women in computer science.

MARY JANE IRWIN, Evan Pugh Professor and A. Robert Noll Chair in Engineer- ing in the Department of Computer “It’s a challenge Science and Engineering at Pennsyl- to build arithmetic vania State University, is as committed to her research as she is to serving the components so that computer science community, in par- they run as fast as ticular women in the field. Her inter- ests and accomplishments span com- possible without puter architecture, multicore systems consuming a lot of design, and energy-aware design. She has also been active in the Computing power, and of course Research Association’s Committee on they’re a central part the Status of Women in Computing Research (CRA-W), the Grace Hopper of what’s going on Celebration of Women in Computing, in the CPU.” the Board on Army Science and Tech- nology, and the National Academy of Engineering’s Membership Policy Committee. She has been deeply in- volved with ACM as well, co-founding to become a professor, although at the very few women in UIUC’s computer the Journal on Emerging Technologies in time I really had no idea what that en- science graduate program, and Jim Computing Systems and serving as an tailed, other than college-level teach- was of American Indian descent. He’d elected member of ACM’s Council, as ing. UIUC was the luck of the draw be- spent time on the reservation growing vice president from 1997 to 1998, and cause my husband got a job there, so up, and he knew what it was like to be as editor-in-chief of ACM’s Transac- that’s the only place I applied. In that from an underrepresented group in a tions on the Design Automation of Elec- case, my draw was very lucky! majority population. tronic Systems from 1998 to 2004. At UIUC, you worked with James E. And you’ve been at Penn State ever since? You grew up in Memphis [Tennessee], Robertson, and you wrote your disser- Next year I celebrate my 40th anniver- where your dad was a professor at what tation on computer arithmetic. sary at Penn State. I used to tell my stu- was then Memphis State University. It’s a challenge to build arithme- dents, “I’ve been on the faculty longer What drew you to computer science? tic components so that they run as than you’ve been alive.” Now I can say, I was good at math, but it was early fast as possible without consuming “much longer than you’ve been alive.” in the days of computing and there was a lot of power, and of course they’re no degree in computer science at Mem- a central part of what’s going on in How did your research interests evolve? phis State at the time. So I majored in the CPU. Jim Robertson was very fa- I was also really interested in hard- math and took all the computing-relat- mous in computer arithmetic. I was ware design, in building advanced cir- ed courses that they had. Then I went attracted to that research area, but I cuits and special processors. At Penn to graduate school at the University of also think there was some empathy State, I had a number of strong grad stu- Illinois at Urbana-Champaign (UIUC) between us, because I was one of the dents, [CONTINUED ON P. 135]

136 COMMUNICATIONS OF THE ACM | DECEMBER 2015 | VOL. 58 | NO. 12 Check out the new acmqueue app FREE TO ACM MEMBERS

acmqueue is ACM’s magazine by and for practitioners, bridging the gap between academics and practitioners of computer science. After more than a decade of providing unique perspectives on how current and emerging technologies are being applied in the field, the new acmqueue has evolved into an interactive, socially networked, electronic magazine.

Broaden your knowledge with technical articles focusing on today’s problems affecting CS in practice, video interviews, roundtables, case studies, and lively columns.

Keep up with this fast-paced world on the go. Download the mobile app.

Desktop digital edition also available at queue.acm.org. Bimonthly issues free to ACM Professional Members. Annual subscription $19.99 for nonmembers.

acmqueue_cacm_fp_ads_AS.indd 4 10/6/15 12:34 PM Designing Interactive Systems

June 4 – 8

Brisbane Australia

dis2016.org bit.ly/dis16 @DIS2016