Software > Artificial Intelligence > Tech Careers > Wearables

> Software > Artifi cial Intelligence > Tech Careers > Wearables

FEBRUARY 2019 www.computer.org Looking for the BEST Tech Job for You? Come to the Computer Society Jobs Board to meet the best employers in the industry—Apple, Google, Intel, NSA, Cisco, US Army Research, Oracle, Juniper...

Take advantage of the special resources for job seekers— job alerts, career advice, webinars, templates, and resumes viewed by top employers.

www.computer.org/jobs

r1cov4.indd 4 12/29/17 1:10 PM IEEE COMPUTER SOCIETY computer.org • +1 714 821 8380

STAFF

Editor Publications Portfolio Managers Cathy Martin Carrie Clark, Kimberly Sperka

Publications Operations Project Specialist Publisher Christine Anthony Robin Baldwin Publications Marketing Project Specialist Meghan O’Dell Senior Advertising Coordinator Debbie Sims Production & Design Carmen Flores-Garvey

Circulation: ComputingEdge (ISSN 2469-7087) is published monthly by the IEEE Computer Society. IEEE Headquarters, Three Park Avenue, 17th Floor, New York, NY 10016-5997; IEEE Computer Society Publications Office, 10662 Los Vaqueros Circle, Los Alamitos, CA 90720; voice +1 714 821 8380; fax +1 714 821 4010; IEEE Computer Society Headquarters, 2001 L Street NW, Suite 700, Washington, DC 20036. Postmaster: Send address changes to ComputingEdge-IEEE Membership Processing Dept., 445 Hoes Lane, Piscataway, NJ 08855. Periodicals Postage Paid at New York, New York, and at additional mailing offices. Printed in USA. Editorial: Unless otherwise stated, bylined articles, as well as product and service descriptions, reflect the author’s or firm’s opinion. Inclusion in ComputingEdge does not necessarily constitute endorsement by the IEEE or the Computer Society. All submissions are subject to editing for style, clarity, and space. Reuse Rights and Reprint Permissions: Educational or personal use of this material is permitted without fee, provided such use: 1) is not made for profit; 2) includes this notice and a full citation to the original work on the first page of the copy; and 3) does not imply IEEE endorsement of any third-party products or services. Authors and their companies are permitted to post the accepted version of IEEE-copyrighted material on their own Web servers without permission, provided that the IEEE copyright notice and a full citation to the original work appear on the first screen of the posted copy. An accepted manuscript is a version which has been revised by the author to incorporate review suggestions, but not the published version with copy-editing, proofreading, and formatting added by IEEE. For more information, please go to: http://www.ieee.org/publications_standards/publications/rights/paperversionpolicy.html. Permission to reprint/republish this material for commercial, advertising, or promotional purposes or for creating new collective works for resale or redistribution must be obtained from IEEE by writing to the IEEE Intellectual Property Rights Office, 445 Hoes Lane, Piscataway, NJ 08854-4141 or [email protected]. Copyright © 2019 IEEE. All rights reserved. Abstracting and Library Use: Abstracting is permitted with credit to the source. Libraries are permitted to photocopy for private use of patrons, provided the per- copy fee indicated in the code at the bottom of the first page is paid through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923. Unsubscribe: If you no longer wish to receive this ComputingEdge mailing, please email IEEE Computer Society Customer Service at [email protected] and type “unsubscribe ComputingEdge” in your subject line. IEEE prohibits discrimination, harassment, and bullying. For more information, visit www.ieee.org/web/aboutus/whatis/policies/p9-26.html.

IEEE Computer Society Magazine Editors in Chief

Computer IEEE Security & Privacy Computing in Science David Alan Grier (Interim), David Nicol, University of Illinois & Engineering Djaghe LLC at Urbana-Champaign Jim X. Chen, George Mason University IEEE Micro IEEE Software IEEE Intelligent Systems Lizy Kurian John, University of Ipek Ozkaya, Software V.S. Subrahmanian, Dartmouth Texas, Austin Engineering Institute College IEEE MultiMedia IEEE Internet Computing IEEE Computer Graphics Shu-Ching Chen, Florida George Pallis, University of and Applications International University Cyprus Torsten Möller, University of Vienna IEEE Annals of the History IT Professional of Computing Irena Bojanova, NIST IEEE Pervasive Computing Marc Langheinrich, University of Gerardo Con Diaz, University of Lugano California, Davis

www.computer.org/computingedge 1 FEBRUARY 2019 • VOLUME 5, NUMBER 2

THEME HERE 10 41 46 Managing Energy A Diff erent Lens on P2PLoc: Peer-to- Consumption as Diversity and Inclusion: Peer Localization an Architectural Creating Research of Fast-Moving Quality Attribute Opportunities for Small Entities Liberal Arts Colleges Software 10 Managing Energy Consumption as an Architectural Quality Attribute RICK KAZMAN, SERGE HAZIYEV, ANDRIY YAKUBA, AND DAMIAN A. TAMBURRI 17 Silver Bullet Talks with Ksenia Dmitrieva-Peguero GARY MCGRAW Artificial Intelligence 21 Software Engineering for Machine-Learning Applications: The Road Ahead FOUTSE KHOMH, BRAM ADAMS, JINGHUI CHENG, MARIOS FOKAEFS, AND GIULIANO ANTONIOL 25 From Raw Data to Smart Manufacturing: AI and Semantic Web of Things for Industry 4.0 PANKESH PATEL, MUHAMMAD INTIZAR ALI, AND AMIT SHETH Tech Careers 33 Artificial Intelligence and IT Professionals SUNIL MITHAS, THOMAS KUDE, AND JONATHAN WHITAKER 41 A Different Lens on Diversity and Inclusion: Creating Research Opportunities for Small Liberal Arts Colleges WENDI K. SAPP AND MARY ANN LEUNG Wearables 46 P2PLoc: Peer-to-Peer Localization of Fast-Moving Entities ASHUTOSH DHEKNE, UMBERTO J. RAVAIOLI, AND ROMIT ROY CHOUDHURY 51 Earables for Personal-Scale Behavior Analytics FAHIM KAWSAR, CHULHONG MIN, AKHIL MATHUR, AND ALLESANDRO MONTANARI

Departments 4 Magazine Roundup 51 8 Editor’s Note: New Considerations in Software Earables for Development Personal- Scale Behavior Analytics Subscribe to ComputingEdge for free at www.computer.org/computingedge. CS FOCUS

Magazine Roundup

current landscape in academia and industry, presenting steps that organizations—and you, the reader—can take to bring about changes to increase diversity and inclusion.

Computing in Science & Engineering

HPC Opens a New Frontier in Fuel-Engine Research The authors of this article from he IEEE Computer Computer the September/October 2018 Society’s lineup of 12 issue of Computing in Science T peer-reviewed techni- Increasing Women & Engineering discuss how an cal magazines covers cutting- and Underrepresented industry-led research team at edge topics ranging from soft- Minorities in Computing: the US Department of Energy’s ware design and computer The Landscape and What Argonne National Laboratory graphics to Internet comput- You Can Do recently conducted a compu- ing and security, from scien- In this article from the October tationally guided combustion tifi c applications and machine 2018 issue of Computer, two system optimization on the IBM intelligence to visualization experts on increasing women Blue Gene/Q Mira supercom- and microchip design. Here are and underrepresented minori- puter. The team used a high- highlights from recent issues. ties in computing discuss the fi delity simulation approach

4 February 2019 Published by the IEEE Computer Society 2469-7087/19/$33.00 © 2019 IEEE to optimize the fuel spray and Through non-linear storytelling hypothesized, providing a rough combustion bowl geometry of a and guided exploration, interactive barometer to be combined with heavy-duty diesel engine using immersive experiences help the other evidence to evaluate musi- a gasoline-like fuel. The acceler- public to engage with advanced cologists’ hypotheses. ated simulation time allowed the space mission data and mod- team to evaluate an unprecedented els, and thus be better informed IEEE Internet Computing number of design variations and and educated about NASA mis- improve the production design sions, the solar system, and outer Effi cient Cloud Provisioning using a new fuel. space. The authors demonstrate for Video Transcoding: this capability by exploring the Review, Open Challenges, and IEEE Annals of the History OSIRIS-Rex mission. Future Opportunities of Computing Video transcoding is the process of IEEE Intelligent Systems encoding an initial video sequence Interview with Charles into multiple sequences of diff er- Bigelow Towards Musicologist-Driven ent bitrates, resolutions, and video Charles Bigelow’s career paral- Mining of Handwritten Scores standards, so that it can be viewed lels the development of digital font Historical musicologists have on devices of various capabilities technology. He has designed fonts been seeking objective and power- and with various network access and consulted about font technol- ful techniques to collect, analyze, characteristics. Because video cod- ogy to many of the companies that and verify their fi ndings for many ing is a computationally expensive created desktop publishing sys- decades. The aim of this study, process and the amount of video in tems. He has also written exten- published in the July/August social-media networks drastically sively on digital font technology 2018 issue of IEEE Intelligent Sys- increases every year, large media and taught at RISD, Stanford, and tems, is to show the importance providers’ demand for transcoding RIT. Read more in the July–Sep- of such domain-specifi c problems cloud services will continue ris- tember 2018 issue of IEEE Annals to achieve actionable knowledge ing. This article, which appears in of the History of Computing. discovery in the real world. The the September/October 2018 issue focus is on fi nding evidence for of IEEE Internet Computing, sur- IEEE Computer Graphics the chronological ordering of J.S. veys the state of the art of related and Applications Bach’s manuscripts by propos- cloud services. It also summarizes ing a musicologist-driven mining research on video transcoding OpenSpace: Bringing NASA method for extracting quantita- and provides indicative results for Missions to the Public tive information from early music a transcoding scenario of inter- This article from the September/ manuscripts. Bach’s C-clefs were est related to Facebook. Finally, it October 2018 issue of IEEE Com- extracted from a wide range of illustrates open challenges in the puter Graphics and Applications manuscripts under the direction fi eld and outlines paths for future presents OpenSpace, an open- of domain experts, and with these, research. source astro-visualization soft- the classifi cation of C-clefs was ware project designed to bridge conducted. The proposed methods IEEE Micro the gap between scientifi c discov- were evaluated on a dataset con- eries and their public dissemina- taining over 1,000 clefs extracted Not in Name Alone: A tion. A wealth of data exists for from Bach’s manuscripts. The Memristive Memory space missions from NASA and results show more than 70- Processing Unit for Real In- other sources. OpenSpace brings percent accuracy for dating Bach’s Memory Processing together this data and combines manuscripts. Dating of Bach’s lost Data movement between pro- it in a range of immersive settings. manuscripts was quantitatively cessing and memory is the root www.computer.org/computingedge 5 MAGAZINE ROUNDUP

neural networks and the prolif- is important as a foundation for eration of massive amounts of advanced studies and leadership unlabeled multimodal data, rec- in ubiquitous computing, and they ommendation systems and multi- share their experiences teaching modal retrieval systems based on pervasive computing in liberal arts continuous representation spaces colleges. and deep-learning methods are becoming of great interest. Mul- IEEE Security & Privacy timodal representations are typically obtained with auto-encoders Fingerprinting for Cyber- that reconstruct multimodal data. Physical System Security: In this article from the April–June Device Physics Matters Too WWW.COMPUTER.ORG 2018 issue of IEEE MultiMedia, Due to the increasing attacks the authors describe an alterna- against cyber-physical systems, /COMPUTINGEDGE tive method to perform high-level it is important to develop novel multimodal fusion that leverages solutions to secure these critical crossmodal translation by means systems. System security can be of symmetrical encoders cast improved by using the physics of into a bidirectional deep neural process actuators (that is, devices). cause of the limited performance network (BiDNN). Using the les- Device physics can be used to and energy effi ciency in modern sons learned from multimodal generate device fi ngerprints to von Neumann systems. To over- retrieval, they present a BiDNN- increase the integrity of responses come the data-movement bottle- based system that performs video from process actuators. Read more neck, the authors of this article hyperlinking and recommends in the September/October 2018 from the September/October 2018 interesting video segments to a issue of IEEE Security & Privacy. issue of IEEE Micro present the viewer. Results established using memristive Memory Processing TRECVID’s 2016 video hyperlink- IEEE Software Unit (mMPU)—a real processing- ing benchmarking initiative show in-memory system in which the that our method obtained the best Software Engineering’s computation is done directly in score, thus defi ning the state of Top Topics, Trends, and the memory cells, thus eliminating the art. Researchers the necessity for data transfer. Fur- This article, which is part of the thermore, with its enormous inner IEEE Pervasive Computing September/October 2018 issue of parallelism, this system is ideal for IEEE Software on software engi- data-intensive applications that are Teaching Pervasive neering’s 50th anniversary, off ers based on single instruction, mul- Computing in Liberal Arts an overview of the twists, turns, tiple data (SIMD)—providing high Colleges and numerous redirections seen throughput and energy-effi ciency. In this article from the July–Sep- over the years in the software engi- tember 2018 issue of IEEE Perva- neering (SE) research literature. IEEE MultiMedia sive Computing, the authors refl ect Nearly a dozen topics have domi- on the critical role of liberal arts nated the past few decades of SE A Crossmodal Approach to education in fostering creative, research, and these have been redi- Multimodal Fusion in Video collaborative, and ethical innova- rected many times. Some are gain- Hyperlinking tors for pervasive computing. They ing popularity, whereas others are With the recent resurgence of discuss why liberal arts education becoming increasingly rare.

6 ComputingEdge February 2019

IT Professional adaptive industrial assistance July–September 2016 IEEE MultiMedia in cooperative, complex human- http://www.computer.org

A Cognitive Assistance in-the-loop assembly tasks. The 2016 july–september ❚

Framework for Supporting functionality of the cognitive sys- Quality Modeling

Human Workers in Industrial tem includes enabling perception

Tasks and awareness, understanding Volume 23 Number 3

Cognitive systems are capable and interpreting situations, rea- mult-22-03-c1 Cover-1 July 12, 2016 4:40 PM of human-like actions such as soning, decision making, and IEEE MultiMedia serves the perception, learning, planning, autonomous acting. community of scholars, developers, practitioners, reasoning, self- and context- and students who are awareness, interaction, and per- interested in multiple media types and work in forming actions in unstructured fields such as image and environments. The authors of this video processing, audio article from the September/Octo- analysis, text retrieval, and data fusion. ber 2018 issue of IT Professional Read It Today! present an implemented frame- FOLLOW US work for an interactive cognitive @securityprivacy www.computer.org /multimedia system enabling human-centered,

ADVERTISER INFORMATION

Advertising Personnel Southwest, California: Mike Hughes Debbie Sims: Advertising Coordinator Email: [email protected] Email: [email protected] Phone: +1 805 529 6790 Phone: +1 714 816 2138 | Fax: +1 714 821 4010

Advertising Sales Representative (Classifieds & Jobs Board) Advertising Sales Representatives (display) Heather Buonadies Central, Northwest, Southeast, Far East: Email: [email protected] Eric Kincaid Phone: +1 201 887 1703 Email: [email protected] Phone: +1 214 673 3742 Fax: +1 888 886 8599 Advertising Sales Representative (Jobs Board)

Northeast, Midwest, Europe, Middle East: Marie Thompson David Schissler Email: [email protected] Email: [email protected] Phone: 714-813-5094 Phone: +1 508 394 4026 Fax: +1 508 394 1707

www.computer.org/computingedge 7 EDITOR’S NOTE

New Considerations in Software Development

oftware development has traditionally testable, and applicable. Meanwhile, the authors focused on highly visible qualities such of IEEE Intelligent Systems’ “From Raw Data to S as performance reliability and user- Smart Manufacturing: AI and Semantic Web of friendly design. However, as computing becomes Things for Industry 4.0” show how AI is helping more ubiquitous and central to society, software revolutionize our factories. engineers have been recognizing the need to AI is also having a profound eff ect on software address new considerations—such as energy con- engineering jobs, potentially changing the types sumption and security—that are often less notice- of roles humans have in the software development able to the user but are no less important. process. IT Professional’s “Artifi cial Intelligence In this issue of ComputingEdge, IEEE Software’s and IT Professionals” predicts that low-level pro- “Managing Energy Consumption as an Architec- gramming tasks will increasingly be performed tural Quality Attribute” argues that energy should by AI. Another concern when it comes to careers be treated as one of the key considerations in in technology is creating equal opportunities for architectural design, along with modifi ability, students to participate in research. In Computing in performance, and availability. The authors use a Science & Engineering’s “A Diff erent Lens on Diver- case study to explain how measuring and mod- sity and Inclusion: Creating Research Opportuni- eling energy use can lead to insights on ways to ties for Small Liberal Arts Colleges,” the authors save energy. IEEE Security & Privacy’s “Silver Bullet discuss programs that help small colleges collabo- Talks with Ksenia Dmitrieva-Peguero” discusses rate with large research institutions. how awareness of secure coding practices has Finally, this issue of ComputingEdge includes grown and what still needs to happen to make two articles about wearables. Computer’s “P2PLoc: software more secure. Peer-to-Peer Localization of Fast-Moving Entities” Artifi cial intelligence (AI) is also a concern for proposes wearable Internet of Things devices for today’s software engineers. “Software Engineer- real-time group-motion tracking. IEEE Pervasive ing for Machine-Learning Applications: The Road Computing’s “Earables for Personal-Scale Behavior Ahead,” from IEEE Software, covers the inher- Analytics” presents an in-ear multisensory stereo ent challenges of machine learning–based sys- device with applications in areas like healthcare tems and ideas for making them more accurate, and communication.

8 February 2019 Published by the IEEE Computer Society 2469-7087/19/$33.00 © 2019 IEEE PURPOSE: The IEEE Computer Society is the world’s largest EXECUTIVE COMMITTEE association of computing professionals and is the leading provider President: Cecilia Metra of technical information in the field. President-Elect: Leila De Floriani; Past President: Hironori MEMBERSHIP: Members receive the monthly magazine Kasahara; First VP: Forrest Shull; Second VP: Avi Mendelson; Computer, discounts, and opportunities to serve (all activities Secretary: David Lomet; Treasurer: Dimitrios Serpanos; are led by volunteer members). Membership is open to all IEEE VP, Member & Geographic Activities: Yervant Zorian; members, affiliate society members, and others interested in the VP, Professional & Educational Activities: Kunio Uchiyama; computer field. VP, Publications: Fabrizio Lombardi; VP, Standards Activities: COMPUTER SOCIETY WEBSITE: www.computer.org Riccardo Mariani; VP, Technical & Conference Activities: OMBUDSMAN: Direct unresolved complaints to ombudsman@ William D. Gropp computer.org. 2018–2019 IEEE Division V Director: John W. Walz 2019 IEEE Division V Director Elect: Thomas M. Conte CHAPTERS: Regular and student chapters worldwide provide the 2019–2020 IEEE Division VIII Director: Elizabeth L. Burd opportunity to interact with colleagues, hear technical experts, and serve the local professional community. BOARD OF GOVERNORS AVAILABLE INFORMATION: To check membership status, report Term Expiring 2019: Saurabh Bagchi, Leila De Floriani, David S. an address change, or obtain more information on any of the Ebert, Jill I. Gostin, William Gropp, Sumi Helal, Avi Mendelson following, email Customer Service at [email protected] or call Term Expiring 2020: Andy Chen, John D. Johnson, Sy-Yen Kuo, +1 714 821 8380 (international) or our toll-free number, +1 800 272 David Lomet, Dimitrios Serpanos, Forrest Shull, Hayato Yamana 6657 (US): Term Expiring 2021: M. Brian Blake, Fred Douglis, Carlos E. • Membership applications Jimenez-Gomez, Ramalatha Marimuthu, Erik Jan Marinissen, • Publications catalog Kunio Uchiyama • Draft standards and order forms • Technical committee list EXECUTIVE STAFF • Technical committee application • Chapter start-up procedures Executive Director: Melissa Russell • Student scholarship information Director, Governance & Associate Executive Director: • Volunteer leaders/staff directory Anne Marie Kelly • IEEE senior member grade application (requires 10 years Director, Finance & Accounting: Sunny Hwang practice and significant performance in five of those 10) Director, Information Technology & Services: Sumit Kacker Director, Marketing & Sales: Michelle Tubb PUBLICATIONS AND ACTIVITIES Director, Membership Development: Eric Berkowitz

Computer: The flagship publication of the IEEE Computer Society, COMPUTER SOCIETY OFFICES Computer, publishes peer-reviewed technical content that covers all aspects of computer science, computer engineering, Washington, D.C.: 2001 L St., Ste. 700, Washington, D.C. technology, and applications. 20036-4928 • Phone: +1 202 371 0101 • Fax: +1 202 728 9614 Email: [email protected] Periodicals: The society publishes 12 magazines, 15 transactions, and two letters. Refer to membership application or request Los Alamitos: 10662 Los Vaqueros Cir., Los Alamitos, CA 90720 information as noted above. Phone: +1 714 821 8380 • Email: [email protected] Conference Proceedings & Books: Conference Publishing Asia/Pacific: Watanabe Building, 1-4-2 Minami-Aoyama, Services publishes more than 275 titles every year. Minato-ku, Tokyo 107-0062, Japan • Phone: +81 3 3408 3118 Fax: +81 3 3408 3553 • Email: [email protected] Standards Working Groups: More than 150 groups produce IEEE standards used throughout the world. MEMBERSHIP & PUBLICATION ORDERS Technical Committees: TCs provide professional interaction in Phone: +1 800 272 6657 • Fax: +1 714 821 4641 more than 30 technical areas and directly influence computer Email: [email protected] engineering conferences and publications. IEEE BOARD OF DIRECTORS Conferences/Education: The society holds about 200 conferences each year and sponsors many educational activities, including President & CEO: Jose M.D. Moura computing science accreditation. President-Elect: Toshio Fukuda Certifications: The society offers three software developer Past President: James A. Jefferies credentials. For more information, visit www.computer Secretary: Kathleen Kramer .org/certification. Treasurer: Joseph V. Lillie Director & President, IEEE-USA: Thomas M. Coughlin 2019 BOARD OF GOVERNORS MEETINGS Director & President, Standards Association: Robert S. Fish Director & VP, Educational Activities: Witold M. Kinsner 31 January – 1 February: Sheraton Anaheim Hotel, Anaheim, CA Director & VP, Membership and Geographic Activities: 6 – 7 June: Hyatt Regency Coral Gables, Miami, FL Francis B. Grosz, Jr. (TBD) November: Teleconference Director & VP, Publication Services & Products: Hulya Kirkici Director & VP, Technical Activities: K.J. Ray Liu

revised 5 December 2018 THE PRAGMATIC Editor: Eoin Woods Endava ARCHITECT [email protected]

Managing Energy Consumption as an Architectural Quality Attribute

Rick Kazman, Serge Haziyev, Andriy Yakuba, and Damian A. Tamburri

ENERGY USED TO be free, or so a system. And, like every other qual- application—an automated weather we thought. In the past, software ar- ity attribute, it involves nontrivial station. The station reports telem- chitects rarely considered software’s tradeoffs: energy use versus perfor- etry from sensors related to the am- energy consumption. Those days are mance, availability, modifi ability, bient temperature, humidity, wind gone. With mobile devices as the pri- security, or time to market. speed, rainfall, leaf wetness, soil mary form of computing for most We can’t hope to address all these temperature, and so on. All the col- people,1 with the increasing indus- concerns here, but we want to make lected data is processed and used to try and government adoption of the two important general points: help farmers achieve more effi cient IoT, and with the ubiquity of cloud plant growth. services as the backbone of our com- • Treating energy effi ciency as a In this domain, energy effi ciency puting infrastructure, energy has quality attribute is no different is paramount: these weather stations become an issue architects can no from treating any other architec- are left unattended for long periods longer ignore. Energy is no longer tural quality. of time, might be snow-covered, and “free” and unlimited. Mobile de- • We can, with a small effort in need to work in low-light conditions vices’ energy effi ciency affects us all, experimentation and prototyp- for weeks on end. The automated and large corporations are increasing, and small design changes, weather station we describe here was ingly concerned with their server substantially improve an applica- initially designed with attention to farms’ energy effi ciency. Forbes re- tion’s energy use. energy effi ciency, but without explic- ported that in 2016, datacenters itly modeling or measuring energy globally accounted for more energy Both of these points are good news for use. In the following, we describe consumption (by 40 percent) than architects! Their most immediate con- our experiments, the design changes the entire UK—about 3 percent of all sequence is that we can reason about they helped motivate, their tradeoffs, energy consumed worldwide.2 energy consumption architecturally. and their energy savings. At both the low and high ends, And, by making a relatively small in- computational devices’ energy con- vestment in design experiments and The Experiments sumption has become a crucial design changes, we’re repaid by enor- The automated weather station concern. This means that we, as ar- mous savings in energy use. is connected to a 12 V DC power chitects, now must add energy effi - To illustrate these two points, we source. Every 15 minutes it wakes up ciency to the long list of competing report here on a small case study we from a low-power deep-sleep mode qualities we consider when designing performed on the design of an IoT and reports telemetry to the Azure

10102 IEEE SOFTWAREFebruary 2019 | PUBLISHED BY THE IEEE COMPUTERPublished by the SOCIETY IEEE Computer Society 0740-7459/18/$33.002469-7087/19/$33.00 © 2018 © IEEE 2019 IEEE THE PRAGMATIC Editor: Eoin Woods Endava THE PRAGMATIC ARCHITECT ARCHITECT [email protected]

Managing Energy 0.07 0.06 0.05 Consumption as an 0.04

Current (A) 0.03 Architectural Quality 0.02 0.01

0 50 100 150 200 250 300 Attribute Time (s) FIGURE 1. A sample plot of work mode for an automated weather station. The average current was 0.0295 A, and the energy consumption was 0.03048 Wh. Rick Kazman, Serge Haziyev, Andriy Yakuba, and Damian A. Tamburri

cloud using a 3G connection. After for DC, where I represents current assumptions, we estimated the energy that, it suspends all running tasks and V represents voltage. consumption for 1 hour (assum- and falls back into deep-sleep mode. Because voltage is constant in ing our typical sleep interval of ENERGY USED TO be free, or so a system. And, like every other qual- application—an automated weather Between the DC power supply these devices, we assumed that the 15 minutes): we thought. In the past, software ar- ity attribute, it involves nontrivial station. The station reports telem- and the weather station, we in- power consumed depended fully chitects rarely considered software’s tradeoffs: energy use versus perfor- etry from sensors related to the am- serted a current meter based on the on the current draw and time. In • Work 1 Sleep Duration 5 energy consumption. Those days are mance, availability, modifi ability, bient temperature, humidity, wind MAX471 current-sensing amplifier. all experiments, we measured the 309/60 1 15 5 20.15 min gone. With mobile devices as the pri- security, or time to market. speed, rainfall, leaf wetness, soil (For the MAX471 data sheet, see total energy consumption in watt- • Cycles (C) per h 5 60/20.15 5 mary form of computing for most We can’t hope to address all these temperature, and so on. All the col- http://html.alldatasheet.com/html hours (Wh). 2.9776 people,1 with the increasing indus- concerns here, but we want to make lected data is processed and used to -pdf/73441/MAXIM/MAX471/125 To simplify measurements and • Average Power Consumption try and government adoption of the two important general points: help farmers achieve more effi cient /1/MAX471.html.) This meter was further calculations, we split the per h 5 (Ework 1 Esleep) * C 5 IoT, and with the ubiquity of cloud plant growth. connected to an analog sense pin of timeline for the automated weather (0.03048 1 0.0002025 * 15) * services as the backbone of our com- • Treating energy effi ciency as a In this domain, energy effi ciency an AVR MCU (microcontroller). The system into two main components: C 5 0.0335175 * 2.9776 5 puting infrastructure, energy has quality attribute is no different is paramount: these weather stations MCU registered readings every 1 ms; work mode and sleep mode. Figure 1 0.0998 Wh become an issue architects can no from treating any other architec- are left unattended for long periods after 100 readings, it sent the average depicts a sample plot of work longer ignore. Energy is no longer tural quality. of time, might be snow-covered, and value over a UART (universal asyn- mode, showing the power (cur- Given this baseline, we now describe “free” and unlimited. Mobile de- • We can, with a small effort in need to work in low-light conditions chronous receiver–transmitter) to a rent) consumed over 309 seconds. our experiments. vices’ energy effi ciency affects us all, experimentation and prototyp- for weeks on end. The automated PC. A simple logging application on The average current was 0.0295 A, and large corporations are increasing, and small design changes, weather station we describe here was the other side of the wire converted and the energy consumption was Experiment 1 ingly concerned with their server substantially improve an applica- initially designed with attention to the UART input into a CSV (comma- 0.03048 Wh. In this experiment, we changed the farms’ energy effi ciency. Forbes re- tion’s energy use. energy effi ciency, but without explic- separated values) file. A simple Sleep mode consumed little en- telemetry messages’ payload format ported that in 2016, datacenters itly modeling or measuring energy Python script then calculated energy ergy, as we expected. The power from plaintext to Google protocol globally accounted for more energy Both of these points are good news for use. In the following, we describe consumption on the basis of the CSV consumption was nearly zero, with buffers. The assumption was that the consumption (by 40 percent) than architects! Their most immediate con- our experiments, the design changes file and plotted a graph. a watchdog process causing minor reduced message length should result the entire UK—about 3 percent of all sequence is that we can reason about they helped motivate, their tradeoffs, To calculate energy consumption, spikes of energy consumption. For in less time to send data and fewer energy consumed worldwide.2 energy consumption architecturally. and their energy savings. we used these formulas: example, for a sleep mode of 490 s, chances to fail. However, this came At both the low and high ends, And, by making a relatively small in- the average current was 0.001013 A, with a tradeoff: increased memory computational devices’ energy con- vestment in design experiments and The Experiments E 5 P * T, and the energy consumption was and CPU use to transform messages sumption has become a crucial design changes, we’re repaid by enor- The automated weather station 0.0016538 Wh. Thus, sleep mode to and from the protocol-buffer concern. This means that we, as ar- mous savings in energy use. is connected to a 12 V DC power where P represents power and T rep- consumed roughly 3 percent as much format. chitects, now must add energy effi - To illustrate these two points, we source. Every 15 minutes it wakes up resents time, and energy as work mode. Table 1 shows the kinds of mes- ciency to the long list of competing report here on a small case study we from a low-power deep-sleep mode On the basis of these measure- sages being sent and their sizes, us- qualities we consider when designing performed on the design of an IoT and reports telemetry to the Azure P 5 I * V ments and some small simplifying ing plaintext and protocol buffers.

Table 1. For experiment 1, the kinds of messages being sent and their sizes (in bytes), using plaintext and Google protocol buffers.

Plaintext Protocol buffers

Message Header Payload Total Header Payload Total

Status 375 822 1,197 387 275 662

Current telemetry 374 188 562 399 40 439

Historical 384 460 844 402 400 802 telemetry* (5 records) (10 records)

* Historical telemetry was sent in batches. In the plaintext version, the batch contained five records. With protocol buffers, we could pack 10 records into a batch message, saving 42 bytes.

0.06

0.05

0.04

0.03

Current (A) 0.02

0.01

0.00 0 50 100 150 200 250 300 Time (s)

FIGURE 2. Energy use during message transmission, using protocol buffers. Using the buffers improved energy consumption by 8 percent.

But did this change save energy? The difference was 0.0998 – device went into sleep mode (see Sending these messages using proto- 0.0917 5 0.0081 Wh (8 percent). the area in the green rectangle in col buffers required 293 s, with an Although 8 percent wasn’t a huge im- Figure 2). Debugging and studying average current of 0.027362 A (see provement, it was nontrivial. Given this issue led us to discover a bug in Figure 2). The power consumption protocol buffers’ other advantages EEPROM persistence functionality was 0.0267519 Wh. Using the pre- (they describe data using an inter- and the peripheral-device-scanning vious formula to calculate the aver- face description language and gen- logic. The bug’s details aren’t image energy consumption per hour, erate the code to handle it), this was portant for this research; what’s we got clearly a win. But given that this en- important is that we didn’t notice ergy savings wasn’t huge, the lesson this problem until we measured and • Work 1 Sleep Duration 5 for architects is that tradeoffs among visualized the power consumption. 293/60 1 15 5 19.883 min CPU time, memory, and message Because the device had been func- • C per h 5 60/20.15 5 3.017 length must be assessed empirically. tioning as expected, it wasn’t obvious • Average Power Consumption Furthermore, while taking these that there was a bug until we mea- per h 5 (0.027362 1 0.0002025 * measurements and observing power sured the energy. 15) * C 5 0.0303995 * 3.017 5 consumption graphs, we noticed a Following are the new measure- 0.0917 Wh strange idle period, just before the ments of device power consumption.

12104 IEEE SOFTWAREComputingEdge | WWW.COMPUTER.ORG/SOFTWARE | @IEEESOFTWARE February 2019 THE PRAGMATIC ARCHITECT THE PRAGMATIC ARCHITECT

0.10 Table 1. For experiment 1, the kinds of messages being sent and their sizes (in bytes), using plaintext and Google protocol buffers. 0.08

Plaintext Protocol buffers 0.06 Message Header Payload Total Header Payload Total

0.04

Status 375 822 1,197 387 275 662 Current (A)

Current telemetry 374 188 562 399 40 439 0.02 Historical 384 460 844 402 400 802 telemetry* (5 records) (10 records) 0.00 * Historical telemetry was sent in batches. In the plaintext version, the batch contained five records. With protocol buffers, we could pack 10 records into a batch message, saving 42 bytes. 0 20 40 60 80 100 Time (s)

FIGURE 3. Energy use during message transmission after a bug fix. The difference, compared to this experiment’s original results, was a 42 percent improvement. 0.06

0.05 By fixing this bug, we decreased the time required to perform all duties 0.060 0.04 during the polling cycle by almost 0.055 0.03 two-thirds, from 293 to 99 s (see Figure 3). The energy consumption

Current (A) 0.050 0.02 was 0.0116034 Wh. Using the previ- ous values, we recalculated the total 0.01 0.045 power consumption: 0.00 0.040 Current (A) • Work 1 Sleep Duration 5 99/60 1 0 50 100 150 200 250 300 0.035 Time (s) 15 5 16.65 min • C per h 5 60/20.15 5 3.603 0.030 • Average Power Consumption FIGURE 2. Energy use during message transmission, using protocol buffers. Using the buffers improved energy consumption by per h 5 (0.0116034 1 0.025 8 percent. 0.0002025 * 15) * C 5 8 10 12 14 16 18 But did this change save energy? The difference was 0.0998 – device went into sleep mode (see 0.014641 * 3.603 5 0.0527 Wh Sending these messages using proto- 0.0917 5 0.0081 Wh (8 percent). the area in the green rectangle in Time (s) col buffers required 293 s, with an Although 8 percent wasn’t a huge im- Figure 2). Debugging and studying The difference, compared to this FIGURE 4. The energy profile of a single batch transmission. On the basis of this average current of 0.027362 A (see provement, it was nontrivial. Given this issue led us to discover a bug in experiment’s original results, was profile, we could estimate the energy use. Figure 2). The power consumption protocol buffers’ other advantages EEPROM persistence functionality 0.0917 – 0.0527 5 0.039 Wh (42 was 0.0267519 Wh. Using the pre- (they describe data using an inter- and the peripheral-device-scanning percent). The lesson is that you can’t vious formula to calculate the aver- face description language and gen- logic. The bug’s details aren’t im- manage it if you don’t measure it! less power consumption. The trade- single batch. A single batch had the age energy consumption per hour, erate the code to handle it), this was portant for this research; what’s By measuring energy consumption, off here is that more data would be energy profile shown in Figure 4, we got clearly a win. But given that this en- important is that we didn’t notice we were able to increase our un- sent in fewer batches. This could consuming 0.00127 Wh over ap- ergy savings wasn’t huge, the lesson this problem until we measured and derstanding of the inner workings have the effect of increasing the time proximately 10.5 s. • Work 1 Sleep Duration 5 for architects is that tradeoffs among visualized the power consumption. of the automated-weather-station and power needed to send a batch, On the basis of this profile, we 293/60 1 15 5 19.883 min CPU time, memory, and message Because the device had been func- application. and it also introduced the need for a could now estimate the energy use. • C per h 5 60/20.15 5 3.017 length must be assessed empirically. tioning as expected, it wasn’t obvious send-fail-retry mechanism in case of That is, given our measurements, • Average Power Consumption Furthermore, while taking these that there was a bug until we mea- Experiment 2 message failure. we could now form a model of the per h 5 (0.027362 1 0.0002025 * measurements and observing power sured the energy. This experiment aimed to reduce the As we discovered from experi- quality attribute that was no differ- 15) * C 5 0.0303995 * 3.017 5 consumption graphs, we noticed a Following are the new measure- number of polling cycles, leading to ment 1, the weather station could ent from a model of performance or 0.0917 Wh strange idle period, just before the ments of device power consumption. longer sleep times with significantly send up to 10 historical records in a availability. For simplicity’s sake, we

104 IEEE SOFTWARE | WWW.COMPUTER.ORG/SOFTWARE | @IEEESOFTWARE www.computer.org/computingedge SEPTEMBER/OCTOBER 2018 | IEEE SOFTWARE 10513 THE PRAGMATIC ARCHITECT

0.10

0.08

0.06

0.04 Current (A)

0.02

0.00 0 2,000 4,000 6,000 8,000 Time (s)

FIGURE 5. The energy profile of a weather station running for several hours with a polling interval of 3,600 seconds. Most of the sleep time was recorded as 0 A, meaning that the actual energy consumption was close to the meter’s tolerances.

we realized that there was some er- Table 2. The differences between the experiments. ror in our calculations or our model. Taking a closer look at the raw re- Consumption per Total energy cording, we noticed that most of Setup Description hour (Wh) savings (%) the sleep time was recorded as 0 A, Original Plaintext payload and a 0.0998 0 meaning that the actual energy con- 15-min polling interval sumption was close to the meter’s tolerances. So, we were unable to Experiment 1 Binary format 0.0917 8 measure such low current consump- Binary format 1 bug fix 0.0527 47 tion precisely in real time, and could only trust sleep mode measurements Experiment 2 A polling interval of 1 h 0.0137 86 made over long runs. This insight highlights a new les- chose a polling interval of 1 hour. C 5 (0.012874 1 0.01215) * son: the need to build both archi- So, the historical data would be 0.97 5 0.02427 Wh tectural models and prototypes. On sent in batches of four records (one one hand, models let us efficiently record each 15 min). The predicted For this experiment, we set up a reason about architectural quality power consumption was weather station with a polling interval attributes and their tradeoffs. On of 3,600 seconds and ran it for 8,225 the other hand, prototypes mandate • Work Time 5 99 s 1 10.5 s 5 seconds (a little over two hours); see empirical testing, thus incrementally ,110 s Figure 5. The average current draw refining the assumptions built into • Consumed Power during Awake 5 was 0.0011344 A, and the total power both models and prototypes. 0.0116034 1 0.00127 5 consumption was 0.0312867 Wh. The difference in power con- 0.012874 Wh So, we calculated the average power sumption from Experiment 1 was • Sleep Time 5 3,600 s 5 60 min consumption per hour as 0.0527 – 0.013694 5 0.039006 Wh • Consumed Power during Sleep (42 percent). The difference from the Time 5 60 * 0.0002025 5 E/(Duration/3,600) 5 original model (before we improved 0.01215 Wh 0.0312867/(8,225/3,600) 5 the design) was 0.0998 – 0.013694 5 • C per h 5 3,600/(110 1 3,600) 5 0.0312867/2.2847 5 0.013694 Wh 0.086106 Wh (86 percent). Table 2 0.97 summarizes these differences. • Estimated Average Power Con- Comparing the results with our As you can see, with some mod- sumption per h 5 (Ew 1 Es) * estimated consumption of 0.02427 Wh, est changes to the design, we reaped

14106 IEEE SOFTWAREComputingEdge | WWW.COMPUTER.ORG/SOFTWARE | @IEEESOFTWARE February 2019 THE PRAGMATIC ARCHITECT THE PRAGMATIC ARCHITECT

This article originally appeared in IEEE Software, vol. 35, no. 5, 2018.

enormous energy savings, which 0.10 would greatly increase the robustness of the IoT devices and the sys- 0.08 tem as a whole. Despite operating in RICK KAZMAN is a professor of information technology manage- sometimes adverse conditions, these ment at the University of Hawai’i at Ma¯noa and a researcher at 0.06 systems must never lose data, even if Carnegie Mellon University’s Software Engineering Institute. Contact 0.04 Current (A) they’re kept offl ine for 24 hours. With him at [email protected]. this 86 percent energy improvement, 0.02 we could now meet this requirement. 0.00 0 2,000 4,000 6,000 8,000 Time (s) s computing moves toward SERGE HAZIYEV is the senior vice president of SoftServe’s greater scale and mobility, Advanced Technology Group. Contact him at shaziyev@ FIGURE 5. The energy profile of a weather station running for several hours with a polling interval of 3,600 seconds. Most of the A energy use will inevitably softserveinc.com. sleep time was recorded as 0 A, meaning that the actual energy consumption was close to the meter’s tolerances. become a key concern for architects. We hope we’ve convinced you that AUTHORS THE ABOUT energy can be treated like any other we realized that there was some er- architectural quality attribute. It’s Table 2. The differences between the experiments. ror in our calculations or our model. no different, from the perspective Taking a closer look at the raw re- of architectural design, than modifi - ANDRIY YAKUBA a senior IoT engineer at SoftServe. Contact him Consumption per Total energy cording, we noticed that most of ability, performance, or availability. at [email protected]. Setup Description hour (Wh) savings (%) the sleep time was recorded as 0 A, It can be modeled and prototyped, Original Plaintext payload and a 0.0998 0 meaning that the actual energy con- and we can reason about the design 15-min polling interval sumption was close to the meter’s tradeoffs required to achieve bet- tolerances. So, we were unable to ter energy use. Of course, the design Experiment 1 Binary format 0.0917 8 measure such low current consump- primitives, models, tools, and trad- Binary format 1 bug fix 0.0527 47 tion precisely in real time, and could eoffs are specifi c to energy use, but only trust sleep mode measurements the fundamental principles and rea- DAMIAN A. TAMBURRI is an assistant professor in the Experiment 2 A polling interval of 1 h 0.0137 86 made over long runs. soning methods for architectural de- Jheronimus Academy of Data Science and Technical University of This insight highlights a new les- sign don’t change. Eindhoven. Contact him at [email protected]. chose a polling interval of 1 hour. C 5 (0.012874 1 0.01215) * son: the need to build both archi- As we said before, you can’t man- So, the historical data would be 0.97 5 0.02427 Wh tectural models and prototypes. On age what you don’t measure. So, ar- sent in batches of four records (one one hand, models let us efficiently chitects need to begin thinking about record each 15 min). The predicted For this experiment, we set up a reason about architectural quality making their software more energy power consumption was weather station with a polling interval attributes and their tradeoffs. On aware, monitoring it and adapting of 3,600 seconds and ran it for 8,225 the other hand, prototypes mandate it to environmental or application • Work Time 5 99 s 1 10.5 s 5 seconds (a little over two hours); see empirical testing, thus incrementally conditions. ,110 s Figure 5. The average current draw refining the assumptions built into • Consumed Power during Awake 5 was 0.0011344 A, and the total power both models and prototypes. References 0.0116034 1 0.00127 5 consumption was 0.0312867 Wh. The difference in power con- 1. T. Bindi, “Mobile and Tablet Internet -exceeds-desktop-for-fi rst-time https://www.forbes.com/sites 0.012874 Wh So, we calculated the average power sumption from Experiment 1 was Usage Surpasses Desktop for First -statcounter. /forbestechcouncil/2017/12/15 • Sleep Time 5 3,600 s 5 60 min consumption per hour as 0.0527 – 0.013694 5 0.039006 Wh Time: StatCounter,” ZDNet, 2 Nov. 2. R. Danilak, “Why Energy Is a Big /why-energy-is-a-big-and-rapidly • Consumed Power during Sleep (42 percent). The difference from the 2016; https://www.zdnet.com/article and Rapidly Growing Problem for -growing-problem-for-data-centers Time 5 60 * 0.0002025 5 E/(Duration/3,600) 5 original model (before we improved /mobile-and-tablet-internet-usage Data Centers,” Forbes, 15 Dec. 2017; /#65451c435a30. 0.01215 Wh 0.0312867/(8,225/3,600) 5 the design) was 0.0998 – 0.013694 5 • C per h 5 3,600/(110 1 3,600) 5 0.0312867/2.2847 5 0.013694 Wh 0.086106 Wh (86 percent). Table 2 0.97 summarizes these differences. • Estimated Average Power Con- Comparing the results with our As you can see, with some mod- sumption per h 5 (Ew 1 Es) * estimated consumption of 0.02427 Wh, est changes to the design, we reaped

106 IEEE SOFTWARE | WWW.COMPUTER.ORG/SOFTWARE | @IEEESOFTWARE www.computer.org/computingedge SEPTEMBER/OCTOBER 2018 | IEEE SOFTWARE 10715 IEEE TRANSACTIONS ON SUBMIT TODAY SUSTAINABLE COMPUTING

SCOPE

The IEEE Transactions on Sustainable Computing (T-SUSC ) is a peer-reviewed journal devoted to publishing high-quality papers that explore the different aspects of sustainable computing. The notion of sustainability is one of the core areas in computing today and can cover a wide range of problem domains and technologies ranging from software to hardware designs to application domains. Sustainability (e.g., energy efficiency, natural resources preservation, using multiple energy sources) is needed in computing devices and infrastructure and has grown to be a major limitation to usability and performance.

Contributions to T-SUSC must address sustainability problems in different computing and information processing environments and technologies, and at different levels of the computational process. These problems can be related to information processing, integration, utilization, aggregation, and generation. Solutions for these problems can call upon a wide range of algorithmic and computational frameworks, such as optimization, machine learning, dynamical systems, prediction and control, decision support systems, meta-heuristics, and game-theory to name a few.

T-SUSC covers pure research and applications within novel scope related to sustainable computing, such as computational devices, storage organization, data transfer, software and information processing, and efficient algorithmic information distribution/processing. Articles dealing with hardware/software implementations, new architectures, modeling and simulation, mathematical models and designs that target sustainable computing problems are encouraged.

SUBSCRIBE AND SUBMIT For more information on paper submission, featured articles, calls for papers, and subscription links visit: www.computer.org/tsusc INTERVIEW Editor: Gary McGraw, [email protected]

Do you think we’ve made progress as Silver Bullet Talks with a field in those seven years, beyond raising awareness? There are still problems, but I think Ksenia Dmitrieva-Peguero we’re making progress. The field has definitely grown. There’s more demand and more understanding of Gary McGraw | Synopsis the kinds of problems we’re trying to solve, of the difference between software security and all the other things that have to do with IT and Hear the full podcast at www.computer.org/silverbullet. Show links, notes, security in a company. In terms of and an online discussion can be found at www.cigital.com/silverbullet. quality, I’m not sure.

Maybe the quality remained constant? There’s more software and there- training sessions. She’s passionate fore more bugs, and there are more about cutting-edge web technolo- things to be attacked. Now we have gies and probing systems security, software in mobile devices and cars and she speaks regularly around the and everywhere else. So, it might world on topics such as HTML5, look like there are more security CSP, and JavaScript. issues, but it’s probably also society being more aware of the issues. It’s You’ve been doing hands-on soft- hard to say if we’ve made any quali- ware security in the real world tative progress. for many years. How has the field evolved since you started doing con- If you could fix any practice area in sulting seven years ago? software security today, which one Seven years ago, a lot of people would it be? senia Dmitrieva-Peguero is a didn’t know what we were talk- Probably threat modeling. Architec- K principal consultant in Syn- ing about when we talked about ture review follows because that’s opsys’ Software Integrity Group. security. We would look for people the more complicated one, and She has many years of hands-on to hire with programming expe- people have very different descrip- experience in software and sys- rience, and we didn’t care about tions and understanding of what it tems security and is an expert in security experience. You would is. Everybody has their own version many practices including penetra- learn everything on the job. These of threat modeling and risk analysis. tion testing, static analysis tool de- days, there are degrees in secu- sign and execution, customization rity, and we expect candidates to It might be because we can build a and deployment, and threat mod- know security basics and have program to go through our code eling. Throughout her career as a some experience. When you talk and look for bugs, but we can’t build consultant, Dmitrieva-Peguero has to clients, they know what soft- a similar system to go through a established and evolved secure cod- ware security is. They’ve definitely design and look for flaws. ing guidance and best practices for heard of penetration testing and Absolutely. There’s less tool- many different firms and has deliv- static analysis. Today, the aware- ing because the process is much ered numerous software security ness is much higher. more involved. Maybe with the

2469-7087/19/$33.00 © 2019 IEEE Published by the IEEE Computer Society February 2019 17 1540-7993/17/$33.00 © 2017 IEEE Copublished by the IEEE Computer and Reliability Societies September/October 2017 7 INTERVIEW

development of artificial intelli- a third-party library, which is often developer standpoint, when they gence we’ll be better able to develop a better solution—a little more vali- introduce new features, they often some sort of automation for under- dated, updated, and secure. But in say, “Hey, this is a breaking build, standing design problems. both cases, they have opportuni- and we don’t really care about the ties to screw things up because even older versions. We don’t have the Frameworks like Angular, Node, and if you use a third-party library or requirements or the desire to sup- Express have been popping up all a plug-in, it usually requires some port whatever was built before. So, over the place. In your view, do all configuration. We’ve seen these we’re just going to go ahead and frameworks have the same security examples in Angular where a plug-in start a new version.” posture? comes with a pretty good default Definitely not. All frameworks have security setting, but it doesn’t sat- You have to figure out what they did, the problem of building security isfy some features that developers and then figure out how to adjust in and how much each framework need. So developers turn off secu- whatever you built to work again ev- developer or maintainer wants to rity settings; they change them and ery time? make security a part of it. Unfor- then the plug-in doesn’t do the job If a developer started using Angu- tunately, these days that’s not the it’s supposed to do. lar 1.6 before Angular 2 came out, highest priority for the framework sometimes they’ll just stick with creators. Very few even bother with You’ve spent time digging into An- the old version because they’d it. Then developers are left to solve gular and thinking about how to au- have to rewrite their whole appli- all the security problems themselves tomate aspects of security analysis. cation due to the breaking changes while they’re using the framework. What have you learned about Angu- in the framework. Not every com- Instead of having the features built lar, and what have you done to get pany will do that. If developers are in, they have to reinvent the wheel that into automation? still using the older 1.6 version, the every single time. Angular is interesting. On one hand, tools that were built for 1.6 will still On one hand, security should be it’s client-side code. We don’t trust work. But they won’t work for a the responsibility of the framework anything that runs on the client. A new application. creators and maintainers. On the lot of things can be bypassed. But other hand, the developers using on the other hand, the way the tem- Do you think version controlling and the framework should demand it. plating is done in Angular provides keeping everything somewhat simi- When they choose a framework, good protection from cross-site lar in terms of its security posture they should ask about its security scripting even if malicious data are the biggest open problems? posture. I don’t see any questions is passed through the server-side I think one of the problems is on forums like Stack Overflow say- code. In terms of automatically find- keeping up with all the versions ing, “Hey, which framework is more ing security problems, that’s still a and upgrading to the new features secure?” Usually you see a question big question. There isn’t a good tool and plug-ins. The second problem like, “Hey, which framework has for JavaScript today. It’s possible to is keeping up with the variety of more visual components, features, find some dataflow and cross-site frameworks. Last year, Angular was or is faster?” scripting issues, but the issues that number one, but I think it’s fading have to do with the configuration out and React is stepping in. Now So, the developers building the of the framework or plug-ins are we’ll need to build automation tools frameworks don’t really deal with harder because they are updated for React. Six months from now, it’ll security, and the developers using so often. We’re always chasing the be something else. the frameworks can screw every- updates that happen every couple thing up from a security perspective. of months. When a new version of Here’s a trick question. What’s more Right. For validation, developers Angular comes out, not everything important: code review or architec- might write their own routine to changes, but there might be enough ture risk analysis [ARA]? validate email addresses, and that changes that our automated tools It depends on the application. If will most likely be code copied from don’t find things anymore. it’s a standard web app that has a Stack Overflow or another GitHub database, a back end, and a front repository where the quality isn’t Have you talked to the developers end, you might not gain as much guaranteed. Depending on the building Angular? Are they aware of doing ARA. You could start with a developer, the framework, and how what you’re doing? code review and get more bang for well the library is integrated with We haven’t communicated directly your buck that way. But if it’s soft- the framework, they might be using with them. But from the Angular ware in a car that isn’t standard or

18 ComputingEdge February 2019 8 IEEE Security & Privacy September/October 2017 INTERVIEW This article originally appeared in IEEE Security & Privacy, vol. 15, no. 5, 2017. development of artificial intelli- a third-party library, which is often developer standpoint, when they another very complicated applica- gence we’ll be better able to develop a better solution—a little more vali- introduce new features, they often tion then, yes, we should definitely About Ksenia Dmitrieva-Peguero some sort of automation for under- dated, updated, and secure. But in say, “Hey, this is a breaking build, start with ARA. senia Dmitrieva-Peguero is a principal consultant at standing design problems. both cases, they have opportuni- and we don’t really care about the K Synopsys, where she leads a JavaScript research group that ties to screw things up because even older versions. We don’t have the How do you think code review and concentrates on common web application vulnerabilities and Frameworks like Angular, Node, and if you use a third-party library or requirements or the desire to sup- ARA are related? best practices. Her key areas of expertise include Web 2.0, Java- Express have been popping up all a plug-in, it usually requires some port whatever was built before. So, If you’re reviewing a complex sys- Script, HTML5, and Content Security Policy. Dmitrieva-Peguero over the place. In your view, do all configuration. We’ve seen these we’re just going to go ahead and tem, you’ll start with an ARA, loves studying new technologies, finding their vulnerabilities, frameworks have the same security examples in Angular where a plug-in start a new version.” which will help you identify poten- and discovering ways to protect them. She presents at confer- posture? comes with a pretty good default tial issues. To find out if these issues ences frequently, including AppSec Europe, BSides Security Definitely not. All frameworks have security setting, but it doesn’t sat- You have to figure out what they did, actually exist in the system, you London, and RSA Asia Pacific and Japan. Dmitrieva-Peguero the problem of building security isfy some features that developers and then figure out how to adjust would do a code review. You would received an MS in computer science from George Washington in and how much each framework need. So developers turn off secu- whatever you built to work again ev- look at something and say, “Hey, this University. Outside of the office, she is a competitive ball- developer or maintainer wants to rity settings; they change them and ery time? system is talking to this third-party room dancer. She lives in Virginia with her husband and their make security a part of it. Unfor- then the plug-in doesn’t do the job If a developer started using Angu- back end that has its own protocol. brand-new baby girl. tunately, these days that’s not the it’s supposed to do. lar 1.6 before Angular 2 came out, How about we review the code of highest priority for the framework sometimes they’ll just stick with this third-party back end and this creators. Very few even bother with You’ve spent time digging into An- the old version because they’d protocol and how the communi- it. Then developers are left to solve gular and thinking about how to au- have to rewrite their whole application happens?” In doing code US. The conference is more like Oh, come on, always. One last ques- all the security problems themselves tomate aspects of security analysis. cation due to the breaking changes review of this scope, we might find the exchange of ideas and not the tion. One of the coolest things you while they’re using the framework. What have you learned about Angu- in the framework. Not every com- some issues. exchange of business cards and pro- do is competitive ballroom dancing. Instead of having the features built lar, and what have you done to get pany will do that. If developers are motional materials. When did you start dancing, and did in, they have to reinvent the wheel that into automation? still using the older 1.6 version, the You’ve experienced and lived in dif- you ever think about going pro? every single time. Angular is interesting. On one hand, tools that were built for 1.6 will still ferent cultures all over the world. Do As a hardcore technologist and a It’s been about 10 years. I’ve never On one hand, security should be it’s client-side code. We don’t trust work. But they won’t work for a diverse cultures approach computer woman, what’s your view of sexism thought about going pro because I the responsibility of the framework anything that runs on the client. A new application. security differently, or is it the same? in the field? Do you think it’s harder started dancing in Russia. Most peo- creators and maintainers. On the lot of things can be bypassed. But I don’t think there’s much differ- to gain respect as a technologist if ple in Russia start ballroom dancing other hand, the developers using on the other hand, the way the tem- Do you think version controlling and ence in the countries I’ve lived and you’re a woman? when they are six or seven years old. the framework should demand it. plating is done in Angular provides keeping everything somewhat simi- worked in. I’m from Russia and Yes, unfortunately, it’s there. It’s If you’re starting after high school, When they choose a framework, good protection from cross-site lar in terms of its security posture have worked in Europe and the US. harder to gain respect for sure, and there’s no way you can become a they should ask about its security scripting even if malicious data are the biggest open problems? I didn’t see any differences in how I think it’s especially hard to gain professional—it’s treated only as a posture. I don’t see any questions is passed through the server-side I think one of the problems is people understand or relate to com- respect at the middle level. When hobby. In the US, you can become on forums like Stack Overflow say- code. In terms of automatically find- keeping up with all the versions puter security. Maybe in other cul- you’re at a higher level—for example, a pro, but you have to make your life ing, “Hey, which framework is more ing security problems, that’s still a and upgrading to the new features tures, it’s different. I think if you look when speaking at a big conference— out of it. For me, it’s a hobby that I secure?” Usually you see a question big question. There isn’t a good tool and plug-ins. The second problem at other areas like education or psy- there are requirements to have men get a lot of enjoyment from. like, “Hey, which framework has for JavaScript today. It’s possible to is keeping up with the variety of chology or maybe medicine, there and women represented equally. more visual components, features, find some dataflow and cross-site frameworks. Last year, Angular was might be cultural differences, but if And a conference might actually or is faster?” scripting issues, but the issues that number one, but I think it’s fading you look at technical stuff—math tend to select talks from women he Silver Bullet Podcast with have to do with the configuration out and React is stepping in. Now and computers—I think it’s pretty more often than from men. At the T Gary McGraw is cosponsored So, the developers building the of the framework or plug-ins are we’ll need to build automation tools straightforward in any culture. middle level, working with cli- by Cigital (part of Synopsys) and frameworks don’t really deal with harder because they are updated for React. Six months from now, it’ll ents and establishing your position this magazine and is syndicated by security, and the developers using so often. We’re always chasing the be something else. You give a lot of talks all over the and trust as a woman can be very SearchSecurity. the frameworks can screw every- updates that happen every couple place. What’s your favorite confer- challenging. thing up from a security perspective. of months. When a new version of Here’s a trick question. What’s more ence to attend or speak at? Gary McGraw is vice president of Right. For validation, developers Angular comes out, not everything important: code review or architec- One of my favorites so far has been You have to sort of prove yourself a security technology at Synop- might write their own routine to changes, but there might be enough ture risk analysis [ARA]? AppSec in Europe. There’s a great little bit more? sys. He’s the author of Software validate email addresses, and that changes that our automated tools It depends on the application. If concentration of web technology Yes. I’ve been in situations where I Security: Building Security In will most likely be code copied from don’t find things anymore. it’s a standard web app that has a experts, which is my playground. was working with a male colleague (Addison-Wesley 2006) and eight Stack Overflow or another GitHub database, a back end, and a front I really enjoy communicating and the client would interact with other books. McGraw received a repository where the quality isn’t Have you talked to the developers end, you might not gain as much and interacting with and listen- him but not with me. BA in philosophy from the Uni- guaranteed. Depending on the building Angular? Are they aware of doing ARA. You could start with a ing to all these amazingly smart versity of Virginia and a dual PhD developer, the framework, and how what you’re doing? code review and get more bang for people. In Europe, I think people Even though you probably knew in computer science and cognitive well the library is integrated with We haven’t communicated directly your buck that way. But if it’s soft- are very relaxed and friendly and more than the other guy? science from Indiana University. the framework, they might be using with them. But from the Angular ware in a car that isn’t standard or less commercialized than in the Sometimes, yes. Contact him via garymcgraw.com.

www.computer.org/computingedge 19 8 IEEE Security & Privacy September/October 2017 www.computer.org/security 9 IEEE TRANSACTIONS ON SUBMIT TODAY BIG DATA

SCOPE

The IEEE Transactions on Big Data (TBD) publishes peer reviewed articles with big data as the main focus. The articles provide cross disciplinary innovative research ideas and applications results for big data including novel theory, algorithms and applications. Research areas for big data include, but are not restricted to, big data analytics, big data visualization, big data curation and management, big data semantics, big data infrastructure, big data standards, big data performance analyses, intelligence from big data, scientific discovery from big data security, privacy, and legal issues specific to big data. Applications of big data in the fields of endeavor where massive data is generated are of particular interest.

SUBSCRIBE AND SUBMIT For more information on paper submission, featured articles, calls for papers, and subscription links visit: www.computer.org/tbd Editor: Giuliano Antoniol Polytechnique Montréal INVITED CONTENT [email protected]

Editor: Steve Counsell Brunel University Editor: Phillip Laplante [email protected] Pennsylvania State University [email protected]

Software Engineering for Machine-Learning Applications

The Road Ahead

Foutse Khomh, Bram Adams, Jinghui Cheng, Marios Fokaefs, and Giuliano Antoniol

THE NEED AND desire for more auto- to address these challenges. In fact, experts could come together to dis- mation and intelligence have led to the learned behavior of an ML-based cuss challenges, new insights, and breakthroughs in machine learning system might be incorrect, even if practical ideas regarding the engi- (ML) and artifi cial intelligence (AI), the learning algorithm is imple- neering of ML- and AI-based sys- yet we still experience failures and mented correctly, a situation in tems. The program included talks shortcomings in the resulting soft- which traditional testing techniques and panels presented by renowned ware systems. The main reason is the are ineffective. A critical problem is academic researchers and indus- shift in the development paradigm in- how to effectively develop, test, and trial practitioners, including keynote duced by ML and AI. Traditionally, evolve such systems, given that they speakers David Parnas, Lionel Briand, software systems are constructed don’t have (complete) specifi cations and Yoshua Bengio. The full pro- deductively, by writing down the or even source code corresponding gram is at http://semla.polymtl.ca. rules that govern the system behav- to some of their critical behaviors. Here, we summarize some key chal- iors as program code. However, Motivated by these challenges, we lenges these experts identifi ed. with ML techniques, these rules are organized the First Symposium on inferred from training data (from Software Engineering for Machine System Accuracy which the requirements are gener- Learning Applications (SEMLA) at The fi rst topic concerned the accu- ated inductively). This paradigm Polytechnique Montréal on 12 and racy of systems built using ML and shift makes reasoning about the be- 13 June 2018, with the kind support AI models, and the responsibilities of havior of software systems with ML of Polytechnique Montréal’s Depart- engineers building them. For exam- components diffi cult, resulting in ment of Computer Engineering and ple, one keynote speaker mentioned software systems that are intrinsi- Software Engineering, the Institute three categories of AI research: cally challenging to test and verify. for Data Valorization (IVADO), SAP, Given the critical and increasing and Red Hat. The event attracted • building programs that imitate role of ML- and AI-based systems around 160 participants from all over human behavior to better under- in our society, it’s imperative for the world, including students, aca- stand human thinking (used in both the software engineering (SE) demics, and industrial practitioners. psychology research), and ML communities to research SEMLA’s main objective was to • building programs that play games and develop innovative approaches create a space in which SE and ML well (challenging and fun), and

• demonstrating that practical other domains (such as requirement systems, since an AI system’s behav- computerized products can use elicitation) are more challenging. ior might be incorrect even if the the same methods that humans Overall, AI’s full impact on SE is learning algorithms are implemented use (risky and often naive). still unclear. correctly. One keynote speaker Because of AI and ML systems’ explained how in complex cyber- He stressed that researchers should intrinsic imperfection, one panelist physical systems (CPSs), when no be very concerned about AI systems argued, only harmless AI technology clear specifications of the intended in the third category because they or applications should be released to systems exist (that is, humans have a can’t guarantee 100 percent accu- the public, since the responsibility of lot of knowledge but can’t formalize racy or correct answers in all cases. every engineer is to protect the pub- it), only AI can approximate the sys- He also raised concerns that people lic. He also mentioned that the pub- tem’s intended behavior by learning are using the Turing test to falsely lic should be informed accurately of models from the available data. claim intelligence in systems. He the AI technology it’s being exposed This is a clear improvement over commented, “Turing did not claim to. For example, instead of touting a the manual design of models and that his test was a test for artificial “100 percent self-driving car,” auto- controllers. However, it pushes most intelligence!” motive companies should advertise of the risk toward the trained models’ In response, a leading AI expert their products as “AI-assisted cars,” quality. So, how can we perform stated that AI’s goal is not to achieve with a clear list of the ways in which adequate quality assurance (QA) of 100 percent accuracy because AI is assisting. AI models, given that the number Another panelist emphasized that of environments in which the mod- • humans are also far from 100 AI isn’t a panacea. He illustrated els will be deployed is unlimited and percent accuracy in their daily how simple techniques could give that the human operator will re- tasks, and the illusion of AI, or how the blind quire a detailed explanation of any • AI technology’s strength comes application of AI wouldn’t improve failures? from the ability to abstract up the workflow of workers. For ex- Fortunately, we can use AI tech- from different factors of varia- ample, in principle, an intelligent nology to reduce the search space of tion between environments, to robot could easily replace a human the environments to be tested, nudg- obtain models that can general- worker to hand another worker the ing QA techniques to those environ- ize and transfer to situations that right tool for a given job, but not if ments most likely to have failures or weren’t encountered before. the worker afterward throws the violate important safety constraints. tool back on a pile. (The robot will Such an approach could even work He further explained that AI tech- have a hard time retrieving the right in the system-of-systems context of nologies’ main challenge is the curse tool from an unordered pile.) How- CPSs, where each sensor and actua- of dimensionality—that is, the need ever, using an intelligent robot to tor must be validated not only in iso- for sufficient, labeled data to cover return tools in an ordered fashion lation but also in close integration all important factors (features) of (which is a different problem) could with each other. a given problem. AI, in fact, needs allow other robots later on to be de- However, this QA doesn’t guard more training data than humans do! ployed to hand over tools to work- against hardware failure. So, hard- Whereas the key properties of ers. If a traditional computer science ware systems should incorporate techniques such as deep learning algorithm can solve a problem, we fault-tolerance mechanisms to cope (for example, compositionality, en- should just use that. with such failures. One audience par- coding into a simpler domain, and ticipant also observed that hardware conditional computation) aim to re- System Testing could incorporate fault-tolerance duce dimensionality’s impact, appli- The second hot topic our experts dis- mechanisms to mitigate the effect of cations of AI still risk being limited cussed was the difficulty of testing AI model errors, improving AI sys- to domains in which labeled data ML and AI systems. Our panelists tems’ robustness. is cheap. Although labeled data is debated whether we should tackle Another major challenge is that somehow abundant in some SE do- the testing of those systems the same humans, once they’ve started trust- mains (such as defect prediction), way we do the testing of traditional ing AI in their daily tasks, could

2282 IEEE SOFTWAREComputingEdge | WWW.COMPUTER.ORG/SOFTWARE | @IEEESOFTWARE February 2019 INVITED CONTENT INVITED CONTENT

that his test was a test for artificial “100 percent self-driving car,” auto- controllers. However, it pushes most AUTHORS THE ABOUT [email protected]. [email protected]. intelligence!” motive companies should advertise of the risk toward the trained models’ In response, a leading AI expert their products as “AI-assisted cars,” quality. So, how can we perform stated that AI’s goal is not to achieve with a clear list of the ways in which adequate quality assurance (QA) of 100 percent accuracy because AI is assisting. AI models, given that the number JINGHUI CHENG is an assistant professor in Another panelist emphasized that of environments in which the mod- Polytechnique Montréal’s Department of Com- • humans are also far from 100 AI isn’t a panacea. He illustrated els will be deployed is unlimited and puter Engineering and Software Engineering. percent accuracy in their daily how simple techniques could give that the human operator will re- Contact him at [email protected]. tasks, and the illusion of AI, or how the blind quire a detailed explanation of any • AI technology’s strength comes application of AI wouldn’t improve failures? from the ability to abstract up the workflow of workers. For ex- Fortunately, we can use AI tech- from different factors of varia- ample, in principle, an intelligent nology to reduce the search space of tion between environments, to robot could easily replace a human the environments to be tested, nudg- obtain models that can general- worker to hand another worker the ing QA techniques to those environ- ize and transfer to situations that right tool for a given job, but not if ments most likely to have failures or weren’t encountered before. the worker afterward throws the violate important safety constraints. tool back on a pile. (The robot will Such an approach could even work begin adapting their behavior to the unanimously disagreed because hu- some open problems and what they He further explained that AI tech- have a hard time retrieving the right in the system-of-systems context of AI assistance. For example, a study mans are essential for putting the consider to be their biggest needs nologies’ main challenge is the curse tool from an unordered pile.) How- CPSs, where each sensor and actua- in Munich showed how assisted decisions of AI into context. Al- and top priorities. of dimensionality—that is, the need ever, using an intelligent robot to tor must be validated not only in iso- braking initially reduced the number though the outcome (and potential For example, a presenter from for sufficient, labeled data to cover return tools in an ordered fashion lation but also in close integration of accidents, until drivers relied too failures) of the AI impacts the hu- Google Brain pinpointed several all important factors (features) of (which is a different problem) could with each other. much on the assisted braking and mans’ recommendations, those rec- programming-language issues in- a given problem. AI, in fact, needs allow other robots later on to be de- However, this QA doesn’t guard drove more aggressively. So, when ommendations are also a human volved in ML libraries, models, and more training data than humans do! ployed to hand over tools to work- against hardware failure. So, hard- is an AI-enabled product ready for fi lter for AI failures. Further socio- frameworks. Different approaches Whereas the key properties of ers. If a traditional computer science ware systems should incorporate release to the public? Although four logical research is necessary to study in the current libraries have differ- techniques such as deep learning algorithm can solve a problem, we fault-tolerance mechanisms to cope million miles of test drives can’t pre- how AI technology affects human ent advantages and disadvantages. (for example, compositionality, en- should just use that. with such failures. One audience par- vent a serious accident in the next behavior. Creating an effi cient syntax for coding into a simpler domain, and ticipant also observed that hardware mile, how much information in the automatic differentiation that can conditional computation) aim to re- System Testing could incorporate fault-tolerance test drives can be used to debug and Industrial Applications deliver ease of implementation, per- duce dimensionality’s impact, appli- The second hot topic our experts dis- mechanisms to mitigate the effect of fi x the corresponding fault? SEMLA’s second day was devoted to formance, usability, and fl exibility is cations of AI still risk being limited cussed was the difficulty of testing AI model errors, improving AI sys- An additional important ques- industrial applications of AI. These important but diffi cult. Testing and to domains in which labeled data ML and AI systems. Our panelists tems’ robustness. tion discussed at SEMLA is humans’ industrial speakers discussed the debugging these implementations is cheap. Although labeled data is debated whether we should tackle Another major challenge is that role in an AI-driven world. Are hu- current state of AI in industry and are also salient challenges. Industrial somehow abundant in some SE do- the testing of those systems the same humans, once they’ve started trust- mans obsolete once AI technologies the challenges they face when apply- practitioners further mentioned that mains (such as defect prediction), way we do the testing of traditional ing AI in their daily tasks, could become mainstream? The panelists ing AI models. They also discussed collaboration among experts from

82 IEEE SOFTWARE | WWW.COMPUTER.ORG/SOFTWARE | @IEEESOFTWARE www.computer.org/computingedge SEPTEMBER/OCTOBER 2018 | IEEE SOFTWARE 8323 INVITED CONTENT

different fields is important for de- Unfortunately, a rift exists between systems. On the other hand, miss- veloping ML applications. these two communities, which we ing test cases or test cases that are tried to understand through SEMLA. known to fail (but are rare) are a Healing the Rift One reason for this rift is that stake- larger issue for ML than for regular From these two days of intensive dis- holders in the AI community focus software systems. cussions, two key questions emerged: on algorithms and their performance characteristics, whereas stakehold- • How should software developers in the SE community focus on e believe that the SE ment teams integrate the AI implementing and deploying those and ML communities model lifecycle (training, testing, algorithms. W should work together deploying, evolving, and so on) So far, no real venue has inte- to solve the critical challenges of as- into their software process? grated both fields, yet intersections suring the quality of AI and software • What new roles, artifacts, and exist between them, one of which systems in general. We have a lot to activities come into play, and is testing. The notion of coming up benefit from each other! how do they tie into existing with ways to break a system is in- agile or DevOps processes? tegral to ML, and the scale of test Read your subscriptions sets (thousands to millions of in- through the myCS This article originallypublications appeared portal at in Answering these questions requires stances) is huge compared to the IEEEhttp://mycs.computer.org Software, vol. 35, no. 5, 2018. combined knowledge in SE and ML. number of test cases in software

Rejuvenating Binary Executables ■ Visual Privacy Protection ■ Communications Jamming Policing Privacy ■ Dynamic Cloud Certification■ Security for High-Risk Users Smart TVs ■ Code Obfuscation ■ The Future of Trust Take the CSIEEE SymposiumLibrary on whereverSecurity and Privacy you go!

January/February 2016 March/April 2016 May/June 2016 Vol. 14, No. 1 IEEE Computer Society magazinesVol. 14, No. 2 and Transactions are now Vol. 14, No. 3 available to subscribers in the portable ePub format.

Just download the articles from the IEEE Computer Society Digital Library, and you can read them on any device that supports ePub. For more information, including a list of compatible devices, visit IEEE Security & Privacy magazine provides articles with both a practical andwww.computer.org/epub research bent by the top thinkers in the ﬁ eld. • stay current on the latest security tools and theories and gain invaluable practical and research knowledge, • learn more about the latest techniques and cutting-edge technology, and computer.org/security • discover case studies, tutorials, columns, and in-depth interviews and podcasts for the information security industry.

2484 IEEE SOFTWAREComputingEdge | WWW.COMPUTER.ORG/SOFTWARE | @IEEESOFTWARE February 2019

DEPARTMENT: Internet of Things

From Raw Data to Smart Manufacturing:

AI and Semantic Web of Things for Industry 4.0

Pankesh Patel Fraunhofer USA AI techniques combined with recent advancements in Muhammad Intizar Ali the Internet of Things, Web of Things, and Semantic National University of Web—jointly referred to as the Semantic Web— Ireland promise to play an important role in Industry 4.0. As Amit Sheth Kno.e.sis Center part of this vision, the authors present a Semantic Web of Things for Industry 4.0 (SWeTI) platform. Through realistic use case scenarios, they showcase how SweTI technologies can address Industry 4.0’s challenges, facilitate cross-sector and cross-domain integration of systems, and develop intelligent and smart services for smart manufacturing.

Industry 4.0 refers to the Fourth Industrial Revolution—the recent trend of automation and data exchange in manufacturing technologies. To fully realize the Industry 4.0 vision, manufacturers need to unlock capabilities such as vertical integration through connected and smart manufacturing assets of a factory, horizontal integration through connected discrete operational systems of a factory, and end-to-end integration throughout the entire supply chain. Several architectures and conceptual platforms (for example, RAMI 4.0 in Europe, IIRA in the US) have been proposed to develop Industry 4.0 applications.1 However, these reference architectures are largely missing the granularity of Semantic Web and AI technologies. We believe that combined AI and Seman- tic Web technologies are a good fit for the plethora of complex problems related to interoperability, automated, flexible, and self-configurable systems such as Industry 4.0 systems.2 In this article, we present a Semantic Web of Things for Industry 4.0 (SWeTI) platform for building Industry 4.0 applications. We first present a representative set of Industry 4.0 use cases, followed by a layered SWeTI platform for building these use cases. Each layer contains a variety of tools and techniques to build smart applications that can process raw sensory information and support smart manufacturing using Semantic Web and AI techniques. Finally, we present our existing tools and middleware for realizing the SWeTI platform.

USE CASES Below is a representative set of Industry 4.0 use cases that can potentially leverage Semantic Web technologies.

Use Case 1: Deep Integration In this use case, an integrated and holistic view of a factory is established to improve decision- making across different departments and to reduce overall complexity. This includes the inter- linking of diverse data sources such as sensor measurements (for example, temperature, vibra- tion, pressure, power), the manufacturing execution system (for example, work orders, material needed for the production, incoming material number), business processes, work force, and so on. Although much of this data is already captured by IT systems, it remains largely inaccessible in an integrated way without investing significant manual effort. Thus, the objective of this use case is to make all data available in a unified model to support users (factory planners, machin- ists, controllers, field technician, and so forth) in decision-making. For example, consider the following scenarios:

• A factory planner needs input from diverse sources regarding order plans, machine maintenance schedules, workforce availability, and so on.3 • A field technician must quickly troubleshoot an onsite industrial asset, and is seeking a solution that combines a summary of the problem, including difficulty and time estimates; links to relevant manuals and necessary parts; additional physical tools to resolve the problems and the current location of these tools; and, if the problem is difficult to resolve, additional support from people with the necessary expertise. • During production, a machinist needs to know which tools are required to perform the task at hand, the location of these tools and materials, and quality control standards to be adhered to.3

Use Case 2: Horizontal Integration This use case extends the vertical integration of all factory operation into the horizontal dimension, knitting together the relevant players in a manufacturing supply chain—the raw materials and parts suppliers, logistics, inventory of supplied goods, production process, warehouses and distributors of finished products, sales and marketing, customers—through an interconnected networks of Internet of Things (IoT) devices and external information sources such as social media and web services (for example, financial services or weather forecasting), overseen via an overarching semantic-enabled engine. Some examples include the following:

• A smart factory manager wants to optimize supply chain and warehouse facilities to ensure that the right amount of raw material is always available in the warehouse to support production processes. Because the factory produces customized products, allowing customers to choose their food products’ ingredients, it is difficult to estimate the amount and type of raw material required to fulfill customer orders. However, efficient data-mining and machine-learning techniques that harness social media data can provide insights into ongoing trends and customer preferences, which will help the warehouse manager optimize the supply chain and ensure that the right amount of raw ingredients is always available in the warehouse to replenish the processing machine at the food production factory. • A production manager in a manufacturing unit needs an integrated view of the supply chain, including raw materials and the distribution network, to optimize internal manufacturing processes. A horizontal integration of smart factory production processes, supply, and the distribution network including fleet management data and external da- tasets such as traffic congestion, weather, and social media is required to build an optimal strategy for the production of perishable food products. Using this integrated information, the production manager can adapt internal business processes on the fly.

+VMZ"VHVTU26 ComputingEdge XXXDPNQVUFSPSHJOUFMJHFOUFebruary 2019 IEEE INTELLIGENT SYSTEMS INTERNET OF THINGS

USE CASES Use Case 3: Autonomous System This class of use cases deals with enabling factory devices to cooperate to achieve the factory’s Below is a representative set of Industry 4.0 use cases that can potentially leverage Semantic overall objectives. The following are just a few examples: Web technologies. • Self-organization. When a production order comes down to the factory, machines can communicate and exchange information with on another to organize resources to com- Use Case 1: Deep Integration plete the orders on time. Resource allocation is determined at runtime, rather than pre- In this use case, an integrated and holistic view of a factory is established to improve decision- allocated, depending on the machines’ current conditions, including current factory making across different departments and to reduce overall complexity. This includes the inter- workload at factory, machine availability and maintenance schedule time, backlog of linking of diverse data sources such as sensor measurements (for example, temperature, vibra- customer orders, and machine capacity. Moreover, resource allocation could consider external electricity rates data to achieve the goal of reducing the factor’s energy con- tion, pressure, power), the manufacturing execution system (for example, work orders, material sumption and carbon footprint. needed for the production, incoming material number), business processes, work force, and so • Flexible manufacturing. Decentralized control is very useful when market demands on. Although much of this data is already captured by IT systems, it remains largely inaccessible lead to the introduction of new machines in factories. The new machines can participate in an integrated way without investing significant manual effort. Thus, the objective of this use by simply announcing their services and features during the resource allocation process. case is to make all data available in a unified model to support users (factory planners, machin- This illustrates the flexibility and adaptability of a factory, where new machines can be ists, controllers, field technician, and so forth) in decision-making. For example, consider the integrated in a plug-and-produce fashion according to market demands with minimal following scenarios: downtime. • Fault tolerance. The highly dynamic and self-organization features result in a fault- • A factory planner needs input from diverse sources regarding order plans, machine tolerant system—faulty sensors can be replaced by discovering new sensors with similar maintenance schedules, workforce availability, and so on.3 functionality to prevent downtime during the production process. • A field technician must quickly troubleshoot an onsite industrial asset, and is seeking a solution that combines a summary of the problem, including difficulty and time estimates; links to relevant manuals and necessary parts; additional physical tools to resolve A SEMANTIC WEB OF THINGS FOR INDUSTRY 4.0 the problems and the current location of these tools; and, if the problem is difficult to resolve, additional support from people with the necessary expertise. PLATFORM • During production, a machinist needs to know which tools are required to perform the In this section, we present our SWeTI platform’s layered architecture for Industry 4.0 application task at hand, the location of these tools and materials, and quality control standards to design. Figure 1 depicts the overall architecture beginning with the data processing pipeline at be adhered to.3 the device-, sensor-, or machine-level and moving toward intelligent autonomous applications or dashboards at the application layer. In what follows, we briefly describe each layer and its com- Use Case 2: Horizontal Integration ponents and functionality. This use case extends the vertical integration of all factory operation into the horizontal dimension, knitting together the relevant players in a manufacturing supply chain—the raw materials and parts suppliers, logistics, inventory of supplied goods, production process, warehouses and distributors of finished products, sales and marketing, customers—through an interconnected networks of Internet of Things (IoT) devices and external information sources such as social media and web services (for example, financial services or weather forecasting), overseen via an overarching semantic-enabled engine. Some examples include the following:

• A smart factory manager wants to optimize supply chain and warehouse facilities to ensure that the right amount of raw material is always available in the warehouse to support production processes. Because the factory produces customized products, allowing customers to choose their food products’ ingredients, it is difficult to estimate the amount and type of raw material required to fulfill customer orders. However, efficient data-mining and machine-learning techniques that harness social media data can provide insights into ongoing trends and customer preferences, which will help the warehouse manager optimize the supply chain and ensure that the right amount of raw ingredients is always available in the warehouse to replenish the processing machine at the food production factory. • A production manager in a manufacturing unit needs an integrated view of the supply chain, including raw materials and the distribution network, to optimize internal manufacturing processes. A horizontal integration of smart factory production processes, supply, and the distribution network including fleet management data and external da- tasets such as traffic congestion, weather, and social media is required to build an opti- Figure 1. A layered view of the Semantic Web of Things for Industry 4.0 (SWeTI) platform. mal strategy for the production of perishable food products. Using this integrated information, the production manager can adapt internal business processes on the fly.

+VMZ"VHVTU XXXDPNQVUFSPSHJOUFMJHFOU +VMZ"VHVTUwww.computer.org/computingedge XXXDPNQVUFSPSHJOUFMJHFOU27 IEEE INTELLIGENT SYSTEMS

Device Layer At the factory floor, devices ranges from production machines (such as PLCs, industrial motors, pumps, and robots) to smart devices and tools (such as smartwatches, glasses, sensors, and smartphones) that provide extended human–machine interface and functionality. From a connec- tivity viewpoint, these could be legacy devices (which communicate through legacy protocols such as profibus, modbus, opc) as well as new equipment with embedded technology that allows these devices to communicate through recent IoT standards such as OPC-UA, MQTT, Bluetooth, and so on.

Edge Layer The edge layer transforms data generated by factory floor device into information. Typically, industrial gateways are deployed to this layer, which are relatively powerful devices compared to the lowest layer. The broad functionality at this layer is presented as follows:

• Interoperability. For instance, OPC-UA specifies real-time communication of plant data between control devices from different vendors (such as ABB or Siemens). Produc- tion Performance Management Protocol (PPMP; https://bit.ly/2P0WGUc) describes the payload definition of three types of messages generated from industrial assets: machine messages, measurement messages, process messages. This protocol is independent of transport protocols such as HTTP/REST, MQTT, and AMQP. • Analytics-based actions. Actions include reactions to various production events such as executing rule-based alerts and/or sending commands back (for example, reconfigu- ration of industrial machines) to the production equipment and tools. • Edge analytics. These provide data aggregation techniques, data filtering, and cleansing techniques to refine data, implementing a device-independent model as a base for decisions, and analytical logic for specific device domains and types.

Greengrass (https://aws.amazon.com/greengrass) is an edge-analytic software solution from Amazon. Similarly, Microsoft offers Azure IoT Edge. Greengrass has a small footprint that can run on gateway devices such as BeagleBone and Raspberry PI. Using Lambda functions, AWS Greengrass provides data filtering and computation capabilities. Developers can push small analytical capabilities by deploying lambda functions from AWS cloud (for further details, see “On Using the Intelligent Edge for IoT Analytics”4).

Cyber Layer The cyber layer acts as a distributed information hub. Having massive information gathered from diverse distributed information sources, this layer prepares ground for the data-analytic layer for specific analytics. Key forms these information sources take are enterprise wide knowledge: Industry 4.0 envisions the access of data across different players (logistics, customers, distributors, and supplier) in the supply chain. Information from various machines on factory floors (through the edge layer or directly from the device layer) is pushed to form the linked network of information (that is, Linked Data; https://bit.ly/29YZz5b) generated from production machines. Linked data is a natural fit because it provides an abstraction layer on top of a distributed set of data, stored across the supply chain. An alternate technology solution could be blockchain (https://bit.ly/2nna7Ac). It is decentralized and distributed across peer-to-peer networks. Each participant in the network can read and write data in blockchain. The blockchain network remains in sync. For instance, blockchain can be used for timely industrial asset maintenance, sharing necessary information across different organizations.5 Repair partners could monitor the blockchain for maintenance to minimize downtime in factory production and record their work on the blockchain for further use. The regulators of an industrial plant equipment would have access to asset records, allowing them to provide timely certification to ensure that the asset is safe for factory workers.

+VMZ"VHVTU28 ComputingEdge XXXDPNQVUFSPSHJOUFMJHFOUFebruary 2019 IEEE INTELLIGENT SYSTEMS INTERNET OF THINGS

Domain knowledge sources include the Linked Open Data (LOD), Linked Open Reasoning Device Layer (LOR), and Linked Open Services (LOS). These approaches reuse data, rules, and services, respectively, from the web.6 For instance, Sensor-based Linked Open Rules (S-LOR)6 is a data set At the factory floor, devices ranges from production machines (such as PLCs, industrial motors, of interoperable rules used to interpret data produced by sensors. LOS is intended to share and pumps, and robots) to smart devices and tools (such as smartwatches, glasses, sensors, and reuse services and applications that can be used to compose crossdomain Industry 4.0 applica- smartphones) that provide extended human–machine interface and functionality. From a connections. tivity viewpoint, these could be legacy devices (which communicate through legacy protocols such as profibus, modbus, opc) as well as new equipment with embedded technology that allows External sources are also available. Real-time data streams from external sources such as social these devices to communicate through recent IoT standards such as OPC-UA, MQTT, Bluetooth, media and tracking devices from logistics, weather, traffic, and news feeds can be brought into and so on. the cyber layer. This data can be linked with other information (for example, events affecting supply shipments, possibly with the support of information extraction techniques such as for event identification) at the data-analytic layer for strategic optimization, such as route network Edge Layer improvements. This would provide a fully transparent view of the supply chain and let managers make decisions to keep the supply chain flow moving. The edge layer transforms data generated by factory floor device into information. Typically, industrial gateways are deployed to this layer, which are relatively powerful devices compared to An alternative to the distributed information hub is to use the cloud to build Industry 4.0 applica- the lowest layer. The broad functionality at this layer is presented as follows: tions. Cloud-based manufacturing is a centralized single-shop place that allows manufacturers to apply industrial analytics on top of stored data. For example, Microsoft Azure • Interoperability. For instance, OPC-UA specifies real-time communication of plant da- (https://bit.ly/2KLKbs9) use the term “data lake” to store big data. A cloud-based centralized ta between control devices from different vendors (such as ABB or Siemens). Produc- repository allows users to store structured and unstructured data at any scale. Users can run dif- tion Performance Management Protocol (PPMP; https://bit.ly/2P0WGUc) describes the ferent analytics—from simple analytics such as data visualization to complex analytics such as payload definition of three types of messages generated from industrial assets: machine real-time analytics, machine learning, and big data processing. messages, measurement messages, process messages. This protocol is independent of transport protocols such as HTTP/REST, MQTT, and AMQP. Additional analytics on top of the data sources described above enables several capabilities, de- • Analytics-based actions. Actions include reactions to various production events such scribed in the next section. as executing rule-based alerts and/or sending commands back (for example, reconfigu- ration of industrial machines) to the production equipment and tools. • Edge analytics. These provide data aggregation techniques, data filtering, and cleansing Data-Analytic Layer techniques to refine data, implementing a device-independent model as a base for decisions, and analytical logic for specific device domains and types. The massive amount of data available on the cyber layer creates a distributed data lake, which creates an opportunity to add industrial analytics on top of this data lake by leveraging AI algorithms. The purpose of industrial analytics is to identify invisible relationships among data at the Greengrass (https://aws.amazon.com/greengrass) is an edge-analytic software solution from application layer (discussed in the next section), thus enabling decision-makers to make opti- Amazon. Similarly, Microsoft offers Azure IoT Edge. Greengrass has a small footprint that can mized decisions. run on gateway devices such as BeagleBone and Raspberry PI. Using Lambda functions, AWS Greengrass provides data filtering and computation capabilities. Developers can push small ana- The industrial analytics algorithms can be on-premise (written by an enterprise) or cloud-based. lytical capabilities by deploying lambda functions from AWS cloud (for further details, see “On Various cloud vendors allow manufacturers to store and integrate data, apply industrial analytics Using the Intelligent Edge for IoT Analytics”4). available in a cloud marketplace, and develop customized business solutions leveraging cloud- based tools. As an example, GE developed Predix (www.predix.io/catalog), an industrial Internet platform that offers a marketplace to deploy various apps and services, including predictive Cyber Layer maintenance, anomaly detection, and algorithms for the intelligent edge. Similarly, Azure AI gallery (https://gallery.azure.ai) offers a catalog service that finds different machine learning The cyber layer acts as a distributed information hub. Having massive information gathered from algorithms for the industrial Internet. Microsoft has integrated AzureML diverse distributed information sources, this layer prepares ground for the data-analytic layer for (https://studio.azureml.net), a visual programming environment, to develop machine-learning specific analytics. Key forms these information sources take are enterprise wide knowledge: algorithms. Therefore, AzureML developers can make their algorithms available to the commu- Industry 4.0 envisions the access of data across different players (logistics, customers, distribu- nity by publishing an Azure AI gallery. Recently, Siemens launched MindSphere tors, and supplier) in the supply chain. Information from various machines on factory floors (https://siemens.mindsphere.io; hosted on AWS), a cloud-based Industry 4.0 OS that lets manu- (through the edge layer or directly from the device layer) is pushed to form the linked network of facturers connect their industrial machines to the cloud and offers a marketplace (like an information (that is, Linked Data; https://bit.ly/29YZz5b) generated from production machines. AppStore) to use deployment-ready Industrial applications. Linked data is a natural fit because it provides an abstraction layer on top of a distributed set of data, stored across the supply chain. An alternate technology solution could be blockchain (https://bit.ly/2nna7Ac). It is decentralized Application Layer and distributed across peer-to-peer networks. Each participant in the network can read and write The application layer presents the acquired knowledge from the cyber layer to users (such as data in blockchain. The blockchain network remains in sync. For instance, blockchain can be domain experts and decision-makers) so that correct decisions can be taken. This layer is about used for timely industrial asset maintenance, sharing necessary information across different or- building meaningful and customized applications on top of services and data exposed by the ganizations.5 Repair partners could monitor the blockchain for maintenance to minimize down- data-analytic layer and presenting the acquired knowledge to the user in an appropriate manner. time in factory production and record their work on the blockchain for further use. The A broad variety of machine learning approaches7 have been developed to analyze and extract regulators of an industrial plant equipment would have access to asset records, allowing them to higher-level information from diverse IoT data. provide timely certification to ensure that the asset is safe for factory workers.

+VMZ"VHVTU XXXDPNQVUFSPSHJOUFMJHFOU +VMZ"VHVTUwww.computer.org/computingedge XXXDPNQVUFSPSHJOUFMJHFOU29 IEEE INTELLIGENT SYSTEMS

In recent years, industrial vendors have demonstrated a wide variety of Industry 4.0 applications, combining a range of technologies such as digital twin, chatbot/natural language processing, and augmented reality. For instance, by combining the functionality exposed by the data-analytic layer with an industrial asset, developers can create a digital twin—a virtual representation of an industrial asset. An extension of this work is to integrate different interfaces with the digital twin. GE Digital has demonstrated an extended demo of a digital twin (https://youtu.be/2dCz3oL2rTw); customers can ask the digital twin questions about its performance and potential issues in natural language and receive answers back in natural language. Moreover, customers can interact with digital twin through augmented reality (AR) devices such as the Microsoft HoloLens and obtain a 3D view of an asset (for example, a steam turbine) to analyze its internal parts. The Industry 4.0 application-building approaches are divided into two broad categories:

• Rapid application development (RAD). To address the longer development time problem, RAD tools provide abstractions that hide low-level programming details. For example, Node-RED (https://nodered.org) offers node to read data from an OPC-UA server. Similarly, Kura Wires (www.eclipse.org/kura) offers visual programming con- structs to acquire data using industrial protocols such as Modbus and OPC-UA. • Cloud-based tools. These tools offer services that implement common application development functions, such as connecting industrial assets to the cloud, data storage, and data visualization. Developers can configure these tools and develop applications very rapidly. Various vendors have started offering cloud-based tools for building Industry 4.0 applications. For instance, Amazon Lex (https://aws.amazon.com/lex) offers a service that develops conversational interfaces into any application using text and voice. Microsoft Azure Accelerator (https://bit.ly/2P1qEan) offers preconfigured and customi- zable solutions. Developers can deploy these solutions in minutes, connect factory floor assets, and deploy Industry 4.0 solutions such as remote monitoring, connected factory, and predictive maintenance.

SWETI PLATFORM COMPONENTS We continue to leverage our existing tools and middleware to build Industry 4.0 applications. We plan to use some open source tools for Industry 4.0 (https://bit.ly/2gIbVow) from Eclipse Foundation. Here, we present our existing open source tools for building Industry 4.0 applications.

IoTSuite IoTSuite (https://bit.ly/2M9FldX) is a framework for prototyping IoT applications by making application development easier by hiding development-related complexity from developers. It takes platform-independent high-level specifications as inputs, parses them, and generates platform-specific code that results in a distributed software system collaboratively hosted by heterogeneous IoT devices. The high-level specification includes specification about sensors, actuators, storage devices, and computational component specification as well as deployment specification that describe device properties. Thus, developers need not concern themselves with the platform- and runtime-specific aspects of development. The key characteristics of IoTSuite that make it suitable for our Industry 4.0 projects are its flexibility:

• It can generate code for different target programming languages (for example, C, C++, Python, Java) as the code generator for target programming language is exposed as a plug-in. To generate a new programming language, IoTSuite developers simply write a plug-in to generate code in a target programming language. • It can plug different runtime systems (for example, MQTT, CoAP, OPC-UA) as it ex- poses well-defined interfaces to plug different runtime systems. IoTSuite developers simply have to implement runtime specific interfaces to plug a target runtime system.

+VMZ"VHVTU30 ComputingEdge XXXDPNQVUFSPSHJOUFMJHFOUFebruary 2019 IEEE INTELLIGENT SYSTEMS INTERNET OF THINGS

This framework has been used for industrial-grade devices such as ABB’s RIO 600 as well as In recent years, industrial vendors have demonstrated a wide variety of Industry 4.0 applications, popular devices such as Raspberry PI, Arduino, and Android devices. The framework’s current combining a range of technologies such as digital twin, chatbot/natural language processing, and version integrates standards such as OPC-UA, MQTT, CoAP, and WebSocket, and generates augmented reality. For instance, by combining the functionality exposed by the data-analytic code in programming languages such as Node.js, Android, and Java. layer with an industrial asset, developers can create a digital twin—a virtual representation of an industrial asset. An extension of this work is to integrate different interfaces with the digital twin. GE Digital has demonstrated an extended demo of a digital twin Machine-to-Machine Measurement (https://youtu.be/2dCz3oL2rTw); customers can ask the digital twin questions about its performance and potential issues in natural language and receive answers back in natural language. The Machine-to-Machine Measurement (M3) framework (https://bit.ly/2OWMWdk) is intended Moreover, customers can interact with digital twin through augmented reality (AR) devices such to help build crossdomain IoT applications. It uses semantic technologies to achieve interopera- as the Microsoft HoloLens and obtain a 3D view of an asset (for example, a steam turbine) to bility among heterogeneous IoT systems. Reasoning over semantically annotated data produced analyze its internal parts. by IoT devices can help create user suggestions. For example, a body temperature value of 38 degrees could be associated with a naturopathy application that would suggest home remedies The Industry 4.0 application-building approaches are divided into two broad categories: when a constant high fever was sensed. This framework uses LOD, LOR, Linked Open Vocabu- laries (LOV), and LOS to enhance interoperability and get meaningful knowledge from data.6 • Rapid application development (RAD). To address the longer development time problem, RAD tools provide abstractions that hide low-level programming details. For ex- ACEIS (https://bit.ly/2OYQtYA) middleware contains a set of tools designed for IoT data ana- ample, Node-RED (https://nodered.org) offers node to read data from an OPC-UA lytics and uses Semantic Web technologies to build various components including automated server. Similarly, Kura Wires (www.eclipse.org/kura) offers visual programming con- streaming data discovery, integration on the fly, event detection, and contextually aware decision structs to acquire data using industrial protocols such as Modbus and OPC-UA. support systems.8 This middleware can be used to build smart applications for Industry 4.0. • Cloud-based tools. These tools offer services that implement common application development functions, such as connecting industrial assets to the cloud, data storage, and data visualization. Developers can configure these tools and develop applications very CONCLUSION rapidly. Various vendors have started offering cloud-based tools for building Industry 4.0 applications. For instance, Amazon Lex (https://aws.amazon.com/lex) offers a ser- Industry 4.0 is an emerging area of research. In this article, we described the example of the vice that develops conversational interfaces into any application using text and voice. SWeTI platform, which augments Semantic Web, AI, and data analytics to support the building Microsoft Azure Accelerator (https://bit.ly/2P1qEan) offers preconfigured and customi- of smart IoT applications for Industry 4.0. We presented a set of realistic use case scenarios to zable solutions. Developers can deploy these solutions in minutes, connect factory floor advocate for the SWeTI platform. assets, and deploy Industry 4.0 solutions such as remote monitoring, connected factory, and predictive maintenance.

SWETI PLATFORM COMPONENTS ACKNOWLEDGMENTS We continue to leverage our existing tools and middleware to build Industry 4.0 applications. We acknowledge the fruitful contributions of the SWeTI workshop We plan to use some open source tools for Industry 4.0 (https://bit.ly/2gIbVow) from Eclipse (https://swetiworkshop.wordpress.com) participants and contributors. A productive discus- Foundation. Here, we present our existing open source tools for building Industry 4.0 applica- sion concluded by the workshop inspired some of the ideas presented in this article. This tions. work is partially funded by SFI under grant no. SFI/16/RC/3918 and SFI/12/RC/2289.

IoTSuite IoTSuite (https://bit.ly/2M9FldX) is a framework for prototyping IoT applications by making REFERENCES application development easier by hiding development-related complexity from developers. It 1. I. Grangel-González et al., “The Industry 4.0 Standards Landscape from a Semantic takes platform-independent high-level specifications as inputs, parses them, and generates plat- Integration Perspective,” Proc. 22nd IEEE Int’l Conf. Emerging Technologies and form-specific code that results in a distributed software system collaboratively hosted by hetero- Factory Automation (ETFA), 2017, pp. 1–8. geneous IoT devices. The high-level specification includes specification about sensors, actuators, 2. A. Sheth, “Internet of Things to Smart IoT Through Semantic, Cognitive, and storage devices, and computational component specification as well as deployment specification Perceptual Computing,” IEEE Intelligent Systems, vol. 31, no. 2, 2016, pp. 108–112. that describe device properties. Thus, developers need not concern themselves with the platform- 3. N. Petersen et al., “Monitoring and Automating Factories Using Semantic Models,” and runtime-specific aspects of development. The key characteristics of IoTSuite that make it Proc. 6th Joint Int’l Conf. Semantic Technology (JIST), 2016, pp. 100–115. suitable for our Industry 4.0 projects are its flexibility: 4. P. Patel, M.I. Ali, and A. Sheth, “On Using the Intelligent Edge for IoT Analytics,” IEEE Intelligent Systems, vol. 32, no. 5, 2017, pp. 64–69. 5. D. Miller, “Blockchain and the Internet of Things in the Industrial Sector,” IT • It can generate code for different target programming languages (for example, C, C++, Professional, vol. 20, no. 3, 2018, pp. 15–18. Python, Java) as the code generator for target programming language is exposed as a 6. A. Gyrard et al., “Building the Web of Knowledge with Smart IoT Applications,” plug-in. To generate a new programming language, IoTSuite developers simply write a IEEE Intelligent Systems, vol. 31, no. 5, 2016, pp. 83–88. plug-in to generate code in a target programming language. 7. M.S. Mahdavinejad et al., “Machine Learning for Internet of Things Data Analysis: A • It can plug different runtime systems (for example, MQTT, CoAP, OPC-UA) as it ex- Survey,” Digital Communications and Networks, vol. 4, no. 3, 2017, pp. 161–175. poses well-defined interfaces to plug different runtime systems. IoTSuite developers 8. F. Gao et al., “Automated Discovery and Integration of Semantic Urban Data Streams: simply have to implement runtime specific interfaces to plug a target runtime system. The ACEIS Middleware,” Future Generation Computer Systems, vol. 76, 2017.

+VMZ"VHVTU XXXDPNQVUFSPSHJOUFMJHFOU +VMZ"VHVTUwww.computer.org/computingedge XXXDPNQVUFSPSHJOUFMJHFOU31 IEEE INTELLIGENT SYSTEMS

ABOUT THE AUTHORS Pankesh Patel is a senior research scientist at Fraunhofer USA’s Center for Experimental Software Engineering (CESE). His current focus is implementation of Industry 4.0 techniques and methodologies in commercial environments. Contact him at dr.pankesh.patel@gmail.com. Muhammad Intizar Ali is an adjunct lecturer, research fellow, and research unit leader of the Reasoning, Querying, and IoT Data Analytics Unit at the Insight Centre for Data Ana- lytics, National University of Ireland. His research interests include Semantic Web, Internet of Things, and data analytics. Contact him at [email protected]. Amit Sheth is the LexisNexis Ohio Eminent Scholar and executive director of the Ohio Center of Excellence in Knowledge-Enabled Computing (Kno.e.sis), and a Fellow of IEEE and AAAI. Contact him at [email protected].

This article originally appeared in IEEE Intelligent Systems, vol. 33, no. 4, 2018.

stay on the Cutting Edge of Artificial Intelligence

IEEE January/fEbruary 2016

Also in this issue: IEEE Intelligent Systems provides aI’s 10 to Watch 56

January/FEB real-Time Taxi Dispatching 68 from flu Trends to Cybersecurity 84 ruary IEEE 2016

PUTTING AI INTO PRACTICE peer-reviewed, cutting-edge articles on the theory and Online Beh A vi OrAl AnA OrAl applications of systems that lysis

perceive, reason, learn, and VOL uME 31 nu uME MBE r 1 www.computer.org/intelligent act intelligently. IS-31-01-C1 Cover-1 January 11, 2016 6:06 PM

The #1 AI Magazine

www.computer.org/intelligent IEEE

+VMZ"VHVTU32 ComputingEdge XXXDPNQVUFSPSHJOUFMJHFOUFebruary 2019

FROM THE EDITORS

Artificial Intelligence and IT Professionals

As “self-programming techniques” manifest in the form of artificial intelligence (AI), many are wondering how AI Sunil Mithas will affect IT professionals. For example, some predict that Muma College of Business at AI could reduce the number of jobs for software develop- the University of South ers by 70 percent in India, which accounts for 65 percent of Florida global IT offshore work and 40 percent of IT-enabled busi- 1 Thomas Kude ness process work. ESSEC Business School However, such dire predictions are not new. It is helpful to Jonathan Whitaker recall a similar prediction almost 60 years ago when Her- University of Richmond bert Simon, a Nobel Prize winner sometimes called ‘the Robins School of Business founding father of AI,’ predicted that ‘self-programming techniques’ would lead to the extinction of the computer programming occupation by 1985. Simon noted:2

“…we can dismiss the notion that computer programers [sic] will become a powerful elite in the automated corporation. It is far more likely that the programing occupation will become extinct (through the further development of self-programing techniques) than that it will become all powerful. More and more, computers will program themselves….”

While massive industrial and technical developments—including personal computers in the 1980s; the World Wide Web in the 1990s; outsourcing and offshoring in the 2000s; and social media, mobile computing, and cloud computing in the 2010s—created some peaks and valleys, the computer programming occupation has continued its inexorable growth, belying the initial pessimism. Rather than attempt a blunt prediction of future decades, we approach the question of how AI will affect IT professionals by first identifying the factors that influence the demand for software programmers, then discussing how these factors relate to AI, and finally articulating the likely impact of AI on IT professionals.

FACTORS THAT INFLUENCE THE DEMAND FOR PROGRAMMERS AND IT PROFESSIONALS We begin by delving into what software programmers do and how those activities are affected by technical developments. To ‘program’ is to develop a series of instructions or operations to be performed by a mechanism such as a computer. The personal computer revolution of the 1980s and the advent of the World Wide Web in the 1990s greatly increased the information intensity of industrial activity by allowing numerous occupational tasks to be codified and standardized.3 The ability to access world-class intra-firm efficiencies through the personal computer and inter-

firm efficiencies through the World Wide Web drove demand for software such as enterprise resource planning (ERP) software. In turn, this demand for software accelerated the number of computer science degrees during the mid-1980s and late 1990s, as shown in Figure 1. In this way, the codification and standardization enabled by IT created significant new demand for IT and the software programming profession during the 1980s and 1990s.

Figure 1. Degrees in computer and information sciences conferred by US postsecondary institutions. (Source: Digest of Education Statistics, 2016, US National Center for Education Statistics, https://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2017094)

4FQUFNCFS0DUPCFS34 ComputingEdge XXXDPNQVUFSPSHJUQSPFebruary 2019 IT PROFESSIONAL FROM THE EDITORS

Note that the number of computer and information science bachelor’s degrees surged by 50 per- firm efficiencies through the World Wide Web drove demand for software such as enterprise recent from 2009-2010 to 2014-2015 as concerns about offshoring subsided, an increase far greater source planning (ERP) software. In turn, this demand for software accelerated the number of than the 2 percent increase in business degrees and 30 percent increase in engineering/engineer- computer science degrees during the mid-1980s and late 1990s, as shown in Figure 1. In this ing technologies degrees during the same timeframe. way, the codification and standardization enabled by IT created significant new demand for IT and the software programming profession during the 1980s and 1990s. It is important to note that the impacts of offshoring vary for different types of IT occupations. For example, in our related research we found that information-intensive and high-skill occupations experienced higher employment growth, despite a slight decline in salary growth in the US from 2000-2004, suggesting that many information-intensive service occupations have a tacit component that make them more difficult to relocate offshore. In one of our research papers, we note that total employment for computer and information systems managers (a more tacit and less codifiable occupation) increased 20 percent from 283,480 in 2000 to 341,250 in 2015, while wages increased 76 percent from $80,250 to $141,000 during the same timeframe. In contrast, employment for computer programmers (a more codifiable occupation) declined 45 percent from 2000 to 2015, and wages for computer programmers increased 38 percent from 2000 to 2015, only half the rate of increase for manager positions. More recently, firms are using crowdsourcing or micro-sourcing through platforms such as Amazon Mechanical Turk or upwork.com to outsource or offshore some activities. However, the extent to which these platforms will nega- tively impact the work of software programmers that involves complex workflows remains de- batable.4

While the increase of outsourcing and the advent of offshoring during the 2000s might not have changed the overall demand for software programming, it certainly shifted and reallocated that demand across firms and geographies. Outsourcing and offshoring of software development and other occupations during the past two decades was enabled by higher levels of modularization. Modularization is the decomposition of a product or service into components, such that the components can be performed independently by separate people in different firms or geographies, and later be reintegrated. We find evidence of the impact of modularization on software programmers in developed economies by noting the relatively flat level of employment for software programmers in the US (see Figure 2) and of information and communication technology (ICT) specialists in the European Union during the 2000s. Meanwhile, India’s National Association of Figure 2. Computer programmer employment and wages in the US. (Sources: 1970 data from US Software and Service Companies (NASSCOM) reports that the number of IT and business pro- Census Bureau supplementary report number PC(S1)-32, cess outsourcing (BPO) professionals in India increased almost ten-fold from 430,000 in 2001 to https://www.census.gov/library/publications/1973/dec/population-pc-s1-32.html; 1980 data from US 3,860,000 in 2017. Census Bureau supplementary report number PC80(S1)-8, To place these figures in context, the research firm IDC estimates that the number of global soft- https://www.census.gov/content/dam/Census/library/publications/1983/demo/pc80-s1-8.pdf; 1987- ware development professionals was 11,000,000 in 2014. Combining this IDC estimate with the 1996 data from US General Accounting Office report GAO/HEHS-98-159R, US Census Bureau data in Figure 2 and the NASSCOM data above suggests that about 15 per- https://www.gao.gov/products/HEHS-98-159R; 1997-2017 data from US Bureau of Labor Statistics Occupational Employment Statistics, https://www.bls.gov/oes/tables.htm) cent of software development professionals are in the US, and about 30 percent of software development professionals are in India. In addition to the offshoring of information-intensive activities, another factor that facilitated the Figure 1 shows that the number of bachelor’s degrees in computer and information science de- disaggregation of business processes is the global movement of labor. The cross-border move- clined by 26 percent from 2004-2005 to 2009-2010, but then increased sharply from 2009-2010 ment of IT professionals continues to attract significant debate on immigration, visa issues, and to reach an all-time high in 2014-2015. The sharp decline in computer and information science employment and wages of IT professionals in developed economies.5 Even firms that offshored degrees might have been due to concerns about offshoring because we do not see a decline in and outsourced realized the limits of outsourcing, and progressive firms kept at least some criti- two related fields (business and engineering/engineering technologies) from 2004-2005 to 2009- cal IT capabilities and programmers onshore and in-house for strategic reasons. 2010, where the number of bachelor’s degrees in business increased by 15 percent, and the number of bachelor’s degrees in engineering and engineering technologies increased by 12 percent. The above findings in the context of technological and organizational developments during the 1980s, 1990s, and 2000s can inform the discussion of ‘extinction’ or ‘substitution’ for the soft-

4FQUFNCFS0DUPCFS XXXDPNQVUFSPSHJUQSP 4FQUFNCFS0DUPCFSwww.computer.org/computingedge XXXDPNQVUFSPSHJUQSP35 IT PROFESSIONAL

ware programming occupation, because activities that can be codified, standardized, and modu- larized are also more likely to be automated through AI. While the modularization of software development has reduced the complexity of individual activities, complexity is increased by the need to coordinate work across software teams and integrate individual modules to create a product that is Apple-simple, Google-fast, and SAP-reliable at the same time.6 The complexity of software programming jobs has also increased due to changes in system development methodologies and the rise of agile methods that call for closer collaboration among software developers and customers integrating design thinking and related approaches.7 For example, given the difficulty to elicit precise requirements from clients upfront, agile software development insists on constant customer feedback and collaboration, which might be difficult to achieve There is a need to 8 in a disaggregated work mode. Because of the need for innovation take a more and closer collaboration between software developers and users, firms are realizing the value of investing in their internal technical capabili- thoughtful ties and digital transformation by bringing more software development in-house, sometimes moving toward a hybrid model that includes both perspective on how on-premise and cloud computing.9 The foregoing discussion suggests that Simon’s 1960 prediction about AI is likely to computer programmers becoming extinct needs to be seen in the con- influence the text of major industrial and technical developments. Simon made his prediction before the advent of the personal computer and the World software Wide Web, changes in software development methodologies and the role of IT departments in firms, and many other wider trends in tech- development nology, business, and society.10 As part of these developments, the demand for computer programmers has increased (see Figure 2) and the profession. IT profession, which consisted mostly of code development in the 1960s, has now diversified into many different job descriptions and responsibilities.11 For example, in addition to computer programmers, software developers, and web developers, current US Bureau of Labor Statistics computer occupations include information security analysts, network and computer systems administrators, computer network architects, computer user support specialists, and computer network support specialists.

SOFTWARE DEVELOPMENT IN THE AGE OF ARTIFICIAL INTELLIGENCE While Simon wrote that “more and more, computers will program themselves” during a time of great anticipation for AI, this anticipation did not come to fruition at that time. Now that we are again at a time of enthusiasm based on recent advances in AI and machine learning, there is a need to take a more thoughtful perspective on how the factors discussed above fit into AI, and on how AI is likely to influence the software development profession. We consider two different roles of AI for software development: (1) AI as a tool to program software, and (2) AI as the software itself—sometimes referred to as Software 2.0.12 Making some cautious but informed predictions, it is likely that both of these roles will be relevant for software development in the future, and that there will be a place for human software developers in both of these AI roles. The first role of AI as a tool to program software means that AI directly writes program code or indirectly helps human programmers to write program code—in the sense of instructions for computers. Consequently, the tasks of human software developers will follow the general trajectory of automation and outsourcing, where high-level tasks carried out by human programmers will move to even higher levels of value creation, while lower-level programming tasks will increasingly be performed by AI.13 This is in line with earlier conjectures that emerging technologies can destroy some jobs, and in this case, it will replace jobs that involve lower-level

4FQUFNCFS0DUPCFS36 ComputingEdge XXXDPNQVUFSPSHJUQSPFebruary 2019 IT PROFESSIONAL FROM THE EDITORS

programming. Thus, AI will substitute for humans by simplifying the entire job (robots replacing ware programming occupation, because activities that can be codified, standardized, and modu- workers) or substituting some activities within a job that are amenable to rule-based logic (simi- larized are also more likely to be automated through AI. lar to automated teller machines taking over some functions of a human teller). These trends While the modularization of software development has reduced the complexity of individual ac- have been underway for some time and are already visible across industries. tivities, complexity is increased by the need to coordinate work across software teams and inte- However, technologies also create new jobs (for example, data scientists), change the mix of jobs grate individual modules to create a product that is Apple-simple, Google-fast, and SAP-reliable in the economy, and alter the nature of activities within jobs.14 For example, AI might comple- at the same time.6 The complexity of software programming jobs has also increased due to ment humans in jobs that require pattern recognition or case-based reasoning. In the case of soft- changes in system development methodologies and the rise of agile methods that call for closer ware development, recent discussions on the role of AI suggest that AI assistance might help collaboration among software developers and customers integrating human programmers avoid errors and strategic mistakes when coding.15 For example, AI could design thinking and related approaches.7 act as a pair programming partner, reducing the resource needs for the established agile practice For example, given the difficulty to elicit precise requirements from of pair programming. Agile practices could also be useful in test-driven development, where hu- clients upfront, agile software development insists on constant cus- mans could focus on writing test cases and AI would create code that satisfies the test.15 In this tomer feedback and collaboration, which might be difficult to achieve There is a need to way, software development would be conducted through human–machine interaction.16 Going 8 in a disaggregated work mode. Because of the need for innovation take a more beyond software development, other occupations that are menial and/or prone to error, and there- and closer collaboration between software developers and users, firms fore good candidates for AI-enabled displacement, include cashiers, laboratory technicians, ac- are realizing the value of investing in their internal technical capabili- thoughtful countants, auditors, and tax preparers. ties and digital transformation by bringing more software development The second role of AI as the software itself suggests that we would not use traditional program- in-house, sometimes moving toward a hybrid model that includes both perspective on how ming code—in the sense of instructions as to what the computer should do—but would replace on-premise and cloud computing.9 program code with AI (Software 2.0).12 However, we believe that traditional program code will The foregoing discussion suggests that Simon’s 1960 prediction about AI is likely to continue to be relevant. For example, Brynjolfsson and Mitchell17 suggest that AI/Software 2.0 is computer programmers becoming extinct needs to be seen in the con- influence the particularly useful for stable tasks. Thus, traditional programming might still be needed in more text of major industrial and technical developments. Simon made his dynamic environments such as the case of frequently changing customer requests or the context prediction before the advent of the personal computer and the World software of more exploratory research projects. Wide Web, changes in software development methodologies and the It seems feasible that some code will be replaced by AI and that new problems will be addressed role of IT departments in firms, and many other wider trends in tech- development through AI instead of traditional code. For example, traditional instructions could be replaced by nology, business, and society.10 As part of these developments, the de- the weights in a neural network. We see this already in the context of translation services, speech mand for computer programmers has increased (see Figure 2) and the profession. recognition, and video gaming.12 But even in such a context, we would likely still need software IT profession, which consisted mostly of code development in the developers. The work of software developers might shift away from traditional coding and to- 1960s, has now diversified into many different job descriptions and re- ward designing and developing the architecture that brings together AI modules to solve a prob- sponsibilities.11 For example, in addition to computer programmers, lem, to development tasks related to data governance, and/or to activities requiring judgment software developers, and web developers, current US Bureau of Labor rather than activities requiring rule-based decisions. Furthermore, the omnipresence of AI in Statistics computer occupations include information security analysts, network and computer ubiquitous or experiential computing18 will create a continuing need for software developers. Re- systems administrators, computer network architects, computer user support specialists, and latedly, recent efforts to create explainable AI (XAI) or to address ethical questions related to AI, computer network support specialists. such as bias and discrimination, will likely continue to require software programmers.19 To further explore the potential impacts of AI on various IT occupations, Table 1 shows US Bu- SOFTWARE DEVELOPMENT IN THE AGE OF reau of Labor Statistics 2016-2017 data for the average wage and current number of positions, and 2016-2026 projections for the growth of each occupation. These seven IT occupations each ARTIFICIAL INTELLIGENCE include at least 100,000 employees, and reasonably represent the range of average wages among While Simon wrote that “more and more, computers will program themselves” during a time of IT professionals. great anticipation for AI, this anticipation did not come to fruition at that time. Now that we are The Bureau of Labor Statistics classifies the growth of each occupation into one of four catego- again at a time of enthusiasm based on recent advances in AI and machine learning, there is a ries: decline, average growth (5 to 9 percent), faster than average growth (10 to 14 percent), and need to take a more thoughtful perspective on how the factors discussed above fit into AI, and on much faster than average growth (15 percent and up). The projected growth for each IT occupa- how AI is likely to influence the software development profession. tion provides some clues for how AI could affect various IT occupations, considering these Bu- We consider two different roles of AI for software development: (1) AI as a tool to program soft- reau of Labor Statistics projections at face value. ware, and (2) AI as the software itself—sometimes referred to as Software 2.0.12 Making some The projection that software developer (applications) and web developer occupations are ex- cautious but informed predictions, it is likely that both of these roles will be relevant for software pected to grow much faster than average from 2016-2026 suggests that AI is expected to comple- development in the future, and that there will be a place for human software developers in both ment (not displace) traditional programming over the next decade. Similarly, the much faster of these AI roles. than average growth projection for information security analysts suggests that AI could create The first role of AI as a tool to program software means that AI directly writes program code or demand for IT professionals who can address both cybersecurity and privacy considerations indirectly helps human programmers to write program code—in the sense of instructions for when bad actors use AI capabilities to design more sophisticated attacks.20 computers. Consequently, the tasks of human software developers will follow the general trajectory of automation and outsourcing, where high-level tasks carried out by human programmers will move to even higher levels of value creation, while lower-level programming tasks will increasingly be performed by AI.13 This is in line with earlier conjectures that emerging technologies can destroy some jobs, and in this case, it will replace jobs that involve lower-level

4FQUFNCFS0DUPCFS XXXDPNQVUFSPSHJUQSP 4FQUFNCFS0DUPCFSwww.computer.org/computingedge XXXDPNQVUFSPSHJUQSP37 IT PROFESSIONAL

Table 1. 10-year projected growth for various IT occupations. Source: Digest of Education Statistics, 2016, US National Center for Education Statistics, available at https://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2017094.

Position 2016-2017 2016-2017 2016-2026 wages employment projected employment growth Software developers (appli- $101,790 831,000 Much faster than average cations) (15%+) Web developers $67,990 163,000 Much faster than average (15%+) Information security analysts $95,510 100,000 Much faster than average (15%+) Computer user support spe- $50,210 637,000 Faster than average cialists (10-14%) Database administrators $87,020 120,000 Faster than average (10-14%) Network and computer sys- $81,100 391,000 Average tems administrators (5-9%) Computer network support $62,340 199,000 Average specialists (5-9%) Computer network architects $104,650 163,000 Average (5-9%)

The projection that the computer user support specialist occupation is expected to grow faster than average also suggests a complementary role for AI over the next decade, as additional applications and functionality will draw additional users. Similarly, the need for effective management of additional data that results from additional applications and functionality is consistent with the projection that the database administrator occupation is expected to grow faster than average, reinforcing a complementary role for AI over the next decade. However, the projection that the occupations of network and computer systems administration, computer network support specialists, and computer network architects will grow at only an average rate suggests that AI is expected to automate computation-intensive tasks such as managing the flow of network traffic. It will be interesting to examine the extent to which these projections map to reality as capabilities of AI unfold over time and reveal the extent to which AI complements or substitutes activities in IT occupations.

CONCLUSION AI will bring many changes to the IT profession. While it is difficult to predict precisely how these changes will unfold, just as it was difficult for Simon to predict six decades ago, there are reasons to believe that software development will change and that with appropriate investments in human capital, software programmers should be able to respond to the changes in technologies and customer needs.21 For computer science students, we do not expect any major short-term changes in curricula as students still need to learn the basics of computer programming. However, over time we expect that more entry-level computer programming concepts will trickle down into high school curricula and coding boot camps, and more advanced concepts such as AI and machine learning will extend beyond computer information and science degrees into other majors such as business and

4FQUFNCFS0DUPCFS38 ComputingEdge XXXDPNQVUFSPSHJUQSPFebruary 2019 IT PROFESSIONAL FROM THE EDITORS

the natural sciences. On the whole, there are reasons to be optimistic about the future of software Table 1. 10-year projected growth for various IT occupations. Source: Digest of Education programmers and IT professionals because the seamless integration of “human and computer in- Statistics, 2016, US National Center for Education Statistics, available at telligence to solve interesting and important problems that impact the future of work, organiza- https://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2017094. tions, and broader society” will continue the high demand for their talents and creativity.22 Position 2016-2017 2016-2017 2016-2026 wages employment projected employment growth REFERENCES Software developers (appli- $101,790 831,000 Much faster than average (15%+) 1. V. Ganesh, “Automation to Kill 70% of IT Jobs,” Hindu BusinessLine, blog, cations) November 2017; https://www.thehindubusinessline.com/info-tech/automation-to-kill- Web developers $67,990 163,000 Much faster than average 70-of-it-jobs/article9960555.ece. 2. H.A. Simon, “The Corporation: Will It Be Managed by Machines?,” Management and (15%+) the Corporations, M.L. Anshen and G.L. Bach, McGraw Hill, 1960. Information security analysts $95,510 100,000 Much faster than average 3. S. Mithas and J. Whitaker, “Is the World Flat or Spiky? Information Intensity, Skills (15%+) and Global Service Disaggregation,” Information Systems Research, vol. 18, no. 3, 2007, pp. 237–259. Computer user support spe- $50,210 637,000 Faster than average 4. D. Retelny, M.S. Bernstein, and M.A. Velentine, “No Workflow Can Ever Be Enough: cialists (10-14%) How Crowdsourcing Workflows Constrain Complex Work,” Proc. ACM Human- Computer Interaction, 2017. Database administrators $87,020 120,000 Faster than average 5. S. Mithas and H.C. Lucas, “Are Foreign IT Workers Cheaper? U.S. Visa Policies and (10-14%) Compensation of Information Technology Professionals,” Management Science, vol. 56, no. 5, 2010, pp. 745–765. Network and computer sys- $81,100 391,000 Average 6. S. Earley et al., “From BYOD to BYOA, Phishing, and Botnets,” IT Professional, vol. tems administrators (5-9%) 16, no. 5, 2014, pp. 16–18. 7. C.T. Schmidt et al., “How Agile Practices Influence the Performance of Software Computer network support $62,340 199,000 Average Development Teams: The Role of Shared Mental Models and Backup,” Proceedings of specialists (5-9%) the 34th International Conference on Information Systems, 2014. 8. K. Schwaber and M. Beedle, Agile Software Development with Scrum, Prentice Hall, Computer network architects $104,650 163,000 Average 2001. (5-9%) 9. J. Bennett, “Why GM Hired 8,000 Programmers,” The Wall Street Journal, blog, February 2015; http://www.wsj.com/articles/gm-built-internal-skills-to-manage- internet-sales-push-1424200731?KEYWORDS=why+GM+hired. 10. S. Mithas and F.W. McFarlan, “What Is Digital Intelligence?,” IT Professional, vol. The projection that the computer user support specialist occupation is expected to grow faster 19, no. 4, 2017, pp. 3–6. than average also suggests a complementary role for AI over the next decade, as additional appli- 11. R. Moncarz, “Training for Techies: Career Preparation in Information Technology,” cations and functionality will draw additional users. Similarly, the need for effective manage- Occupational Outlook Quarterly, vol. 46, no. 3, 2002, pp. 38–45. ment of additional data that results from additional applications and functionality is consistent 12. A. Karpathy, “Software 2.0,” Medium, blog, November 2017; with the projection that the database administrator occupation is expected to grow faster than av- https://medium.com/@karpathy/software-2-0-a64152b37c35. erage, reinforcing a complementary role for AI over the next decade. However, the projection 13. P. Smith, “SAP Founder and CEO Say Governments Must Act on AI Challenge as that the occupations of network and computer systems administration, computer network support Google Lays Out Core Principles,” The Australian Financial Review, June 2018; specialists, and computer network architects will grow at only an average rate suggests that AI is https://www.afr.com/technology/sap-founder-and-ceo-say-governments-must-act-on- expected to automate computation-intensive tasks such as managing the flow of network traffic. ai-challenge-as-google-lays-out-core-principles-20180609-h116c4. It will be interesting to examine the extent to which these projections map to reality as capabili- 14. F. Levy and R.J. Murnane, The New Division of Labor: How Computers are Creating ties of AI unfold over time and reveal the extent to which AI complements or substitutes activi- The Next Job Market, Russell Sage Foundation, 2004. ties in IT occupations. 15. I. Huston, “AI Is Not the End of Software Developers: A Data Scientist’s Take on Software 2.0.,” Built to Adapt, blog, January 2018; https://builttoadapt.io/ai-is-not-the- end-of-software-developers-28d80df3c331. 16. B. McDermott, “Machines Can’t Dream,” SAP, January 2018; CONCLUSION https://news.sap.com/2018/01/impact-of-artificial-intelligence-machines-cant-dream. AI will bring many changes to the IT profession. While it is difficult to predict precisely how 17. E. Brynjolfsson and T. Mitchell, “What Can Machine Learning Do? Workforce these changes will unfold, just as it was difficult for Simon to predict six decades ago, there are Implications,” Science, vol. 358, no. 6370, 2017, pp. 1530–1534. reasons to believe that software development will change and that with appropriate investments 18. Y. Yoo, “Computing in Everyday Life: A Call for Research on Experiential in human capital, software programmers should be able to respond to the changes in technolo- Computing,” MIS Quarterly, vol. 34, no. 2, 2010, pp. 213–231. gies and customer needs.21 19. G. Nott, “‘Explainable Artificial Intelligence’: Cracking Open the Black Box of AI,” Computer World, April 2017. For computer science students, we do not expect any major short-term changes in curricula as 20. I. Bojanova et al., “Cybersecurity or Privacy,” IT Professional, vol. 18, no. 5, 2016, students still need to learn the basics of computer programming. However, over time we expect pp. 16–17. that more entry-level computer programming concepts will trickle down into high school curric- 21. S. Murugesan, “Stay Professionally Fit, Always,” IT Professional, vol. 19, no. 6, 2017, ula and coding boot camps, and more advanced concepts such as AI and machine learning will pp. 4–7. extend beyond computer information and science degrees into other majors such as business and

4FQUFNCFS0DUPCFS XXXDPNQVUFSPSHJUQSP 4FQUFNCFS0DUPCFSwww.computer.org/computingedge XXXDPNQVUFSPSHJUQSP39 IT PROFESSIONAL

22. H. Jain et al., “Special Issue of Information Systems Research-Humans, Algorithms, and Augmented Intelligence: The Future of Work, Organizations, and Society,” Information Systems Research, vol. 29, no. 1, 2018, pp. 250–251.

ABOUT THE AUTHORS Sunil Mithas is a world-class scholar and professor at the Muma College of Business at the University of South Florida. His research interests include strategies for managing innovation and excellence for corporate transformation, focusing on the role of technology and other intangibles. Mithas is the author of the books Digital Intelligence: What Every Smart Manager Must Have for Success in an Information Age (Finerplanet, 2016) and Dancing Elephants and Leaping Jaguars: How to Excel, Innovate, and Transform Your Organiza- tion the Tata Way (2014). He is a member of IT Professional’s editorial board. Contact him at [email protected]. Thomas Kude is an associate professor at ESSEC Business School in France. His current research focuses on digital ecosystems, agile software development, and IT management. In his research, Kude regularly works with companies in the software industry and beyond. Kude received a PhD from the University of Mannheim in Germany, and his work has been published in renowned academic journals and presented at international conferences. Con- tact him at [email protected]. Jonathan Whitaker is an associate professor at the University of Richmond Robins School of Business. Prior to his academic career, he worked as a technology consultant with Price Waterhouse and A.T. Kearney. He earned an MBA from the University of Chicago and a PhD from the University of Michigan, and his research has been published in leading academic journals and profiled in the Wall Street Journal and MIT Sloan Management Review. Contact him at [email protected].

This article originally appeared in IT Professional, vol. 20, no. 5, 2018.

4FQUFNCFS0DUPCFS40 ComputingEdge XXXDPNQVUFSPSHJUQSPFebruary 2019

DEPARTMENT: Diversity and Inclusion

A Different Lens on Diversity and Inclusion:

Creating Research Opportunities for Small Liberal Arts Colleges

Much of the focus on diversity originates from the charac- Wendi K. Sapp teristics of an individual—race, gender, socioeconomic sta- Oak Ridge National Laboratory tus, and so on. However, when we consider the concept of diversity from another angle, what do we see? What are the Mary Ann Leung fundamental reasons for considering diversity? In the Sustainable Horizons world of science, technology, engineering, and mathemat- Institute ics (STEM) research, a magnifying glass has been placed Editor: on the inequities among researchers. We have all heard of Mary Ann Leung, organizations or movements promoting women and minor- [email protected] ities to pursue STEM careers or providing support for those already in the profession. However, there is an additional facet to the idea of underrepresented groups in STEM: small liberal arts colleges. Affiliation with small liberal arts colleges might translate to decreased opportunities. Nevertheless, some programs attempt to bridge this gap.

INFLUENCES OF DIVERSITY Standard definitions of diversity (from the perspective of protected classes of individuals) have historically centered around a person’s race or ethnicity. More recently, the focus has grown to include socioeconomic status, religion, sex, and age.1 Beyond the perspective of protected classes, a new emphasis has been made on the productivity of diverse groups.2 When a diverse group of people comes together to solve a problem, unique ideas and solutions are created. In the world of STEM, this kind of broad-minded perspective is required to tackle the multifaceted scientific issues that exist today. This is especially true when it comes to computational science and engineering, an innately multidisciplinary field. Another aspect of diversity that is rarely explored is the individual’s association with an academic institution, which can affect student and faculty opportunities.3 For example, if an undergraduate student wishes to perform medical research, she might find it easier to do so at a university that includes a medical school. Otherwise, that student might need to look for outside opportunities, perhaps as a summer intern, which could be quite competitive and present barriers such as family responsibilities, money for travel, and time commitments.4 Similarly, a member of the faculty might be looking to gather information and data to draw some conclusions about his classroom teaching style, but might not have the luxury of time to devote to it because his emphasis is on performing research in his field. Perhaps if the student and the professor had access to different resources, both could accomplish their goals within their institutions.

Computing in Science & Engineering Copublished by the IEEE CS and the AIP July/August2469-7087/19/$33.00 2018 © 2019 IEEE Published by the IEEE Computer90 Society February1521-9615/18/$33 2019 ©2018 IEEE41 COMPUTING IN SCIENCE & ENGINEERING

However, no one institution can meet everyone’s needs. There is not a single university that holds all possible resources. Academic institutions focus on aspects of higher learning, which can broadly be boiled down to two categories: creating new knowledge, and imparting knowledge. In these categories is where we find the distinction between research universities and small liberal arts institutions. Granted, other academic institutions might fall into these categories too, such as community colleges, but the focus of this text will be on the aforementioned. Some of us might be familiar with the differences between research universities and small liberal arts colleges. Probably the most well-known difference between these two institutions is the way that faculty are expected to spend their time. At a large research university, the professors primarily focus on training graduate students, performing research, and publishing the results in journals. This kind of activity contributes knowledge. In contrast, small liberal arts professors spend more time teaching courses and mentoring students.5 Many professors engage their students in research activities, even if resources might be limited. Professors who are employed by small liberal arts colleges typically choose to do so because they enjoy teaching and having a closer relationship with students, which is made possible, in part, by the small class sizes.6 Faculty at small liberal arts colleges understand the importance of undergraduate research experiences. However, developing research projects for undergraduate students is not a simple task. Small liberal arts colleges, despite some of the resource challenges, successfully produce gradu- ates who will later obtain advanced degrees in research fields. In fact, Cech found evidence of this while analyzing the National Science Foundation’s Survey of Earned Doctorates data.7 Cech states, “Only about 8 percent of students who attend four-year colleges or universities are en- rolled in baccalaureate colleges (a category that includes national liberal arts colleges). Among the students who obtain PhDs in science, 17 percent received their undergraduate degree at a baccalaureate college. Thus, these colleges are about twice as productive as the average institution in training eventual PhDs.”8

SMALL LIBERAL ARTS COLLEGES ARE UNDERREPRESENTED Professors, students, and staff from small liberal arts colleges might be at a disadvantage when it comes to national grants, award programs, research opportunities, and access to educational tools and resources. When applications are made to nationally competitive programs, applications that are affiliated with larger institutions could appear stronger because of their lengthy list of previ- ous scientific activities. These individuals and groups from larger institutions have probably had more opportunities historically than applicants from small liberal arts colleges. At a small liberal arts college in Frederick, Maryland, Dr. Xinlian Liu enjoys the mentoring relationships he can build with his students. Like many professors who decide to work at small liberal arts colleges, finding opportunities for his students and himself is often tricky. The time allotted to teaching and spending time with students can reduce the amount of time left for research. This, paired with the added difficulty of accessing supercomputers and other computing resources, causes Xinlian to fear that he will only be able to produce students who can read about technology in class but not be able to get their hands on it. He enjoys teaching courses on high- performance computing (HPC) but states “there are a [lack of] resources available to support curricula such as [HPC].” Upon considering the field of HPC, he feels that students who attend small liberal arts colleges are “among the least represented in the field.” To improve his students’ awareness of computational education, Dr. Liu volunteers as an XSEDE Campus Cham- pion, has founded an Nvidia CUDA Education Center, and mentors Blue Waters undergraduate interns.

July/August42 2018 ComputingEdge 91 www.computer.org/ciseFebruary 2019 COMPUTING IN SCIENCE & ENGINEERING DIVERSITY AND INCLUSION

However, no one institution can meet everyone’s needs. There is not a single university that OPPORTUNITIES TO ENGAGE WITH SMALL holds all possible resources. Academic institutions focus on aspects of higher learning, which LIBERAL ARTS COLLEGES can broadly be boiled down to two categories: creating new knowledge, and imparting knowledge. In these categories is where we find the distinction between research universities and Faculty at small liberal arts colleges and large research institutions performed the same duties as small liberal arts institutions. Granted, other academic institutions might fall into these categories graduate students. These individuals begin with similar experiences, but as they advance, their too, such as community colleges, but the focus of this text will be on the aforementioned. strengths, accomplishments, and preferences often shape careers. After graduation, some individuals might pursue positions that emphasize teaching and administration, while others might seek Some of us might be familiar with the differences between research universities and small liberal out posts that emphasize creating new knowledge and significant grant-writing. Ultimately, fac- arts colleges. Probably the most well-known difference between these two institutions is the way ulty, regardless of the institution at which they work, have a similar skill set. Additionally, there that faculty are expected to spend their time. At a large research university, the professors pri- is an inherent curiosity that often accompanies those involved in STEM fields. This curiosity marily focus on training graduate students, performing research, and publishing the results in does not disappear merely because an individual no longer performs research. journals. This kind of activity contributes knowledge. In contrast, small liberal arts professors spend more time teaching courses and mentoring students.5 Many professors engage their stu- Encouraging faculty from small liberal arts colleges to pursue research collaboration opportuni- dents in research activities, even if resources might be limited. Professors who are employed by ties can help improve the quality of research projects by providing fresh perspectives and new small liberal arts colleges typically choose to do so because they enjoy teaching and having a skills. Additionally, these professors can include their students in the research activities. closer relationship with students, which is made possible, in part, by the small class sizes.6 Dr. Liu’s efforts to encourage students to consider careers in HPC had a lasting effect on two of Faculty at small liberal arts colleges understand the importance of undergraduate research his students in the summer of 2017. The Sustainable Research Pathways (SRP) program—a part- experiences. However, developing research projects for undergraduate students is not a simple nership between the Sustainable Horizons Institute (SHI) and the Lawrence Berkeley National task. Laboratory (LBNL)—was looking for faculty and students from college and universities to be Small liberal arts colleges, despite some of the resource challenges, successfully produce gradu- matched with researchers at LBNL. Dr. Liu jumped at the opportunity to pursue research collab- ates who will later obtain advanced degrees in research fields. In fact, Cech found evidence of orations with scientists at LBNL. this while analyzing the National Science Foundation’s Survey of Earned Doctorates data.7 Cech During the program’s first three years, it cultivated more than 30 new research collaborations. states, “Only about 8 percent of students who attend four-year colleges or universities are en- The program begins with a matching workshop. At this one-day event, faculty showcase their rolled in baccalaureate colleges (a category that includes national liberal arts colleges). Among research projects during a poster session while researchers from the laboratories deliver presenta- the students who obtain PhDs in science, 17 percent received their undergraduate degree at a tions about their work. Afterward, the researchers and faculty participate in speed meetings to baccalaureate college. Thus, these colleges are about twice as productive as the average institu- explore matches. Once matches are made, plans are hatched to begin the summer research expe- tion in training eventual PhDs.”8 rience. Overall, 87 percent of workshop attendees match with staff, indicating a high degree of mutual interest in collaboration. SMALL LIBERAL ARTS COLLEGES ARE Among past workshop attendees, 9 percent of faculty have come from liberal arts colleges (see Figure 1). “The goal of the SRP Program is to not only support faculty in building research col- UNDERREPRESENTED laborations, but also to develop a pipeline of diverse candidates through students who partici- Professors, students, and staff from small liberal arts colleges might be at a disadvantage when it pate.”9 During the summer research program, students accompany faculty who have been comes to national grants, award programs, research opportunities, and access to educational tools matched. So far, 40 students have participated in the SRP program. and resources. When applications are made to nationally competitive programs, applications that are affiliated with larger institutions could appear stronger because of their lengthy list of previ- ous scientific activities. These individuals and groups from larger institutions have probably had more opportunities historically than applicants from small liberal arts colleges. At a small liberal arts college in Frederick, Maryland, Dr. Xinlian Liu enjoys the mentoring relationships he can build with his students. Like many professors who decide to work at small liberal arts colleges, finding opportunities for his students and himself is often tricky. The time allotted to teaching and spending time with students can reduce the amount of time left for research. This, paired with the added difficulty of accessing supercomputers and other computing resources, causes Xinlian to fear that he will only be able to produce students who can read about technology in class but not be able to get their hands on it. He enjoys teaching courses on high- performance computing (HPC) but states “there are a [lack of] resources available to support curricula such as [HPC].” Upon considering the field of HPC, he feels that students who attend small liberal arts colleges are “among the least represented in the field.” To improve his students’ awareness of computational education, Dr. Liu volunteers as an XSEDE Campus Cham- pion, has founded an Nvidia CUDA Education Center, and mentors Blue Waters undergraduate interns.

Figure 1. The composition of the SRP workshop attendees’ home institutions.

July/August 2018 91 www.computer.org/cise July/Augustwww.computer.org/computingedge 2018 92 www.computer.org/cise43 COMPUTING IN SCIENCE & ENGINEERING

In 2017, after the initial matching event, Dr. Liu learned that he matched with researcher Dr. Sil- via Crivelli, and that two students would accompany him to LBNL that summer to work on the challenging problem of applying deep learning algorithms to 3D structures. “Throughout the summer, Dr. Crivelli involved me in several projects she was working on,” Dr. Liu said of the intensive 10-week summer research program. He is sure that the experience has revived his aptitude for performing cutting-edge research. Dr. Crivelli was glad to have Xinlian’s machine learning and convolutional neural networks expertise in the lab. Furthermore, Dr. Crivelli observed of Xinlian that “he seemed to have a passion for research and he doesn’t have many opportunities to pursue it as a professor in a small liberal arts college. I thought the lab in- ternship would be a great opportunity for him to get back to research.” She was optimistic that the summer would be productive and fruitful—and she was right. After a summer of rigorous research activities, the team’s hard work paid off when their poster was accepted to the Super- computing Conference (SC17). Tom Corcoran, one of Dr. Liu’s students, says this about the success, “Having my hard work publicly validated and recognized by being accepted into such a prestigious conference will un- doubtedly improve my chances of being able to conduct similar work in the future, and to be accepted into a strong graduate program at a good school.” Rafael Zamora-Resendiz, the second student to accompany Dr. Liu to LBNL, also understands the implications of his involvement in a program such as SRP. He believes the SRP program helped to provide a platform for small liberal arts colleges: “I hope our project communicates that students from smaller institutions, like Hood College, are very capable of producing competitive work.” Both students are planning to pursue their heightened interest in HPC in graduate school. This is a trajectory that neither of them would have considered for themselves before this experience. Tom and Rafael are first-generation scholars who lacked confidence that their applications to graduate school would be competitive. In fact, more than 50 percent of the SRP applicant’s students were first-generation scholars. Like most first-generation students, attending college and obtaining a bachelor’s degree can be daunting. Indeed, only 24 percent of first-generation students graduate with a degree after eight years.10 But now, with a prestigious poster acceptance to SC17 and advanced research experience, not only are Rafael and Tom completing their bachelor’s degrees, they are well on their way to graduate school and rewarding careers.

CONCLUSION After experiencing the success of a program such as SRP, Dr. Liu maintains a hopeful and enthu- siastic attitude toward the ongoing research collaborations that resulted from the program. He is excited about “continued work with Dr. Crivelli for years to come.” The SRP workshop events helped to ensure a good match between participants. Xinlian believes this was “the foundation of everything else.” The opportunity for research collaborations to be made with faculty at small liberal arts institutions will bestow a lasting influence on the students who took part, which leads to greater diversity in STEM research fields. The impact that the SRP Program has had on Dr. Liu and his students is a lasting one. In addition to the direct benefit to the researchers in SRP, visiting faculty often extend the impact of the program by using their research experience in the classroom at their home institutions, co-publish papers, and present findings as well as continue their collaborations at the laboratory during sub- sequent summers with a new group of students.

ACKNOWLEDGMENTS The SRP Program is funded through Lawrence Berkeley National Laboratory by Office of Advanced Scientific Computing Research, US Department of Energy under contract no. DE-AC02-05CH11231.

July/August44 2018 ComputingEdge 93 www.computer.org/ciseFebruary 2019 COMPUTING IN SCIENCE & ENGINEERING DIVERSITY AND INCLUSION

In 2017, after the initial matching event, Dr. Liu learned that he matched with researcher Dr. Sil- via Crivelli, and that two students would accompany him to LBNL that summer to work on the REFERENCES challenging problem of applying deep learning algorithms to 3D structures. 1. T. Haring-Smith, “Broadening Our Definition of Diversity,” Lib. Educ., vol. 98, 2012, pp. 6–13. “Throughout the summer, Dr. Crivelli involved me in several projects she was working on,” Dr. 2. E. Ostrom, “The Difference: How the Power of Diversity Creates Better Groups, Liu said of the intensive 10-week summer research program. He is sure that the experience has Firms, Schools, and Societies,” Perspect. Polit., vol. 6, 2008, pp. 828–829. revived his aptitude for performing cutting-edge research. Dr. Crivelli was glad to have Xinlian’s 3. M. Clarke, “The Impact of Higher Education Rankings on Student Access, Choice, and machine learning and convolutional neural networks expertise in the lab. Furthermore, Dr. Opportunity,” High. Educ. Eur., vol. 32, 2007, pp. 59–70. Crivelli observed of Xinlian that “he seemed to have a passion for research and he doesn’t have 4. M. Falasca, “Barriers to Adult Learning: Bridging the Gap,” Aust. J. Adult Learn., vol. many opportunities to pursue it as a professor in a small liberal arts college. I thought the lab in- 51, 2011, pp. 583–590. ternship would be a great opportunity for him to get back to research.” She was optimistic that 5. K.A. Feldman, “Class Size and College Students’ Evaluations of Teachers and the summer would be productive and fruitful—and she was right. After a summer of rigorous Courses: A Closer Look,” Res. High. Educ., vol. 21, 1984, pp. 45–116. research activities, the team’s hard work paid off when their poster was accepted to the Super- 6. I. Puri, “Don’t Overlook Liberal Arts Schools: Small Class Size and Access to computing Conference (SC17). Faculty,” Huffington Post, 21 June 2016; www.huffingtonpost.com/ishan-puri/dont- overlook-liberal-art_b_10574942.html. Tom Corcoran, one of Dr. Liu’s students, says this about the success, “Having my hard work 7. T.B. Hoffer et al., “Doctorate Recipients from United States Universities: Summary publicly validated and recognized by being accepted into such a prestigious conference will un- Report 2000,” Survey of Earned Doctorates, report, Nat'l Science Foundation, 2001. doubtedly improve my chances of being able to conduct similar work in the future, and to be ac- 8. T.R. Cech, “Science at Liberal Arts Colleges: A Better Education?,” Daedalus, vol. cepted into a strong graduate program at a good school.” Rafael Zamora-Resendiz, the second 128, 1999, pp. 195–216. student to accompany Dr. Liu to LBNL, also understands the implications of his involvement in 9. H. Haskell and M.A. Leung, “Changing Lives and Building a Pipeline through a program such as SRP. He believes the SRP program helped to provide a platform for small lib- Sustainable Research Pathways,” Sustainable Horizons Institute, 2017; eral arts colleges: “I hope our project communicates that students from smaller institutions, like doi.org/shinstitute.org/changing-lives-and-building-a-pipeline-through-srp. 10. X. Chen and C.D. Carroll, “First-Generation Students in Postsecondary Education: A Hood College, are very capable of producing competitive work.” Look at Their College Transcripts,” Nat'l Center for Education Statistics, government Both students are planning to pursue their heightened interest in HPC in graduate school. This is report NCES 2005–171, July 2005. a trajectory that neither of them would have considered for themselves before this experience. Tom and Rafael are first-generation scholars who lacked confidence that their applications to graduate school would be competitive. In fact, more than 50 percent of the SRP applicant’s students were first-generation scholars. Like most first-generation students, attending college and obtaining a bachelor’s degree can be daunting. Indeed, only 24 percent of first-generation stu- ABOUT THE AUTHORS dents graduate with a degree after eight years.10 But now, with a prestigious poster acceptance to Wendi K. Sapp is a contracted systems engineer and technical writer in the National Cen- SC17 and advanced research experience, not only are Rafael and Tom completing their bache- ter for Computational Sciences at Oak Ridge National Laboratory. Contact her at lor’s degrees, they are well on their way to graduate school and rewarding careers. [email protected]. Mary Ann Leung is president and founder of the Sustainable Horizons Institute. Contact her at [email protected]. CONCLUSION After experiencing the success of a program such as SRP, Dr. Liu maintains a hopeful and enthu- siastic attitude toward the ongoing research collaborations that resulted from the program. He is excited about “continued work with Dr. Crivelli for years to come.” The SRP workshop events helped to ensure a good match between participants. Xinlian believes this was “the foundation of everything else.” The opportunity for research collaborations to be made with faculty at small This article originally appeared in liberal arts institutions will bestow a lasting influence on the students who took part, which leads Computing in Science & Engineering, to greater diversity in STEM research fields. vol. 20, no. 4, 2018. The impact that the SRP Program has had on Dr. Liu and his students is a lasting one. In addition to the direct benefit to the researchers in SRP, visiting faculty often extend the impact of the program by using their research experience in the classroom at their home institutions, co-publish papers, and present findings as well as continue their collaborations at the laboratory during sub- sequent summers with a new group of students.

July/August 2018 93 www.computer.org/cise July/Augustwww.computer.org/computingedge 2018 94 www.computer.org/cise45 THE IOT CONNECTION

P2PLoc: Peer-to- Peer Localization of Fast-Moving Entities

Ashutosh Dhekne, Umberto J. Ravaioli, and Romit Roy Choudhury, University of Illinois at Urbana-Champaign

PPLoc envisions wearable Internet of ocalization has been exten- sively studied in various Things devices that compute the relative contexts, both indoor and outdoor. Yet, emerging ap- positions of each user, resulting in a topology Lplications continue to ask for new requirements that challenge exist- or conﬁ guration of mobile users that can ing localization mechanisms. For in- be tracked in real time for group-motion stance, a team of soccer or basketball players might seek their precise po- applications. sitions during a game—this is valuable to coaching and sports analytics applications. As another example, a swarm of wirelessly connected IoT FROM THE EDITOR drones carrying chemical probes Precise indoor locations systems are coming of age, and enabling context-aware might need to y in precise forma- operations will result in a wide range of effective Internet of Things (IoT) applications to analyze water samples from tions. However, for many of these systems to operate, they need location refer- a polluted lake while constantly re- ence points, or some ﬁ xed nodes of known location. This article makes the case porting their sensor readings and that there are many peer-to-peer location applications that only require the rela- each drone’s relative position in tive positions of the mobile nodes, and explores the issues that need to be consid- the swarm to a central aggregator. ered to accurately determine their topology. —Roy Want Similarly, an army troop on the ground or a group of rst responders

r10iot.indd 94 10/5/18 2:37 PM EDITOR ROY WANT THE IOT CONNECTION Google; [email protected]

in a disaster-relief e ort could bene t generates the distance of each edge in Player from the ability to continuously visu- the network. These distances naturally alize their group’s con guration. How- over-determine the system, producing ever, GPS might not be adequately pre- the relative topology graph (relative be- cise or even available on a battle eld, cause the produced topology could be a and environmental infrastructure rotated version of the true topology) of might not be available, for example, on all the participating devices. Our goal a basketball court. is to localize dynamic nodes whose As a solution, P PLoc (peer-to-peer locations change over time. Tracking localization) envisions wearable IoT mobile nodes requires fast collection devices on users’ arms or wrists that time to prevent measurements from exchange wireless messages to ulti- becoming too stale. Collecting each Distance measurement mately compute the relative positions pairwise distance is not possible be- of each group member. The outcome cause each TWR handshake consumes Figure . A group of players abstracted is a topology or con guration of mo- time and there are O(n ) pairwise dis- as a graph. Nodes represent the players P2PLoc: Peer-to- bile users that can be tracked in real tances to be measured, scaling poorly and edges denote each distance mea- time. We believe this can be a valuable as the number of devices, n, grows. Of surement performed. primitive to various group-motion course, instead of over-determining applications. the system through n measurements, 3n Peer Localization of Existing localization approaches we can still solve the topology with round-trip times and the two turn- 2 are usually based on creating a large pairwise measurements. This would around times. For a group of devices, database of received signal strength signi cantly reduce the total localiza- one would require all three messages to Fast-Moving Entities from a few xed access points and tion time. Thus, the essential question be exchanged between each selected then matching the measured signal for this approach comes down to which device pair. This would mean · d mes- strength to report the approximate O(n) pairwise distance measurements sages need to be exchanged to obtain Ashutosh Dhekne, Umberto J. Ravaioli, and Romit Roy Choudhury, location of the user. However, in the will result in fast and accurate track- d distance measurements (see Figure University of Illinois at Urbana-Champaign sports or army contexts, we might not ing of the topology. b). Also, we need at least three dis- have the liberty to create such a n- There are three main factors to con- tance measurements for every node PPLoc envisions wearable Internet of ocalization has been exten- gerprint of the entire arena. Instead, sider when choosing the pairs: to uniquely solve a topology. By care- sively studied in various we propose to use the time wireless fully picking the edges, it is possible 3n Things devices that compute the relative contexts, both indoor and signals take to travel between two de- › the total number of wireless to obtain a solution in 2 distance outdoor. Yet, emerging ap- vices as a measure of the distance be- message exchanges while exe- measurements for n nodes. Figure b ⌈ ⌉ positions of each user, resulting in a topology Lplications continue to ask for new tween them. The precision of this time cuting the TWR protocol; shows one such careful choice. requirements that challenge exist- measurement directly correlates with › the geometric dilution of pre- The original TWR protocol is de- or conﬁ guration of mobile users that can ing localization mechanisms. For in- the bandwidth of the wireless signal cision, which changes with the signed for one-to-one distance mea- be tracked in real time for group-motion stance, a team of soccer or basketball used. Therefore, we use ultra-wideband topology; and surement. However, the broadcast players might seek their precise po- (UWB) radios with a GHz bandwidth. › occlusions caused by humans nature of wireless channels permits applications. sitions during a game—this is valu- When used with a packet-handshake that makes some links unusable. one-to-many operations, providing an able to coaching and sports analytics protocol called two-way ranging (TWR), opportunity to reduce the total number applications. As another example, a today’s UWB platforms can estimate TWR PROTOCOL of messages exchanged. As shown in swarm of wirelessly connected IoT the distance between two devices with OPTIMIZATIONS Figure , the initiator’s POLL message FROM THE EDITOR drones carrying chemical probes about cm precision without clock Figure a shows the original TWR can be heard by all other nodes. They Precise indoor locations systems are coming of age, and enabling context-aware might need to y in precise forma- synchronization. protocol. It is simply a ping-pong of can then take turns to send the RESP operations will result in a wide range of effective Internet of Things (IoT) applications to analyze water samples from A group of players or military per- messages with precisely measured message back to the initiator. A single tions. However, for many of these systems to operate, they need location refer- a polluted lake while constantly re- sonnel can be abstracted as a net- timings at both participating devices. FINAL message then su ces for all the ence points, or some ﬁ xed nodes of known location. This article makes the case porting their sensor readings and work topology (see Figure ), with It is comprised of three time-stamped responders to calculate their distance that there are many peer-to-peer location applications that only require the rela- each drone’s relative position in each node representing an individual messages exchanged between a device from the initiator. A further optimi- tive positions of the mobile nodes, and explores the issues that need to be consid- the swarm to a central aggregator. and the edges representing the dis- pair—an initiator and a responder. zation is possible where all initiators ered to accurately determine their topology. —Roy Want Similarly, an army troop on the tance between them. Given n nodes, We obtain the time of ight by aver- take turns to send their POLL mes- ground or a group of rst responders TWR performed between every pair aging the di erence between the two sages, and the responders take turns

r10iot.indd 94 10/5/18 2:37 PM r10iot.indd 95 10/5/18 2:37 PM THE IOT CONNECTION

Initiator Responder 1 distance This problem, called geometric di- measurement, lution of precision (DoP), occurs in GPS POLL 3 messages used receivers as well. GPS DoP solutions2,3 should be applicable in this situation. Round- Turnaround However, there is a key difference in trip time time RESP the way GPS estimates DoP and what would be required in a short-range sys- Turnaround Round- tem like ours. GPS only uses the angles time trip time between vectors formed by the initi- FINAL ator positions but ignores the mag- 11 distances measured, nitude. While this works reasonably 33 messages used well for GPS (because of the very large (a) (b) distance between Earth and the satel- lites), ignoring the magnitude can lead to poor choices in short-range systems. I I I R R … R 1 2 3 1 2 n–1 Instead, we calculate the estimated lo- Only three nodes POLL calization error directly and select the behave as initiators best initiators.

i HUMAN OCCLUSIONS RESP Accurate distance measurements de- ... All initiators also send pend on the wireless device’s ability to ... RESP messages identify the direct line of sight (LOS) path between two nodes. This can be- FINAL i i come challenging due to body block- 15 distances measured, ing when the device is worn by hu- 13 messages used mans. Non-line-of-sight (NLOS) paths can then be misinterpreted as being (c) (d) the first path, causing large-ranging errors. Figure 4 demonstrates the Figure 2. (a) The original two-way ranging (TWR) protocol. Each distance measure- impact of body blocking with a set ment needs three messages. (b) For triangulation, each node must have at least three of nodes (blue squares) arranged in a edges. (c) Our modified TWR protocol to minimize the number of wireless messages semicircle around a person wearing exchanged in a group of nodes. (d) Only three nodes initiate TWR—a minimal number of a UWB device on his or her arm. Dis- messages exchanged ensures faster overall protocol time. tance estimates for devices blocked by the person’s body are significantly scattered and erroneous (red streaks), to send only one RESP message each. rate for n nodes. However, it makes no while those obtained by non-occluded This is followed by the initiators send- claims about the accuracy obtained devices are more precise (green dots). ing FINAL messages. Just three initia- through a particular choice of three Using distance estimates from oc- tors are required to solve the topology. initiator nodes. If all distance mea- cluded nodes to solve for the topology This optimized ranging protocol and surements were precise, this choice can cause severe localization errors. In the resultant set of distance measure- would not matter. However, if dis- a fast-moving topology, human occlu- ments are shown in Figure 2c and 2d, tance measurements have even small sions are common and a scheme that respectively. While this protocol is errors, such as those introduced by does not cater to such situations will suboptimal in the number of distances hardware noise, then the localiza- fail miserably. Thus, even if DoP con- measured, it is optimal in the number tion accuracy can be severely affected siderations indicate a set of initiators of wireless messages exchanged. Ul- by the choice of initiator nodes. The to be optimal, occlusions might render timately, minimizing the message ex- dark overlapping area in Figure 3 that choice infeasible. changes speeds up localization. shows the region of confusion—node Determining occlusion based on T could be anywhere within this re- link quality between every node pair DILUTION OF PRECISION gion. Observe how the geometry of is time-consuming. Fortunately, be- The optimized TWR protocol provides the initiators (labeled A1, A2, and A3) cause every device can overhear all an upper bound on the system update affects this dark region. ongoing communication, a device can

48 ComputingEdge February 2019 96 COMPUTER WWW.COMPUTER.ORG/COMPUTER

r10iot.indd 96 10/5/18 2:58 PM THE IOT CONNECTION

Initiator Responder 1 distance This problem, called geometric di- deduce its link quality with all other measurement, lution of precision (DoP), occurs in GPS devices just by listening to the chan- A1 POLL 3 messages used receivers as well. GPS DoP solutions2,3 nel without incurring time costs. Each should be applicable in this situation. device independently deduces occlu- Round- Turnaround However, there is a key difference in sions and produces an exclusion list, trip time time the way GPS estimates DoP and what which is updated at every round of the RESP A2 A3 A3 would be required in a short-range sys- pipelined TWR protocol. Turnaround Round- tem like ours. GPS only uses the angles time trip time between vectors formed by the initi- EVALUATION PLATFORM A1 A2 FINAL ator positions but ignores the mag- AND RESULTS 11 distances measured, nitude. While this works reasonably We invited 10 volunteers to wear UWB 33 messages used well for GPS (because of the very large arm bands while playing basketball. T T (a) (b) distance between Earth and the satel- The volunteers took specific positions lites), ignoring the magnitude can lead on a basketball court, creating a to- to poor choices in short-range systems. pology. Each UWB node (https://www Region of confusion Region of confusion I I I R R … R 1 2 3 1 2 n–1 Instead, we calculate the estimated lo- .decawave.com/products/evk1000 Only three nodes POLL calization error directly and select the -evaluation-kit), shown in Figure 5, ran behave as initiators u1 u1 best initiators. our modified TWR protocol and chose u2 u3 u2 u3 a set of appropriate initiator nodes i HUMAN OCCLUSIONS based on DoP and occlusions. The vol- RESP Accurate distance measurements de- unteers moved into 22 different topol- ... All initiators also send pend on the wireless device’s ability to ogies, mimicking important positions ... RESP messages identify the direct line of sight (LOS) in a basketball game. Overall, the 75th path between two nodes. This can be- percentile localization accuracy for FINAL i i come challenging due to body block- all the volunteers across all topologies Figure 3. The node’s location can be estimated to be anywhere in the region of confu- 15 distances measured, ing when the device is worn by hu- was around 0.8 m. Of course, some to- sion. The shape and area of this region depends on both the magnitude and the angle 13 messages used mans. Non-line-of-sight (NLOS) paths pologies provided poor occlusion-free of radius vectors formed by the initiators. can then be misinterpreted as being choices, causing a relatively long tail. (c) (d) the first path, causing large-ranging In a real game, we expect such cases errors. Figure 4 demonstrates the to be few and short-lived. To measure Figure 2. (a) The original two-way ranging (TWR) protocol. Each distance measure- impact of body blocking with a set the impact of occlusions alone, we ment needs three messages. (b) For triangulation, each node must have at least three of nodes (blue squares) arranged in a repeated the game by mounting the edges. (c) Our modified TWR protocol to minimize the number of wireless messages semicircle around a person wearing UWB nodes on tripods. The resulting exchanged in a group of nodes. (d) Only three nodes initiate TWR—a minimal number of a UWB device on his or her arm. Dis- localization error stayed under 0.2 m, messages exchanged ensures faster overall protocol time. tance estimates for devices blocked showing the significant impact of hu- Occluded device, erroneous distance estimates by the person’s body are significantly man occlusions. scattered and erroneous (red streaks), to send only one RESP message each. rate for n nodes. However, it makes no while those obtained by non-occluded IMPLEMENTATION IN IOT This is followed by the initiators send- claims about the accuracy obtained devices are more precise (green dots). DEVICES Non-occluded device, correct distance estimates ing FINAL messages. Just three initia- through a particular choice of three Using distance estimates from oc- We have discussed specific optimi- tors are required to solve the topology. initiator nodes. If all distance mea- cluded nodes to solve for the topology zations and pitfalls in implementing This optimized ranging protocol and surements were precise, this choice can cause severe localization errors. In a peer-to-peer relative localization Human body occluding the device the resultant set of distance measure- would not matter. However, if dis- a fast-moving topology, human occlu- scheme in the context of sports and ments are shown in Figure 2c and 2d, tance measurements have even small sions are common and a scheme that other group activities. Whereas we respectively. While this protocol is errors, such as those introduced by does not cater to such situations will used UWB devices for performing UWB device strapped to one arm suboptimal in the number of distances hardware noise, then the localiza- fail miserably. Thus, even if DoP con- the distance measurements, the fun- measured, it is optimal in the number tion accuracy can be severely affected siderations indicate a set of initiators damentals discussed here remain of wireless messages exchanged. Ul- by the choice of initiator nodes. The to be optimal, occlusions might render applicable for any ranging technol- Figure 4. Effect of human occlusions on estimated distance. Ultra-wideband (UWB) timately, minimizing the message ex- dark overlapping area in Figure 3 that choice infeasible. ogy. Recent advancements in the devices that are blocked by a person’s body obtain erroneous distance estimates. changes speeds up localization. shows the region of confusion—node Determining occlusion based on IEEE 802.11-REVmc protocol 4 allow T could be anywhere within this re- link quality between every node pair for wireless time-of-flight measure- DILUTION OF PRECISION gion. Observe how the geometry of is time-consuming. Fortunately, be- ments on commodity WiFi devices measurements envisioned in this arti- thus transform ad hoc playgrounds into The optimized TWR protocol provides the initiators (labeled A1, A2, and A3) cause every device can overhear all and access points, which can easily be cle. IoT devices that support this tech- sports-analytics arenas without relying an upper bound on the system update affects this dark region. ongoing communication, a device can adapted to perform the peer-to-peer nology can be built today. P2PLoc can on expensive tracking technology.

www.computer.org/computingedge 49 96 COMPUTER WWW.COMPUTER.ORG/COMPUTER OCTOBER 2018 97

r10iot.indd 96 10/5/18 2:58 PM r10iot.indd 97 10/5/18 2:37 PM THE IOT CONNECTION

3. P. Massatt and K. Rudnick, Geometric Formulas for Dilution of Precision Calcu- lations, Navigation, 1990, pp. 379–391; Armband https://onlinelibrary.wiley.com UWB antenna /doi/abs/10.1002/j.2161-4296.1990 .tb01563.x. 4. IEEE Std. P802.11-REVmc, IEEE Approved Draft Standard for Infor- mation Technology—Telecommuni- cations and Information Exchange between Systems—Local and Met- ropolitan Area Networks—Specific Power source Requirements Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications, IEEE, 2016; https://ieeexplore.ieee UWB device .org/document/7558098.

Figure 5. A wearable arm band carrying a UWB device.

ASHUTOSH DHEKNE is a PhD 1 student at the University of Illinois at 90% = 2.65m Urbana-Champaign. Contact him at [email protected]. 75% = 0.82m

UMBERTO J. RAVAIOLI is an analyst 0.5

CDF at Toyon Research Corporation. Contact him at um.ravaioli@gmail .com. Tripod based Volunteer based ROMIT ROY CHOUDHURY is a pro- 0 fessor and Jerry Sanders III Scholar 024 68at University of Illinois at Urbana- Localization error (m) Champaign. Contact him at croy@ illinois.edu. Figure 6. Overall localization error remains under 1 m for most cases, even with human occlusions. Without human occlusions, localization error is under 20 cm. CDF: cumulative distribution function.

he proliferation of IoT devices enabling accurate and fast tracking of in everyday sensing and ana- a team of devices in the absence of ex- Tlytics is pushing the envelope ternal infrastructure. for location tracking. P2P location tracking is well suited for many of REFERENCES these applications due to its low en- 1. R. Want, W. Wang, and S. Chesnutt, ergy footprint and extreme robustness “Accurate Indoor Location for the under any environmental condition. IoT,” Computer, vol. 51, no. 8, 2018, Despite the challenges of peer-to-peer pp. 66–70. location tracking, from our results, we 2. R.B. Langley, Dilution of Precision, Read your subscriptions see the potential in the feasibility and GPS World, May 1999, pp. 52–59; This article originallythrough the appeared myCS in publications portal at vast utility of such a primitive. P2PLoc www2.unb.ca/gge/Resources Computer, vol. 51, no. 10, 2018. http://mycs.computer.org is only a ﬁrst step in this direction, /gpsworld.may99.pdf.

50 ComputingEdge February 2019 98 COMPUTER WWW.COMPUTER.ORG/COMPUTER

r10iot.indd 98 10/5/18 3:10 PM

DEPARTMENT: WEARABLE COMPUTING

Earables for Personal- Scale Behavior Analytics

Fahim Kawsar The rise of consumer wearables promises to have a Nokia Bell Labs, Cambridge profound impact on people’s lives by going beyond Chulhong Min counting steps. Wearables such as eSense—an in- Nokia Bell Labs, Cambridge ear multisensory stereo device—for personal-scale Akhil Mathur Nokia Bell Labs, Cambridge behavior analytics could help accelerate our Alessandro Montanari understanding of a wide range of human activities in Nokia Bell Labs, Cambridge a nonintrusive manner. Department Editors: Oliver Amft; [email protected] The era of wearables has arrived. As more established Kristof Van Laerhoven; forms of wearables such as timepieces, rings, and pendants [email protected] get a digital makeover, they are reshaping our everyday experiences with new, useful, exciting, and sometimes entertaining services. Millions of people now wear commercial wearables on a daily basis to quantify their physical activity, social lifestyle, and health.1 However, for wearables to have a broader impact on our lives, next-generation consumer wearables must expand their monitoring capabilities beyond a narrow set of exercise-related physical activities. One of the barriers for modern wearables in modeling richer and broader human activities is the limited diversity of the embedded sensors. Inertial sensors, such as gyroscopes, are constrained to track motion activities; microphones are primarily limited to conversational activity sensing; and radio-wave-based sensing primarily works for proxemic context detection. Also, many non- commercial academic endeavors exploring newer wearables with ambitious modeling targets suffer from poor ergonomics, limiting large-scale studies and experimental trials. We argue that one way to address these challenges is to leverage a form factor with an already established function and augment it with diverse, established sensing modalities while maintaining its ergonomics and comfort. Indeed, several recent wearable initiatives, including smart eyeglasses2 and smart wristbands,3 aimed at modeling medical-grade biomarkers but allowed open data access to spur new research. We envision that wearables equipped with multimodal sensing and real-time data accessibility could foster new research leading toward a comprehensive understanding of various human behavioral traits in a nonintrusive manner. Such understanding will uncover opportunities for new and useful applications in the areas of precise, predictive, and personalized medicine; digital, physical, mental, and social well-being; cognitive assistance; and sensory human communication experiences.

To this end, we present eSense, an in-ear multi-sensory high definition wireless stereo device in an aesthetically pleasing and ergonomically comfortable form factor. eSense offers a set of APIs to allow developers to record real-time data streams of different sensors and to reconfigure different system parameters suitable for behavioral inferences. We briefly discuss the system’s design and development and its potential in a wide range of research studies and applications in personal-scale behavior analytics.

• Established functionality. Earbuds are already integrated into people’s lives, and provide access to high-definition music during work, commuting, and exercise. Earbuds also allow people to make hands-free calls. eSense, first and foremost, is a comfortable earpiece capable of producing a high-definition wireless audio experience in a compel- ling form, but it has added sensing functionalities. Consequently, eSense seamlessly augments established earpieces without demanding changes to users’ current behaviors. • Unique placement. The ear is a relatively stationary part of the body, so placement of a sensing unit in the ear offers two concrete benefits. First, due to the stationary nature, sensory signals (accelerometer, gyroscope, audio, and so on) are less susceptible to motion artifacts and external noises. So, sensor data carries accurate and precise information concerning recognition of physical and conversational activities in comparison to other wearable devices such as those worn on the wrist. Such signal clarity has a profound impact on the robustness of sensory models. Second, placement in the ear enables earables to monitor head and mouth movements, in addition to whole-body movements, in a noninvasive way. This unique capability creates opportunities for many novel applications in the areas of personal health, dietary monitoring, and attention management. • Privacy-preserving interaction. Earbuds are intimate and discreet, enabling users to have immediate and hands-free access to information in a privacy-preserving and social- ly acceptable way. Other wearable form factors typically demand explicit actions and attention from users, but earbuds can deliver auditory information even when a user is mindfully engaged in a physical or social activity.

ESENSE eSense is designed to be able to track a set of head- and mouth-related behavioral activities including speaking, eating, drinking, shaking, and nodding, as well as a set of whole-body movements. Automatically tracking these activities has profound value to various applications in the areas of quantified lifestyle, computational social science, healthcare, and well-being. While these application areas benefit from a diverse range of sensory signals including audio, motion, orientation, photoplethysmogram (PPG), temperature, and galvanic skin response, we designed eSense with three sensory channels—audio, motion, and Bluetooth Low Energy (BLE) radio. Three aspects influenced this design decision: the physical dimension of the eSense printed circuit board to maintain the aesthetics and comfort, the minimization of signal interference from adjacent sensors, and the maximization of battery life to offer the primary functional service: high-definition music playback.

+VMZo4FQUFNCFS52 ComputingEdge XXXDPNQVUFSPSHQFSWBTJWFFebruary 2019 IEEE PERVASIVE COMPUTING WEARABLE COMPUTING

DESIGN DYNAMICS Wearables with many embedded sensors are now mainstream. Many of these wearables manifest in traditional forms—for example, a wristband, smartwatch, or pendant. As such, a new wearable form demands careful design assessment from multiple perspectives including ergonomics, aesthetics, functionality, and interaction usability. We have chosen the wireless earbud for efficient, robust, and multimodal sensing of behavioral attributes for the following benefits: Figure 1. eSense system. (a) Audio, motion, and Bluetooth Low Energy radio sensing are powered by a CSR processor and a 40-mAH battery. (b) Schematic design. (c) Front and back of the printed • Established functionality. Earbuds are already integrated into people’s lives, and pro- circuit board. vide access to high-definition music during work, commuting, and exercise. Earbuds also allow people to make hands-free calls. eSense, first and foremost, is a comfortable earpiece capable of producing a high-definition wireless audio experience in a compel- ling form, but it has added sensing functionalities. Consequently, eSense seamlessly Hardware augments established earpieces without demanding changes to users’ current behaviors. Two main concerns drove the hardware design of eSense (see Figure 1): physical size and func- • Unique placement. The ear is a relatively stationary part of the body, so placement of a tional requirements. Physically, we wanted eSense to equal the size of a standard wireless earbud sensing unit in the ear offers two concrete benefits. First, due to the stationary nature, sensory signals (accelerometer, gyroscope, audio, and so on) are less susceptible to mo- (including battery, electronics, and all outside connections) to ensure that it could be worn natu- tion artifacts and external noises. So, sensor data carries accurate and precise infor- rally with comfort. Functionally, we wanted eSense to permit reprogramming the sensors and mation concerning recognition of physical and conversational activities in comparison recharging of the battery by users. Taking both of these concerns into account, we used a cus- to other wearable devices such as those worn on the wrist. Such signal clarity has a pro- tom-designed 15 × 15 × 3 mm PCB. eSense is composed of a Qualcomm CSR8670, a program- found impact on the robustness of sensory models. Second, placement in the ear enables mable Bluetooth dual-mode flash audio system-on-chip (SoC) with one microphone; a TDK earables to monitor head and mouth movements, in addition to whole-body movements, MPU6050 six-axis inertial measurement unit (IMU) including a three-axis accelerometer, a in a noninvasive way. This unique capability creates opportunities for many novel ap- three-axis gyroscope, a digital motion processor, and a two-state button; a circular LED; associ- plications in the areas of personal health, dietary monitoring, and attention management. ated power regulation; and battery-charging circuitry. There is no internal storage or real-time • Privacy-preserving interaction. Earbuds are intimate and discreet, enabling users to clock. We opted for an ultra-thin 40-mAh LiPo battery to provide the system with power. This have immediate and hands-free access to information in a privacy-preserving and social- battery offers a reasonable energy profile: 3.0 h of inertial sensing at 50 Hz and 1.2 h for simul- ly acceptable way. Other wearable form factors typically demand explicit actions and at- taneous audio sensing at 16 kHz and inertial sensing at 50 Hz. The carrier casing is equipped tention from users, but earbuds can deliver auditory information even when a user is with an external battery enabling recharging of eSense earbuds on the go. Each earbud weights mindfully engaged in a physical or social activity. 20 g and is 18 × 20 × 20 mm.

Wireless earbuds provide users with freedom of movement and hands-free interaction, minimizing situational disability and fragmentation of attention. In addition, earables can be worn for Firmware many hours without any impact on primary motor or cognitive activities. These advantages collectively shaped our design decision to select the earbud as an ideal wearable platform for per- We developed an energy-aware firmware that implements the classic Bluetooth stack including sonal-scale behavior analytics. the Advanced Audio Distribution Profile (A2DP) for high-definition audio streaming, and mono channel recording. The firmware also implements the full BLE radio stack for delivering the accelerometer and gyroscope data and configuring different parameters. A set of BLE character- ESENSE istics expose these functionalities for setting the sampling rate and duty cycle of the microphone and IMU, setting the advertisement packet interval and connection interval of BLE, and receiv- eSense is designed to be able to track a set of head- and mouth-related behavioral activities in- ing the sensor data. Also, we have designed standard BLE characteristics for receiving the bat- cluding speaking, eating, drinking, shaking, and nodding, as well as a set of whole-body move- tery voltage and advertisement packets. In our design, continuous bi-directional audio streaming ments. Automatically tracking these activities has profound value to various applications in the uses classical Bluetooth, and motion data streaming uses BLE. To accommodate simultaneous areas of quantified lifestyle, computational social science, healthcare, and well-being. While audio and motion data transfer, eSense implements a multiplexing protocol transparently without these application areas benefit from a diverse range of sensory signals including audio, motion, requiring any modification to the host device stack. Finally, to enable continuous proximity sens- orientation, photoplethysmogram (PPG), temperature, and galvanic skin response, we designed ing, eSense broadcasts advertisement packets continuously, and the advertisement interval can be eSense with three sensory channels—audio, motion, and Bluetooth Low Energy (BLE) radio. configured programmatically to maintain a right balance between battery life and application Three aspects influenced this design decision: the physical dimension of the eSense printed cir- requirements. cuit board to maintain the aesthetics and comfort, the minimization of signal interference from adjacent sensors, and the maximization of battery life to offer the primary functional service: high-definition music playback.

+VMZo4FQUFNCFS XXXDPNQVUFSPSHQFSWBTJWF +VMZo4FQUFNCFSwww.computer.org/computingedge XXXDPNQVUFSPSHQFSWBTJWF53 IEEE PERVASIVE COMPUTING

Middleware To support application development with eSense, we have created a thin middleware for the iOS and Android operating systems, and Node.js middleware for desktop platforms. This middleware lets developers connect to and configure eSense and ingest sensory data in real time. It also offers a set of predefined audio- and movement-based context primitives that developers can readi- ly use in their applications in an event-driven manner.

SIGNAL CHARACTERISTICS We have experimentally explored the differential characteristics of the BLE, audio, and inertial signals captured by eSense. Comparing eSense to two other forms, a smartphone and a smartwatch, we looked at several key factors that impact behavior analytics pipelines including sampling variability, the signal-to-noise ratio (SNR), placement invariance, and sensitivity to motion artifacts. The experimental results suggest that eSense is robust in modeling these signals and in most conditions demonstrates comparable and often superior performance in signal stability and noise sensitivity. Figure 2 shows the SNR of motion and audio signals received under identical physical activity conditions—sitting, standing, and walking—for eSense, a LG Urbane 2 smartwatch, and a Google Nexus 6 smartphone in the pocket. For inertial sensors, we derived the SNR by computing the power ratio of the signal and noise in decibel (dB) scale. The noise profile—electrical noise and sensor biases—was obtained from the stationary states of the devices. We computed the SNR for audio signals by the power ratio of the signal and noise in the context of speech during different physical activities. We obtained the noise profile in the absence of speech signals during these activities and before calculating the SNR in dB scale; this noise was removed from the recorded signal through spectral subtraction. For inertial sensing, we observe that the earbud and smartwatch carry more information than a smartphone about the target physical activities. With respect to audio sensing, the earbud provides the highest SNR in all physical activity conditions owing to the device’s unique placement, which is less affected by the acoustic profile of human activities.

Figure 2. Signal-to-noise ratio (SNR) of accelerometer, gyroscope, and microphone signals of eSense, a smartphone, and a smartwatch under different activity conditions. eSense’s earbud captures differential signals in all cases.

APPLICATION LANDSCAPE The most plausible consumer-focused applications for multisensory wearables such as eSense are smart music players that can react to social and emotional contexts, fitness trackers, and af- fective communication tools. Building upon many academic research efforts, several commercial-grade earbuds today such as Bragi’s The Dash, Bose SoundSport, Jabra Elite Sport, and Sony Xperia offer a superior noise-cancelling music experience augmented with fitness coach-

+VMZo4FQUFNCFS54 ComputingEdge XXXDPNQVUFSPSHQFSWBTJWFFebruary 2019 IEEE PERVASIVE COMPUTING WEARABLE COMPUTING

ing. Two other active application areas that are gaining momentum are in-ear personal assistants Middleware and real-time language translators. A number of crowdfunded projects are currently exploring these services with wearables. To support application development with eSense, we have created a thin middleware for the iOS and Android operating systems, and Node.js middleware for desktop platforms. This middleware There are, however, many other application areas in which we envision earables providing sig- lets developers connect to and configure eSense and ingest sensory data in real time. It also of- nificant benefits. In fact, over the past two decades, seminal research in the ubiquitous compu- fers a set of predefined audio- and movement-based context primitives that developers can readi- ting and wearable computing domains has sought to achieve useful, engaging, and sometimes ly use in their applications in an event-driven manner. ambitious behavioral analytics with ear-worn sensing devices including continuous monitoring of cardiovascular function, heart rate and stress,4,5 measurement of oxygen consumption and blood flow,6 tracking eating episodes,7 dietary and swallowing activities,8,9 and several other SIGNAL CHARACTERISTICS biomarkers. Here we briefly look at potential applications for eSense and similar earables in these areas. We have experimentally explored the differential characteristics of the BLE, audio, and inertial signals captured by eSense. Comparing eSense to two other forms, a smartphone and a smartwatch, we looked at several key factors that impact behavior analytics pipelines including sam- Health and Well-being pling variability, the signal-to-noise ratio (SNR), placement invariance, and sensitivity to motion artifacts. The experimental results suggest that eSense is robust in modeling these signals and in eSense can be effectively used to monitor head- and mouth-related behavioral activities includ- most conditions demonstrates comparable and often superior performance in signal stability and ing speaking, eating, drinking, shaking, and nodding, as well as a set of whole-body movements. noise sensitivity. It can also be extended to detect more minute head and neck movements to augment various clinical medicine applications related to neck and head injury. Moreover, with eSense conversa- Figure 2 shows the SNR of motion and audio signals received under identical physical activity tional activity monitoring capabilities, social interactions can be quantified that to further help conditions—sitting, standing, and walking—for eSense, a LG Urbane 2 smartwatch, and a treat different mental health conditions and provide well-being feedback. Figure 3 shows a Google Nexus 6 smartphone in the pocket. For inertial sensors, we derived the SNR by compu- workplace well-being application that models eSense sensory streams for a variety of physical, ting the power ratio of the signal and noise in decibel (dB) scale. The noise profile—electrical digital, and social well-being metrics10 and provides personalized and actionable feedback in noise and sensor biases—was obtained from the stationary states of the devices. We computed conversational and visual representations. the SNR for audio signals by the power ratio of the signal and noise in the context of speech during different physical activities. We obtained the noise profile in the absence of speech signals during these activities and before calculating the SNR in dB scale; this noise was removed from the recorded signal through spectral subtraction. For inertial sensing, we observe that the earbud and smartwatch carry more information than a smartphone about the target physical activities. With respect to audio sensing, the earbud provides the highest SNR in all physical activity conditions owing to the device’s unique placement, which is less affected by the acoustic profile of human activities.

Figure 3. A personalized well-being feedback app for eSense that captures physical, mental, and social well-being at the workplace.

Cognitive Assistant In recent years, we have seen the emergence of conversational agents such as Siri, Cortana, and Figure 2. Signal-to-noise ratio (SNR) of accelerometer, gyroscope, and microphone signals of Google Assistant into mainstream use by millions of users on a daily basis. However, these eSense, a smartphone, and a smartwatch under different activity conditions. eSense’s earbud agents are not yet capable of understanding users’ physical, social, and environmental contexts. captures differential signals in all cases. We envision that a platform such as eSense will pave the path for these agents to be hyperaware of their users’ context, and thereby be able to offer personalized services more persuasively and integrate with our inner cognition loop seamlessly. Also, subtle gestures such as nodding or APPLICATION LANDSCAPE shaking can act as semi-implicit interaction techniques with these agents. Such cognitive agents will be extremely beneficial for the mobile workforce (for example, field service professionals The most plausible consumer-focused applications for multisensory wearables such as eSense and retail workers) as well as structured-workplace workers (for example, office workers and are smart music players that can react to social and emotional contexts, fitness trackers, and af- call center workers) by providing them with situational guidance and activity-aware recommen- fective communication tools. Building upon many academic research efforts, several commer- dations, enabling them to adhere workplace safety and regulations precisely. cial-grade earbuds today such as Bragi’s The Dash, Bose SoundSport, Jabra Elite Sport, and Sony Xperia offer a superior noise-cancelling music experience augmented with fitness coach-

+VMZo4FQUFNCFS XXXDPNQVUFSPSHQFSWBTJWF +VMZo4FQUFNCFSwww.computer.org/computingedge XXXDPNQVUFSPSHQFSWBTJWF55 IEEE PERVASIVE COMPUTING

Driving Behavior Monitoring One specialized behavior analytics application that eSense can be used eSense could pave for is to track drivers’ head movements to ensure that drivers are awake, alert, and looking in the right direction. The hands-free and the path for immediate interaction affordance of eSense can also be leveraged as a conversational contextual communication interface to provide drivers with relevant information about their routes, and to prevent them from reading texts agents to be while driving. hyperaware of their Contextual Notification Management users’ context, and Timely delivery of notifications on mobile devices has been an active thereby be able to area of study in the past few years. Researchers have primarily focused on understanding the receptivity of mobile notifications and offer personalized predicting opportune moments to deliver notifications to optimize metrics such as response time, engagement, and emotion. eSense’s services more ability to understand situational context could be incorporated into designing effective notification delivery mechanisms in the future. persuasively and integrate with our Lifelogging inner cognition loop eSense could also enable lifelogging using nonvision modalities. While today’s lifelogging applications are primarily vision based, seamlessly. audio, motion, and proximity sensing can collectively identify and capture users’ memorable and vital everyday events and can help them intuitively recall their past experiences.

Computational Social Science Research There is a rich body of literature on tracking face-to-face interactions and the impact of the space on enabling them. These works often apply to controlled environments and settings such as offices. eSense could be used as a platform to conducting such studies at scale, thereby opening up opportunities for new insights in computational social science.

CONCLUSION Wearable devices such as eSense hold enormous potential in accelerating our understanding of a wide range of human activities in a nonintrusive manner. As such, we plan to share this platform with ubiquitous computing researchers and create an open data repository for eSense-driven research and research on other wearables. Academic researchers are encouraged to visit http://www.esense.io and register their interest in participating in an early access pilot project in the fourth quarter of 2018 using donated eSense units, associated software libraries, and a data- sharing platform. We will also showcase eSense at the 2018 UbiComp and ISWC conferences. We expect that these efforts will collectively help our community to achieve the next breakthroughs in wearable sensing systems, especially in understanding the dynamics of human behavior in the real world.

REFERENCES 1. O. Amft and K. Van Laerhoven, “What Will We Wear after Smartphones?,” IEEE Pervasive Computing, vol. 16, no. 4, 2017, pp. 80–85. 2. F. Wahl et al., “Personalizing 3D-Printed Smart Eyeglasses to Augment Daily Life,” Computer, vol. 50, no. 2, 2017, pp. 26–35.

+VMZo4FQUFNCFS56 ComputingEdge XXXDPNQVUFSPSHQFSWBTJWFFebruary 2019 IEEE PERVASIVE COMPUTING WEARABLE COMPUTING

3. M.-Z. Poh, N.C. Swenson, and R.W. Picard, “A Wearable Sensor for Unobtrusive, Driving Behavior Monitoring Long-Term Assessment of Electrodermal Activity,” IEEE Trans. Biomedical Eng., eSense could pave vol. 57, no. 5, 2010, pp. 1243–1252. One specialized behavior analytics application that eSense can be used 4. Y. Tomita and Y. Mitsukura, “An Earbud-Based Photoplethysmography and Its for is to track drivers’ head movements to ensure that drivers are the path for Application,” Electronics and Communications in Japan, vol. 101, no. 1, 2018, pp. awake, alert, and looking in the right direction. The hands-free and 32–38. immediate interaction affordance of eSense can also be leveraged as a conversational 5. S. Nirjon et al., “MusicalHeart: A Hearty Way of Listening to Music,” Proc. 10th contextual communication interface to provide drivers with relevant ACM Conf. Embedded Network Sensor Systems (SenSys 12), 2012, pp. 43–56. information about their routes, and to prevent them from reading texts agents to be 6. S.F. LeBoeuf et al., “Earbud-Based Sensor for the Assessment of Energy Expenditure, while driving. Heart Rate, and VO2max,” Medicine and Science in Sports and Exercise, vol. 46, no. hyperaware of their 5, 2014, pp. 1046–1052. 7. A. Bedri et al., “EarBit: Using Wearable Sensors to Detect Eating Episodes in Contextual Notification Management users’ context, and Unconstrained Environments,” Proc. ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 1, no. 3, 2017; doi.org/10.1145/3130902. Timely delivery of notifications on mobile devices has been an active thereby be able to 8. K. Taniguchi et al., “Earable RCC: Development of an Earphone-Type Reliable area of study in the past few years. Researchers have primarily fo- Chewing-Count Measurement Device,” J. Healthcare Eng., 2018; cused on understanding the receptivity of mobile notifications and offer personalized doi.org/10.1155/2018/6161525. predicting opportune moments to deliver notifications to optimize 9. O. Amft et al., “Analysis of Chewing Sounds for Dietary Monitoring,” Proc. 7th Int’l metrics such as response time, engagement, and emotion. eSense’s services more Conf. Ubiquitous Computing (UbiComp 05), 2005, pp. 56–72. ability to understand situational context could be incorporated into 10. A. Mashhadi et al., “Understanding the Impact of Personal Feedback on Face-to-face designing effective notification delivery mechanisms in the future. persuasively and Interactions in the Workplace,” Proc. 18th ACM Int’l Conf. Multimodal Interaction (ICMI 16), 2016, pp. 362–369. integrate with our Lifelogging inner cognition loop eSense could also enable lifelogging using nonvision modalities. While today’s lifelogging applications are primarily vision based, seamlessly. ABOUT THE AUTHORS audio, motion, and proximity sensing can collectively identify and Fahim Kawsar leads pervasive systems research at Nokia Bell Labs in Cambridge, UK, capture users’ memorable and vital everyday events and can help them and holds a Design United Professorship at TU Delft. Contact him at fahim.kawsar@nokia- intuitively recall their past experiences. bell-labs.com. Chulhong Min is a research scientist at Nokia Bell Labs in Cambridge, UK. Contact him at Computational Social Science Research [email protected]. Akhil Mathur is a research scientist at Nokia Bell Labs in Cambridge, UK. Contact him at There is a rich body of literature on tracking face-to-face interactions and the impact of the space [email protected]. on enabling them. These works often apply to controlled environments and settings such as offices. eSense could be used as a platform to conducting such studies at scale, thereby opening up Alessandro Montanari is a research scientist at Nokia Bell Labs in Cambridge, UK. Con- opportunities for new insights in computational social science. tact him at [email protected].

CONCLUSION Wearable devices such as eSense hold enormous potential in accelerating our understanding of a wide range of human activities in a nonintrusive manner. As such, we plan to share this platform with ubiquitous computing researchers and create an open data repository for eSense-driven research and research on other wearables. Academic researchers are encouraged to visit http://www.esense.io and register their interest in participating in an early access pilot project in the fourth quarter of 2018 using donated eSense units, associated software libraries, and a data- This article originally appeared in sharing platform. We will also showcase eSense at the 2018 UbiComp and ISWC conferences. IEEE Pervasive Computing, vol. 17, no. 3, 2018. We expect that these efforts will collectively help our community to achieve the next breakthroughs in wearable sensing systems, especially in understanding the dynamics of human behavior in the real world.

+VMZo4FQUFNCFS XXXDPNQVUFSPSHQFSWBTJWF +VMZo4FQUFNCFSwww.computer.org/computingedge XXXDPNQVUFSPSHQFSWBTJWF57 CONFERENCES in the Palm of Your Hand

IEEE Computer Society’s Conference Publishing Services (CPS) is now offering conference program mobile apps! Let your attendees have their conference schedule, conference information, and paper listings in the palm of their hands.

The conference program mobile app works for Android devices, iPhone, iPad, and the Kindle Fire.

For more information please contact [email protected] Call for Articles

IEEE Software seeks practical, readable articles that will appeal to experts and nonexperts alike. The magazine aims to deliver reliable information to software developers and managers to help them stay on top of rapid technology change. Submissions must be original and no more than 4,700 words, including 250 words for each table and gure.

Author guidelines: www.computer.org/software/author Further details: [email protected] www.computer.org/software EEITRE OPTN jUl COMPUTING INTERNET IEEE M COMPUTING INTERNET IEEE M COMPUTING INTERNET IEEE july • augustjuly 2016 NO COMPUTING INTERNET IEEE may • junemay 2016 march • april 2016 y • a IEEE INTERNET COMPUTING COMPUTING INTERNET IEEE UGU a 2016 sT a RC y • j h • a UNE

PRI november • december 2015 2016 january • february 2016 v l 2016 EM ja NU b

ER aR • D y • fEb ECEM RU b aR ER y 2015 2016

ME ExPl as ORING URING Th Cl I NTERNET T E I OU ThE INTERNET ThE OMORRO NTERNET d s E TOR CONOMIC Of YOU YOU Of w’s INTERNET w’s aGE aGE s V Ol vO . 20, NO. 1 NO. . 20, l. 19, 6 NO. l. V V V Ol. 20, NO. 3 NO. 20, Ol. Ol. 20, NO. 2 NO. 20, Ol. Ol. 20, NO. 4 NO. 20, Ol. www. www. COMPUTER COMPUTER . ORG www. www. / INTERNET . ORG www. COMPUTER COMPUTER / INTERNET / COMPUTER / . ORG . ORG / INTERNET /

INTERNET IC-20-01-c1 Cover-1 December 7, 2015 1:45 PM . ORG /

IC-19-06-c1 Cover-1 October 9, 2015 3:26 PM INTERNET / / /

IC-20-03-c1 Cover-1 April 13, 2016 8:45 PM IC-20-02-c1 Cover-1 February 11, 2016 10:30 PM Want to know more about the Internet? This magazine covers all aspects of Internet computing, from programming and standards to security and networking. www.computer.org/internet

www.computer.org/computingedge 59 IEEE Letters of the Computer Society (LOCS) is a rigorously peer-reviewed forum for rapid publication of brief articles describing high-impact results in all areas of interest to the IEEE Computer Society.

Topics include, but are not limited to: EDITOR IN CHIEF

• software engineering and design; Darrell Long - University of California, Santa Cruz • information technology; • software for IoT, embedded, and cyberphysical systems; ASSOCIATE EDITORS • cybersecurity and secure computing; • Dan Feng - Huazhong University of Science • autonomous systems; and Technology • machine intelligence; • Gary Grider - Los Alamos National Laboratory • parallel and distributed software and • Kanchi Gopinath - Indian Institute of Science algorithms; (IISc), Bangalore • programming environments and languages; • Katia Obraczka - University of California, Santa • computer graphics and visualization; Cruz • services computing; • Thomas Johannes Emil Schwarz - Marquette • databases and data-intensive computing; University • cloud computing and enterprise systems; • Marc Shapiro - Sorbonne-Université–LIP6 & • hardware and software test technology. Inria • Kwang Mong Sim - Shenzhen University

LOCS offers open access options for authors. Learn more about IEEE open access publishing: Learn more about LOCS, www.ieee.org/open-access submit your paper, or become a subscriber today: www.computer.org/locs IEEE Security & Privacy magazine provides articles with both a practical and research bent by the top thinkers in the field. ✔ Stay current on the latest security tools and theories and gain invaluable practical and research knowledge, ✔ Learn more about the latest techniques and cutting-edge technology, and ✔ Discover case studies, tutorials, columns, and in-depth interviews and podcasts for the information security industry.

www.computer.org/subscribe

SPgeneral_full.indd 111 2/28/18 4:31 PM

Software &gt; Artificial Intelligence &gt; Tech Careers &gt; Wearables

Software > Artificial Intelligence > Tech Careers > Wearables